[ntar-workers] Packet Compression (was NTAR - PCAP next generation
dump file format)
Jose M. Gonzalez
chema at cs.berkeley.edu
Fri Jul 1 01:37:56 GMT 2005
I like Alex's third choice, i.e., leaving packet headers outside of the
bzip2/zip compression. That would only force carrying out uncompression
when the parser needs to read the packet contents.
I'm actually trying to think on which scenarios will benefit from
compressing traces.
- first one is when you have a large trace and want to make it smaller.
You want to zip it w/o requiring unzipping the full trace to use it
again.
For this scenario, I think Alex 3rd approach will work. A possible
implementation is to have a CompressedPacketBlock with the type
"Packets Compressed Using Alex Idea." This CPB will contain a body with
the following structure:
+-------------------+------------------------+
| raw packet header | compressed packet data |
+-------------------+------------------------+
A raw packet header would include the same info than a Packet Block
(interface_id, drops_count, timestamp, caplen, len), plus the length
of the compressed data.
In order to accommodate uncompressed L2-L4 data, the first X bytes
of the packet could be left uncompressed.
+-------------------+--------------------+---------------------+
| raw packet header | raw data beginning | zipped rest of data |
+-------------------+--------------------+---------------------+
To support this, the raw packet header could be extended with a "length
of the uncompressed data" word.
- second one is a dumper trying to squeeze only a few bytes from every
packet. For this scenario, I think a GoPBlock with the type "Compressed
Packets" will be a better idea (you save bytes in exchange of limiting
browing to only the forward direction). Packets will be compressed by
using the snaplen to cut them.
This is a case of lossy compression, focusing in the beginning of
every packet, and throwing out the rest. In my case (and I assume this
may be just my experience), I've been interested in keeping just the
L3-L4 headers (I was doing network monitoring). For me, the L7 contents
were useless.
A GoPBlock will contain lots of packets, whose structure will be as
follows:
+-------------------+--------------------+
| raw packet header | raw data beginning |
+-------------------+--------------------+
This is the same idea than the previous approach, with a snaplen="length
of the uncompressed data," and using a GoPBlock instead of lots of
PacketBlocks. We should go to around 28+14+20+40 ~ 100 bytes/pkt
- If 100 bytes/packet is still too much data, we could try to go further,
compressing the data beginning using an adhoc, content-based compression
method. I doubt that this is ever useful, though.
Regards,
-Chema
Loris Degioanni wrote:
> >2. multiple compression blocks (with or without multiple section
> >headers) - this allows chunking of the compression, and allows a limited
> >random access comparable to splitting a classic capture file and
> >compressing them independently.
> >
> >A third choice that I'm surprised isn't supported (or, apparently,
> >supportable) is one where only the packet data is contained in a
> >compression block; with the packet block header remaining uncompressed.
> >This sort of thing would be especially useful for full-packet captures,
> >which can get very large, and really need compression. While a
> >simplistic implementation would probably not provide great compression,
> >due to the duplication of compression algorithm header data in each
> >packet, a more sophisticated approach might provide a common compression
> >dictionary block that could be used to decompress each of the individual
> >packets.
>
> That's a good idea. When defining the file format, I tought about
> per-packet compression, but I rejected it because of:
>
> - the overhead to set-up compression for every packet
> - the limited amount of compression obtained
>
> The issues I still see in your approach are:
> - it's quite complex to implement. In particular, is there any library
> we can rely on?
> - the location of the compression dictionary block. The beginning of the
> file? In that case you have to jump back in order to update it. The end
> of the file? You run the risk of loosing a lot of information if
> something goes wrong while you write it.
>
> >This third choice is also limited by the types of data that can be
> >represented in the (uncompressed) packet block headers - currently this
> >is only timestamp, (capture) length, inbound/outbound and error flags,
> >and packet hash. For random or packet selection access, it would be
> >very useful if it were possible to include address features of the
> >captured packet, e.g. IP or MAC src/dst addresses, TCP/UDP src/dst
> >ports, etc. This could be done using new options,
>
> ... or by an additional block that will contain this kind of information
> for one (or more?) packet blocks.
>
> Loris
>
> >although the fact
> >that options follow packet data is mildly annoying in this case (I
> >understand the reasoning for that, and am not suggesting changing it -
> >it's just that for this (ab)use of options, having to seek past the
> >packet data is inconvenient).
> >
> >@alex
> >_______________________________________________
> >ntar-workers mailing list
> >ntar-workers at winpcap.org
> >https://www.winpcap.org/mailman/listinfo/ntar-workers
> >
> _______________________________________________
> ntar-workers mailing list
> ntar-workers at winpcap.org
> https://www.winpcap.org/mailman/listinfo/ntar-workers
More information about the ntar-workers
mailing list