[ntar-workers] Packet Compression (was NTAR - PCAP next generation dump file format)

Fri Jul 1 01:37:56 GMT 2005

I like Alex's third choice, i.e., leaving packet headers outside of the 
bzip2/zip compression. That would only force carrying out uncompression 
when the parser needs to read the packet contents. 

I'm actually trying to think on which scenarios will benefit from 
compressing traces.

- first one is when you have a large trace and want to make it smaller. 
	You want to zip it w/o requiring unzipping the full trace to use it 
	again. 

	For this scenario, I think Alex 3rd approach will work. A possible 
	implementation is to have a CompressedPacketBlock with the type 
	"Packets Compressed Using Alex Idea." This CPB will contain a body with 
	the following structure:

	+-------------------+------------------------+
	| raw packet header | compressed packet data |
	+-------------------+------------------------+

	A raw packet header would include the same info than a Packet Block 
	(interface_id, drops_count, timestamp, caplen, len), plus the length 
	of the compressed data. 

	In order to accommodate uncompressed L2-L4 data, the first X bytes 
	of the packet could be left uncompressed. 

	+-------------------+--------------------+---------------------+
	| raw packet header | raw data beginning | zipped rest of data |
	+-------------------+--------------------+---------------------+

	To support this, the raw packet header could be extended with a "length 
	of the uncompressed data" word. 

- second one is a dumper trying to squeeze only a few bytes from every 
	packet. For this scenario, I think a GoPBlock with the type "Compressed 
	Packets" will be a better idea (you save bytes in exchange of limiting 
	browing to only the forward direction). Packets will be compressed by
	using the snaplen to cut them. 

	This is a case of lossy compression, focusing in the beginning of 
	every packet, and throwing out the rest. In my case (and I assume this 
	may be just my experience), I've been interested in keeping just the 
	L3-L4 headers (I was doing network monitoring). For me, the L7 contents 
	were useless. 

	A GoPBlock will contain lots of packets, whose structure will be as 
	follows: 

	+-------------------+--------------------+
	| raw packet header | raw data beginning |
	+-------------------+--------------------+

	This is the same idea than the previous approach, with a snaplen="length 
	of the uncompressed data," and using a GoPBlock instead of lots of 
	PacketBlocks. We should go to around 28+14+20+40 ~ 100 bytes/pkt

- If 100 bytes/packet is still too much data, we could try to go further, 
	compressing the data beginning using an adhoc, content-based compression 
	method. I doubt that this is ever useful, though. 

Regards, 
-Chema

Loris Degioanni wrote:
> >2. multiple compression blocks (with or without multiple section 
> >headers) - this allows chunking of the compression, and allows a limited 
> >random access comparable to splitting a classic capture file and 
> >compressing them independently.
> >
> >A third choice that I'm surprised isn't supported (or, apparently, 
> >supportable) is one where only the packet data is contained in a 
> >compression block; with the packet block header remaining uncompressed.  
> >This sort of thing would be especially useful for full-packet captures, 
> >which can get very large, and really need compression.  While a 
> >simplistic implementation would probably not provide great compression, 
> >due to the duplication of compression algorithm header data in each 
> >packet, a more sophisticated approach might provide a common compression 
> >dictionary block that could be used to decompress each of the individual 
> >packets.
> 
> That's a good idea. When defining the file format, I tought about 
> per-packet compression, but I rejected it because of:
> 
> - the overhead to set-up compression for every packet
> - the limited amount of compression obtained
> 
> The issues I still see in your approach are:
> - it's quite complex to implement. In particular, is there any library 
> we can rely on?
> - the location of the compression dictionary block. The beginning of the 
> file? In that case you have to jump back in order to update it. The end 
> of the file? You run the risk of loosing a lot of information if 
> something goes wrong while you write it.
> 
> >This third choice is also limited by the types of data that can be 
> >represented in the (uncompressed) packet block headers - currently this 
> >is only timestamp, (capture) length, inbound/outbound and error flags, 
> >and packet hash.  For random or packet selection access, it would be 
> >very useful if it were possible to include address features of the 
> >captured packet, e.g. IP or MAC src/dst addresses, TCP/UDP src/dst 
> >ports, etc.  This could be done using new options, 
> 
> ... or by an additional block that will contain this kind of information 
> for one (or more?) packet blocks.
> 
> Loris
> 
> >although the fact 
> >that options follow packet data is mildly annoying in this case (I 
> >understand the reasoning for that, and am not suggesting changing it - 
> >it's just that for this (ab)use of options, having to seek past the 
> >packet data is inconvenient).
> >
> >@alex
> >_______________________________________________
> >ntar-workers mailing list
> >ntar-workers at winpcap.org
> >https://www.winpcap.org/mailman/listinfo/ntar-workers
> >
> _______________________________________________
> ntar-workers mailing list
> ntar-workers at winpcap.org
> https://www.winpcap.org/mailman/listinfo/ntar-workers