[ntar-workers] Seekable file layouts etc
Gianluca Varenni
gianluca.varenni at gmail.com
Sun Jul 10 03:12:50 GMT 2005
----- Original Message -----
From: "Alexander Dupuy" <alex.dupuy at counterstorm.com>
To: <ntar-workers at winpcap.org>
Sent: Friday, July 08, 2005 10:14 AM
Subject: Re: [ntar-workers] Seekable file layouts etc
>I wrote:
>>> when a packet header seemed to be garbage ... it would look backwards in
>>> the (saved) previous packet data searching for a file header.
>
>>> It would be very nice if there was some similar sanity checking that
>>> could
>>> be performed on NTAR packet block headers ... With the special magic for
>>> SPB headers, resynchronization in this case should not be too difficult.
>
> Gianluca Varenni replied:
>> Uhm, can you elaborate on that? Do you mean adding something at the end
>> of a
>> block? At the moment every block has a trailer containing the block size;
>> this trailer can be used for both backward seeks in the file, and for
>> sanity
>> check.
>
> I was thinking of something at the start of the block, so that you can
> determine sanity before you read the block data itself (as it may be long,
> 1-2K or even more with jumbo frames). However, the trailing block size
> would be usable for this purpose, as you would use a valid length trailer
> from one block to validate the next one (for truncation detection).
Exactly. The library should check for the correct trailing block size (ntar
does that while closing the block with "ntar_close_block()"); actually I've
just discovered that such check is not done when the file supports seeks
(basically I skip the entire block data, including the trailer). I'll fix it
tomorrow.
>
>> What do you mean by "special magic for SPB headers?" SPB is the simple
>> packet block (so a "normal" block), and it should have nothing to do with
>> synchronization.
>
> Sorry, I meant SHB (section header block).
>
>> In any case, it would be interesting to have some mechanism to
>> resynchronize
>> a tracefile obtained out of truncated captures, I don't know if it's so
>> easy (basically you need to find a new SHB where you were expecting the
>> data
>> of a block).
>
>> Maybe the best idea in this case would be to have some sort of "recovery
>> mode" in ntar, where all "cool" features (backward seeks, random access
>> to blocks, indexes/markers) are disabled. You basically use this mode to
>> take the corrupted trace file and regenerate a good trace file (cutting
>> out all the garbage).
>
> Christian Kreibich then responded.
>> I think once you detect corruption you simply have to start from the last
>> valid block and do a byte-by-byte scan and test whether the parseable
>> sequence of blocks looks decent, per the above. From that you should be
>> able
>> to fix the block size fields in situ to restore correct sequencing (if
>> you don't want to duplicate a 5GB trace), or ...
>
> An internal recovery mode would be useful. This might best be handled by
> a state variable with three possible values: Unknown, Read, and
> Recovering. The initial state would be Unknown, and it would be reset to
> Unknown after any API seek is performed, and would disable any attempt at
> recovery. When in Unknown state, any successful block read would set the
> state to Read. The Recovering state would be set when truncation is
> detected, and would disable all operations that perform file i/o, except
> for block/section/file close, and get_next_section/get_next_block.
I think I'm dumb.... what is the point of having two separate states
"unknown" and "read"?
>
> Detection and recovery from an invalid (truncated) block, could be
> implemented using something like the following logic, without any need for
> seeks on the ntar file (i.e. this should work even when reading from a
> pipe):
>
> Before block.c:read_raw_block_data returns success, it verifies
> non-truncation by checking that the trailing length is present and matches
> the leading length.
> (This is an API change, currently trailer is verified only in
> ntar_read_section and ntar_close_block [close_block_read_mode]).
This should be quite trivial to implement.
>
> In Read state (only), if an invalid trailing length is present, its four
> bytes, and 11 additional bytes read from the file are appended to the
> block's raw data (the 11 bytes are needed in case the SHB has only been
> partially read, and represent 3 bytes of block type, 4 bytes block length,
> and 4 bytes byte-order magic).
Correct.
>
> The raw data is scanned forwards, looking for the SHB "\r\n\n\r"
> palindrome, and confirming that one of the possible byte-order magic
> numbers follows four bytes later. If an SHB is detected in the raw data,
> state is set to Recovering, the data from the SHB palindrome to the end of
> the raw data is copied to a recovery buffer, and a special
> RECOVERABLE_TRUNCATION error is returned. If no SHB is detected, this is
> an unrecoverable error, and a MISMATCH error is returned.
Ok.
>
> In Recovering mode, get_next_section and get_next_block read data from the
> recovery buffer instead of doing freads, until the recovery buffer has
> been exhausted, at which point the state is set to Unknown.
In general, the idea should work (independently of having the two states
"read" and "unknown", or only one). I'm only a bit worried about the
recovery buffer, it's possible that it adds a lot of complexity to the code
(basically I need to add some code that checks if I need to read data from
the recovery buffer or from the file). However, after my callback based
implementation of reads (I've just released a new version of ntar), this
should be *much* easier: basically we substitute the "normal" read callback
with a fake one that reads data from the recovery buffer (until data are
present), and then "backpatches" the read callback to the original one when
all the bits in the recovery buffer have been processed. Sounds cool!
What do you think?
Have a nice day
GV
>
> @alex
> --
> mailto:dupuy at counterstorm.com
> _______________________________________________
> ntar-workers mailing list
> ntar-workers at winpcap.org
> https://www.winpcap.org/mailman/listinfo/ntar-workers
More information about the ntar-workers
mailing list