[ntar-workers] Some perf test results on NTAR.

Tue Jul 12 19:20:47 GMT 2005

----- Original Message ----- 
From: "Guy Harris" <guy at alum.mit.edu>
To: <ntar-workers at winpcap.org>
Sent: Tuesday, July 12, 2005 12:55 AM
Subject: Re: [ntar-workers] Some perf test results on NTAR.

> Gianluca Varenni wrote:
>
>> The read tests (test018) were run using the standard (vanilla) NTAR 
>> library,
>> and a modified one that does *not* use seeks to jump from a block to 
>> another
>> (instead, I read all the data from each block in a fake buffer).
>
> The original AT&T "standard I/O library" routines, at least in some 
> version of AT&T UNIX, would, as I remember, discard all buffered 
> information on an fseek(), do an lseek(), and either fill the buffer or 
> rely on the next read to fill the buffer - it wouldn't check whether it 
> could seek within the buffer.
>
> It might be that the MSVC++ or GNU libc standard I/O library do the same 
> thing.  For long seeks, this would probably be more efficient, when moving 
> forward through the file, than just reading forward, as you don't read all 
> the intermediate data.  For *short* seeks, however, you might be likely to 
> be doing a seek within the buffer, in which case reading forward means you 
> just skip stuff in the buffer, while an fseek() will throw out the 
> buffered data and cause it to be re-read, causing extra I/Os.

Probably. Fortunately enough, MS ships the source of the CRT with VSNET2003 
(at least the sources for the static CRT), so I'll have a look at how they 
implement that stuff.
Another idea I have is to actually disable the FILE buffering using 
setbuf(), and see what happens. At least on Windows, I expect an 
improvement, because in any case the OS has its own global file caching 
mechanism.

I'll try this approach as soon as I find time.

Have a nice day
GV

>
> For applications that only need to access the capture file sequentially, 
> just using the "standard I/O" library (FILE *) routines is probably good 
> enough.
>
> For applications that don't, if there's a performance issue with random 
> access, the right answer would probably be to use custom accessors with 
> NTAR, and have the application do its own buffering and handle seeks 
> within the buffer sanely.  (For Ethereal, there are other reasons, such as 
> handling compressed data and handling seeks on a pipe so we can read from 
> a pipe - Ethereal's Wiretap library does seeks even when reading 
> sequentially, both to try to open files as various file types and to 
> implement various heuristics to, for example, handle various annoying 
> mutant libpcap formats that use the standard magic number but don't use 
> the standard record format - why we'd ultimately want to do that.)  That 
> does, of course, mean that the accessor routines would have to include a 
> seek routine.
>
> BTW, this brings to mind something I remember from my youth (when, for 
> fun, I'd order OS/360 manuals from IBM and read them).  OS/360's QSAM 
> (Queued Sequential Access Mechanism, which did buffered I/O, along the 
> lines of what you get with the FILE * routines, as I remember) had what 
> they called "locate mode" and "move mode"; in "move mode", a read would 
> copy data from the QSAM buffer to the application's buffer, while, in 
> "locate mode", a read would just return a pointer to the record in the 
> QSAM buffer.  (Records weren't split across blocks; block sizes are 
> variable on IBM's disks, although the count/key/data stuff might now be 
> implemented in disk controller firmware atop modern fixed-length-sector 
> disks.)
>
> If NTAR were to do its own buffering, it could, in theory, offer "locate 
> mode", although for those records that were split across buffer blocks, 
> it'd have to reassemble the record in its own buffer and supply a pointer 
> to it in that buffer.  I don't know whether this would be worth doing; the 
> buffer would probably have to be big enough to hold several records to 
> make it worth doing (so that the chances of a record being split across a 
> buffer block are low enough that a significant number of reads require no 
> data copying).
> _______________________________________________
> ntar-workers mailing list
> ntar-workers at winpcap.org
> https://www.winpcap.org/mailman/listinfo/ntar-workers