[ntar-workers] Some perf test results on NTAR.

Tue Jul 12 07:55:56 GMT 2005

Gianluca Varenni wrote:

> The read tests (test018) were run using the standard (vanilla) NTAR 
> library,
> and a modified one that does *not* use seeks to jump from a block to 
> another
> (instead, I read all the data from each block in a fake buffer).

The original AT&T "standard I/O library" routines, at least in some 
version of AT&T UNIX, would, as I remember, discard all buffered 
information on an fseek(), do an lseek(), and either fill the buffer or 
rely on the next read to fill the buffer - it wouldn't check whether it 
could seek within the buffer.

It might be that the MSVC++ or GNU libc standard I/O library do the same 
thing.  For long seeks, this would probably be more efficient, when 
moving forward through the file, than just reading forward, as you don't 
read all the intermediate data.  For *short* seeks, however, you might 
be likely to be doing a seek within the buffer, in which case reading 
forward means you just skip stuff in the buffer, while an fseek() will 
throw out the buffered data and cause it to be re-read, causing extra I/Os.

For applications that only need to access the capture file sequentially, 
just using the "standard I/O" library (FILE *) routines is probably good 
enough.

For applications that don't, if there's a performance issue with random 
access, the right answer would probably be to use custom accessors with 
NTAR, and have the application do its own buffering and handle seeks 
within the buffer sanely.  (For Ethereal, there are other reasons, such 
as handling compressed data and handling seeks on a pipe so we can read 
from a pipe - Ethereal's Wiretap library does seeks even when reading 
sequentially, both to try to open files as various file types and to 
implement various heuristics to, for example, handle various annoying 
mutant libpcap formats that use the standard magic number but don't use 
the standard record format - why we'd ultimately want to do that.)  That 
does, of course, mean that the accessor routines would have to include a 
seek routine.

BTW, this brings to mind something I remember from my youth (when, for 
fun, I'd order OS/360 manuals from IBM and read them).  OS/360's QSAM 
(Queued Sequential Access Mechanism, which did buffered I/O, along the 
lines of what you get with the FILE * routines, as I remember) had what 
they called "locate mode" and "move mode"; in "move mode", a read would 
copy data from the QSAM buffer to the application's buffer, while, in 
"locate mode", a read would just return a pointer to the record in the 
QSAM buffer.  (Records weren't split across blocks; block sizes are 
variable on IBM's disks, although the count/key/data stuff might now be 
implemented in disk controller firmware atop modern fixed-length-sector 
disks.)

If NTAR were to do its own buffering, it could, in theory, offer "locate 
mode", although for those records that were split across buffer blocks, 
it'd have to reassemble the record in its own buffer and supply a 
pointer to it in that buffer.  I don't know whether this would be worth 
doing; the buffer would probably have to be big enough to hold several 
records to make it worth doing (so that the chances of a record being 
split across a buffer block are low enough that a significant number of 
reads require no data copying).