[ntar-workers] Some perf test results on NTAR.
Guy Harris
guy at alum.mit.edu
Tue Jul 12 07:55:56 GMT 2005
Gianluca Varenni wrote:
> The read tests (test018) were run using the standard (vanilla) NTAR
> library,
> and a modified one that does *not* use seeks to jump from a block to
> another
> (instead, I read all the data from each block in a fake buffer).
The original AT&T "standard I/O library" routines, at least in some
version of AT&T UNIX, would, as I remember, discard all buffered
information on an fseek(), do an lseek(), and either fill the buffer or
rely on the next read to fill the buffer - it wouldn't check whether it
could seek within the buffer.
It might be that the MSVC++ or GNU libc standard I/O library do the same
thing. For long seeks, this would probably be more efficient, when
moving forward through the file, than just reading forward, as you don't
read all the intermediate data. For *short* seeks, however, you might
be likely to be doing a seek within the buffer, in which case reading
forward means you just skip stuff in the buffer, while an fseek() will
throw out the buffered data and cause it to be re-read, causing extra I/Os.
For applications that only need to access the capture file sequentially,
just using the "standard I/O" library (FILE *) routines is probably good
enough.
For applications that don't, if there's a performance issue with random
access, the right answer would probably be to use custom accessors with
NTAR, and have the application do its own buffering and handle seeks
within the buffer sanely. (For Ethereal, there are other reasons, such
as handling compressed data and handling seeks on a pipe so we can read
from a pipe - Ethereal's Wiretap library does seeks even when reading
sequentially, both to try to open files as various file types and to
implement various heuristics to, for example, handle various annoying
mutant libpcap formats that use the standard magic number but don't use
the standard record format - why we'd ultimately want to do that.) That
does, of course, mean that the accessor routines would have to include a
seek routine.
BTW, this brings to mind something I remember from my youth (when, for
fun, I'd order OS/360 manuals from IBM and read them). OS/360's QSAM
(Queued Sequential Access Mechanism, which did buffered I/O, along the
lines of what you get with the FILE * routines, as I remember) had what
they called "locate mode" and "move mode"; in "move mode", a read would
copy data from the QSAM buffer to the application's buffer, while, in
"locate mode", a read would just return a pointer to the record in the
QSAM buffer. (Records weren't split across blocks; block sizes are
variable on IBM's disks, although the count/key/data stuff might now be
implemented in disk controller firmware atop modern fixed-length-sector
disks.)
If NTAR were to do its own buffering, it could, in theory, offer "locate
mode", although for those records that were split across buffer blocks,
it'd have to reassemble the record in its own buffer and supply a
pointer to it in that buffer. I don't know whether this would be worth
doing; the buffer would probably have to be big enough to hold several
records to make it worth doing (so that the chances of a record being
split across a buffer block are low enough that a significant number of
reads require no data copying).
More information about the ntar-workers
mailing list