[ntar-workers] Some perf test results on NTAR.
Gianluca Varenni
gianluca.varenni at gmail.com
Tue Jul 12 06:31:47 GMT 2005
Hello everyone.
During the weekend I run some tests to have a rough idea of the performance
of ntar on windows and linux, and check the issue of "slow" seeks me and
Loris were talking about in some other threads of the mailing lists.
Basically, I used two tests (test017 and test018, included in the latest
NTAR release I've put on the website), that respectively create sintetic
traces, and read traces (the test reads the type of each block, only, the
block data are *not* retrieved).
As you can see from the data below, I used a quite powerful DELL server
using performing SCSI disks. All the traces were saved/read from a freshly
formatted HD (with NTFS or ext2), in order to avoid fragmentation issues.
Moreover, I used some tricks before performing each read test, in order to
empty all the OS and disk caches (the results change a lot if you don't pay
attention to this detail).
The read tests (test018) were run using the standard (vanilla) NTAR library,
and a modified one that does *not* use seeks to jump from a block to another
(instead, I read all the data from each block in a fake buffer).
The results are shown below.
Some thoughts about the tests:
- the tests are rather incomplete, it would be interesting to repeat the
tests
+ with other linux filesystems
+ tuning the cluster size of the NTFS filesystems
+ trying RAID 0
+ using another read app, that actually reads the block contents.
- the performance of test017 (dump) is pretty much the same on linux&windows
- considering an ethernet link type, the write performance is about 400Mbps
(+/-20Mbps) for packets between 64 and 1518 bytes.
- on windows test018 (read) is heavily (badly) affected by seeks, especially
if seeks are short. The same tests with seeks disabled show an impressive
improvement. I think this is mainly due to the buffering used by the FILE
calls (fwrite/fread/fseeks), which are currently used by NTAR. It would be
very interesting (and it's quite easy to do) to implement the
read/write/seek NTAR callbacks using the native windows
ReadFile/WriteFile/
???seek???.
- on linux test018 (read) is somewhat affected by short seeks (but much less
that windows). Again, I think this is mainly due to the buffering used by
the FILE calls (fwrite/fread/fseeks), which are currently used by NTAR. It
would be very interesting (and it's quite easy to do) to implement the
read/write/seek NTAR callbacks using the native posix read/write/??seek??.
- in general, read operations seem to be comparable between windows and
linux with big packets and using the "no-seek optimization" under windows.
When small packets are involved, linux seems to be better. I would like to
investigate the problem, trying to reimplement the read/write/seek
callbacks using some more low level calls, and see what happens.
- CPU load: arghh, it's quite a pain to measure it. Basically, the CPU load
tends to be quite variable, going from 0% to some peak value, and I was
not able to compute a mean value for that in most cases. Under Windows, in
any case, the CPU load seems to be concentrated on one single CPU. Under
linux, I tried using "top" to measure the cpu load, and I had some strange
results (maybe strange for me because I'm not so used to it). Basically,
during some of the tests the cpu load for the test processes (test017 or
test018) was very high (85%), but the total load of every single CPU in
the system (hitting "1" while top runs) was much lower. I think I'm
missing something...
Any comment or idea for new tests (or volunteers for tests) is extremely
well accepted.
Have a nice day
GV
PS. Sorry for the *extremely* long mail...
------------------------------------------------------------------
Platform: DELL PowerEdge 2850
Dual XEON 3GHz Hyperthreading (=4 virtual CPUs)
1 GB RAM
1 x Seagate Cheetah 15k rpm 36GB SCSI (OS)
1 x Seagate Cheetah 15k rpm 36GB SCSI (trace files)
==============================Windows==============================
OS: Windows Server 2003
Trace file disk formatted with NTFS, standard cluster size.
Test 017-A (Dump of packets to disk)
------------------------------------
Vanilla NTAR 1.1.0.190 (static VC7 CRT)
PacketSize NumPackets Time(runA,runB) 1 CPU load estimate
64 10M 17, 14 0-100% (variable)
64 100M 147, 161 0-100% (variable)
1518 1M 29, 25 0-40% (variable)
1518 10M 293, 263 0-40% (variable)
Test 017-B (Dump of packets to disk)
------------------------------------
NTAR 1.1.0.190 compiled with DLL VC7 CRT (msvcrt71.dll)
PacketSize NumPackets Time(runA,runB) 1 CPU load estimate
64 10M 11, 11 0-100% (variable)
64 100M 151, 146 0-100% (variable)
1518 1M 21, 18 0-40% (variable)
1518 10M 300, 257 0-40% (variable)
Test 018-A (Read of blocks from disk)
------------------------------------
Vanilla NTAR 1.1.0.190 (static VC7 CRT, all seeks enabled)
PacketSize NumPackets Time(runA,runB) 1 CPU load estimate
64 10M 77, 79 80%
64 100M 788, 793 80%
1518 1M 41, 44 30%
1518 10M 387, 395 30%
Test 018-B (Read of blocks from disk)
------------------------------------
NTAR 1.1.0.190 (static VC7 CRT, seeks to jump from a block to
another disabled)
PacketSize NumPackets Time(runA,runB) 1 CPU load estimate
64 10M 24, 24 100%
64 100M 241, 242 100%
1518 1M 22, 22 30%
1518 10M 231, 231 30%
===============================Linux===============================
OS: Linux Fedora CORE 3, kernel 2.6.11.something
Trace file disk formatted with Linux LVM filesystem + ext2.
Test 017 (Dump of packets to disk)
------------------------------------
Vanilla NTAR 1.1.0.190
PacketSize NumPackets Time(runA,runB) "top" CPU load estimate
64 10M 10, 7, 13 ????
64 100M 142, 147 ????
1518 1M 14, 24 ????
1518 10M 302, 296 ????
Test 018-A (Read of blocks from disk)
------------------------------------
Vanilla NTAR 1.1.0.190 (all seeks enabled)
PacketSize NumPackets Time(runA,runB) "top" CPU load estimate
64 10M 19, 18 85%
64 100M 176, 168 85%
1518 1M 27, 26 ????
1518 10M 227, 227 ????
Test 018-B (Read of blocks from disk)
------------------------------------
NTAR 1.1.0.190 (reads to jump from a block to another disabled)
PacketSize NumPackets Time(runA,runB) "top" CPU load estimate
64 10M 15, 15 85%
64 100M 129, 141 85%
1518 1M 27, 26 ????
1518 10M 223, 227 ????
More information about the ntar-workers
mailing list