This document contains only my personal opinions and calls of judgement, and where any comment is made as to the quality of anybody's work, the comment is an opinion, in my judgement.
[file this blog page at: digg del.icio.us Technorati]
not as terribleis hardly reassuring as a selling point.
-p f2
array) and read rates of
40-60MB/s, but variable within a significant margin. Using
watch iostat 1 2to check the spread of activity across the drives I could see either well balanced low rates of 5-7MB/s per drive, or unbalanced rates, almost arbitrarily. When I measured the single drives I got pretty good performance. At their simplest the array and drive tests were done with just
sysctl vm/drop_caches=3 dd bs=4k if=/dev/sdc of=/dev/nullbut I also used for more varied tests Bonnie 1.4. (with
-o_direct
of course). Anyhow just
dd
as in the above is sufficient to get a coarse
but reasonable idea. It is also useful to watch
vmstat 1
while doing tests.
/dev/md0
) and
the underlying disk block devices, with a command like
blockdev --setra 65536 /dev/md0and also with smaller values. The results were that with read-ahead less than 64 the underling disk drives would perform at less than their top rate:
tree$ sudo sh -x /tmp/testra + for N in 1 4 16 32 64 128 + blockdev --setra 1 /dev/sda + sysctl vm/drop_caches=1 vm.drop_caches = 1 + dd bs=4k count=50000 if=/dev/sda of=/dev/null + grep MB/s 204800000 bytes (205 MB) copied, 6.94646 s, 29.5 MB/s + for N in 1 4 16 32 64 128 + blockdev --setra 4 /dev/sda + sysctl vm/drop_caches=1 vm.drop_caches = 1 + dd bs=4k count=50000 if=/dev/sda of=/dev/null + grep MB/s 204800000 bytes (205 MB) copied, 6.93524 s, 29.5 MB/s + for N in 1 4 16 32 64 128 + blockdev --setra 16 /dev/sda + sysctl vm/drop_caches=1 vm.drop_caches = 1 + dd bs=4k count=50000 if=/dev/sda of=/dev/null + grep MB/s 204800000 bytes (205 MB) copied, 4.11668 s, 49.7 MB/s + for N in 1 4 16 32 64 128 + blockdev --setra 32 /dev/sda + sysctl vm/drop_caches=1 vm.drop_caches = 1 + dd bs=4k count=50000 if=/dev/sda of=/dev/null + grep MB/s 204800000 bytes (205 MB) copied, 3.88555 s, 52.7 MB/s + for N in 1 4 16 32 64 128 + blockdev --setra 64 /dev/sda + sysctl vm/drop_caches=1 vm.drop_caches = 1 + dd bs=4k count=50000 if=/dev/sda of=/dev/null + grep MB/s 204800000 bytes (205 MB) copied, 3.53645 s, 57.9 MB/s + for N in 1 4 16 32 64 128 + blockdev --setra 128 /dev/sda + sysctl vm/drop_caches=1 vm.drop_caches = 1 + dd bs=4k count=50000 if=/dev/sda of=/dev/null + grep MB/s 204800000 bytes (205 MB) copied, 3.58348 s, 57.2 MB/sand the same happened to the RAID10 device itself but got values smaller than 64k. At some point I noticed something entirely unexpected: in the
vmstat
output the number of
interrupts was inversely proportional to the value of the
read-ahead. Since I was doing my tests on an otherwise idle
machine and there were very few interrupts when I was not
running dd
this obviously correlated to RAID
operations. After some more testing with dd
, using
various combinations of read-head (--setra
) and
read length (bs=
) and watching with
vmstat
my summary was:
dd
to read faster, but the maximum
speed of the read device was reached quickly, with a
value of 64; thereafter only the number of
interrupts would decrease.bs=
for dd
had instead
essentially no effect compared to changing values of
read-ahead.bi
column
of the output of vmstat
seemed to be an exact
multiple of the read-ahead, which I noticed when I set the
read-ahead to 1000 and other powers of 10:
$ sudo blockdev --setra 1000 /dev/sda $ sudo dd bs=4k count=500000 if=/dev/sda of=/dev/null & vmstat 1 [2] 3697 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 2 1 0 330772 648812 727796 0 0 21 5 91 104 3 11 86 0 0 2 2 0 330056 664148 727340 0 0 50464 0 304 425 12 44 0 45 0 1 2 0 330932 682976 720968 0 0 51000 0 299 415 11 45 0 44 0 1 2 0 330148 700512 715600 0 0 48500 0 287 423 12 43 0 46 0 1 2 0 330556 716128 710148 0 0 50000 4 300 432 12 41 0 48 0 1 2 0 330228 732268 705280 0 0 49000 0 295 409 12 43 0 46 0 1 2 0 330880 744936 702148 0 0 48500 0 286 419 10 47 0 44 0 1 2 0 329916 758820 699220 0 0 49500 0 289 430 16 39 0 46 0 1 2 0 330092 770968 696012 0 0 49000 0 289 425 11 43 0 47 0 1 2 0 330808 781252 693556 0 0 48500 0 289 415 7 48 0 46 0 1 2 0 336416 792724 684552 0 0 50500 0 295 414 7 50 0 44 0 1 2 0 845892 810656 163568 0 0 48500 0 308 511 12 45 0 43 0 1 2 0 868672 819836 139180 0 0 50000 0 340 558 11 42 0 48 0That's uncanny, down to some lines having a multiple of 500 instead of 1000.
mailboxingusually in its
tagged queueingvariant.