This document contains only my personal opinions and calls of judgement, and where any comment is made as to the quality of anybody's work, the comment is an opinion, in my judgement.
[file this blog page at: digg del.icio.us Technorati]
syslog
. I don't really need the extensions,
so I did not much bother to read the relevant parts of the
manual, until now, and I have been lucky. The reason is that
therein lies the syntax for the log file format, and the syntax
is the same as for MS-DOS
environment variables, as this quote makes tragically clear:
The Microsoft cultural hegemony seems nearly absolute.A template for RFC 3164 format:
$template RFC3164fmt,"<%PRI%>%TIMESTAMP% %HOSTNAME% %syslogtag%%msg%"
Saito stated that this should help lower Toshiba's NAND manufacturing costs by 40-50 percent each year.
This target of compound cost reduction is major news, because if achieved it will substantially alter many current tradeoffs betwen capacity, speed and power density in the design of storage systems, especially because flash memory keeps information even when powered off just like disks, but powers on much more quickly.Moving towards 2011, Saito also stated that the price ratio between SSDs and HDDs will likely dissipate as well, as long as NAND manufacturing costs keep reducing by his 50 percent per year goal.
platters.
scratch my itchlogic. But perhaps it will be applied a bit too much: it seems likely to me that development tools will be the first to be enhanced to take advantage of multiple CPUs on a chip, because they are the main itch of a developer, and they are also relatively easy to parallelize.
embarassingly parallelones or development tools taking much advantage of parallellism anytime soon. I am more optimistic about their diffusion though: while as I am typing on this 2-CPU laptop only my editor is expending a minuscule amount of CPU time on one at a time, in a different context someone might be listening to music or watching streaming video or compiling the kernel in the background.
There Is No Alternativeand multiple CPU chips are all we can get. As to that, it is not so easy to find single CPU chips anymore, at least for desktops and laptops.
single-system imageit has a (possibly replicated) metadata server with pointers to the various bits of the single image. It is a bit of a weak point, but a filesystem server usually must have some kind of entry point; or conceivably use a lot of multicastingcasting, which has its own downsides. In practice the most obvious downside is that Lustre performs well when used with large files, as then the metadata to data ratio is low.
backportedfeatures, but also because of the
socialnature of the concept of
worksof any large software system: that not too many people complain at any one time. Which means that a long lived software package accumulates a lot of complaints and (ideally...) fixes.
cross sectionbetween the software and user issues be small enough. Using the metaphor of software as a lens in a
macroscope, what matters is that it be
transparentenough for it to remain useful. Unfortunately many fixes introduce new problems as they trigger new bugs, or removing spots on the lens might introduce new blemishes. Thus the reluctance of software vendors to fix bugs that affect only a minority of users: fixing them can create new bugs that affect the majority, and that would be really bad.
clustersto say whether their aim is to offer:
clusteron its own is used to indicate the Beowulf variety.
figure of meritas the
performance envelopesof various file systems are often quite differently shaped, and that shape is often misperceived. Among different types of performance those more commonly recognized are:
Thumper), Dell 2950 generic servers, Lustre, Linux RAID10, and DRBD for mirroring over the network. More specifically the target is around 200TB of cheap, highly available storage as a single pool or a few large ones. Some of the details:
only 6 millionobjects, and serial checking is sort of reasonably fast for that size of OST.
-p f2
read speed might be significantly higher
at the expense of lower write speed). In my dreams perhaps, but
not entirely delusionary.
General rule of thumb at the moment is 128MB of RAM/TB of filesystem plus 4MB/million inodes on that filesystem.
Which is a significant improvement on previously reported space requirements.Right now, I can repair a 9TB filesystem with ~150 million inodes in 2GB of RAM without going to swap using xfs_repair 2.9.4 and with no custom/tuning/config options.
Penryn45nm quad-CPU chip:
and in another article: a description of the physical chareacteristics of that type of IC (each quad-core Xeon chip has two of these ICs each with two CPUs):As Intel launches the L5420, a low power Xeon at 2.5 GHz. This CPU consumes 50 W (TDP), less than 12.5W per core thus, and only 16W (4 W per core) when running idle. The CPU consumes as little power as the previous 65 nm L5335, but performs about 30% better in for example Povray, Sungard and Cinebench.
In the same article another amazing statement as to the physical cost of anThese days, Intel manufacturers millions of Core 2 Duo processors each made up of 410 million transistors (over 130 times the transistor count of the original Pentium) in an area around 1/3 the size.
x86CPU decoder:
As I have been fond of repeating over many years, for now CPU architecture is dead, as immense transistor budgets make it almost irrelevant. The same article again also talks about another one of my favourite topics, chips with many CPUs, of which a CPU design with relatively few transistors bould be a building block:Back when AMD first announced its intentions to extend the x86 ISA to 64-bits Iasked Fred Weber, AMD's old CTO, whether it really made sense to extend x86 or if Intel made the right move with Itanium and its brand new ISA. His response made sense at the time, but I didn't quite understand the magnitude of what he was saying.
Fred said that the overhead of maintaining x86 compatibility was negligible, at the time around 10% of the die was the x86 decoder and that percentage would only shrink over time. We're now at around 8x the transistor count of the K8 processor that Fred was talking about back then and the cost of maintaining x86 backwards compatibility has shrunk to a very small number.
The article reports quite a few interesting detail about thisBuilt on Intel's 45nm manufacturing process, the Atom is Intel's smallest x86 microprocessor with a < 25 mm^2 die size and 13 mm x 14 mm package size. Unlike previous Intel processors that targeted these market segments, Atom is 100% x86 compatible (Merom ISA to be specific, the Penryn SSE4 support isn't there due to die/power constraints).
building blockCPU, initially to be used for palmtops, but which can still run 64 bit code.
Apr 5 15:05:12 tree kernel: gss_create: Pseudoflavor 390005 not found!<6>RPC: Couldn't create auth handle (flavor 390005) Apr 5 15:05:23 tree kernel: gss_create: Pseudoflavor 390005 not found!<6>RPC: Couldn't create auth handle (flavor 390005)But one of the most ironic case is the printing of error messages from the
syslog
dæmon itself:
Apr 4 18:03:16 tree syslogd: select: Invalid argument
Apr 4 18:04:16 tree syslogd: select: Invalid argumentFortunately there is
strace
to figure out what is the actual error, no thanks to
the authors of so many bad error messages:
time(NULL) = 1207404572 writev(1, [{"Apr 5 15:09:32", 15}, {" ", 1}, {"", 0}, {"base", 4}, {" ", 1}, {"syslogd: select: Invalid argumen"..., 33}, fsync(1) = -1 EINVAL (Invalid argument) writev(2, [{"Apr 5 15:09:32", 15}, {" ", 1}, {"", 0}, {"base", 4}, {" ", 1}, {"syslogd: select: Invalid argumen"..., 33}, fsync(2) = -1 EINVAL (Invalid argument) writev(6, [{"Apr 5 15:09:32", 15}, {" ", 1}, {"", 0}, {"base", 4}, {" ", 1}, {"syslogd: select: Invalid argumen"..., 33},So the
syslogd
error message is not just useless,
it is wrong too. The file descriptors on which the fsync
report an error for for files that not plain block device files:
*.info |/var/spool/xconsole *.=debug |/var/spool/xconsoledas these are
named pipes. Adding a
-
before the |
removes the
error, as the -
tells syslogd
not to
fsync
on every line.readahead | cached write |
direct write |
cached read |
direct read |
---|---|---|---|---|
16 | 192MiB 28% |
253MiB 10% |
84MiB 28% |
522MiB 24% |
512 | 165MiB 24% |
253MiB 10% |
478MiB 68% |
496MiB 22% |
16384 | 176MiB 25% |
262MiB 11% |
673MiB 61% |
481MiB 21% |
O_DIRECT
. There should be a difference as going
through the page cache means an extra memory-to-memory copy,
which in this singular test is not amortized over multiple uses,
but it is the level of overhead here that is amazing.
# blockdev --setra 16384 /dev/md0 # hdparm -tT /dev/md0 /dev/md0: Timing cached reads: 20748 MB in 2.00 seconds = 10396.44 MB/sec Timing buffered disk reads: 1966 MB in 3.00 seconds = 655.21 MB/secSo what's going here? Which part of the page cache subsystem is being awful? Even more confusingly, why the much lower number for cached reads here, when specifying not to do caching?
# hdparm -tT --direct /dev/md0 /dev/md0: Timing O_DIRECT cached reads: 3452 MB in 2.00 seconds = 1725.17 MB/sec Timing O_DIRECT disk reads: 1430 MB in 3.00 seconds = 476.20 MB/secAt least in this something can be discovered: that
cached readsmeans reading once the first 2MiB of the block device, and then timing repeated reads of the same, but in that case it is odd that it is faster than sequential reading of the same. Also, the array drives even when all are transferring can at most do around 800MB/s aggregate, so obviously some caching is going on despite the
O_DIRECT
, probably
though in the on-drive RAM buffers, which are rather larger than
2MiB. So probably 1.7GB/s is the maximum speed over the two SAS
host adapters and buses being used, which sort of figures, as
that is just under 8
PCIe
lanes.