Software and hardware annotations, q1 2005
This document contains only my personal opinions and calls of
judgement, and where any comment is made as to the quality of
anybody's work, the comment is an opinion, in my judgement.
- 050329
- I have finally gotten around to add the
Ubuntu
sources to my Debian
/etc/apt/sources.list
:
deb http://archive.Ubuntu.com/ubuntu/ hoary main universe restricted multiverse
deb http://archive.Ubuntu.com/ubuntu/ warty-security main universe restricted multiverse
deb http://archive.Ubuntu.com/ubuntu/ warty-updates main universe restricted multiverse
deb http://archive.Ubuntu.com/ubuntu/ warty main universe restricted multiverse
in order to have the option of installing Ubuntu-only
packages or package versions.
To ensure that only Debian packages are considered by default
I have had to release-pin
packages with an
origin
of Ubuntu
to a low
priority, by putting these lines in
/etc/apt/preferences
:
Package: *
Pin: release o=Ubuntu
Pin-Priority: 90
- 050321
- I have just had a look at the
initrd
for
Debian and I was quite
amazed to see it is 2MB
compressed and about 5MB uncompressed. It is a pretty largish
root
filesystem, and some mini
distributions are smaller.
I was looking at it to answer a question by someone as to
how to prevent Debian from loading a specific SCSI driver at
startup, even if the HA
was present. Now it turns out that this is not easy because
there are many excessively helpful
mechanisms that try to automate system driver loading and
configuration, starting with those in the
initrd
.
- 050319
- Having a renewed interest in
VoIP
I have started looking at developments a bit more recent
than
H323
and this obviously means both
SIP,
IAX2
and the
Asterisk
software exchange.
Unsurprinsingly it looks like the usual: the whole are is
underdocumented and the desing of stuff is awkward and
inconsistent.
- 050315
- Playing with the 2.6.11.2 kernel and looking at the SUSE
patches I have noticed they have a ZyDAS 1201 driver as a
patch, version 0.15
of
the ZyDAS 1201 driver.
That patch applies cleanly and the driver
just
works
, even the firmware loading seems good (I have put
the ZyDAS firmware files in
/usr/local/lib/firmware
which is the right place
for manually installed firmware files).
The newer releases of the driver are much better than the
older ones (I had
tried release 0.8 a while ago)
and the 2.6 USB code seems to have improved a fair bit
too. Even better, there is a note saying that ZyDAS is helping
by giving documentation. This means ZyDAS joins the
good Linux WiFi chipsets
which is particularly welcome as ZyDAS chipset USB thingies
are easy to find, cheap and small.
just out of curiosity I have tried to measure what is the
effective 802.11b speed one can get, in not very optimal
conditions. I have my AP in one room and a PC in the next
room, with a wall in between that is rather radio opaque, and
thus a signal strength of 35/128 and default parameters.
Under these conditions I could get around 600KiB/s, or
around 5mb/s, out of the theoretical maximum of 11mb/s. This
is not too bad, and very similar to the actual utilization,
around 50%, with 802.11g but I suspect that in better
conditions and with a little tuning (frame size, MSS, ...)
this can be improved.
- 050314
- Well, I have been playing around with the 2.6.11.2 Linux
kernel release, and it seems a lot more reliable than previous
releases in the 2.6.x series. I have made a couple
of interesting discoveries:
- It is now possible to select the elevator on a
per-block-device basis, by using
/sys/block/
dev/queue/scheduler
.
- I figured out why I was getting much lower
hdparm -t
results under 2.6 than under 2.4
and the it appears that to get the same results under 2.6
I must raise the filesystem readahead set with
hdparm -a
n to some large
value like 512 blocks.
Evidently 2.6 kernels don't automagically do as much
readahead as 2.4 kernels do.
Note that lots of readhead make streaming tests look
better, but may be terrible for other uses...
- 050312
- Since the Linux 2.6.11 kernel release is so recent, and its
two official updates to 2.6.11.2, I have a look again at a
major distribution variant of the same kernel, the
SUSE
kotd
package,
to see how much of a variant it is; other kernels from major
distributions are similar, for example there is a
list of extra drivers in recent Mandrake kernels.
Well, it has almost 500 patches. Some of these add
functionality like
UML
and
Xen).
But there are very many fixes... These are the patch
collections:
Collection |
# of files |
total # of lines |
arch |
19 | 16130 |
drivers |
150 | 365887 |
fixes |
106 | 14786 |
rpmify |
13 | 911 |
suse |
157 | 236323 |
uml |
10 | 2817 |
xen |
14 | 36169 |
Of these the suse
and drivers
collections seem to be mostly extensions, but they also
contain a lot of fixes.
Now the question is, if these things are good for SUSE, who
are careful people who test things and have many happy users,
why aren't these patches in the original kernel? My comments
on this are:
- Well, some of those patches are not really relevant to
the original kernel, because they are SUSE specific or not
relevant to a general purpose kernel. But still, there are
so many fixes.
- In order to become part of the original kernel, they
have to be submitted to the original kernel
maintainers. I can easily see that the majaor distributions
may think it is not in their best interests to proactively
contribute their collections of fixes to the original
kernel.
Another point is that of these changes to the kernel, very
many are extensions, and they are distributed as source
patches. I know that Linus Torvalds prefers it like this, but
I still think that the Linux kernel should have some mechanism
to allow modularitazion of the source. Even accepting that it
is a monolithic kernel at runtime does not imply that
its source should be monolithic to the extent it is
now.
- 050301
- Things with P2P seem a bit better, or perhaps worse,
than the impressions I got from looking at the people queued
to download from my machine. Other statistics show that the
percentage of user that cannot provide uploads is way less
than the 40-50% I had summarily estimated. According to
statistics by Razorback2
which is probably the biggest eDonkey directory site, only
about 15% of users cannot provide uploads
(
LowID
).
So the mystery of why queues are so long and terrible and
download speeds so low persists.
However, I tried a few other downloads. One, and a fairly
large one, for an ISO9660 image with same media test files,
started almost immediately and proceeded at high speed (around
40KiB/s). The reason seems to be that the file was seeded by
some high speed servers from Razorback2 itself, thus showing
how effective seeding is.
I then decided to try and download something that was a
bit large but also with very many available sources. Finding
something suitable was not easy, in part because of the
variously objectionable nature of most of the really popular
stuff (unfortunately most was not freedom software), in part
because not a lot of files are popular.
Well, even with many complete sources, queuing took a
fairly long time and downloads were not speedy. Typically once
started there were 3-4 sources (out of a few hundred) each
delivering around 3-10KiB.
- 050301
- Well, more observations on the dreadful P2P situation. I
have left my eDonkey client running with a tasty selection of
free software ISOs. The top uploads served are reported as
KANOTIX-2005-01.iso
with 1.2GB,
ubcd32-full.iso
with 0.9GB,
knoppix-std-0.1.iso
with 0.7GB, and I haven't had
any download running for a bit.
I have occasionally tried to download something to test
the download side, and while my upload side is constantly
busy, when I try to download things often there is a single
host offering them, and there are huge queues.
I am not not at all surprised that my experience so far
(and many others I have read about online) has been so
negative, with extremely poor download rates, scarce
availability ofqyq seeds, and long queueing times.
- 050223
- In the past few days I have done some trying out of P2P
systems like
eDonkey,
Gnutella,
OpenFT,
and it is pretty obvious there are fairly big problems with
the P2P model of operation. The main problem however is simply
lack of bandwidth, that is of
seeds
for
downloads. This is going to become worse and worse as ISPs are
not trying to switch from high monthly fixes fees to low
monthly fees and then charging for bandwidth, in both
directions, but the situation is ugly enough as it is.
The main symptom is that I have had both
aMule
and
giFT
running for a week now constantly and I have uploaded well
over six times more than I have downloaded. When I have tried
to download something, it just gets stuck for many hours or
days waiting for some site to become free, and then it
downloads at very very slow speeds (so it took several days to
get the ISO image of
System Rescue CD
which downloaded in a few dozen minutes from a SourceForge
mirror).
This has been worse for eDonkey than for Gnutella/OpenFT.
This seems to be a rather common experience, and indeed
there are some obvious and systematic causes:
- Almost all P2P hosts are either on a modem or an ADSL
line. This means that at best the theoretical down bandwidth
is half the up bandwidth, and in several cases it is one
eighth, as many services offer 2mb/s accounts with a 256kb/s
up limit (there are also technical reasons for inefficient
line utilization).
- About half of P2P hosts seem to be behind firewalls that
forbid incoming connections completely. This is less of a
problem for Gnutella/OpenFT, but it is just like that with
eDonkey.
Now the combined effect of these two inevitable issues is that
in theory download and upload banwidth should equalize to
about half the typical/most common bandwidth, which is about
256kb/s, or in practice (taking into account some
technicalities) the effective limit should be 28KiB/s. So I
would expect that my up banwidth would be close to 28KiB/s, but
my effective down bandwidth be around 14KiB/s, assuming that
P2P is indeed peer to peer, that is sharing happens
symmetrically.
But I usually observe that my aggregate downloads are for
a lot less than that, that is when downloading happens it is
for 3-6KB/s, and as a rule downloading doesn't happen at all
for hours, as transfer requests are queued before getting a
short burst of 3-6KB downloading, for an average download
speed well below the typical 3-6KB, never mind being equal to
the upload speed.
My up link not only runs at top capacity all the time
(which is not good, as I am on a theoretically 1:50 contended
service), I also see in the queue of people waiting for a
chance to download from host around 80-100 hosts, and some of
them have been queued for days.
All this indicates that not only maximum upload speeds are
on average much lower than download speeds, because of
asymmetric speeds for both v.90 modems and ADSL, and that
around half of the hosts participating don't accept clients at
all because of firewalls, but that very very very few sites
are actually sharing.
In other words the typical usage pattern is that people
get online, wait a long time to download
something they are interested in, and then once the download
is complete, they close the connection/sharing.
In other words that files are shared just about only when
they are being downloaded, as they are being downloaded, and
then only in half of the cases, and at less tha half speed,
and most crucially when they are mostly incomplete.
The waiting happens because very few of the P2P hosts have
complete file image to share, and everybody else has got
incomplete ones that are incomplete in the same way.
In other words:
- P2P actually have very hierarchical,
download
style usage patterns.
- There are very few
seed
hosts to
kickstart the temporary sharing that is what actually
takes place, and these seeds are on relatively slow and
overloaded connections.
In effect P2P networks are not fully peer to peer, they are
shared download systems (much like BitTorrent),
with not much in the way of sites to download from to start
with.
The consequence is that P2P systems currently are just
about useless for an important and interesting use, which is
to replace or augment FTP/HTTP/RSYNC/Torrent sites as the
primary distribution mechanisms for free software, and in
particular for ISO images of free software operating system
install CDs.
This is highly regrettable, because P2P could instead be a
particularly efficient viral marketing
channels for free software installers.
Two fixes are possible, one both weak and unfeasible, and
the other unlikely but in theory excellent but for a detail:
- Change the behaviour of peers to continue sharing even
after the
download
completes.
- This is weak because peers, most of whom have consumer
grade, contended modem or ADSL connections, have pitiful
upload bandwidths to offer, and it is unfeasible because
it just goes against the grain of user behaviour, and the
more commonly ISPs charge for bandwidth, against their
self interest.
- Put the same repositories that currently offer their
archives on FTP/HTTP/RSYNC/Torrent on P2P networks too
- This can work really well and greatly improve the
reliability of downloads from those sites, at the same
time relieving them of a large part of the bandwidth cost,
as after all while people download they also end up
sharing. It is unlikely however that this will happen, as
many repositories are publicly funded (e.g. hosted by
universities) and P2P systems have been demonized as
vehicles for dishonest and criminal behaviour. The
technical problem is that P2P systems typically present a
completely flat view of the namespace of available files,
and most existing archives are arranged, for very good
reasons, hierarchically. This can be fixed by having P2P
servers
flatten
file paths into file
names, which is not hard.
As a final note, I suspect that the current popularity of P2P
systems despite their awful performance is due to historical
causes; in the beginning all P2P systems probably were in
effect seeded by University students, and in particular
computer science ones, who enjoyed symmetrical and very high
bandwith connections thanks to attachment to their campus
network.
Then the enormous amount of bandwidth consumed and the
illegal nature of much of the content offered for sharing led
universities to forbid such seeding, and the P2P systems
remaining out there are now seedless and sad ghosts of what
they were, still popular thanks to fresh memories of a golden
age that is no more.
- 050223
- I am looking into
P2P programs, mostly based
around the eDonkey or the
OpenFT protocols.
The motivation of this research is that freedom software
packages are becoming ever more sophisticated and bigger, in
particular for the albums/compilations known as
distributions
, especially the live CD ones.
The existing methods are all somewhat unsatisfactory:
- FTP or
HTTP
-
- Download is from a single server per file, putting
huge loads on server.
- No built in verification of the integrity of the
transferred file.
- When an MD5 checksum file is also available, this
only tells whether the download failed, not
where.
- Partial downloads in practice can only be restarted
from the end.
- Fortunately there are very many FTP and HTTP
servers, even if they are prone to congestion,
unfortunately there are few systematic catalogs of
servers and indexes of their contents, with the result
that the well known servers are even more prone to
congestion.
- RSYNC
-
- RSYNC downloads in chunks and verifies the integrity
of each chunk, and can redownload any arbitrary chunk,
so that's pretty nice.
- There is still a single download source at a
time.
- There are relatively few download servers.
- There are even fewer ways to find catalogs of RSYNC
servers and indexes of their contents than for FTP and
HTTP.
- Existing RSYNC clients are slightly more awkward
than FTP or HTTP servers, which have nice shell-style
or commander-style interfaces.
- BitTorrent
-
- BitTorrent is basically RSYNC where chunks can come
from many different servers, which all register with a
the original BitTorrent server, which may or may not
be the one with the original content.
- eDonkey
- TBD
- Gnutella
- TBD
- 050218
- The ridiculous font situation under Linux is
getting ever worse. I have been looking at changing the font
used for the
GUI
elements (toolbars, menus, not the page) in
Mozilla and Firefox. The following disgusting issues
arose:
- Mozilla uses GTK 1, which
uses the X11 native font system,
and Firefox uses GTK 2, which completely ignores it in
favour of that idiocy,
FontconfigXft2.
- One can change the Mozilla GUI font by
editing/overriding the theme description for its GTK 1
theme, that is by adding some poorly documented lines to
$HOME/.gtkrc
.
- In theory, and as documented, one can change the
GUI font for Firefox by similarly editing/overriding the
GTK 2 theme, by adding some poorly documented lines to
$HOME/.gtkrc-2.0
.
- The GTK 2 per-user theme file is called
.gtkrc-2.0
even in the 2.2 and 2.4 releases
of GTK 2.
- The font specification in the
.gtkrc-2.0
file uses the setting gtk-font-name
whose
syntax is similar to, but incompatible with that
of Fontconfig/Xft2 font names, which in turn is hardly
documented, and the differences seem gratuitous. For
example, in Fontconfig font names the point size is
separated from the font name by a dash, but not in GTK 2
settings.
- In any case, a bright guy has made sure that several
settings which are possible in
.gtkrc-2.0
are
actually overriden by equivalent settings in the GConf
database, which apparently is only documented in an email
announcing this patch to a mailing list; this requires
Firefox to be dependent not just on the GTK libraries, but
also on the GNOME libraries, or at last the GConf ones.
- Even after all this idiocy has been worked out, if I
choose a bitmap/PCF font it is bold by default, and I
haven't been able to switch that off. Why?
Why? Why?
In this like in many other cases (ALSA springs to mind) the
unwillingness and perhaps inability to think things through
and go beyond the cool half-assed demo
stage seem to me the driving forces.
- 050215
- The insanity of Linux kernel development is becoming ever
more manifest in the 2.6.x series. For the sake of
entertainment I have had a look at the 2.6.x kernel packages
by RedHat and SUSE among many. Well, the RH ES 4.0 2.6.7 krnel
has over 250 patches, and the SUSE 2.6.10 kernel source
package has several archives of patches, incuding a 4
gigabytes one of
fixes
.
Sure, some of these will be cool little features that don't
really need to be in the mainline kernel (like UML and Xen
support), but the number of mere bug fixes, especially inside
drivers, is amazing.
Understandably Linux says that his main worry is to make
sure that the overall core structure of Linux be right, and
this has meant paying a lot less attention to device issues,
but it is getting a bit ridiculous.
Also, RedHat and SUSE are hardly untrustworthy as to the
stuff they do with their kernel; one might be tempted to just
include almost all their patches into the mainline kernel,
as if they re good for them, probably they are good for
everybody.
- 050202
- Thanks to a letter by Michael Forbes to
Linux Magazine
I have discovered the recently introduced
--link-dest
option to
rsync
and the Perl wrapper script
rsnapshot
that uses it to automate creating backups of filesystems that
are both incremental and full, using forests of hard
links.
- 050201
- As to ALSA, I haven't had the time yet to check whether
there is a
mixer plugin in 1.0.8
but it has a reworked
alsamixer
with a rather
less misleading user interface, in particular for controls
that do not correspond to sound channels.
- 050112
-
Quite entertaining interview with Linus Torvalds in the
January 2005 issue of Linux Magazine
among the interesting points is that he is currently using
a dual
PPC
G5 system, to practice code portability, and that
he lists along with x86 and
PPC
the ARM architecture as one of the crucial Linux
architectures, and the importance he gives embedded Linux, as
well as
SMP
(on which he says his pessimism was wrong, which I disagree with).
Very interesting blog entry about the
consequences of
defining pseudo-OO in base C
which then suggests the use of a
preprocessor to autogenerate all the plumbing:
Lets face it, because of C's constraints, writing GTK
code, and especially widgets, can be ridiculously slow due to all
the long names and the object orientation internals that C can't
hide.
C with Classes
anyone? :-)
Also, found an interesting product that
traces a lot of WIN32 API calls.
- 050111
- Good news for those concerned with the slightly primitive
state of ALSA mixing: apparently version 1.0.8rc2 has a mixer
abstraction plugin in the ALSA library, and a new graphical
mixer application,
Mix2005
has been anounced.
- 050109
- Rather fascinating article on Tomcat and general
web serving performance issues
So you want high performance
by Peter Lin. It discusses issues like the very
high cost of parsing XML, optimal JNDC architectures, how much
time and money it takes to get physical high speed lines, and
the cost of power and cooling for faster CPUs and disks in
racks.