This document contains only my personal opinions and calls of judgement, and where any comment is made as to the quality of anybody's work, the comment is an opinion, in my judgement.
[file this blog page at: digg del.icio.us Technorati]
A report by CERN in 2007 showed that undetected storage system errors are far more common (1, 2) than manufacturer estimates of undetected storage device errors, mainly because of hardware and firmware bugs, but not just.
Stored data corruption can be detected by well-placed strong checksums, and can be corrected by using redundancy, either in the form or replication or of error correction codes.
The only proper way to protect data againt
corruption is end-to-end
, that is to
associate checksums and redundancy with the logical data
object itself, so it is carried with it wherever it is
stored.
But that imposes an overhead for curation of data, which is expensive for data of middling importance. Therefore checksums, and sometimes even redundancy, are a common feature of storage devices: for example every data block on a magnetic disk drive has them.
For some years now server (and personal computer) CPUs have been fast enough to allow just-in-case computing of checksums inside filesystems, every time data is written into a file, optionally using them for data verification every time data is read from a file. The Linux filesystems that use them currently are:
name | meta data | data | type | inline | notes |
---|---|---|---|---|---|
Arvados Keep | no | yes | MD5 | no | Distributed filesystem. Checksums identify data blocks. |
ZFS | yes | yes | Fletcher, SHA256 | no |
Scrubbingcan verify all checksums. |
Reiser4 | yes | yes | CRC32c | yes | Since version 4.0.1 of Reiser4. Requires at least version 1.1.0 of reiser4progs. |
Btrfs | yes | yes | CRC32c | no |
Scrubbingcan verify all checksums. Checksum field is 256b, currently only used for CRC32c. Checksumming can be disabled explicitly or by turning off copy-on-writemode. |
NILFS2 | yes | yes | CRC32? | no? | Not checked on read. Used to detect failed writes during log recovery rather than data corruption. |
ext4 | yes | no | CRC32c | no | Since Linux 3.6. Requires: conversion to 64b layout, version 1.43 of e2fsprogs. |
XFS | yes | no | CRC32c | yes | Since Linux 3.15. Requires: conversion to version 5 layout, version 3.2.0 of xfsprogs. |
Note: in the above inline
indicates whether the checksum is embedded with the metadata
or data it refers to. If the checksum is inline it indicates
only whether the data is corrupt or not, but it could be the
wrong data; if the checksum is not inline, but in some kind of
data descriptor, it also allows detecting mismatches between
expected and actual data, but the check is more expensive. The
authors of Reiser4
prefer inline checksums
while the ZFS authors
prefer checksums in data descriptors.
Note: among non-Linux filestems there is the BSD filesystem Hammer, the MS-Windows filesystem ReFS, the distributed filesystem GPFS, and Apple's not-quite-finished APFS.
It seems widely believed that redundancy-based systems also
have checksums
that can be used to verify
data integrity. Such redundancy systems can be replication as
in RAID1, or parity and syndrome like in RAID5 and RAID6, or
erase codes
in more complex parity
schemes. For example the
MD RAID
module of Linux has check and repair
operations that start something similar to the scrubbing
of ZFS and Btrfs and use available
redundancy to fix detected issues.
While these checks can be useful as far as they go, they do not go very far, because they are consistency checks at best, not integrity checks, and they exist only because it is possible to double-use redundancy schemes to perform limited consistency checks.
The basic issue is that it is possible to use some type or
another of code
(of which replication is a
simple form) for identity, integrity and redundancy purposes,
but the types of codes that are good for one purpose are not
necessarily as good for the others.
For example MD5 is still good to check integrity but not as good to check identity: if you download a file and its MD5 checksum and its computed MD5 checksum matches the content of the file then it is quite likely that the file was not corrupted during download, but it is rather less likely that the file downloaded matches the original, even if these seem the same statement; because the probability of a random error resulting in the same MD5 code is low, but it is possible to deliberately construct a file with the same MD5 checksums as another. Even if it is possible it is expensive, so a match of MD5 checksums provides some assurance of identity too.
Note: in the not-inline
case
mentioned in a previous note an integrity checksum in a data
descriptor is effectively dual-used as an identity code. Now
most integrity codes are CRC32c which is a lot weaker for
identity checking than a strong identity code like MD5 or
SHA256.
Similarly using redundancy codes to verify integrity. And that's just about the codes; proper integrity checking also rely on careful staging of checks, which may be different from that for creating and using redundancy.
While recently reading a fairly enthisiatic review of the Samsung mobile phone Galaxy S7 Edge it was unsurprising to read:
With a renewed focus on mobile gaming, and and perhaps also an eye on the problems that rival manufacturers' phones have suffered with overheating, the Samsung Galaxy S7 Edge now has an internal liquid cooling system.
In practice, it will still heat up if you put it under pressure. Run the full set of GFXBench graphics tests while charging, for instance, and the phone will still get uncomfortably warm. I measured it at a peak of 43°C. It’s good to know that Samsung is acknowledging the issue, though, even if not solving it entirely.
Unsurprising because I had
already noted
that desktops are never going away because laptops, tablets
and phones normally are in contact with parts of the user
body, and are difficult to cool, so if they are powerful they
are going to get uncomfortably warm
.
The review was enthusiastic because it is a really well designed, powerful mini-tablet, with very high quality components, including a very good AMOLED displays with 2560×1440 pixels and a 5.5in diagonal.
Like the similar top rated Apple mobile phones all this does
not come cheap: It's expensive, costing £639 inc VAT
at retail
.
That seems currently unremarkable, but let's ponder that figure again: it it equivalent to around US$ 1,000 and in the USA, one of the richest countries in history, about half of workers would have difficulty paying an unplanned US$ 400 expense.
Clearly US$ 1,000 mobile phones are conspicuous
consumption items, status symbols, as smart
mobile phones with fairly reasionable functionality cost
$100-$200, so one would expect the top end phones of Apple and
Samsung to be elite products that sell in very small numbers
to those for whom US $1,000 is a small matter; Apple's
founder once stated that Apple products were meant to be the
equivalent of BMW cars, that is indeed a luxury product for
the affluent.
However Apple and Samsung top end products are in effect mass market ones and many are purchased by the people in the income class that would not be able to pay an unplanned US$ 400 expense, and that's why Apple is so huge and immensely profitable, instead of being merely a high-margin, small-sales niche producer. There are three main reasons for this:
first-worldcountries since the 1980s, high interest-rate credit is easily available to fund sales of expensive status symbol gadgets by relatively low-income purchasers; especially when the loan and the high interest-rate charged are disguised as long-term mobile phone service contracts, where often the mobile phone service price is in effect a small fraction of the monthly payment. The review mentioned above indeed says
Free from £36 per month(around US$ 50) where a minimum term of 2 years usually applies; of those perhaps £6 are for the actual mobile phone service. Plus the larger monthly payments usually ends up being paid for rather longer, due to consumer inertia, than those 2 minimum years.
The last point is by far and away the most important, because
it means that Apple's business model
is no
longer really that of selling premium electronics products,
but of
selling (indirectly via phone companies)
small mortgages
with a high
interest-rate (what used to be called usury
). Without a phenomenally loose credit
policy Apple would not be able to have both
such large profits and such huge unit sales.
Consider other Apple products like laptops that are in the same price range: many upper-middle class persons have an Apple MacBook, which is a status symbol too, but also a more plausibly utilitarian gadget than a US$ 1,000 mobile phone, but that is a small market of affluent consumers for which an up-front expense of US$ 1,000 is eminently affordable. That used to be the natural market for Apple and BMW products, and Apple did well out of it, but not as spectacularly as when it started effectively selling small-mortgages when it introduced the iPhone line.
Note: Also compare with wristwatches, briefcases or purses: there are many more people with US$ 500 or US$ 1,000 cellphones than with wristwatches, briefcases or purses.
The HTC and Vive
Vive
is a virtual reality
access device that
sits between the VR room
(CAVE),
where most of the access is part of the room, and personal
access devices like stereoscopic glasses, recently become
popular in their variant with an embedded mobile phone. The
Vive includes both VR room equipment in the form of locators
that provide a room-based frame of reference, and stereoscopic
glasses that provide user-based viewing and the pointer
for the locators: the combined effect
is that when the user wears the glasses it is as if the user
were in a CAVE.
The Vive is one of the first VR access devices that has both a price accessible to (affluent) consumers and quality that is good enough for sustained immersion. A guest on a science-fiction's author website has given it a very positive review with the observation that such VR access devices may have interesting side-effect.
Many years ago I was involved in some work on something
similar, and I had a few early VR access devices, including an
early pair of nVidia stereoscopic glasses that worked pretty
well at the time even if with very reduced quality to what is
possible today. At the time I had suggested something similar
with ultrasound room-based locators as in
ultrasound 3D mice
(I bought a farly cheap commercial model 15 years ago and it
worked quite well) which I think is still a good option for
example for desk or body based locators; another VR access
device uses webcams as locators, and other techniques can be
used.
What is notable about the review is mostly the very strong
sense that it provides a good quality access, that the
immersion from the VR access is on a similar level of quality
as that from a conventional spectator
monitor.
A while go I got some mechanical keyboards and gave some first impressions, so here is an update.
As to the CM Storm QuickFire TK. I have been disappointed to see that four of its keys have become unreliable quite a while ago. These are supposed to be highly reliable Cherry MX brown switches, and I could accept that one of them may have been defective, that that four of them (among them the much used Enter key) have become unreliable may mean that they were fakes.
Note: The Enter key is mostly dead. But it works if I remove the keycaps and as I press it I also push it northwards, where presumably the keyswitch is. Which means that the metal lip of the keyswitch has become dirty or is off position has already become enervated.
I have replaced it as my main keyboard with the Ducky DK-9087 Shine 3 keyboard that has continued to work very well, and is nicely backlit. The black rubber lacquer on its backlit keys has worn out a bit on the left-Shift, left-Ctr, A, I, O and Enter keycaps, but that is expected and the advantage of having Cherry MX keys is that there is a choice of spare keycaps. I still prefer the texture of the PBT keycaps I bought but they are less suitable for backlit keys, so I have not put them on yet.
The Corsair K65 still works well but I have not used it much, as I was using it for my rarely used test and gaming desktop, and then I replaced it, for testing a Zalman KM-500.
The KM-500 is one of the cheapest mechanical keyboards, and
it does not use Cherry MX switches, but they are very similar
to the Cherry MX Black ones, and have Cherry MX compatible
keycaps. The version I got had a non-internatiinal layout with
a thin Enter
key, UK keycaps but
104 keys instead of 105 keys.
The missing key is the one with the vertical bar and backslash symbols. Under X-Windows these can be typed with ISO_Level3_Shift-`, which is marked on the keycap, and with ISO_Level3_Shift--, which is not marked, and somewhat inconvenient, especially for MS-Windows users that use the backslash a lot.
Note: ISO_Level3_Shift in X-Windows is usually mapped on the Right-Alt or Alt-Gr key.
After light use I am fairly happy with it, but I have seen reviews on the web that say that after time some keys become unreliable, but this has not happened to me yet.
Also while all the keyboards and mice that I bought work-ish with my PCs, some use advanced USB protocols, and those do not work with some of my USB hubs or KVM switches. Fortunately the K65 has a hardware switch to enable various different protocol modes, and so does the SHINE3. Some of the mice however just are not supported by some hubs or KVM switches.
My laptop is four years old and soon after buying it I also replaced its disk drive with a 256GB flash SSD (1, 2, 3) which I use for the / and /home filetrees.
It is still going quite without problems and it reports only 3% of it rated total writes has been used after 36,279 hours (around 1,500 full days) of use:
# smartctl -A /dev/sda smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.4.0-22-generic] (local build) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF READ SMART DATA SECTION === SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 100 100 050 Pre-fail Always - 0 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0 9 Power_On_Hours 0x0032 100 100 001 Old_age Always - 36279 12 Power_Cycle_Count 0x0032 100 100 001 Old_age Always - 620 170 Grown_Failing_Block_Ct 0x0033 100 100 010 Pre-fail Always - 0 171 Program_Fail_Count 0x0032 100 100 001 Old_age Always - 0 172 Erase_Fail_Count 0x0032 100 100 001 Old_age Always - 65 173 Wear_Leveling_Count 0x0033 097 097 010 Pre-fail Always - 101 174 Unexpect_Power_Loss_Ct 0x0032 100 100 001 Old_age Always - 133 181 Non4k_Aligned_Access 0x0022 100 100 001 Old_age Always - 67 8 59 183 SATA_Iface_Downshift 0x0032 100 100 001 Old_age Always - 4 184 End-to-End_Error 0x0033 100 100 050 Pre-fail Always - 0 187 Reported_Uncorrect 0x0032 100 100 001 Old_age Always - 0 188 Command_Timeout 0x0032 100 100 001 Old_age Always - 0 189 Factory_Bad_Block_Ct 0x000e 100 100 001 Old_age Always - 81 194 Temperature_Celsius 0x0022 100 100 000 Old_age Always - 0 195 Hardware_ECC_Recovered 0x003a 100 100 001 Old_age Always - 0 196 Reallocated_Event_Count 0x0032 100 100 001 Old_age Always - 0 197 Current_Pending_Sector 0x0032 100 100 001 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 100 100 001 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 100 100 001 Old_age Always - 0 202 Perc_Rated_Life_Used 0x0018 097 097 001 Old_age Offline - 3 206 Write_Error_Rate 0x000e 100 100 001 Old_age Always - 0
I have another two 256GB flash SSDs, one from SK Hynix that is in a PC that I rarely switch on:
# smartctl -A /dev/sda smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.4.0-18-generic] (local build) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF READ SMART DATA SECTION === SMART Attributes Data Structure revision number: 0 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 166 166 006 Pre-fail Always - 0 5 Reallocated_Sector_Ct 0x0032 253 253 036 Old_age Always - 0 9 Power_On_Hours 0x0032 097 097 000 Old_age Always - 3342 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 51 100 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 2982997 171 Unknown_Attribute 0x0032 253 253 000 Old_age Always - 0 172 Unknown_Attribute 0x0032 253 253 000 Old_age Always - 0 174 Unknown_Attribute 0x0030 100 100 000 Old_age Offline - 12 175 Program_Fail_Count_Chip 0x0032 253 253 000 Old_age Always - 0 176 Erase_Fail_Count_Chip 0x0032 253 253 000 Old_age Always - 0 177 Wear_Leveling_Count 0x0032 100 100 000 Old_age Always - 2744576 178 Used_Rsvd_Blk_Cnt_Chip 0x0032 100 100 000 Old_age Always - 29 179 Used_Rsvd_Blk_Cnt_Tot 0x0032 100 100 000 Old_age Always - 214 180 Unused_Rsvd_Blk_Cnt_Tot 0x0032 100 100 000 Old_age Always - 5098 181 Program_Fail_Cnt_Total 0x0032 253 253 000 Old_age Always - 0 182 Erase_Fail_Count_Total 0x0032 253 253 000 Old_age Always - 0 183 Runtime_Bad_Block 0x0032 253 253 000 Old_age Always - 0 187 Reported_Uncorrect 0x0032 253 253 000 Old_age Always - 0 188 Command_Timeout 0x0032 253 253 000 Old_age Always - 0 191 Unknown_SSD_Attribute 0x0032 253 253 000 Old_age Always - 0 194 Temperature_Celsius 0x0002 029 000 000 Old_age Always - 29 (Min/Max 14/41) 195 Hardware_ECC_Recovered 0x0032 253 253 000 Old_age Always - 0 201 Unknown_SSD_Attribute 0x000e 100 100 000 Old_age Always - 0 204 Soft_ECC_Correction 0x000e 100 100 --- Old_age Always - 0 231 Temperature_Celsius 0x0033 253 253 --- Pre-fail Always - 0 234 Unknown_Attribute 0x0032 100 100 --- Old_age Always - 10896 241 Total_LBAs_Written 0x0032 100 100 --- Old_age Always - 8702 242 Total_LBAs_Read 0x0032 100 100 --- Old_age Always - 2014 250 Read_Error_Retry_Rate 0x0032 100 100 --- Old_age Always - 1720
Another from Samsung which I bought 11 months ago:
# smartctl -A /dev/sde smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.4.0-21-generic] (local build) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF READ SMART DATA SECTION === SMART Attributes Data Structure revision number: 1 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0 9 Power_On_Hours 0x0032 098 098 000 Old_age Always - 8813 12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 56 177 Wear_Leveling_Count 0x0013 097 097 000 Pre-fail Always - 127 179 Used_Rsvd_Blk_Cnt_Tot 0x0013 100 100 010 Pre-fail Always - 0 181 Program_Fail_Cnt_Total 0x0032 100 100 010 Old_age Always - 0 182 Erase_Fail_Count_Total 0x0032 100 100 010 Old_age Always - 0 183 Runtime_Bad_Block 0x0013 100 100 010 Pre-fail Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0032 073 059 000 Old_age Always - 27 195 Hardware_ECC_Recovered 0x001a 200 200 000 Old_age Always - 0 199 UDMA_CRC_Error_Count 0x003e 099 099 000 Old_age Always - 9 235 Unknown_Attribute 0x0012 099 099 000 Old_age Always - 39 241 Total_LBAs_Written 0x0032 099 099 000 Old_age Always - 37982825467
Both are used also for the / and /home filetrees, and they have been also problem-free. In general my experience is that flash SSDs are more reliable than disk drives, and in that reports of low flash SSD reliability depend on poorly designed hardware or firmware, or very high rates of rewriting.
What has gone wrong is raw speed (sequential, 1,000×1MiB blocks): the 4 year old Crucial device has slowed down from the SATA2 top speed of 250-270MB/s either reading or writing to 128MB/s reading and betweeen 49MB/s and 117MB/s writing; the SK Hynix device has slowed down from around 500MB/s reading or writing to 318MB/s reading and 395MB/s writing; the Samsung device still does around 500MB/s reading, but writing oscillates between that and 190MB/s.
Obviously the random access times are still very good, so the responsiveness is good, but the slowedown on the four year old drive is pretty huge and surprising. My current guess is that since it is used for the / and /home filetrees it is subject to a large number of small rewrites, e.g. from rsyslog and collectd. Presumably then the distribution of free blocks among erase blocks is then very fragmented, and so will be that of data blocks when they get rewritten; I guess that if I were to backup the contents, run SECURITY ERASE, and reload, it would go back to top rated speed. After all the volume of rewrites is small as the devices all report very little use of the rate write lifetimes.
A knowledgeable colleague gave me a link to
a very amusing presentation
based on pretending that it is still the 1970s, major new
programming concepts have been developed, and it is easy to
predict that they will be dominant in 20-40 years time; while
none of them did. One of these was visual
programming
languages, those based on flowchart-like
organization of the code.
This brought to me the site for the Self language in which visual programming is possible, and in particular another presentation on it and related topics by David Ungar.
The latter presentation made me cringe because while the presenter correctly points out that Self is not widely used, the claim that it is in some grand sense better than stuff which is widely available are grossly misguided.
The first is the naive conceit that visual programming in the sense of a reactive virtual world on screen in which programs are represented as flowcharts somehow makes programming easier. The challenge of programming is in managing scale, and developing and maintaining the interactions and dependencies of vast amounts of complicated code, not building cool looking pictures on a screen. This is because programming is based on understanding what is being coded, and this depends on the ability to build a modular abstract mental model of what is being programmed, and on-screen flowcharts or even dynamic runtime simulations by themselves do not achieve that. The idea that visual programming works is as simplistic as the idea behind COBOL that since it looked like english as in ADD 1 TO number of accounts it would make programming easy even for untrained people.
Another major and related one is the idea that programming
can be made more approachable by adopting some kind of
real-world metaphor, for example that of sending messages
among people, or switches connected to objects inside a
simulation. While I think that visual programming is
essentially useless at building a modular abstract model of a
program, real-world
metaphors seem to me to
make it harder, because the behaviour of program entities only
very superficially can be made to fit in some kind of metaphor
based on real world objects.
A computer desktop
or folder
do not behave like a desk-top or a
folder, except in the vaguest sense, and to operate them users
have to realize it does not, which sometimes take time.
Note: Also in a classic experiment by
Mike Lesk two groups of secretaries were taught
to use the same text editor program in two different fashions:
one group using familiar metaphors for operations like cut
and paste
and scroll
, and the other using made-up
nonsensical names. The second group was at least equally
proficient, because the key was building an abstract mental
model of what the text editor program behaved like, and the
familiar metaphors did not help at that.
The other is that in the specific case of Self two very big related misunderstandings are embodied in it and its terminology, even if they are presented as cool, insightful features by the presenter:
messagesare just the names of dynamically overloaded procedures. Merely calling procedures
methodsand procedure names
messagesdoes not change them in something else. Languages like Smalltalk-80 (and successors) or Objective-C have in effect alternate syntaxes for Lisp (like much else) and in particular a clumsy subset of defmethod. In particular in Self overload resolution can happen only on the first and thus
distinguishedargument of a procedure call, like in Smalltalk-80 (and most successors) or Objective-C, the argument mistakenly called the
targetof the non-existent
message.
TYPE A x; TYPE B y; Method(x,y);the resolution of the overloaded name Method depends not on the types A and B but on the values of x and y, that is the conceptually-global name overload tables are indexed by value and not type.
Overall while visual programming and programming based on metaphors are pointless or bad ideas, the research on Self involved developing or improving several interesting implementation techniques, which were used for other languages and systems. In part because they were pretty smart, but also because the hardware that made those techniques viable had just been developed or were being developed at the same time, and a lot of computer research in essence is the exploration of the possibilities opened up by better hardware.
On my GNU/Linux laptop I have 228 kernel space threads:
tree$ ps ax | grep '[0-9] \[' | wc -l 228
Plus whatever number of creepy little user space background processes, but let's skip that. That 228 count of kernel threads is ridiculous enough. I do not even have 228 peripherals or anything that might remotely require, by working in parallel with other stuff, that many running threads.
What's going on? All those threads are mostly inactive. They
are either waiting to be polled by something or are waiting to
poll something. That is indeed ridiculous. In a typical
quiescent system, as my laptop is most of the time, either
because I am thinking or typing and the computer is in the
endless space between the words
,
the system should have essentially no threads. Perhaps one
user thread for my X terminal, and another one for the X
server it communicates with and is running asynchronously to
it. But nothing else should be going on.
Those threads that are waiting to poll or be polled are unnecessary: they can be replaced by functions that get called by the threads that are properly necessary.
A properly necessary thread is something that runs because it represents inside a computer an external source of activity, like a user or network client (not a daemon). If a user does nothing, or there is no active network client, the computer should just lock the CPU into idle mode, and have nearly zero processes or threads (kernel or user space).
An operating system is a collection of (privileged) libraries, not a program or even worse a process, or even worse still a collection of processes or threads. On a just booted computer the only active process should be:
polledside, the login process (traditionally something like the UNIX init), or one per login device (traditionally something like the UNIX getty), plus the daemon that spawns network service processes (inetd or xinetd traditionally) that serve network clients, or one process per network service.
pollingside, a daemon to run periodic activities from a list (traditionally something like the UNIX crond).
Nothing else is needed: there is no need to do local IO or network communications except when a user or network client require it. Computer systems should be passive devices, not running two hundred (mostly low activity) threads. Old style systems, like Multics or MUSS or UNIX were naturally like that. What went wrong then? Some guesses:
computer sciencecourses often were illustrating a model of program organization as standalone objects, or even parallel communicating
services, even in many cases simple procedures and procedures calls were sufficient. I guess because of a vulgar misunderstanding of encapsulation, or because of coolness. I have seen in many case computer science graduates, especially those taught in Java or similar language, gratuitously encapsulate functions in classes/objects or even services. For example I once saw a hash function wholly seriously encapsulated as a class, with the following mode of use, instead of
a = hash(b);
:
Hash
class: Hash hb =
new Hash;
.Hash
instance: hb.setUnhashed(b)
.hb.computeHash();
.a =
hb.getHashed();
.free hb; hb =
null;
.network services, where even trivial operations are implemented as always-running stateful service processes awaiting requests, not merely as classes encapsulating stateless procedures.
Given the above guesses, in my imagination many hipster developers (or people called Lennart, or Greg) conceive of reading a disk block or allocating memory as a service to be implemented in its own thread, or something similarly inane.
Perhaps we shall soon experience the glorious moment when the
trend is taken to its logical conclusion and Linux will have a
kernel-side process creation service running as a thread (and
user-side ls, cat, cp microservices
implemented as daemons, and a
new-style shell
daemon that orchestrates
calls to them).
I was wishing some years ago for higher DPI displays for desktop monitors and they are slowly appearing, probably because they have become popular on mobile phone, tablets and laptops, in large part thanks to Apple.
I have just noticed that there are now more examples of a 24in monitor that has 3840×2160 pixels, which amounts to nearly 190DPI. It is almost print resolution and the pixels stop being visible at a viewing distance of around 50cm or 20in.
The price of £230 is also remarkable because it has an IPS panel and it is much the same as that of a similar 24in monitor with 1920×1200 pixels.
There are also similar monitors with a 27in display with 160DPI for around £360, and a 32in display with nearly 140DPI for around £600.
My 32in 2560×1440 monitor still looks good to me (even if still feeel a bit too large), so I am not upgrading soon, plus a monitor with a display with more than 1920×1200 is not supported well by older laptops like my current one or by cheaper KVM switches.
For a long time I have been doing computer science work and some research, and most of it was aboiut infrastructure, and a lot of it was in recent years about the Internet and the World Wide Web. I continue to have a sense of wonder at some of the most remarkable outcomes of their history, outcomes that give me greater hope for the evolution of civilization, if they can be matained.
The most obvious examples are Google Earth, Wikipedia, Google Youtube, Yahoo Flickr, The Internet Archive, The Gutenberg Project. These in essence are all libraries, and indeed I am a bookish person, but that is not the mere reason I am awed by them: it is not just for the vastness of their content, or for their worldwide availability, for the number of volunteers that maintain them, but mostly because that content was difficult and expensive to access before the Internet and the World Wide Web, and is now much easier to find and download.
These sites provide not just content, including a long tail of content which is important but of narrow interest, but very accessible content. Because of the very large improvement in accessibility they are probably in the long term as important as the reinvention of printing by letter by Gutenberg in the 15th century, which at first made existing content far more accessible.
In a similar way I appreciate very much (in different ways) blogging and statistical sites like Wordpress, Blogspot, University of Oxford podcasts, Tumblr, even if much of their content is vanity publishing, and sites like St. Louis FRED, Yahoo! Finance, EuroStat, OECD Statistics, as they support the puboishing of much interesting content.
I feel wonder from looking at places like Socotra on Google Earth and find it used to be the Dioscurides bishopric of the Church of the East on Wikipedia and read about the history of all of these in the same afternoon, where in years past it would have taken several trips to research libraris.
A few days ago I read the May 25th, 2015 open letter by Linas Vepstas (and there is a somewhat defeatist reply) to Debian and Ubuntu developers in which he quite rightly notes how simple critical aspects of their systems have been complexified into fragility. The prime examples are the usual ones of DBUS and systemd and the names the usual Kay Sievert and Lennart Poettering as influential instigators (to which I would add the similarly destructive Keith Packard and Greg Kroah-Hartman, and the latter seems to me one of the first and most despicable).
A minor but telling symptom happened to me today, where
when replacing a failing hard disk I found that I could not
reformat one of its partitions as it resulted still
in use
by the Linux kernel. This despite me
having unmounted it, checking /proc/mounts, and
checking dmsetup ls and losetup -a
just-in-case; plus, knowing that the system was running some
services in LXC-style containers
, so I also
checked whether it was mounted or used in any of those
containers.
I was perplexed especially as I noticed that the four kernel threads that the XFS filesystem creates on a mount were still active for the relevant block device, so the kernel definitely reported it being in use correctly, and not a generic bug. So I started looking for a bug in XFS unmounting related to IO errors as the disk was being replaced because of IO errors in the filesystem metadata, which might have resulted in an aborted and stuck unmounting.
I found some mentions of similar situations in various web
pages, but the best hint I got from a page that suggested
doing a further check, unrelated to XFS issues: to check also
the /proc/*/mounts pseudo-files, which list
per-process (rather than per-system or per-container) mount
points maintained by Linux. It turned out that five of the
hundreds of processes had a private
(per-process) mount of the relevant block device, none of
which actually needed it.
That is each of those five processes had per-process mount
namespaces
.
So I checked and the system had several such namespaces:
/proc# ls -ld */ns/mnt | sed 's/.* //' | sort | uniq -c 518 mnt:[4026531840] 1 mnt:[4026531856] 1 mnt:[4026532278] 29 mnt:[4026532853] 78 mnt:[4026532856] 77 mnt:[4026532930] 37 mnt:[4026532998] 77 mnt:[4026533066] 107 mnt:[4026533134] 29 mnt:[4026533200] 1 mnt:[4026533522] 1 mnt:[4026533524] 1 mnt:[4026533649] 1 mnt:[4026533650] 1 mnt:[4026533686]
Here the one with the highest count is the per-system one, those with the next highest counts are those for the LXC containers, and the those with the single digit counts are per-process and used by a single process. The latter are somewhat inexplicable but most likely exist by accident: the relevant processes seem related to shameless OpenStack cluster and VM management system, which includes the overcomplicated Neutron virtual network system, and it uses network namespaces, and it is possible that someone coded it to unshare all namespace types instead of just networking.
But even given this there is a complexifying consequence of having per-process namespaces and how they have been defined: that in effect checking /proc/mounts or the /proc/mounts of all LXC container is currently pointless, one must always check the namespaces of all running processes.
Because namespaces can be recursively redefined, and containing namespaces (at least filesystem ones) are not supersets of the contained ones. Which means that it is not possible to look at the resources used by a containing namespace to manage it, one has to look explicitly at those of all contained ones too.
That makes investigating and reasoning about systems a lot more complex, and I have already noticed that a lot of people do not seem to be fully able to understand well the consequences of layers of virtualization and partitioning, such as VLANs or even simpler things like the consequences of sharing a disk arm among too many virtual machines, never mind sophisticated classification and access control schemes like SELinux.
I have finally decided to find the reason why I have been getting this kind of kernel report every few seconds:
cfg80211: World regulatory domain updated: cfg80211: DFS Master region: unset cfg80211: (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp), (dfs_cac_time) cfg80211: (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2000 mBm), (N/A) cfg80211: (2457000 KHz - 2482000 KHz @ 40000 KHz), (300 mBi, 2000 mBm), (N/A) cfg80211: (2474000 KHz - 2494000 KHz @ 20000 KHz), (300 mBi, 2000 mBm), (N/A) cfg80211: (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 2000 mBm), (N/A) cfg80211: (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 2000 mBm), (N/A)
That took a bit too much time and insight, and the general context is:
eventto request a configuration of that regulatory domain.
iw reg set DE; env COUNTRY=DE crda(a default may be specified in most distributions in /etc/default/crda for use by setregdomain) so that first the DE regulatory domain is suggested to the kernel, and then its configuration is sent to the kernel, or by setting an equivalent rule for the awful udev subsystem for example like:
KERNEL=="regulatory*", ACTION=="change", SUBSYSTEM=="platform", RUN+="/sbin/crda"
But there are some details that cause trouble:
Within the driver, we use the ficticious country code “X2” to represent this worldwide regulatory domain.
There is currently no interface to configure a different domain. The driver reads the SROM country code from the chip and hands it up to mac80211 as the regulatory hint, however this information is otherwise unused with the driver.
In practice the simplest solution is to disable the udev rules for regulatory domains, on my system 40-crda.rules and 85-regulatory.rules, because if there is no rule for an event it gets ignored.
Another solution would be patching crda so it can send the configuration for a domain under the name of a different domain, so as to send it always with domain code 00.
But I am not sure whether the latter would work, because the crda documentation says that the regulatory domain configuration that is used by the kernel is the intersection of that received by it and that stored on the device:
In order to achieve this devices always respect their programmed regulatory domain and a country code selection will only enhance regulatory restrictions.
Unfortunately the regulatory domain programmed into the
device as X2 is the default world
domain
which is already the most restrictive.
After looking at
Isilon
as to
anisotropy
I have noticed that some founders of Isilon have created
another company to develop and market an improved product,
Qumulo.
From a few articles on Qumulo it seems that Isilon had very
anisotropic behaviour as to recovery (one of the
figures of merit
for filesystems that
I had listed
previously):
To prove that QSFS can handle lots of file, Qumulo stacked up a minimum configuration of its appliance, which has fou But the Qumulo appliance can rebuild the data from a lost drive in about 17 hours. (The company is not saying how it is protecting the data and what mechanism it is using to rebuild lost data if a drive fails, but says the process takes orders of magnitude less time than in other scale-out storage arrays sold today.)
Godman says that the system can do a rebuild and continue to perform its analytics regardless of the load on this entry storage server nodes and loaded it up with over 4 billion files and nearly 300,000 directories.
Godman says you cannot do this with Isilon arrays because if you blew a disk drive it would take you months to rebuild the data. But the Qumulo appliance can rebuild the data from a lost drive in about 17 hours.
If recovery is done by enumerating objects then it amounts to a whole-tree operation, and the consequence is that large numbers of objects take a lot of operations. I have previously remarked on how slow and expensive fsck-style operations can be (1, 2, 3).
As to the recovery time of Qumulo, that article continues with:
But the Qumulo appliance can rebuild the data from a lost drive in about 17 hours.
(The company is not saying how it is protecting the data and what mechanism it is using to rebuild lost data if a drive fails, but says the process takes orders of magnitude less time than in other scale-out storage arrays sold today.) Godman says that the system can do a rebuild and continue to perform its analytics regardless of the load on this entry system.
Previously the entry system
had been described as:
The base Qumulo Q0626 appliance comes with four server nodes, each of them coming in a 1U form factor with four 6 TB Helium disk drives from HGST/Western Digital and four 800 GB solid state disks from Intel.
The nodes in the appliance have a single Intel Xeon E5-1650 v2 processor with six cores running at 3.5 GHz and 64 GB of main memory to run the Qumulo Core software stack. The storage servers have two 10 Gb/sec Ethernet ports that allows them to be clustered together
Those 6TB drives can be duplicated sequentially in around 11-12 hours, and that is as fast as it can be done, so a recovery time of 17 hours while the system is under load largely implies that the recovery process is similar to the resynchronization of a RAID set.
Note: Perhaps a RAID3 set, as the recovery time is claimed to not impact operation, or perhaps a RAID10 set (4 drives) or perhaps a log-structured object replicating scheme.
But then there is implicit in the above another massive anisotropy: the base system has 4×6TB drives and 2×800GB flash SSDs. The 6TB drives are big and cheap, but with an horrifyingly low IOPS-per-TB (1, 2) and exactly the opposite for the 800GB flash SSDs.
Obviously the expectation is that the metadata will reside
entirely on the 1.6TB capacity of the flash SSDs, and that the
working set
of the 24TB of raw disk
capacity will fit long-term in the flash SSDs as well. Given
the relative raw sizes the assumption is likely to be that the
working set is 10-20 times smaller than the capacity of the
system. It is likely that the metadata will fit on the flash
SSDs, but when the working set of the data does not fit on
them those 6TB drives will end up in the oversaturated regime
of arm contention. As someone wrote
latency does not scale like bandwidth.
Clearly a double flash SSD failure would be catastrophic, but it would also be interesting to see the consequences of anisotropy on any realistic workload as to how the system would behave if one of the flash SSDs were removed and the flash SSD cache capacity were thus halved.
While looking at the innumerable logs of a very nice new
Dell XPS 13
(1,
2,
3)
I noticed several entries where CPU speed had been throttled
because of rising temperature, which
did not surprise me because:
Ultrabook® laptop, which means it is very thin and has no active cooling, and dissipates heat via is metal case.
This indicates why desktops are not going to go away: for high CPU or GPU intensive workloads, like reading Google Mail or playing 3D games, desktops can keep a running CPU at a lower temperature with active cooling, and since they do not need to be in contact with the body of the user, can run it hotter if needed.
Avoiding getting hot is quite a fundamental issue with portable devices that are held in contact with parts of the user's body, and it is not going to be solved easily. I am indeed typing this on a laptop on a train and it is keeping my thighs rather uncomfortably hot because I have forgotten to run:
killall -STOP firefox chromium-browser
Conversely any workload that does not involve sustained high CPU or GPU work can be run very conveniently on portable or passively cooled devices, and therefore even many server workloads could be run on laptops or mini-servers.
Ironically it is web user interfaces that keep CPUs busy and thus impact portable computers.
During a chat about storage for big data someone ventured
that
Isilon
storage clusters have a fairly wide and somewhat isotropic
performance envelope
,
to which my comment that they are expensive and they still
have moderately frequent corner cases where they do not work
too well.
This was acknowledged, in particular that they have a cost per TB probably more than four times that of a baseline storage cluster with a narrow performance envelope based on large slow drives with low IOPS-per-TB (excluding from the comparison the Isilon archival product using large slow drives).
That factor of four times does not buy a really clever
design: it is simple design based on a fast, very low latency
network which allows both the use of parity
RAID-like redundancy, and for every backend to be a frontend,
via a distributed data directory.
Put another way, it has a very low latency quasi-broadcast
data and metadata interconnect. That is not quite but somewhat
close to a mostly-scalable non-blocking low-latency
crossbar-switch
(1,
2,
3)
(an isotropic interconnect) and inasmuch it is not a proper
one it still allows pathological cases where hotspots happen
(because of residual anisotropy, especially in the storage
elements it connects).
It is very hard to compete with something that approximates a crossbar-switch because:
Note: the realized performance envelope is the intersection of that of the workload and that of the machinery on which it runs. A machinery with an isotropic performance envelope can therefore run well many different workloads can thus be considered to be grossly overprovisioned in quality rather than capacity, because each workload only utilizes it partially.
However mostly low-latency, mostly isotropic interconnects like that used by Isilon are rarely useful: because it happens rarely that what matters is median or maximum, as opposed to average, latency, and that workloads with very different anisotropic profiles are used against the same machinery.
Because it is usually possible to segment different workloads to differently structured machinery, that is to match the anisotropic workload profile with a similarly anisotopic machinery performance envelope.
So there are three base choices that could be somewhat equivalent at least in cost terms:
The Isilon products implement a mix of the first and second choices, because they do not actually have an isotropic crossbar-switch, but a somewhat more anisotropic interconnect plus more capacity to expand the performance envelope.
That is an interesting choice that seems optimal if workloads are known to have several differently anisotropic envelopes, or if it is not know in advance if they do. But my impression is that usually workload performance envelopes are known in advance, and therefore having multiple machineries to match them is cheaper, and to spend some of the difference in cost in overcapacity of those. The advantage here is that scaling anisotropic machinery is much easier.
Having been impressed recently by the transfer rate of an
M.2 flash SSD
and that of a
32GB USB3 flash key
I have been impressed by the functionality and transfer rates
of some USB3
UASP
storage devices, in particular an external USB3 UASP
NAS disk device device:
$ sudo sysctl vm/drop_caches=1; sudo dd bs=1M count=3000 if=/dev/sdc of=/dev/zero vm.drop_caches = 1 3000+0 records in 3000+0 records out 3145728000 bytes (3.1 GB) copied, 12.9986 s, 242 MB/s
and a StarTech.com
SDOCKU33BV dock
for
that reaches higher with a flash SSD capable of 500MB/s using a
native SATA 6Gb/s bus:
# sysctl vm/drop_caches=1; dd bs=1M count=3000 if=/dev/sda of=/dev/zero vm.drop_caches = 1 3000+0 records in 3000+0 records out 3145728000 bytes (3.1 GB) copied, 6.60298 s, 476 MB/s
This latter with a suitable motherboard thanks to UASP not only is much faster (and more reliable and better specified) than USB3 with the traditional USB Mass storage protocol, but also can work with the smartctl and hdparm tools to offer the same storage maintenance options as native SATA/SAS. UASP is similar then to the ancient ATAPI protocol, which allowed the same SCSI command protocol to be used over the ATA bus instead.
The faster single-CPU speed resulted in a noticable increased in reported disk transfer rates, for example from 170MB/s to 210MB/s. That means that disk transfer rates are somewhat CPU bound, which is not entirely news, but still surprises me a bit.
But mostly having 8 CPUs is also good for backups with pbzip2 (or pigz, lbzip2, pixz, ...) instead of lzop:
asks: 565 total, 1 running, 564 sleeping, 0 stopped, 0 zombie %Cpu0 : 99.0 us, 0.7 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st %Cpu1 : 92.4 us, 6.6 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 1.0 si, 0.0 st %Cpu2 : 97.7 us, 2.0 sy, 0.0 ni, 0.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu3 : 96.4 us, 3.0 sy, 0.0 ni, 0.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu4 : 95.7 us, 4.3 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu5 : 95.0 us, 5.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu6 : 99.3 us, 0.7 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu7 : 86.4 us, 13.6 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem: 16395708 total, 14381516 used, 2014192 free, 100476 buffers KiB Swap: 0 total, 0 used, 0 free. 11103208 cached Mem PID PPID USER PR NI VIRT RES DATA %CPU %MEM TIME+ TTY COMMAND 3287 11565 root 20 0 646392 36104 629832 754.0 0.2 6:37.91 pts/15 pbzip2 -2 3288 11565 root 20 0 4412 788 348 15.9 0.0 0:08.16 pts/15 aespipe -e a+ 3286 11565 root 20 0 47868 8712 5248 10.9 0.1 0:08.07 pts/15 tar -c -b 64+But it is still only around 50MB/s:
8 0 0 170052 13780 13806248 0 0 53288 0 4070 4984 95 5 0 0 0 procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 9 1 0 163232 13780 13815528 0 0 54724 30788 4541 4894 95 6 0 0 0 8 0 0 166176 13780 13815628 0 0 52708 18320 5333 5958 94 6 0 0 0 8 0 0 181432 13780 13802544 0 0 46964 10008 6236 5549 95 5 0 0 0 8 0 0 151932 13780 13833064 0 0 43600 0 3739 4572 96 4 0 0 0 8 1 0 169064 13780 13828516 0 0 51300 44992 4960 5693 95 5 0 0 0 8 0 0 175976 13780 13824516 0 0 47904 18960 4200 4673 95 5 0 0 0 13 0 0 150420 13780 13855808 0 0 50356 0 4069 4882 95 5 0 0 0 8 0 0 156180 13780 13851672 0 0 50348 0 4277 5703 96 4 0 0 0 8 0 0 153664 13780 13856964 0 0 51424 0 4318 5383 95 5 0 0 0
Actually pigz is a lot faster than that and in most cases does not even need 8 CPUs.
Having 8 CPUs is good also for running backups between two
pairs of two drives, especially encrypted ones, or using RSYNC
over checksumming filesystems like BtrFS
.
The general topic of proper storage system design fsync is fascinating and full of controvery, and I have been considering a post on it for many years, but in the meantime here is an illustration of it depth. The basic issue is that:
retire(as opposed to issue) fast random read or writes to or from disk storage in the general case is to use a lot of small fast disks. Assuming that something else can be done has been called a request for an O_PONIES flag.
temporarily.
The issue is to define temporarily
. In a
topical thread
there is a discussion of this issue and in it I see a typical
big misunderstanding of the issues worded as effectively a
demand for O_PONIES:
The operation I want to do is: 1. Apply changes to the store 2. Wait until all of those writes hit disk 3. Delete the temporary file I do not care if step 2 takes 5 minutes, nor do I want the kernel to schedule my writes in any particular way. If you implement step 2 as a fsync() (or fdatasync()) you're having a potentially huge impact on I/O throughput. I've seen these frequent fsync()s cause 50x performance drops!
The request here is the desire that step 2 should last as long as possible, to minimize the frequency of the expensive fsync implementation, but also should happen just before any system or device issue that might cause data loss. Amazing insight: indeed fsync is not needed except before an actual data loss situation. The difficulty is knowing when a data loss situation might be about to happen.
Also the idea that fsync impacts performance instead of speed is based on the usual misunderstanding of what performance is.
This comment is far more interesting:
For example, because of this fsync() issue, and the fact that fsync() calls flush all outstanding writes for the entire filesystem the file(s) being fsync()'ed reside upon, I've set up my servers such that my $PGDATA/pg_xlog directory is a symlink from the volume mounted at (well, above) $PGDATA to a separate, much smaller filesystem.
(That is: transaction logs, which must be fsync()'ed often to guarantee consistency, and enable crash recovery, reside on a smaller, dedicated filesystem, separate from the rest of my database's disk footprint.)If I didn't do that, at every checkpoint, my performance would measurably fall.
Note: here the conclusion
performance would measurably fall
is proper here,
because by separating files with different requirement
better speed is obtained without an equivalent decrease in
effective safety, that is the performance envelope has been
actually expanded.
In part because it makes the excellent point of using storage systems with different profiles for data with different requirements.
In part because it is interesting to note that the statement
that fsync() calls flush all outstanding writes for the
entire filesystem the file(s) being fsync()'ed reside
upon
is not always right: for some filesystems that need
not happen, such as most of those that do some form of copy-on-write
,
for example for Btrfs:
- fsync(file) only writes metadata for that one file
- fsync(file) does not trigger writeback of any other data blocks
Anyhow Btrfs has for fsync intensive workload has a
significant advantage in a common case, one that pushes the
performance envelope wider: since it can easily make snaspshots
it can process multiple
fsync operations in parallel as a single transaction
rather than in series as a carefully ordered sequence of
transactions:
There are also file system limitations to consider: btrfs is not quite stable enough for production, but it has the ability to journal and write data simultaneously, whereas XFS and ext4 do not.
Important Since Ceph has to write all data to the journal before it can send an ACK (for XFS and EXT4 at least), having the journal and OSD performance in balance is really important!
The reason is that while metadata updates must only happen after the relevant data updates, metadata updates by copy only take effect when higher level metadata are committed, up to the top of the tree.
The first desktop monitor for retail sale with an OLED display is the Dell UP3017Q and the display is a 30in 3840×2160 one.
The monitor list price is $5,000 which is quite high,
especially considering that a television with a 55in
3840×2160 display lists for less than half that at
aroudn $2,400
(1,
2);
this may be the difference between mass market and professional
pricing.
Having previously remarked that it is useful when buying something to understand the market segmentation tactics of vendors, I have been looking for a new laptop for myself, so I had a look at a popular online laptop shop that helpfully lists many laptops allowing selection by attribute and listing the number of laptops in their catalog with that attribute. My impressions are that:
This said, my idea of a good contemporary laptop is an average corporate laptop with 13.3in display, 8GiB of RAM and 256GB of flash SSD, typically used with an external display and keyboard and mouse because:
So my usual preference is buy an average corporate laptop from a brand known to take care about them (usually Toshiba, but I also like Lenovo and Dell) and to separately to buy 8GiB of RAM and a 256GB flash SSD and upgrade. In part because I do not really need something like an Ultrabook and I do not particularly like their non-upgradeability or non-repairability.
This is what I did last time, in 2010, as I bought a Toshiba Satellite Pro L630 with the cheapest configuration I could find, with 2GiB of RAM and a 250GB disk drive, and upgraded it to 8GiB or RAM and a 256GB flash SSD, reusing the disk drive for backups.
While that laptop is now 5 years old, it still works very well. I have also put in a bigger battery that gives an autonomy of 8 hours. The only limitations seem to be that the SATA chipset only supports 300Gb/s, and and that the i3 370M 2.4GHz is a bit dated as it draws more power than recent CPUs, and has a slower older graphic core, plus lacks AES acceleration, and VT-d IO acceleration. Conversely it is possible to upgrade memory and storage without fully opening up the laptop shell.
But then I do not usually run VMs, or 3D graphics programs on that laptop, and not that much IO that software AES is that noticeable, and the flassh SSD currently into it is not capable of higher bus speeds than 300Gb/s anyhow.
While Toshiba currently seems to have repositioned the Satellite Pro brand for what I call consumer oriented laptops, the Tecra brand models equivalent to the Satellite Pro L630 are essentially equivalent to it, with much the same speed and features, except for the lower power, more advanced CPU, and much the same price. It looks to me that there has been very little progress in laptop products in 5 years, both as to features and as to price.
I also looked at ease of upgrading RAM, storage and battery, which are for me important concerns: Lenovo designed laptops tend to have 1 RAM slot and to require removing the bottom of the case, Toshiba designed laptops seem to have usually 2 RAM slots of which one is in use, Dell designed laptops (1, 2) seem to be more commonly old-style and have easily removed and replaced disks, often in a caddy, and to have 2 RAM slots, sometimes under a dedicated flap.
Overall I am fairly hesitant as I do not see yet much point in an upgrade until the L630 break.
So I upgraded my current (quite average) desktop replacing its AMD Phenom II X3 720 chip with 3× 2.8GHz CPUs with a new FX 8370E chip which has 8× 3.3GHz CPUs. Total power consumption is supposed to be the same and so are most features.
I did not replace it because of the increase in CPU clock frequency (faster CPUs could be bought in that prices range, but I went to a lower-power-draw CPU), but solely because of the extra numbers of CPUs. That seems strange because there are few applications that can usefully keep 8 threads busy, but the reason for that is instead JavaScript based web sites as each consumes around one CPU.
With only
3 CPUs interactive work becomes
sluggish. While I usually I run my browser with JavaScript
disabled, this cannot always happen as some sites, including
online shops, not just Google and Tumblr and Flickr and the
like, are JavaScript based. AJaX
using sites that implement dynamically self-updating pages are
particularly bad, and that describes many web
applications.
Ideally I would be able to just suspend particular tabs in a
windows or even a whole window, but probably browser
developers have systems with dozens of CPUs, so they do not
feel the need to scratch that itch. I have found an awkward
alternative which is to use a separate Firefox instance using
a distinct profile
just for the worst
sites, and then I can freeze that with kill -STOP but
then I have to find out its process number.
Update 170616: some advertising sites also inject ads that are very CPU heavy, not just JavaScript, but also movie players and animations on loop.