Hard decisions

It is getting increasingly difficult to buy a car. The number of concurrent facts – fuel consumption, acceleration, insurance class, safety rating, and many more – that you need to juggle has become so large that finding the optimum combination is a challenge to artificial intelligence.

Choosing hard disks seemed much simpler. Apart from basic specifications, such as capacity and interface (generally SATA 6 Gbps now), there are only really form factor (2.5 or 3.5 inch), speed, price (neatly re-expressed per gigabyte for comparison), and reliability, as ‘mean time between failures’ (MTBF, normally quoted in hours). Having generally tried to go for those drives least likely to crash from under me, the MTBF has usually been of considerable importance.

Like many electro-mechanical devices, hard disks show a complex pattern of failures with age, not unlike human death rates. There is an early peak of failures occurring during the ‘burn-in time’ of the first weeks in use, then a long period with a fairly steady and very low failure rate.

Finally, as components start to wear out, there is a late peak, in which most disks eventually die on you. If you get it right, your Mac or its disks will have been replaced well before the failure rate climbs in old age, so the chances of suffering disk failure during its working life remain very low.

Given that even cheap consumer units now have MTBFs of half a million hours and more, and ‘enterprise quality’ comfortably exceed 1.2 million hours, you might be tempted to conclude that modern disks should long outlive their host computers, as a million hours is well over 100 years. Whilst I am sure the manufacturers would like you to live in that cloudcuckooland of misapprehension, I think the time has come to disabuse you.

Most, perhaps all, disk manufacturers measure MTBF by running a large number of disks for a relatively short period, counting how many failures occur, and then simply extrapolating to give the average period between each failure. So if they were testing 10 000 disks with an MTBF of about a million hours, they might have run the disks under stress for 500 hours, and found 5 failures over that time. In many cases, disks will have been burned in to ignore the early small peak failure rate. And mark this: none will get anywhere near the increasing failure rates of old age. So the MTBF gives you no information about how long the disk should last before it should be retired, nor how rapidly aged disks can be expected to fail.

So the MTBF is a bit like an averaged death rate for humans between the ages of 25 and 35 – wildly optimistic compared to the rest of our lives. If you were a life insurer, would you base your premiums and payouts on such a figure?

There is now a better guaranteed way to estimate likely disk life: the length of manufacturer’s warranty. Drive engineering and components are so deeply engineered that you can bet that your drive will not fail before its warranty ends. But once that period is over, all its components will start up that slope of increasing risk of failure, the late peak.

So if you want a drive to last five years, buy one with a five year manufacturer’s warranty. If you buy one with only three years, then don’t expect it to last much longer. If they had built it to last for five years, they would offer that longer period of warranty.

Thankfully there are also ways of anticipating some disk failures before they happen. Disks with S.M.A.R.T. technology, which monitors the disk’s condition, can give a week or more notice of serious failure. Further details are in this article. S.M.A.R.T. will not warn of all failures, and catastrophic crashes can still occur out of the blue.

For my money, S.M.A.R.T. and a five year warranty are better requirements for a hard disk than any fanciful MTBF that makes it sound as if my hard disks will outlive even our children.

Updated from the original, which was first published in MacUser volume 20 issue 24, 2004.