Cause of Death – how disks become corrupted and crash

Disk errors and crashes may be terminal, rescuable, or fully repairable. Understanding their nature enables you to choose how to tackle them.

Starting back at work after a holiday period is as hard for Macs as for humans. Most workplace computers will have been shut down for the break, and starting them up is the act most likely to result in hard disk failure.

As you are fixing that first envigorating coffee, ponder whether you are up to recovering from a crashed or corrupted disk. As in similar situations, before deciding on the best strategy, you need to analyse how your disk has failed.

Major hardware failure

Hardware failure is the ultimate cause of disk death because it is essentially irreversible as far as users are concerned. Such events were the original, literal ‘crashes’ of early hard disks: when the read/write heads made physical contact with the magnetic coating on the storage platters. This can be audible, and in the worst case destroys the entire drive.

Some models have been particularly prone, such as the earliest Deskstar drives when still manufactured by IBM. They incorporated several innovations that did not become reliable or robust until IBM sold the design to Toshiba, who have since transformed the ‘deathstar’ into one of the most consistently reliable brands.

More common are less catastrophic failures in components of the disk controller and interface. These and other hardware issues are widely monitored using variables measured in the S.M.A.R.T. system, which can be checked in Disk Utility and more specialist tools, as detailed here. Although the data stored in the disk may be left intact, and remains theoretically recoverable, in practice paying a specialist to do so would be prodigiously expensive.

When a disk suffers a hardware failure, you may get sufficient warning from a S.M.A.R.T. monitor, but in most cases all its mounted volumes just disappear, quite probably resulting in a frozen Mac or kernel panic.

Details of how to get your Mac restarted are given here. Once you have restarted it (from a different drive if the startup disk has gone down), the dead drive is nowhere to be seen, and all you can do is replace it.

If you think that a disk is about to undergo hardware failure, perhaps because of an intermittent problem or the noises that it is making, shut your Mac down and replace the disk as quickly as possible, in case its contents can still be recovered.

Minor hardware failure

The only common minor form of hardware failure is when a single isolated storage unit on a disk becomes unusable, a ‘bad block’. Most disks contain a few bad blocks, and their controllers can mark them out as unusable to ensure that no further data are lost.

Rapid accumulation of bad blocks is generally a warning of imminent catastrophe, but with infrequent occurrence it is much more likely that the disk will go on for months or years before developing any more. In most cases, bad blocks are managed silently, although detailed S.M.A.R.T. assessment should reveal their details.

Major software failure

Software failures are far more common than those in hardware, and are much more likely to be amenable to repair. However they are also more varied in their severity: a single serious software failure in a critical part of the disk can make it impossible for Mac OS X to recognise the disk, effectively trashing its entire contents. There are four different types of information stored on most disks, and recoverability depends on which has been damaged, as well as how extensive the damage is.

All disks contain top-level information about the disk, most importantly that about the one or more partitions or volumes contained – the disk’s partition map.

Mac disks normally follow one of two partitioning systems: the GUID Partition Table (GPT) for all Intel Mac models, or the older Apple Partitioning Scheme, standard for PowerPC-based models. In contrast PC-based disks, including memory sticks, follow PC standards with a master boot record (MBR). Tools such as Disk Utility inform you of the scheme used for any given disk when mounted.

Corruption to the table that sets out where a disk’s partitions are located normally makes that disk, and all its volumes, unusable, although because there is a backup copy of the partition table kept at the opposite end of the disk, it may be possible to repair using that backup copy.

Minor software failure

Each volume then contains its own file system (on hard disks, Mac Extended or HFS+, unless Windows format for Boot Camp), which has extensive index tables containing details of all the files stored in the volume, and most importantly large ‘link tables’ that list the physical location of each of the logical storage blocks for every file.

Different types of error can afflict these, but one of the most common is when the tables become confused, so that a single storage block is claimed as being part of more than one file – cross-linking. Repairing cross-linked files is tricky, and usually results in one or both of the files being damaged.

The other index tables can also become corrupted, resulting a range of odd effects, from changed permissions to broken aliases and files renamed in gobbledegook. Established Mac repair tools are usually good at limiting if not repairing such damage, but sometimes errors propagate, and each time that you attempt repair more problems are spawned.

Although journalling of the Mac’s file system has not had any great effect on preventing corruption of file contents, it has been valuable in fixing these problems with file system data tables. Whilst they were quite common in classic versions of Mac OS, they have become increasingly rare in OS X.

At their worst, these can result in completely confused directories for a volume. Although the constituent files remain intact, your Mac is unable to make any sense of the folder structure within which those files are supposed to reside.

Specialists can perform a manual recovery, but receiving a randomly-arranged assortment of the hundreds of thousands of files that once made up your startup disk is not a dream solution. The best chance of making sense may be the specialist skills of Alsoft’s Disk Warrior ($119.95 including a ‘thumb drive’ for recovery), which can often make a remarkably good educated guess of rebuilding the original directory structures, restoring order to the recovered volume.

DiskWarrior cannot fix your startup volume, so for repair work needs to be used in Recovery Mode.
DiskWarrior cannot fix your startup volume, so for repair work needs to be used in Recovery Mode.

File corruption

The final class of information is that forming the contents of each file, and can give rise to the most subtle but pervasive of disk errors, when a process runs out of control and writes data to the wrong storage locations, for example, or a service runs haywire and tramples over innocent files belonging to another application.

If the affected files are a key part of Mac OS X, such as most of the contents of the /System/Library folder, this can make it impossible to start up from the affected volume. However once you have started from another volume you can access other files on the affected volume freely.

Armed with information on S.M.A.R.T. status, reports from checks or repair using Disk Utility, fsck and other tools, you should now have a much better idea of the type and scale of problem with which you are wrestling. Although you probably still need to make yourself a second, even larger cup of coffee, at least you will now have strategy, instead of that nauseogenic feeling of welling panic.

Technique: Recovery and repair

Your first decision as to whether to try to repair a disk centres on the intended outcome.

If you suspect that the disk is not going to be usable again, and it contains valuable or important files, then recovering as much of its contents becomes your top priority. In that case, a good strategy is to extract the drive and follow the guidance here, possibly making a ‘block copy’ of the original prior to recovering files from the copy.

Attempts to repair the disk may well reduce your chances of recovering as many intact files from it, but if successful you may then be able to use the drive again. You therefore need to muster all the repair tools available, and give it a best shot.

The normal Unix command-line tool for checking and repairing hard disks is fsck, which will in turn call specialist commands to deal with individual file systems, such as fsck_hfs for Apple’s Mac Extended (HFS+) volumes.

When Mac OS X was first released, these were not particularly effective, and usually baulked at attempting any serious repairs. However they have now matured very well, and whilst fsck_hfs is still unlikely to be able to help you through substantial directory repairs, it copes with much else.

Repairing permissions is no longer the panacea that it once was, but is part of checking general disk health.
Repairing permissions is no longer the panacea that it once was, but is part of checking general disk health.

The free bundled application that employs fsck and thus fsck_hfs, Disk Utility, has thus become the tool of first choice for dealing with disk errors and crashes. Provided that the hard disk can be recognised and mounted – so has not suffered significant hardware failure or damage to the partition map or other top-level information – Disk Utility should be your first choice. Indeed, most third-party disk repair tools do the same, although using one of the better tools such as Prosoft’s Drive Genius 4 ($99 excluding ‘thumb drive’) may be worth the extra investment.

Drive Genius offers a wider range of tools, including S.M.A.R.T. diagnostics.
Drive Genius offers a wider range of tools, including S.M.A.R.T. diagnostics.

Safe and effective repair of a startup disk requires you to restart from a bootable volume, such as an optical disk, USB stick (‘thumb drive’) or recovery partition, and run Disk Utility from there, or similarly from another bootable disk. If that is not possible, you can restart in single-user mode and invoke fsck at the command line. The latter was preferred before OS X had its own recovery systems, but has largely been superceded since.

If your hard disk will not mount, you will probably have to consider paying a specialist data recovery service handsomely if you need to recover contents. Some third-party tools might be able to help, but the chances of success are slim. When Disk Utility (or fsck) is unable to complete repair, disk directory damage can sometimes be overcome by Alsoft’s Disk Warrior.

However once you have revived a volume, you should recover data immediately and consider re-initialising it. Avoid trusting any disk that has had to undergo substantial repair for much longer than you need to rescue its contents.

Related guides

My Mac don’t work – troubleshooting tools and techniques
Recovering from a hard crash – when your startup drive is missing or damaged
Q&A – Dead Mac recovery – recovering files from a dead Mac
S.M.A.R.T.ypants – hard drive failure detection

Updated from the original, which was first published in MacUser volume 28 issue 01, 2012.