It’s easy to underestimate the importance of Time Machine (TM) to Mac users. I’ve been writing Q&A and other technical sections for Mac magazines for over thirty years now, and can’t even guess how many letters and messages I have received asking what to do after hard disk failure. Before the introduction of TM, suggesting that the user went to their backups was like asking them to walk unaided to the South Pole. Since TM, the majority of those submitting questions are regularly making backups, almost all of them using TM.
As we’ve come to make backups, so we’ve come to rely on them. When that rickety old hard drive does fail, we expect to be able to replace it, restore from our backup, and carry on as before. Although this is the rule, there are also plenty of exceptions who go to their TM backup only to discover that it can’t restore a lot of what should have been there, or the whole backup is as broken as that failed disk. This article suggests what you can do to prevent such nasty shocks, and how you can ensure your backups will work when you want them to.
Delicate anatomy of backups
For the moment at least, all TM backups are made and maintained on HFS+ volumes, either running as the file system on a local disk, or in a ‘virtual’ HFS+ file system in a sparsebundle, in the case of shared or networked backups. This is good because HFS+ is an old file system so has plenty of support in third-party utilities, and it’s bad because it isn’t particularly fault-tolerant.
Your backups don’t just contain copies of each file as it has changed over time, but a much greater number of hard links to previously saved files, and – a distinctive feature of HFS+ – hard links to directories. The latter are essential to minimise the number of hard links to files, and a key feature of TM backups. Over a year of use, a busy Mac can readily accumulate more than 1 TB of backups, in millions of files and folders.
The file system directories on backup volumes thus grow over time to become huge, with the majority of their entries being hard links. It’s perhaps unsurprising that, among those millions of directory entries, errors can occur, and their directories can become sub-optimal and slow to access. The commonest causes given for significant directory damage are forced restarts following kernel panics, and sudden loss of power. Their frequency and severity have been reduced with the introduction of journalling to HFS+, but they do still occur, and appear most common in sparsebundles, if user reports are anything to go by.
Protecting your backup drive
The most vulnerable time for your backups is when a backup is being made, as that’s when files and links are being written. A kernel panic in the Mac, or loss of power to either Mac or backup drive, are well-known causes of errors in the backup.
There isn’t much you can do to protect against forced restarts, other than ensuring that your Mac has a minimum of kernel panics. But loss of mains power can be guarded against by powering both Mac and backup drive from a good Uninterruptible Power Supply (UPS), with a USB connection from it to your Mac to initiate an orderly shutdown in the event that the mains outage is prolonged. I’m always surprised at the number of Mac users who either have no UPS at all, or don’t supply their backup drive from a protected outlet, and just deliver it filtered mains power rather than that backed up by its battery.
It’s also worth remembering to test your UPS periodically, to ensure that it does shut down your Mac and backup drive properly, but never to run a test when a backup is due, or is taking place.
UPS are particularly important for backups on hard disks: sudden power loss may there not only corrupt the file system, but can result in a physical crash of the disk, losing everything. Although this is unusual in modern hard drives, it does still happen. If your Mac and backup drive aren’t protected by a UPS, no matter how much maintenance you might perform on your backups, one brief mains outage can still ruin them.
Maintaining your backups
The larger your backups, the slower their access becomes, and the more vulnerable they are to error. To keep them as reliable as possible, it makes good sense to limit their size. Leaving them to get on with it for several years isn’t a good idea: aim to start a new backup set every year or two, depending on how they accumulate. If your backup drive has replaceable hard disk(s), this is a good opportunity to remove the drive(s) and replace them with new, putting the old backups into safe keeping as archives.
Although this might be a good plan when using, say, a RAID system with relatively inexpensive hard drives, it clearly doesn’t fit in with the transition to SSDs for storage. Once you’ve invested in several TB of SSD, the last thing that you’re going to do is replace them every couple of years! One alternative then might be to keep your TM backups running for 1-2 years, at the end of which you archive a full disk image, then wipe your backup storage and start from scratch.
Some users perform monthly maintenance, using a tool such as Alsoft’s DiskWarrior to completely rebuild the directories on their backups. This appears to be effective, provided that both Mac and backup storage are protected by a UPS. For relatively small backups on fast storage such as SSDs, maintenance shouldn’t take long either, but this doesn’t scale so well: rebuilding the directory of 1 TB of backups on an external hard disk RAID system may take those backups offline for several hours, and larger backups accumulated over several years will need even longer. It therefore helps relatively little to prolong the period over which you can keep a backup series going, and you need to balance the time taken for maintenance against the risk of errors.
Where DiskWarrior may have a much more important role is in maintaining sparsebundles on shared and networked backup drives. A disproportionately high number of reports of severe and fatal errors seem to occur in those, although that may also be the result of users allowing them to grow larger than they might on local storage.
Since High Sierra added APFS snapshots to TM’s backups, snapshot errors and problems have been reported. As far as I am aware, the only tools which can currently check and repair these are Disk Utility, preferably in Recovery Mode, and the command tool on which it relies
fsck_apfs. Once utilities like DiskWarrior are able to work with APFS and its snapshots, they should make a big difference.
Testing your backups
There’s no point in making backups if you don’t test them out regularly too. If you’re rebuilding their directories every month using DiskWarrior, a good time to do this is once your scheduled rebuild has been completed. Simply select a couple of important documents and folders from a couple of different backups, and restore them – taking great care not to overwrite their current version.
This is also a good time to get a full log transcript from a recent backup, to see if there are any significant errors occurring. T2M2 looks for those errors which propagate up to the TM sub-system itself, but using the TimeMachineFull predicate in Ulbow will reveal many more, all of which should be of little or no significance.
Backups and archives
Another common and costly error is to presume that what you back up using TM forms an archive, from which you can retrieve documents and data of lasting importance. In most backup policies involving TM, it occupies a middle place:
- previous versions stored in the macOS version system, looking back minutes or hours;
- backups made by Time Machine, looking back days or months;
- permanent archiving, looking back years.
Versions enable you to recover a section you deleted half an hour ago. Backups enable you to recover any file over the last year or two. Archives are restricted to documents and data of lasting importance, and should go back many years. If you haven’t thought about them, now is the time to build them into your plan, and to ensure that there are also off-site copies in case disaster strikes.
Remember that any substantial natural or man-made disaster will almost certainly destroy local or locally networked Time Machine backups.