Time Machine: past, present and future

Before Time Machine was introduced in Mac OS X 10.5 Leopard in 2007, relatively few users backed up their Macs, and third-party products were generally either expensive or limited, or both. The advent of Time Machine was, for a great many users, the first time that they had complete, convenient and contemporary backups.

Time Machine was a brilliant design and engineering accomplishment: by integrating it into the existing Mac interface, it all looks to be just another part of the Finder. However, that has also proved to be its limitation, and as storage has grown in size and macOS has become so complex, it has come to creak and groan under that stress. Over the years it has been updated and sort of coped, but it’s now ready for some fairly fundamental re-engineering.

Time Machine 1.0

When first introduced, the backup process was quite simple.

TMbackup105

Every hour, on the dot, the backup service would examine the record of changes made to the file system since the last backup was made, using the FSEvents database. It worked out what had changed and needed to be copied into the backup. During the backup phase itself, it only copied across those files which had been created or changed since the last backup was made.

TMbackuphardlinks

It did, and still does, this by using hard links in the backup, and Apple added a new feature to its HFS+ file system to support this: directory hard links. Where an entire folder has remained unchanged since the last backup, Time Machine simply creates a hard link to the existing folder in that backup. Where an existing file has been changed, though, the new file is written to the backup inside a changed folder, which in turn can contain hard links to unchanged contents.

This preserves the illusion that each backup consists of the complete contents of the source, but only requires the copying of changed files, and creation of a great many hard links to files and folders. It’s also completely dependent on the backup volume using the HFS+ file system, to support those directory hard links.

Without the directory hard link, backups would quickly be overwhelmed by hard links to files. If you had a million files and folders on the backup source volume, every hourly backup would have to create a total of a million copied files or hard links for those which remained unchanged. Directory hard links enable the efficiency needed for this scheme to work.

Time Machine 1.0+

By the time that we were using macOS 10.12 Sierra, this had become a bit more sophisticated. For a start, the initiation of each backup was no longer simply a matter of time interval. It had become a task which was scheduled using intricate interlocked systems, DAS and CTS. These aimed to call the backup to be made at an appropriate moment roughly every hour, depending on what else was going on. Unfortunately there was a bug in DAS at the time, which resulted in failure of automatic backups when Macs ran for several days continuously, which wasn’t fixed until High Sierra.

TMbackup1012

Here’s a more detailed account of that sequence reconstructed from the log.

TMbackupHFS

Its first task is to determine the destination backup folder, then to work out what needs to be backed up on that occasion. For HFS+ volumes, Time Machine’s backupd examines the hidden FSEvents records stored at the root of each volume, which record the changes made to its files and folders. If it can’t find a ‘proper’ FSEvents record, then it performs a deep traversal of the volume to construct one. Deep traversals can take a long time, and in many cases may take over an hour, leading to cancellation of the next scheduled backup.

Once backupd knows what needs to be backed up, it calculates the disk capacity required. This has to allow for overhead or ‘padding’. This is compared with the free space on the backup destination: if there is room, the backup proceeds, otherwise this throws an error.

backupd then copies changed items to the backup destination. It then makes hard links to all the unchanged files and folders.

With the backup itself complete, backupd turns to maintenance tasks, and identifies which old backups have expired, according to its rules. Those are then deleted in a process which can take longer than making the backup itself. Once that task is done, backupd reports that the backup is complete, which is signalled back to CTS and DAS so that the next automatic backup can be scheduled by DAS.

For Macs that couldn’t access their normal Time Machine backup volume, such as laptops when mobile, Apple introduced Mobile Time Machine in 10.7, which took tens of thousands of lines of code to perform what became so easy with the introduction of Apple’s new file system APFS: making a snapshot of a volume’s metadata.

Snapshots are simply copies of all the information within the file system about directories and files on the volume. As long as copies of all original files from the time of the snapshot remain accessible, it takes but the twinkling of an eye to revert the file system back to that state, and no files have to be copied in the process.

Time Machine 1.3

In High Sierra, Time Machine started using APFS snapshots where they are available instead of FSEvents to determine what required backing up. The general scheme then changed to that shown below.

TMbackup1014

Using log entries, it’s possible to construct a more detailed account.

TMbackupAPFS

backupd is scheduled using the same mechanism, which now isn’t as prone to failure as it was in the past. This starts in the normal way, by determining the destination for the backup. Having recognised that it has an APFS volume to back up to that, it then follows a different sequence.

The preparatory sequence identifies and deletes expired local snapshots. According to Apple’s Support Note, these local snapshots are kept for 24 hours, and you should expect to find a full 24 hours of snapshots at any time.

Once that maintenance is complete, backupd makes a fresh snapshot, which is saved to the backup folder. This is set as the ‘stable’ snapshot, and mounted ready for access. The last snapshot, made an hour earlier during the Time Machine backup, is then mounted too as the ‘reference’, and backupd determines which files and folders should be backed up by comparing those two snapshots, as stored in their backup folders.

backupd then checks that there is sufficient free space on the backup destination, and if there is, performs that same process as with HFS+, copying changed items and making hard links to the rest. That is followed by new steps, which save a clone family cache to the new backup folder, and back-up-later caches there too. The precise purpose of these isn’t yet clear, although the latter may well list files which changed as the backup was being made.

The two snapshots are then unmounted, their job complete, and the latest snapshot is marked as the next to be used as the reference snapshot when backupd is next run.

A second local snapshot is then made, which shows the state of the file system on completion of the backup. This appears to be saved to the local snapshot store and not copied to the backup, and is intended for use when the full Time Machine backup isn’t available. It is accessible in the Time Machine app: for the previous 24 hours, for an APFS volume, you should find two backups made every hour instead of one. The first represents that in the backup destination, and the second (a few minutes later) is the local snapshot.

backupd finally performs the same maintenance routines on the backup folder, to remove old expired backups, on completion of which it reports that the backup is complete.

Time Machine 2.0

As we prepare to upgrade to Catalina this autumn/fall, for many of us the only remaining disks which are still in HFS+ format are our Time Machine backups. Anyone wanting to take advantage of, say, an SSD RAID array for high-speed backups still has to format them in HFS+, and all our backups become creaky and error-prone over time because HFS+ barely copes with the load. This worries some users to the point where they’re driven to rebuild the file system directories on their backups every few months to try to keep them running sweetly.

The next release of Time Machine is on its way, though. Whether it will arrive early next year during Catalina’s cycle isn’t clear at present, but Apple’s engineers gave clues at WWDC of where it’s now headed.

APFS snapshots appear to be the key to the future Time Machine 2.0, and Apple is testing out its principles in a new version of Apple Software Restore or asr in Catalina. This will be able to use snapshot deltas, listing the changes made to the file system between snapshots, to perform a partial restore. This works well in the face of filesystem level encryption, as is now standard in all Macs equipped with a T2 chip, for instance, another important consideration.

At present, asr should do this, but without Time Machine’s Finder-like interface. It could be feasible to use Catalina’s new firmlinks, bidirectional links used to wire paths together in the new Volume Groups used to implement the read-only system volume. But I think that those will be reserved for that specific purpose and not used within Time Machine backups.

I think it extremely unlikely that Apple intends adding directory hard links to APFS just to support the old and ailing method of showing backups, as that would only perpetuate the problems which they cause. I also find it hard to believe that Apple specified APFS without them but lacked a plan for implementing Time Machine backups on its new file system.

Hopefully some time in the next year we should see how Apple is going to replace the current wonderful but flawed illusion generated by Time Machine backups.