Time Machine to APFS: Understanding backups

In the first article in this series on how Time Machine makes backups to APFS volumes, I looked at how backing up has changed since Time Machine was introduced. Before going on to explore how Big Sur backs up to APFS volumes in detail, it’s essential that you understand the three different schemes which Apple has used for backing up. I hope this may clear up what appears to be common confusion.

Pure HFS+ backup

The original backup scheme used in Time Machine, from its release in Mac OS X 10.5 until 10.12 Sierra, was based entirely on using hard links to files and folders in the backup. Time Machine (backupd) first worked out what needed to be backed up, by examining the log of changes which had been made to the source volume in its FSEvents database. Those files and folders which had changed were marked as requiring backup, performed by copying them to the backup volume.

tmhfsbackup

In the example shown in the diagram, one file has changed, NewDoc, which is contained in a hierarchy of two folders. To create those in the backup, the two folders have to be created, and NewDoc copied across into the correct place inside Folder2. However, OldDoc and OldFolder and its contents, OldDocs, have remained unchanged since the last backup. For the sake of simplicity and economy, Time Machine therefore doesn’t copy OldDoc or OldFolder across, but merely makes a hard link to their originals in the backup.

When you browse that backup in the Finder, you then see what appears to be identical to the source, although only one of those files has actually been copied, together with two new folders.

Back up APFS to HFS+

With the release of macOS 10.13 High Sierra, users started backing up APFS volumes to HFS+ backups. Although the basic scheme remained unchanged and reliant on HFS+ hard links to both files and folders, Apple changed the way that Time Machine determined what needed to be backed up, and added snapshots as a bonus.

Instead of looking for changes in the FSEvents log, Time Machine made a snapshot of an APFS volume which was stored on that APFS disk (the source). It then compared that with the snapshot made at the time of the last backup, and worked out what had changed, so discovering what needed to be backed up. Not only is the snapshot used to work out what needs to be copied into the backup, but it was also retained on the volume being backed up for up to 24 hours. This provides another means for users to restore the contents of that volume very quickly, by rolling back to a previous snapshot.

tmapfshfsbackup

The diagram above shows how snapshots were added into the existing scheme relying on hard links.

Although snapshots are a bonus to the user, because they’re stored on the volume being backed up, any serious error or failure of the source volume would be likely to destroy both the original files and their local snapshots. However, as those snapshots are a feature of APFS, they can’t usefully be copied to the backup, which still has to use HFS+ with its near-unique support for directory hard links.

Apple also appears to have discovered that making a snapshot ‘diff’ isn’t as quick or accurate as using FSEvents, and in Mojave provides a range of different methods by which Time Machine can decide which is most appropriate for the disk being backed up.

Back up APFS to APFS

The big problem which Apple had to overcome in supporting APFS for storing backups is that its new file system only supports hard links to files, not directories, so requiring a completely new structure for the backup. The solution which Apple appears to have developed (I still haven’t seen this officially documented) is to construct a snapshot on the backup volume.

Normally, snapshots are stored on the volume whose file system they copy. This includes all the directory and file structure, but none of the data which makes up the files. While that snapshot remains on that disk, changed and deleted blocks of file data are kept rather than being returned for reuse, as the snapshot refers to those old blocks to reconstitute the files which it retains. This is explained in the following diagram.

tmsnapshot

In this example, a snapshot was made a little time ago. That contains the file system metadata (red), which includes the two folders (pale yellow) and references the data for two files: OldDoc (blue) and ChangedDoc. At that time, the latter included two blocks of data, ChangedDoc1 (red) and ChangedDoc2 (blue).

The user then creates a new document, NewDoc (green), and edits ChangedDoc so that it’s stored in one new block (green) and one old block (blue).

At that stage, rolling back to the snapshot would remove NewDoc, which was created since the snapshot was made, and it would restore the old (red) first block of ChangedDoc. The data specific to that snapshot therefore consists of the file system, the snapshot itself, and the original first block of ChangedDoc.

What Time Machine does when creating a backup is to create a regular snapshot to the volume which is being backed up, and copies that to the backup volume. It then works out the files and blocks which need to be copied to the backup to add all the new and changed data since the previous backup, copies those across, and joins them up to the file system in the snapshot it has just saved on the backup volume. (It’s possible that Time Machine only copies a snapshot ‘diff’, but log entries indicate that it’s the whole snapshot which is copied, rather than just a derivative.)

tmapfsbackup

In this case, the file data needed to add to the file system metadata in the copied snapshot consists of just the NewDoc; unchanged files (and blocks of files) aren’t copied, but the snapshot refers to files which exist in the previous backup. This achieves similar economy and efficiency to the old HFS+ scheme using hard links, but without using links. Instead, references to existing data are made directly within the file system in the snapshot.

There’s a clear advantage to this new scheme in that it functions not just with whole files, but with changed blocks within files. Just as a snapshot references the data blocks which make up each file, so a snapshot-based backup can back up individual blocks which have changed, which can be significantly more efficient in the storage space required.

This new scheme not only retains hourly snapshots on the source volume, which are still kept for up to 24 hours, but provides its backups in the form of snapshots on the backup volume, where the file system data are stored in addition to the snapshot itself.

The final piece of magic used by Time Machine backups to APFS volumes is that its snapshots can be made not only for whole volumes, but for individual folders within a volume. If you want to back up just your Home Documents folder, Time Machine will do that, rather than having to back up your complete Data volume.