When making copies or backups of our files, it’s vital to make clear distinctions between different methods. There’s a tendency to use the terms copy, clone and backup as if they mean essentially the same thing. This article explains how they differ, and their consequences.
This is the generic term, and applies widely to many different processes. You can copy individual files from one folder to another, to a different storage medium, and between remote systems. Although copying should always be faithful to the data in files, it’s less reliable when it comes to its metadata and the precise representation of the data. Much of this stems from the capabilities of different file systems, and it’s useful to consider some examples from APFS.
Copy a file within an APFS volume and you’ll normally not duplicate the data at all, but end up with a clone file, which temporarily shares the same data in storage. As that copy changes from its original, APFS will normally store changed data, until eventually the data of those two files are completely separate.
Copy a sparse file within the same APFS volume, or to another APFS volume, and it should remain in this special space-saving format. Do the same from an APFS volume to HFS+, which doesn’t support the format, and the sparse file explodes to its full size. That could mean that a 20 MB original is copied as a 50 GB file full of empty space.
Non-standard file metadata, particularly extended attributes (xattrs), are even less predictable in their behaviour. Some xattrs are never copied, others always, provided the destination supports some means of storing them, which varies between different file systems.
Most faithful copying occurs when the source and destination storage use the same file system, and no network transfer is involved, but methods of copying should always preserve the data in the file.
Cloning a volume or a whole disk is a special form of copying in which the aim is to create a perfect replica of the original, including a file system identical in every respect, with identical data and metadata. Because it’s usually performed between two different storage units, it isn’t normally expected that contents will be stored at exactly the same locations, and in the past it has been used to defragment data and free space, for instance by cloning a volume to another disk, then cloning it back to the original.
Cloning HFS+ is fairly straightforward, and has been longstanding practice which isn’t reliant on specific tools. Because APFS volumes can have snapshots and inaccessible directories, and in special circumstances aren’t free-standing from other volumes, there’s now only one tool capable of cloning volumes, Apple Software Restore,
asr. This can be used in a two-step process on a disk image made from the original volume, or directly.
Cloning a general-purpose APFS volume should be relatively straightforward using
asr, but when it comes to the volumes most users want to clone, the System or Data volume from a macOS boot volume group, this becomes more complex. In Big Sur and later, System and Data volumes are joined together by special bi-directional firmlinks, which are crucial to their function. A System or Data volume that isn’t correctly firmlinked to its partner simply can’t function.
As these firmlinks are created during the creation of the paired volumes, adding them later may not be possible. If you were to try the old trick of cloning a Data volume to external storage and cloning it back again, then its firmlinks would be lost and the volume left non-functional. As the
asr documentation explains, there’s also more to a boot volume group than just the firmlinked System and Data volumes: Preboot and Recovery volumes are also required for the group to be bootable.
The end result is that cloning boot volume groups may remain possible using
asr, but for Big Sur and later, particularly when working with the even more complex container layout of internal storage in Apple silicon Macs, cloning can’t be relied upon any more, and a different approach is required, based on copying rather than cloning, and quite possibly ending up migrating user data from a copy or backup.
A backup is a type of copy with a specific purpose: to be able to restore part or all of the contents of the copy. Unlike file-by-file copies, a backup doesn’t have to be stored in the same format as the original, so long as the backup software has a means of restoring the original from that backup.
Time Machine in its two forms is a good example.
In its original form backing up to HFS+, Time Machine copies all files that have changed since the previous backup. It also creates large numbers of hard links to those directories and files which haven’t changed, to create the impression that each backup is a complete copy of the source. Restoring from that is relatively simple, as when those hard links are to be copied back, they already point to the backup copy of each file.
A similar illusion is created by Time Machine when it backs up to APFS, only instead of using hard links to unchanged items, Time Machine uses the links inside a synthetic snapshot, assembled from the file system metadata and copied data.
In the past, other backup schemes have stored the contents of files in large databases, rather than using a conventional folder and file system. As file systems have grown, that technique has largely fallen into disuse, but could still be used to back up folders rather than large volumes.
I hope these explanations are helpful, particularly when you’re preparing your plan for upgrading macOS in the coming weeks or months.