Backing up a Mac, or just making a faithful copy of files stored on one of its volumes, isn’t as easy as it might seem. Underneath the surface of the two native Mac file systems lurk crocodiles ready to catch the unwary who assume that they can readily be copied to other file systems. This article considers some of those problems, so that you know what you’ve got to deal with.
This may be an old file system, but it holds two nasty surprises when trying to copy it. The most prominent are extended attributes, arbitrary metadata which are normally stored separately from the file data itself. These started in Classic Mac days as resource forks, which are still used, but are only one of hundreds of types of extended attribute.
Extended attributes may contain special flags indicating whether they’re ephemeral, and should be ignored when copying, or whether they should be preserved. Unfortunately, that system isn’t generally used, and is little-known too. Unless arrangements are made for the preservation of enduring extended attributes, and their reattachment when copying them back, some files will be irreparably damaged.
The other, and more serious, problem are directory hard links, although they’re only likely to be encountered when copying Time Machine backups, which depend on them. HFS+ is one of very few file systems which use these, as most self-respecting file system engineers realise the problems they produce. A directory hard link works the same as a normal hard link, only it links to a whole directory instead of a file. Time Machine backups to HFS+ volumes invariably contain large numbers, as they avoid replicating many hard links, when creating the image that each backup is a complete copy of the original. On more sane file systems which don’t support them, there’s no equivalent, which is one reason why Time Machine can’t convert existing backups on HFS+ to its new APFS format.
One minor footnote here is that HFS+ also supports compressed files, although those are confined to system volumes, as Apple never made them available to third-parties.
As a modern file system, APFS uses several techniques which have equivalents on other modern file systems, but which are often slightly different there, so requiring conversion. The good news is that it doesn’t support directory hard links; the bad news is that extended attributes are becoming even more widely used, to the point where few documents now lack them. Most, though, are quarantine or macl xattrs, which are considered ephemeral and normally stripped when copying files to another volume. Compressed files are still supported, but remain confined to system files.
The two remaining problems encountered when copying all APFS volumes are space-saving devices, clone and sparse files.
Clone files are normally created when copying or duplicating a file within the same volume (file system), and work like hard links, except that the two references to the same data have different inodes. When a cloned file is modified, the file system maintains the clone relationship as long as it can, saving only changed storage blocks, until they all differ between the two files, and they are no longer clones at all.
Detecting clone files has only recently become possible, and even now it’s almost impossible to tell how much data is common to the original and its clone. Copying strategies therefore have to assume both are different files with no common data, which expands the size of the copy, and prevents clones from being reconstituted when copied back to APFS. That can lead to significant increases in the storage space required.
Sparse files, which have equivalents on many other modern file systems, have to be created using specific system calls, and then store only the non-empty data in a file which could effectively be very much larger than the sparse file. It’s quite possible for just a few MB of data in a sparse file to represent a raw file of hundreds of GB.
At first, sparse files appeared to be uncommon, but they’re now quite widely used by databases and in other situations, and their raw ‘original’ can be extremely large. Unless they’re converted to an equivalent on the other file system, and converted back to sparse files when restored, it’s quite possible for a copy or restore to exceed free storage available on the disk. They can also be extremely slow to copy: expansion to their non-sparse size occurs during reading, so copying a sparse file of 5 MB representing a 500 GB original takes the time expected for 500 GB, not that for 5 MB, as all the empty data is also transferred.
I have a free app for working with sparse and clone files: Sparsity.
The final structure which is an insoluble problem is effectively confined to APFS volumes used to store Time Machine backups. There, each backup is a snapshot, complete with all the linked data for its contents. There’s currently no method available to copy snapshots between their APFS volumes and any other volume. As Time Machine backups to APFS storage consist entirely of snapshots, that means that there’s no way to copy a backup to another volume, regardless of its file system.
If you do intend using another file system to store copies or backups from a Mac, you’ll need to verify its handling of each of these crocodiles, both when copying to the storage, and when copying back to your Mac. If you don’t, one day you’ll be bitten by it. One popular solution, used for Time Machine backups on alien file systems, is the sparsebundle, but that opens up yet another can of worms, and the occasional crocodile among them.