Inside APFS: from containers to clones

This article is an attempt to explain some of the key features in APFS, as of macOS Ventura. Older file systems are relatively simple and straightforward to use; modern file systems like ZFS, Btrfs and APFS are much richer in features, many of which challenge our understanding.

Partition table

Whatever the type of storage, hard disk or solid-state, its space needs to be organised to store files and metadata. At the top level of each disk is its partitioning scheme, dividing storage into large contiguous blocks for use with file systems. Conventional usage refers to these as partitions, but in APFS they’re also known as containers. The scheme now used universally for macOS is the GUID Partition Table (GPT), shown diagrammatically below with the start of the storage at the top.

GPT

Near the start of the storage is a Primary GPT Header, containing the table mapping where the partitions are on the disk. This header is repeated at the end of the storage, as the Secondary GPT Header, which should of course remain identical at all times.

In the header, there’s an initial block containing information about the storage as a whole, followed by an entry for each partition. Those entries specify the type of partition, give each its own unique GUID/UUID, give the start and end locations of that partition, its attributes, and name. Following the header and its list of entries are the partitions themselves, each containing file-system specific data.

Container

In HFS+ each volume, with its own file system, is a separate partition. If you want to change the size of a volume, that requires changing the disk’s partition table, which may be impossible without losing data. This also means that HFS+ volumes can’t share free space.

In APFS partitions are known as containers, which have fixed size and don’t share storage with other containers. Within each container are one or more volumes, each containing its own file system and sharing the same space within that container. An APFS container stores all the higher-level information common to the file systems within it. These include volume metadata, snapshots, and provision for space management and crash protection.

Each APFS container has one instance of the Space Manager, a major feature of APFS to keep track of free space within the container, allocate and free storage blocks on demand. A container also has one instance of the Reaper, to manage the deletion of objects too large to be deleted between file system transactions. This tracks the deletion state of those large objects so they can be removed across multiple transactions.

Volume

An APFS volume contains file system directories, file metadata and file data. Each has its own superblock, containing the location of the root file system tree, the extent reference tree, and the snapshot metadata tree, as well as the volume object map.

Objects stored on disk are never modified in place, a major departure from HFS+. Instead, a copy of the object is modified and written out to a new location on disk. This is the overriding principle of copy on write and applies both to objects being stored by the file system, and within the file system itself.

DiskStructure1015over

Hard links

These are available in both HFS+ (where directory hard links are also available) and in APFS, which only supports hard links to files. They can only be created in Terminal using a command like
ln /Users/myname/Movies/myMovie.mov /Users/myname/Documents/Project1/myNewMovie.mov

That command creates a second entry in the file system to the same file data. The file system keeps a count of those references to determine when to delete the file, so when you’ve finished using a hard link, you can put it into the Trash without the original being deleted. Only when there are no remaining references to that file will it then be deleted from the file system.

Hard links look and work exactly like the original file, and can be moved around freely within the same volume. Copy one to another volume, though, and the copy will be a complete unlinked file. Hard links to files and to directories are one of the essential ingredients of Time Machine backups on HFS+, but as APFS doesn’t support directory hard links, Time Machine has to use a different backup format when stored on APFS.

Clones

Duplicate or copy a file in HFS+ and a new entry is made in the file system for the copy, and all the data in the original file are copied to a new storage area to create a different file. Whenever it can, APFS doesn’t copy any data at all, but creates a clone file instead. This resembles a hard link, in that the file record points to the same data as the original, but a clone is a separate file with its own iNode.

Conditions which have to be met for macOS to create a clone are:

  • both the original and copy files must be on the same APFS volume, so sharing the same file system;
  • copying must be performed using either of two specific commands (both forms of copyItem()) in the FileManager.

In practice, these include all copies and duplicates made within the same volume by the Finder, and most made by apps. This also applies to whole folders, provided they’re copied according to these rules.

Where this gets confusing is that the Finder doesn’t tell you that the duplicate takes no extra space. Put three duplicates in a folder, and the Finder assures you that they take three times the space of one of them, but that isn’t true. What’s more, when Time Machine backs them up to an APFS backup store, it doesn’t copy three files, just the one and two clones. However, if you copy those three clones to a different volume, that copy doesn’t meet the requirements for cloning, and three separate files are created on the destination volume.

Sparse files

Many apps, such as databases, now work with files that are largely empty. Stored conventionally, those would take a lot of space to keep no actual data, so APFS introduces a new type of file, the sparse file. These save wasted space by skipping all the empty data, and only storing contents that aren’t empty.

For this to work, the app writing the sparse file has to follow strict rules. If it assembles a block of sparse data, consisting of a few bytes of regular data, 5 GB of zero bytes, and another few bytes of regular data, writing that in the normal way to a file doesn’t create a sparse file. To write a sparse file, the app needs to work with file handles, and seek to file offsets to skip writing empty data. Only where empty data have been omitted using the seek call will that data be omitted from the sparse file.

sparsefile02

What you end up with behaves quite uniquely. Use Finder’s Get Info and you’ll see that its size is 5 GB, but it only uses 8 KB on disk.

sparsefile03

Duplicate it to fill a folder, and that will be reported as having a size of, say, 55 GB, but only taking 90 KB on disk. Results from Terminal are no more helpful: ls -la simply says that each of those sparse files is 5 GB in size.

sparsefile04

Time taken for each of these operations is a good indicator of whether APFS has kept the sparse file, or exploded it to full size. Creating, moving and copying a sparse file takes an instant; the moment a progress indicator appears, you know that the sparse file has exploded.

Duplicating or moving a sparse file within the same APFS volumes retains its sparseness. Originally, copying a sparse file between volumes, or using cp in Terminal, could result in their sparseness being lost. Now, copying sparse files between APFS volumes, even on different disks, should retain their format. Sparse files should also be preserved when backed up using Time Machine to APFS, but other backup utilities may not be as successful.

Sparse files invariably explode to full size when copied to a different file system, for instance when backing up from APFS to HFS+, even when the destination file system offers its own sparse file format. There currently appears no solution: compressing sparse files breaks their format, and when decompressed they explode to full size.

If you want to experiment with sparse files, or survey folders for clones and sparse files, try my free utility Sparsity.

Key point summary

  • HFS+ volumes are fixed-size partitions of a disk.
  • APFS volumes vary in size and share space within a single partition or container.
  • Clone files are different from hard links, as they refer to different files with common data.
  • Clones can only exist on the same volume, but are preserved in Time Machine backups to APFS.
  • Sparse files are a special format containing only non-empty data, requiring special creation.
  • Sparse files are preserved across APFS volumes, even between different disks, and in Time Machine backups to APFS.
  • Sparse files explode to full size if not (re)written correctly, or when transferred to other file systems.
  • Never compress sparse files, as they will explode to full size during compression or when decompressed.