Inside the file system: 3 APFS containers and volumes

The previous articles in this series have explained the basic division of storage into partitions, with a GUID partition table, then outlined the current implementation of Apple’s older file system HFS+. This concluding article attempts to do the same for the replacement for HFS+, Apple File System or APFS. In writing this I have been constrained by the limited information provided in Apple’s documentation.

Whereas HFS+ is implemented as a file system straight into a GPT partition, so placing a single volume in each partition, APFS adds the intermediate of a container, within which there can be multiple volumes, each containing its own file system and sharing the same space within the container. This causes great confusion in terms: for the sake of clarity, in the rest of this article I will avoid referring to partitions as much as possible, and use the APFS terms of container and volume.

Container

An APFS container stores all the higher-level information which is common to the file systems within it. These include the volume metadata, snapshots, and provision for space management and crash protection.

Several copies of the container superblock are held at any time, which includes copies from the immediate past. The first copy in a container is primarily used to locate the Checkpoint area, where other copies are held, and the most recent of those can then be accessed to get a contemporary view of the container and its data.

The container also holds the EFI Jumpstart, an embedded EFI driver which is used to boot a Mac from that container. This also contains the location of other container structures, with a container object map and a list of volumes.

The Checkpoint area contains a record of in-memory state together with a copy of the container superblock at that moment. It stores ephemeral objects to provide protection in the event of a crash.

Each APFS container has exactly one instance of the Space Manager. This is a major feature of APFS which keeps track of free space within the container, allocating and freeing storage blocks on demand. It contains bitmaps which it uses to record and manage free space.

A container also has exactly one instance of the Reaper, which manages the deletion of objects which are too large to be deleted between file system transactions. This tracks the deletion state of those large objects so they can be removed across multiple transactions.

Volume

An APFS volume contains file system directory structures, file metadata and file content. Each has its own superblock, which contains the location of the root file system tree, the extent reference tree, and the snapshot metadata tree, as well as the volume object map. The trees used are, as in HFS+, B-trees. For example, a directory in the file system consists of an inode record, several directory entry records, and an extended attributes record.

Volumes can have roles assigned, of which the most important for macOS are the System and Data roles used to compose a Volume Group in macOS 10.15 and later. If a version of APFS finds such a backward-incompatible feature which is used by any volume, then Apple’s instructions are that volume mustn’t be mounted.

Objects stored on disk are never modified in place, a major departure from HFS+. Instead, a copy of the object is modified and written out to a new location on disk – this is the overriding principle of copy on write which applies both to objects being stored by the file system, and within the file system itself.

There is also provision to protect the file system during the lengthy process of encryption. While a volume is being encrypted, the file system maintains a special recovery block which is used to recover following any system crash which might otherwise result in corruption.

Checks and repairs

The rather fragmentary information above should enable better understanding of the log returned by First Aid in Disk Utility:
Volume was successfully unmounted.
Performing fsck_apfs -y -x /dev/rdisk6s1
Checking the container superblock.
Checking the space manager.
Checking the space manager free queue trees.
Checking the object map.
Checking volume.
Checking the APFS volume superblock.
The volume ThunderBay3 was formatted by diskmanagementd (1412.81.1) and last modified by apfs_kext (1412.141.1).
Checking the object map.
Checking the snapshot metadata tree.
Checking the snapshot metadata.
Checking the extent ref tree.
Checking the fsroot tree.
Verifying allocated space.
The volume /dev/rdisk6s1 appears to be OK.
File system check exit code is 0.
Restoring the original state found as mounted.

For each volume checked, the following container areas are checked first:

  • container superblock
  • Space Manager
  • object map.

Those same areas are also checked when a container is selected for First Aid.

After those, the following areas of the volume are checked:

  • volume superblock
  • volume formatting and modification, giving the names of tools and build numbers used
  • object map
  • snapshot metadata tree and metadata
  • extent reference tree
  • root file system tree
  • allocated space.

I hope this casts a little more light on what is going on in APFS.

Unfortunately, at present Apple’s documentation of APFS is far from complete, hampering attempts to develop third-party repair tools. Although some are starting to appear, fsck_apfs, either called directly or through Disk Utility, remains the gold standard. Data recovery services are similarly struggling to tackle the more hopeless cases.

Reference

Apple File System Reference (PDF)