APFS: Directories and names

In the first article in this series looking at APFS, I explained some basic features about files and their clones, and how they’re centred on an inode. But a file system can’t just be a soup of millions of inodes, it needs to be structured, both in terms of the storage of all those file inodes and other objects like extended attributes (xattrs), and into the directories/folders that we use to structure the contents of volumes.

B+trees

In HFS+, this is largely accomplished using B-trees, each of which has a root node, together with multiple internal nodes (like the branches of a tree), and leaves. APFS instead uses a variant of these called B+trees, which restrict links to data to the leaves, and doesn’t place them in internal nodes. B+trees are widely used in modern file systems as efficient means of structuring the large pools of objects they require.

Each B+tree in APFS has a table of contents listing the offset to the location of each key and value pair contained in that B+tree. That table of contents is sorted by key, and keeps its free space at the end. Following that table are the key-value pairs, first all the keys, then their free space, followed by all the values.

B+trees are used for files as inodes, for their optional objects such as file extents, xattrs, and the siblings used to implement hard links. Those alone can’t structure files into a directory tree, which is contained in a separate B+tree of directories and their contents. Each directory then consists of an inode (required), together with optional directory records, xattrs, and directory statistics.

Directory record

While xattr objects contain their name, a file inode doesn’t contain its name, only its inode number. This is because the filename is stored in its directory record, and the path to that file is constructed from the directory names traversed from that volume’s root.

In addition to its inode number, a directory record key contains its name and length. As its length is a 10-bit unsigned integer, that constrains directory and file names to a maximum of 1022 Unicode UTF-8 characters plus the terminating null character. However, it’s commonly stated that the maximum in practice is 255 characters (which happens to be the maximum length of an APFS volume name), and the maximum total path length is 1024 characters, limits that may be enforced by macOS rather than APFS.

Among the important information contained within inode value structures are:

  • the id of the parent directory
  • the id leading to a file’s extent records; for dataless files, that’s the file’s id
  • datestamps of creation, last modification, last modification of its attributes, last access
  • for directories, the number of children
  • for files, the number of hard links included as siblings
  • the BSD flags (as seen using chflags)
  • identifiers of the owner and group.

Directories additionally contain a hash of the name, an unsigned 22-bit integer, used as a proxy for the name in operations involving name search and comparison.

Directory and file names

These pose multiple problems with case-sensitivity and Unicode normalisation. When APFS was first announced, it was intended that it wouldn’t have anything to do with Unicode normalisation, and that names would simply be stored as UTF-8 without performing any normalisation. There was widespread concern at this, because HFS+ performs normalisation to Form D, and not performing any normalisation would cause problems for many users.

This is because many visually identical Unicode characters can be represented using more than one Unicode code point. For example,
café
can be encoded either using UTF-8 63 61 66 c3 a9 or 63 61 66 65 cc 81. If APFS didn’t handle normalisation in some way, then it would allow two files or directories with what appeared to be identical names to exist side by side, something HFS+ won’t allow, as it normalised all file and directory names to Unicode Form D before saving them.

Initially, APFS didn’t handle normalisation and, as expected, this caused problems with users and many apps, including some of Apple’s. As a result, APFS was changed so that it now handles both case and normalisation without changing either. This is achieved using the hash of the file or directory name, instead of normalising them, and case-sensitive and -insensitive variants of APFS behave appropriately too.

It’s far more efficient for APFS to work as much as possible using hashes of file and directory names, rather than their original Unicode representation. This is illustrated in one of its more common tasks, determining whether the user is intending to give a file or directory a name that is already in use, by searching for a collision. Comparing the proposed new name against those already in use in that directory is a slow task. Rather than do that, APFS pre-computes a hash of each name, and compares hashes as 22-bit numbers.

Hash computation therefore takes into account both normalisation and case-sensitivity:

  • First, the name is normalised using Form D and, for the case-insensitive variant of APFS, case is made uniform as well.
  • The resulting UTF-8 string is then converted to UTF-32, and its CRC-32C (Castagnoli) hash is computed.
  • The lowest 22-bits of that hash are then stored for that object, and used instead of its name for operations.

This ensures that both variants of APFS preserve both normalisation and case, while preventing most normalisation problems and supporting either case-sensitivity (for iOS) or case-insensitivity (for macOS). What it fails to address, though, is presenting a normalised form to applications, which then rely on a layer above the file system to normalise file and directory names. If an app goes in below that, it may encounter the non-normalised Unicode stored.

CRC-32C hashes are used in other file systems, including Btrfs and ext4, are quick to compute, and superior to the more common checksums.

Directory statistics

In addition to standard directory information provided in directory records and their xattr records (directories often have their own xattrs), APFS also provides for directory statistics records, used to store the total size of all files contained within that directory, including total sizes of all its children. At present, these appear to be rarely used, as they support a feature originally announced in APFS that remains only partially implemented, fast directory sizing.

The value record of a directory statistics object contains the total number of files and folders in that directory (total number of children), and the total size of all files including those of its children, with any hard links counted in full. Thus, if a directory contains three hard links to a single file, the size of that file is counted three times in the total.

These directory statistics records are only used with directories that have been specifically flagged to include them, and must be included for all their children too. Together, these implement the fast directory sizing announced for APFS. Unfortunately, in normal use, you’re unlikely ever to come across this feature, as (apparently) these must be created when the directory is first created, and can’t be applied or propagated to enclosed (child) directories. That would prevent the user from moving any other directories into those featuring fast sizing, severely restricting their use and value. The only volume that they could be applied to appears to be the System volume in Catalina and later, and it’s not clear whether they are even used there.

User operations on directories/folders

Different parts of macOS and apps see and access files quite differently. For APFS, primary identification is through the inode number and inodes, both for files and directories. These remain invariant in any file system (volume) no matter where a file is within the directory tree, whether it has data or is dataless, or what its name is.

For the user and their apps, the primary means of identifying a file is through its URL and both the series of directory names that compose its path, and the filename. These are commonly expressed in symbolic links, which provide the relative path to a file from that location in the directory tree. When any part of a path is changed by name or location, the URL is broken.

Hard links preserve access to files regardless of path or name, effectively linking direct to the inode number. However, as APFS doesn’t support directory hard links, they can’t be used to make directories independent of location within the directory tree. While they are available in HFS+ (one of the few file systems that supports them), they’re primarily used in Time Machine backups on HFS+ storage, and seldom by users.

Finder aliases and their Bookmark equivalents try to get the best of both worlds, first trying a saved URL path, then falling back to an inode approach if that fails.

At a kernel level, though, file system independence is achieved without either URLs or inodes, in their private vnodes, which I’ll return to in a future article.

Summary

  • APFS file system structures, including files, directories, file extents and xattrs, are stored and accessed using B+trees, with sorted tables of contents.
  • Directory records structure the files in a volume into a directory tree.
  • Directory records have their own inode number and attributes, including directory and file names. They can also have their own xattrs.
  • APFS uses hashes of names for search and comparison.
  • Name hashing allows the preservation of Unicode normalisation and case at a file-system level, while effectively normalising comparisons using Unicode Form D, and supporting the option of case-sensitivity or -insensitivity.
  • Fast directory sizing is supported by directory statistics records, but appears almost impossible to use at present.
  • APFS primary identification uses inode numbers and inodes.
  • User primary identification uses URLs based on directory path and file names, although hard links work at an inode level, and Aliases and Bookmarks aim to use both URLs and inodes.
  • At a kernel level, primary identification uses vnodes, which are independent of file system.

Articles in this series

1. Files and clones

Reference

Apple’s APFS Reference (PDF), last revised 22 June 2020.