What should we know about APFS special files?

We may have been using APFS for nearly seven years, but some of its features remain thoroughly opaque. On Christmas Day, I posed the puzzle of 60 TB of snapshots being removed from a 2 TB disk. While we all accept that may be “technically correct”, for ordinary users it makes no sense. Suggestions that they should be “educated” miss the point that the Finder has to be accessible to all users, whether or not they have a degree in Computer Science. If my eleven year-old granddaughter can’t make sense of it, then the Finder is a failure.

Today I turn to another thorny issue raised by the ingenuity of APFS: the size of its special file types, sparse and ‘clone’ files. As usual, I start with a practical demonstration.

Demonstration

If you’re using macOS virtual machines (VMs) on an Apple silicon Mac, one of their VMs is an excellent subject for this. If you don’t have one of those, then you can create a read-write disk image (UDRW) using Disk Utility. Ensure that it’s in APFS format, and make it nice and large, say 25 GB. Once it has been created and mounted, unmount it, mount it again, then unmount it, to ensure that it’s now become stored as a sparse file.

Select the VM or disk image, and use the Finder’s Get Info command to check its size.

clonesparse1

In my case, I’ve used a 100 GB VM, whose size is given as 107 GB, although it only takes 18.47 GB on disk. Then, select the VM or disk image and press Command-D to duplicate it in the Finder. Select the duplicate, and Get Info on it.

clonesparse2

That copy has the same size, and the same lesser space taken on disk, although the Finder duplicated it in the twinkling of an eye, which would only be possible if it had been ‘cloned’ rather than copied.

Sparse files

APFS is one of many file systems that can reduce the space taken to store some large files not by compression, but by storing only the data they need, as seen in this demonstration. Disk images, whether forming the greatest part of a VM, or as a separate file, start off as being almost entirely empty, and only grow as contents are added to them.

When macOS mounts a disk image, APFS performs a Trim on it, to gather all its free space together. When that image is saved, that free space isn’t written to the file, as it would just waste space. By writing that disk image in a special sparse file format, disk space required is reduced from slightly more than 100 GB to around 18 GB.

Clone files

Although known commonly as ‘clones’, these aren’t exact copies at all, but two separate files that, initially at least, share the same data on disk. When the Finder duplicates a file for you, APFS creates the file system metadata for that new file, giving it a new inode number, but the file’s data are initially stored in the same extents as the original. As those two files change, their unique data is written to new extents on disk, and they steadily drift apart until they become completely independent.

The only clue given here by the Finder that two VMs or disk images are clones and share data in this way are their names. Change the name of the copy and move it away, keeping it in the same volume, and you’d never know that its data were being shared with another file, nor the identity of the original.

Recognising sparse and clone files

Aside from the intentional discrepancy reported in Get Info for sparse files, telling which are sparse and which are clones isn’t possible in the macOS GUI. To understand more, I’ll use my free utility Precize, which reports more information culled from corners of the file system.

clonesparse3

The original disk image inside the VM has an inode number of 22513585, given in its volfs and FileRefURL paths at the top, a Disk size considerably smaller than its total file size, and ticks both the Sparse and Clone checkboxes at the foot.

clonesparse4

The duplicated disk image has a different inode number of 24847441, identical sizes, and the same two checkboxes ticked. To the left of those checkboxes, the Ref count on each copy is 1, confirming that neither is hard-linked. Even here, using as much information as I can glean from APFS, there’s no way to tell which file has been cloned from which.

Effect on disk space

Although the only mention in the macOS GUI is in the context of space taken on disk as sparse files, this could mislead the user into thinking that a VM or disk image that only takes 18.47 GB on disk can be copied to disk with a capacity of 25 GB, for instance. This is easy to test using another disk image: create another read-write disk image with APFS as its file system, of a size sufficient to accommodate that given ‘on disk’ but too small for its full size. Try copying the original VM or disk image to it, and the Finder will refuse on the grounds that it’s too large for that disk.

However, if you copy the VM or disk image to an APFS volume that does have sufficient free space to accommodate its full size, the space used according to Disk Utility and the Finder is considerably less than that size, although significantly larger than it takes on its original volume. In my case, for a VM originally taking 18 GB on disk, when copied to another APFS volume it used 25 GB.

If you try that out, watch the progress dialog carefully during copying. It starts by claiming that it has the full size (100+ GB) to copy, and proceeds as if that were the case. Then, as soon as the progress bar reaches the size actually taken on disk, in this case only a quarter of the way through, copying completes almost instantly. Maybe the Finder was more surprised at that than the user.

While APFS preserves sparse files when copying them to another APFS volume, that doesn’t work for other file systems such as HFS+, where the source file has to be fully expanded as it’s being copied, requiring additional time as well as the full disk space. None of this works for clone files, which can only remain cloned within the same APFS volume, of course.

The benefits of sparse and clone files

In terms of disk space used, the benefits of sparse and clone files aren’t as obvious as you might like. Because of their potential to swell to full size, sparse files can’t be copied to a volume that isn’t large enough to cope with that, but once they have been copied they only require their current size on disk. In that sense, telling the user in the Get Info dialog that a sparse file only occupies a small amount of disk space can build unrealistic expectations, although currently it’s the only means in macOS for the user to discover that file is stored in sparse format.

As far as the user is concerned, the greatest benefits come in speed of handling, and effects on SSD ‘wear’. Creating clone files is almost instant, even if they’re huge, and because of their efficiency in the use of storage extents they minimise erase-write cycles on SSD storage. Not informing the user that two files are clones of one another also avoids potential confusion that could arise if they were to think that clones behaved like hard-linked files, in that changing one of a pair of clones doesn’t change the content of the other.

User information

Sparse and clone files are essentially omitted from user documentation of macOS. One place I had expected Apple to provide information about the storage of disk images in sparse file format was in its explanation of different types of disk image and their creation. Although sparse bundles and sparse disk images are described as being “an expandable file that shrinks and grows as needed”, there’s no mention of flexibility of size for read/write disk images now that they’re stored as sparse files. Man hdiutil seems similarly unaware of this change that dates back to Monterey.

A little knowledge

The problem for users with sparse and clone files, like so many of the advanced features of APFS, is that knowing just a little is dangerous. An obvious example is giving figures for space taken on disk in the Get Info dialog. Armed with that information, but without deeper understanding, a user might expect to be able to copy a sparse file of 18 GB size on disk, and a full size of 100 GB, to a volume that has only 20 GB available. Equally, they’d be surprised when that same sparse file was copied to an HFS+ volume and exploded to its full size, or it was copied over a network and took forever to transfer the full 100 GB.

These difficulties are no less for the Finder, as illustrated by the behaviour of its progress dialog when copying a sparse file to another volume. For plain files, the amount of data to be transferred is the regular file size. For a sparse file, that depends on whether the transfer mode and destination support its sparse format. Even then, the copied file may not be the same size as the source, as demonstrated above.

Magic works best when the spectator either knows nothing about the sleight of hand involved, or is another skilled magician.