Last Week on My Mac: APFS and pursuit of the free lunch

Ever since our ancestors first built shelters we’ve been in search for more storage space. The ingenuity displayed by designers of modern caravans reflects the effort we’re prepared to expend squeezing more into less, a pursuit for the most prized free lunch of them all.

The designers of APFS succumbed to the quest for that same free lunch. Apple’s brief overview for developers explains how:

  • “A clone is a copy of a file or directory that occupies no additional space on disk.”
  • “You can use this behavior, for example, to reduce storage space required for document revisions and copies.”
  • “Free Space Is Shared Between Volumes”
  • “This behavior lets files that contain blank sections, such as disk images and database dumps, be saved on disk more efficiently.”

There’s a warning, though, that this lunch might not be completely free:
“Computing the sum of each volume’s available free space isn’t a reliable way to determine the total free space within a partition. In general, check whether the space required to perform a particular operation is available on the volume, rather than trying to calculate the partition’s total free space.”

Over the last couple of weeks, I’ve been revisiting two of these techniques for saving storage space, sparse files and clones, both of which are almost entirely hidden from users and administrators by the opaque magic of APFS. In neither case does an app have direct control over whether the file system employs the technique: they’re both automatic when certain API calls are made, and only in the last few months have developers even been able to detect whether their files are stored in sparse format or have been cloned. Users are told of the free lunch, but not its hidden costs.

Sparse files are of course common features of other file systems, but their more demanding requirements mean that in many circumstances they’re little used. In a survey of my ~/Documents folder, of over 70,000 files only 22 were found to be sparse, with a total expanded size of nearly 8 GB. Calculate their sparsity ratios, equivalent to the data compression ratio of expanded size against size taken on disk, and few exceed 2.0. Some specialist files, particularly databases, far exceed that, but it’s unusual for these to cause problems even when copied to file systems which don’t support sparse formats.

Clones are much more common. Over 30% of the files, totalling 7 GB, in that same ~/Documents folder have at some time in the past been clones or cloned. But there’s no currently accessible way of determining how much that free lunch could eventually cost, as determining how much storage is allocated to more than one file is something confined to the file system, if APFS even knows. They’re an irresistable side-effect of the efficient implementation of copy-on-write in APFS: the principle of only writing out changed blocks makes this affordable on media whose life is limited by the number of times its blocks can be erased.

Because expansion of sparse files and clones to full size is controlled by the file system, the only safe course for apps, users and administrators is to budget according to their fully expanded size, rather than that occupied on disk. Trying to fill any disk to the currently unused space is sooner or later going to result in conflict, when additional space is required to add to a changed cloned file or a sparse file gaining more data.

If freeing up storage doesn’t return any more usable free space, what are the real benefits then to sparse files and clones?

Apple identifies those in terms of efficiency:

  • “Clones let you make fast, power-efficient file copies on the same volume.”
  • “This behavior lets files that contain blank sections, such as disk images and database dumps, be saved on disk more efficiently.”

Cloning large files is an impressive sleight of file system. With just a few bytes being written to the file system, it looks like a whole new 100 GB file has been copied in an instant. It isn’t a real saving, of course, unless the copy functions like a hard link, and never changes. If either original or copy are changed much, this saving turns into a debt which is steadily repaid with every edit. There’s no debt incurred in speed, though, and most importantly it minimises the amount of storage which has to be erased, so reducing SSD wear and extending working life.

When used appropriately, sparse files are far more efficient in every respect. Because their savings aren’t tied to that specific volume but baked into the format, they can get away without ever repaying their debt on Macs which have abandoned file systems like HFS+. But for those users still backing up to or working with systems which don’t enjoy support, sparse files will bring shocks, such as a backup of what seemed like 2 GB which expands during copying to ten or even a hundred times that size. Those bugs need to be addressed.

The other hidden cost of both sparse files and clones is increasing complexity and user confusion. This hasn’t been helped by macOS’s recent history – since the arrival of APFS – of being unable to give a consistent and sensible account of free and used space on storage.

Just as the caravan designer can fold a 2.5 metre bed into a 2 metre trailer, you still can’t expand it to full size without making the trailer bigger. There’s always some cost to that lunch after all.