Sparse files are common in APFS

If there’s one thing we’re learning about APFS, it’s that file sizes are flexible. That means that free space comes and goes, and sometimes copying 800 GB of files onto a 1 TB disk runs out of space.

There are many reasons for problems with free space, but file sizes aren’t fixed for three main reasons:

  • locally-copied files are often cloned, and only become full-sized when both copies are put on a different volume;
  • some files are stored compressed, a feature which was introduced into HFS+;
  • some files are stored in a special sparse form, which can squeeze 100 MB into as little as 29 KB.

This article looks at this last category, APFS Sparse Files, which have been generally thought to be rare. In Big Sur they’re not at all unusual, just difficult to identify.

How Sparse Files work

Increasing numbers of files written by all sorts of different apps and services consist of large voids, between islands of meaningful data. Storing lots of void data is wasteful, so what APFS tries to do is store only the real data. It does this transparently to both the software developer and the user: currently, Finder’s Get Info dialog doesn’t show whether any file is stored in sparse format, although it may give some strong clues when the file size is much larger than the figure given for size ‘on disk’. Similarly, no indications are given with commands in Terminal other than the same disparity in size.

tmsparse1

How Sparse Files stop working

Once a sparse file has been created, it can be modified to fill in the original voids in its data, forcing the size stored to rise even though the total size of the file hasn’t changed. Sparse files are kept in sparse format as much as possible, and when copied or duplicated within the same volume should be kept in sparse format. Copying them between different volumes and disks isn’t so predictable, and sometimes leads to them ‘exploding’ to full size. That is normal when they’re copied to file systems like HFS+ which don’t support sparse files, and to iCloud.

You should expect all sparse files to be expanded fully when they’re backed up to HFS+ disks, as with Time Machine prior to Big Sur, which may not estimate their expanded size correctly either, as I have described. Expansion takes place at the source of a copy: for example, if you copy a sparse file from your internal APFS disk to an external disk in HFS+ format, the full expanded size of data will have to be copied across to the external disk.

Backing up a sparse file to an APFS backup store is preferable, as that can preserve the compact format. This can be identified in log entries reporting the numbers and sizes of items copied in a backup. For example, the entry
2 Files Copied (l: 10 GB p: 66 KB)
shows that the listed size of those two files was 10 GB, but the actual size transferred was only 66 KB. The time taken for such backups also reflects the small amount of data transferred.

How any app can create a sparse file

Throughout my quest for these elusive sparse files, I had assumed that only certain apps could create them. That isn’t true: macOS now defaults to creating all files in sparse format when certain conditions are met. This means that an SQLite database could easily be a sparse file, and so could some of your documents. Apple explains this here.

For a file created by an app to be a sparse file, the following criteria must be met:

  • the file must be created using the FileHandle class for writing;
  • the original expanded size of the file must exceed that contained in a single storage sector;
  • for the data to be stored in sparse format, voids within it must be created by seeking, rather than writing blocks of bytes such as 0x00.

Creation of sparse files is a feature of APFS, and is transparent to the app. APFS decides which are created as sparse files, and that can’t be directly manipulated by the app or the user.

Minimum size of sparse files varies considerably. On my iMac Pro, for example, any file meeting the other two criteria will be in sparse format when it’s original size is greater than 8 KB. On my M1 Mac mini, that threshold is 16 MB.

The first and second requirements are dependent on the code used to create the file and to add data to it. As an example, this is the code used by Sparsity to write sparse files:
fm.createFile(atPath: url.path, contents: nil, attributes: nil)
let theFHandle = try FileHandle.init(forWritingTo: url)
let theData1 = Data.init([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
let theData2 = Data.init([10, 9, 8, 7, 6, 5, 4, 3, 2, 1])
theFHandle.write(theData1)
theFHandle.seek(toFileOffset: ((theSizeUnits * theSizeVal) - 10))
theFHandle.write(theData2)
theFHandle.closeFile()

How to test whether any file is sparse

In APFS, sparse files are distinguished by a flag, INODE_IS_SPARSE, in the inode flags (j_inode_flags). Although those are generally opaque, and not readable for example using FileAttributeKeys, support has been added to URLResourceValues in Big Sur for the key isSparseKey, which can be read using code of the form
let theRes = try theSourceURL.resourceValues(forKeys [.isSparseKey])
if theRes.isSparse != nil {
if theRes.isSparse ?? false {
// do what you want with the sparse file
}}

Apart from guessing from file size disparity, there appears to be no way for the user or a script to test whether any given file is sparse, and could suddenly grow in the disk space it requires when being copied or manipulated. If you can’t tell whether a file is sparse, then the only safe approach is to trust total file size as a reliable indicator of actual file size. Then if the file is sparse and is fully expanded, there are no nasty surprises.

Sparsity

Last year, I provided a simple utility named Sparsity which can write test sparse files. With the introduction of isSparseKey in Big Sur, I now offer version 1.1 which can crawl volumes and folders hunting down sparse files. As this relies on isSparseKey, this version requires macOS 11, and doesn’t run on earlier versions of macOS from High Sierra to Catalina, even though they support sparse files too.

Sparsity version 1.1 is available from here: sparsity11
from Downloads above, and from its Product Page.

When Sparsity finds a sparse file, it reports its expanded size, the size currently required on disk, and the ratio of the first to the second, as the sparsity ratio, such as
/Volumes/External1/Documents/0newDownloads/Sparsity.data ① 100 GB ② 8 KB ③ 12207031.0
① is the fully expanded size of that sparse file, ② is the space it currently requires on disk, and ③ = ①/② rounded to the nearest whole number. The sparsity ratio is similar to a compression ratio, and in this example exceptionally high. Ratios of nearer 2.0 are more common.

When crawling around your Mac, you’ll notice that a lot of apparently normal files are reported as being sparse files, even though their sparsity ratios are around 1.0. At first I thought they might simply be false positives, but the more I look at these, the more I realise that the sparse file format has become one of the main types of file in APFS. We need to get used to their quirkish behaviour.