APFS: How sparse files work

When sparse files were first introduced in APFS, they appeared to be unusual if not rare. Since then, changes in the way that macOS manages disk images and other improvements have made them more widespread. They’re now commonly found in:

many databases,
disk images, including standard UDRW read-write disk images,
most virtual machines.

Because they’re transparent to the user, they’re easily overlooked. I recently opened Finder’s Get Info for a folder where I keep a small selection of virtual machines (VMs) for an Apple silicon Mac. Although their benefits are amplified here by the fact that some of those VMs have also been cloned, the nominal total size of that folder is 1.4 TB, although its contents only take 337 GB on disk, less than a quarter of their full size. That’s on an internal SSD of 2 TB with 1.4 TB free space. Without the storage efficiency of sparse files and clones, there would only be 360 GB of free space on that SSD.

sparsevms

Sparse files in APFS

Sparse files achieve their amazing economy by storing only the real data in a file consisting of substantial amounts of unused space. At their extreme, you can create a sparse file of 10 GB containing just a single block, 8 KB, of data, with the remainder unused. In regular format that file would require just over 10 GB of storage; in sparse format it takes just 8 KB on disk.

To achieve this, APFS does very little indeed. The file’s inode contains the INODE_IS_SPARSE flag, and in its extended-field the number of sparse bytes in the data stream, INO_EXT_TYPE_SPARSE_BYTES, is given as an unsigned 64-bit integer.

The trick is accomplished in the file’s extent map, which gives the offset in the file’s data in bytes, against the physical block address that the extent starts at. To return to the example 10 GB sparse file, its inode has the INODE_IS_SPARSE flag set, its extended-field gives the number of sparse bytes in the file, and its file extent map gives the physical block address for the non-null data at the offset at the end of the file. There’s no need for any additional metadata.

Writing a sparse file

For a file created by an app to be a sparse file, the following criteria must be met:

the file must be created using the FileHandle class for writing;
the nominal size of the file must exceed that contained in a single storage block, which determines the minimum size of a sparse file in any given APFS file system;
for the data to be stored in sparse format, voids (null data) within it must be created by seeking, rather than writing blocks of bytes such as 0x00.

This is straightforward using Swift or another appropriate language. First get the default FileManager
let fm = FileManager.default

Then create the new file at the URL url; this creates the inode in the file system
fm.createFile(atPath: url.path, contents: nil, attributes: nil)

Get a FileHandle object for writing to the file at that URL
let theFHandle = try FileHandle.init(forWritingTo: url)

Write the first block of non-null data to the start of the sparse file; this writes a block of data that’s recorded in the file extent map
theFHandle.write(data1)

To insert sparse data following that, seek to an offset in the file, at its end; this skips through the file to that new offset
theFHandle.seek(toFileOffset: offsetAtTheEnd)

Write a second block of data at the end; this block of data is also recorded in the file extent map
theFHandle.write(data2)

Close the FileHandle
theFHandle.closeFile()

At the end of this, the file has its INODE_IS_SPARSE flag, with INO_EXT_TYPE_SPARSE_BYTES recording the number of sparse bytes in the inode extended-field, and the file extent map records one storage block for the first block of data at the start of the file, and one for the second block of data at the end of the file.

Why sparse disk images?

Disk images often contain substantial amounts of free space that would be ideal for storage as a sparse file. Their creation is an ingenious combination of trimming free space inside the disk image, then saving the resulting file in sparse format.

When a UDRW disk image is first created, it’s written as a single file of the size set by the user. When that disk image is mounted for a second time, provided that its internal file system is HFS+ or APFS, it’s automatically trimmed so that all its unused blocks are coalesced into a single chunk of unused space. That doesn’t happen on its initial mount, because trimming is only triggered automatically when its file system is mounted, neither does it ever happen with internal FAT or ExFAT format disk images, for which trimming isn’t supported.

Once that trim has been performed, the disk image is then saved in sparse format, omitting the space occupied by its unused storage blocks.

Sparse files can readily explode

Because some methods of copying sparse files may not use FileHandle objects with calls that preserve their sparse format, they can inadvertently explode to their full size. When Time Machine backs up sparse files to APFS backup storage, their format is preserved, as it is when they’re restored to an APFS volume. Small changes may occur as a result of differences in block size reflected in file extents, though. I’m pleased to confirm that Carbon Copy Cloner also retains sparse file format in its backups and restores, and I suspect that SuperDuper! may well do so, although I haven’t tested that.

It’s easy to demonstrate what happens when copying doesn’t preserve the sparse format: create a sparse file, for example using Sparsity, and copy that across to another Mac using AirDrop. While the original will be seen to occupy far less space on disk that its nominal size, the copy on the other Mac will no longer be sparse, and what’s even worse copying the file takes the longer time expected for its nominal size, as it copies across all the null data.

In some circumstances, exploding sparse files can have serious consequences. Returning to the example of my folder of VMs, if I were to restore that from a backup that didn’t preserve all those sparse files, free space on that SSD would collapse from 1.4 TB to 360 GB.

Tools

Sparsity creates test sparse files and can discover which files in any given folder are in sparse format;
Precize provides full information about files, including whether they are sparse or clone files.

11Comments

Add yours

1

Will on June 8, 2024 at 8:53 am

Are there any good CLI tools that can create&manipulate sparse files?

LikeLiked by 1 person
- 2
  
  hoakley on June 8, 2024 at 9:18 am
  
  I don’t know of any way to create a sparse file at the command line – it’s rather deeper than command tools go.
  I believe that rsync has options that allow their faithful copying. Otherwise you’ll need to write your own tools, I’m afraid. Did you have anything specific in mind?
  Howard.
  
  LikeLike
  - 3
    
    Will on June 8, 2024 at 10:46 am
    
    I use nix instead of homebrew. And one trick I do with it is to create a postgres db for my project with all the tables and test fixture data, then shove all those files it creates in the nix store. Then for tests, I copy out a copy of it to a temp directory, then throw it away at the end of the test, so I don’t have to pay the time costs of creating a new db each time. I figured maybe it’d be nice to sparsify all the db files to save a tiny bit of space, but also make the copys out to the temp dir a few microseconds faster.
    
    It’s probably not actually worth it, all things considered, but could be neat.
    
    LikeLiked by 1 person
    - 4
      
      hoakley on June 8, 2024 at 1:08 pm
      
      Well, the long-standing caution is that you can’t (without doing a lot of work) convert a non-sparse file to sparse format. Even in Swift, it’s very difficult, and I don’t know of general way to do so.
      Howard.
      
      LikeLike
- 5
  
  Krzysztof on June 8, 2024 at 10:00 am
  
  You can create sparse file with truncate(1).
  
  truncate -s +1000M sparsefile
  
  LikeLiked by 1 person
  - 6
    
    hoakley on June 8, 2024 at 10:35 am
    
    Thank you.
    Howard
    
    LikeLike
7

Simon on June 8, 2024 at 2:29 pm

What happens if you get a small sparse file onto an APFS volume that when exploded would occupy more than the free space left on that volume? I suppose all good until the sparse file is exploded. Is there a way to force explode it in place? Just curious, not saying this would be an actual issue.

LikeLiked by 1 person
- 8
  
  hoakley on June 8, 2024 at 3:35 pm
  
  I haven’t looked at many of these scenarios yet, but what I suspect is:
  – if you try to create a sparse file whose nominal (max) size is greater than free space, although it could be accommodated in sparse format, I think you’d get an error for insufficient free disk space.
  – it’s unusual for a sparse file to explode in place. Normally that happens when copying between volumes, and is handled in the usual way: as the source is being read to stream across to the new volume, the file explodes to nominal size, and the copy is refused because of insufficient free space on the destination.
  – if an action were to be undertaken that would increase the space on disk beyond that available, then that would result in an insufficient free space error, and would fail.
  Of course, in the situation that you’re restoring from a backup, that would lead you in the awkward situation where the restore failed because of insufficient free space. But that could happen for other reasons, too.
  Howard.
  
  LikeLike
  - 9
    
    Simon on June 9, 2024 at 12:27 am
    
    Thank you, Howard.
    
    LikeLiked by 1 person
10

Iljitsch van Beijnum on June 16, 2024 at 6:14 am

Howard, turns out it’s actually not hard to deallocate unused blocks from a file and thus make it a sparse file.

I found a little utility written in C for Windows and Linux that takes an SQLite file and deallocates unused pages in the database. I was able to make this work on the Mac using:

fcntl(fd, F_PUNCHHOLE, &punchhole)

See https://github.com/iljitschvanbeijnum/sqlite_sparse/blob/master/sqlite_sparse.c

LikeLiked by 1 person
- 11
  
  hoakley on June 16, 2024 at 12:59 pm
  
  Congratulations! I think what that does is directly change the extents, not something you can do using any of the standard APIs from Swift or Objective-C.
  Howard.
  
  LikeLike