Exploring sparse files and potential clones with Sparsity 1.2

You don’t need my little app to clone files in Big Sur, something easily achieved in the Finder. But where we do need further information is which files are clones. Sparsity 1.2 does just that, as you can now use its Crawler to identify clones as well as sparse files.

What’s a clone?

When changing an existing file, APFS prefers to use copy on write to minimise the amount of data it writes to the disk. When changed data are written out to an existing file, it’s written not to the block containing the original data, but to a different block. Not only that, but APFS only writes out as much new data as it needs to.

CoS1

Let’s say that an original document is stored in two blocks, and we make changes which affect only the contents of the second.

CoS2

Instead of APFS writing out new versions of both of those blocks, it only writes the changed block, and the new file is then composed of one new and one old block. So long as all three blocks are kept in storage, the file system can readily deliver either the original or the changed version of that file.

Now apply this to copying files. Rather than writing out a complete copy, provided the right system calls are used to create the copy, that won’t actually write any data to disk, but create a new file which gets its data from the old one. This happens pretty well instantly, as the amount of data written to disk is almost zero. From then on, each time any of the file data is changed, those changed storage blocks are written to disk. One block at a time, the two files grow steadily apart

Telling cloned files apart

As I’ve explained, Big Sur introduces a way that apps can check whether a file has been cloned or is itself a clone. This isn’t yet available at the command line, and Apple hasn’t yet extended any of its apps or tools to reveal this information.

It also has its limitations: this flag is set when the cloning first occurs, and doesn’t appear to be changed even though the clone may not share any common storage blocks, or has even been deleted. Until Apple or a third-party documents how to check whether a file currently shares storage with a clone, this is the best that we’ve got.

What Sparsity does

The primary purpose of my free utility Sparsity remains the creation of ‘demonstration’ sparse files, for which it uses a tiny window.

You’ll see what has changed when you open its Crawler window from the Window menu.

Sparsity can now not only search for sparse files, but clones as well. For each, according to how you set its checkboxes, it also reports the total numbers found, and their total size. The latter is given in terms of their fully expanded or decloned size. For example, my current ~/Documents folder reports:
Total 22 sparse files found in 71522 files scanned. Total size = 7.66 GB
Total 22345 clone files found in 71522 files scanned. Total size = 7.02 GB

If you’re suprised at the number of sparse files Sparsity finds, wait until you see how many have at some time been clones.

The value in knowing these is identifying potential problems with large sparse files or clones assuming their full sizes when copied to other storage. In the case of clones, that means any other volume, even one in APFS format, although as far as I can see at present Time Machine backups on APFS appear to be an exception which do preserve clones.

Sparsity version 1.2 is available from here: sparsity12
from Downloads above, and from its Product Page.

Have fun with it!