Finding and removing duplicate files has long been one of the mainstays of Mac housekeeping. Not only did they waste space on the disk they were stored on, but they wasted it again when backed up. For those without the time and patience to do this manually, there are plenty of housekeeping utilities you can pay for, some requiring substantial subscriptions. The activity even won its own name: deduplication.
When APFS came to Macs, that all changed. One of its more subtle features is the clone file, which is easily demonstrated. Now deduplication and those utilities are at best questionable, if not an utter waste of time, effort and money.
Demonstration
Find or create yourself a hefty file around 10 GB size. Don’t use a format that is in fact a bundle or folder in disguise, like RTFD, as that can make this more complicated: a solid, monolithic file is ideal. Then create yourself a test folder on the same volume, and drag-copy the file into it to make the first clone. At this stage, with just the one clone, open Disk Utility, select that volume and click on its Info tool. Copy from that window the volume sizes for later reference.
Back in the Finder, select the first clone and press Command-D several times to clone it again and again. Note how those duplicates are created instantly, without the time required to copy 10 GB of data.
Select that folder in the Finder, and Command-I to Get Info on it. The Finder will tell you that it occupies 50 GB or whatever, as if each of those clones is taking their full size in disk space. Now go back into Disk Utility and get fresh size measurements for that volume. Although this can be complicated by background file activity on some volumes, you should see that there has been no increase in the amount of space used in that volume, for all the 50 or so GB it appears to have gained.
If you happen to have a copy of Sparsity around, use its crawler to demonstrate that all those files are marked as clone files.
Before you trash that test folder, copy the volume space figures again, and check them after you have emptied the Trash. That’s the only disappointment: as those clones never took any space, you don’t get any back when you remove them.
Test
If you’ve paid for a housekeeping utility, now is the time to check whether it can tell you that those files are clones, and there’s no value in deleting them, although they look like duplicates. If it’s smart enough to recognise that, then your money may have been well spent. If it claims that they’re copies and proposes deleting them, you might like to reconsider any subscription you’re paying for that software.
Explanation
When a copy is made of any file, if two criteria are met, then APFS will try to make a clone, not a copy:
- both the original and copy files must be on the same APFS volume, so sharing the same file system;
- copying must be performed using either of two specific commands, both forms of
copyItem()
, in the FileManager.
In practice, these include all copies and duplicates made within the same volume by the Finder, and most made by apps. This should also apply to whole folders, provided that they’re copied according to the same rules.
Unlike hard links, clone files are separate files, with distinct inodes, but when first created they share the same data. If you then make changes to either the original or copy, copy-on-write saves just the changed data blocks for the file that has changed. The more changes are made to a clone, the more new data blocks it uses, until eventually all its data could be different, and the two files each occupy their own space, equal to the sum of the sizes of those two files.
There’s another simple way to break clones: take your original demonstration folder of clones and copy them to another volume. You’ll notice from the time taken to copy each that they become separate files in the process. Copy them back to the original volume, and they’ll remain separate, and the folder size given by the Finder will be accurate and reflected in Disk Utility’s figures.
macOS has another trick up its sleeve: Time Machine backups should recognise cloned files, and only back up one set of data for them, provided that it’s backing up to APFS. So not only do clone files not waste any space on their volume, but they don’t waste any in backups either.
Finally, why don’t some housekeeping utilities recognise clones? That’s because there’s no official way to tell them apart. There is a flag that records whether a file has ever been cloned, reported by Sparsity and Precize, but it doesn’t tell you whether it’s still a complete clone, or has been changed, nor the identity of the other file of that clone pair.
Deduplicating macOS is therefore something of a lost cause when you may be dealing with clones, not copies.