Orphaned snapshots: a growing problem?

In yesterday’s article, I described how I discovered that two of my Time Machine snapshots had apparently got stuck, and couldn’t be deleted when automatic backups tried to ‘thin’ those snapshots. That in turn was filling my log with error messages every time that Time Machine made an automatic backup. This article explains what happened, and how I fixed the problem.

Apple has stringent rules about which apps are allowed to make snapshots, and requires them to maintain their own snapshots so they don’t grow excessively. APFS snapshots are strange beasts, as they tend to grow over time. This is because they effectively consist of a copy of the APFS file system metadata, together with all the file data which has changed since that snapshot was made. This enables you to revert that volume to the exact state that it was in at the time that the snapshot was made. The further your volume departs from that state, the more disk data is required to return to that.

Time Machine makes a snapshot of each volume that it’s backing up every hour, and keeps those snapshots for only 24 hours. That effectively limits the size of the snapshots, so long as it can keep ‘thinning’, that is deleting, them. If one snapshot escapes this without you being aware of it, it can grow steadily until it takes a significant part of your storage space.

Unfortunately, Time Machine has no way of informing the user that it hasn’t been able to thin old snapshots, unless you resort to the command line or browse backups in the log. So unthinned snapshots could only make their presence felt as free space on your storage gradually shrinks. Third party apps like Carbon Copy Cloner don’t suffer this problem, as they show you all their (and other) snapshots, and allow you to track and delete them manually. It’s worrying that Time Machine doesn’t yet meet the same standards.

I therefore listed Time Machine snapshots in Terminal, using the command
tmutil listlocalsnapshots /
and the list returned started
Snapshots for volume group containing disk /:
com.apple.TimeMachine.2020-01-30-045455.local
com.apple.TimeMachine.2020-01-30-125442.local
com.apple.TimeMachine.2020-02-03-115455.local
com.apple.TimeMachine.2020-02-03-125456.local
com.apple.TimeMachine.2020-02-03-135455.local

and so on.

You can see the problem easily: there are two unthinned snapshots from 30 January, when the oldest should have been from 3 February. Attempting to delete those old backups using the command
tmutil deletelocalsnapshots 2020-01-30-045455
was unsuccessful.

Carbon Copy Cloner revealed a slightly different picture.

stucksnapshot

It showed the two stuck snapshots for the System, not the Data, volume, just as had been reported in the log, although they occupy no space in storage. This is expected, as they’re snapshots taken of the read-only System volume, which hadn’t changed since the Catalina 10.15.3 update, which I had performed on 28 January, two days previously.

Not only that, but Carbon Copy Cloner shows two phantom volumes each named Untitled at the lower left of its window. Maybe those are the two mounted snapshots which can’t be unmounted, therefore can’t be deleted? Carbon Copy Cloner was also unable to delete those two stuck snapshots.

At this stage, I went back and checked the log for the time when those snapshots were made by Time Machine, and a day later when they should have been removed. Doing so was a problem because, to my surprise, the unified log didn’t go back far enough in time to be able to browse entries from just a few days ago – an issue which I’ll return to tomorrow.

With no real clues as to what had gone wrong with the snapshots, and nothing else that I could do to remove them, I applied the universal solution: I restarted the Mac, and left Time Machine to run its next automatic backup. When I checked the list of snapshots using tmutil and Carbon Copy Cloner, those stuck backups were gone for good, as are the errors.

Time Machine needs similar controls over its snapshots to those provided by Carbon Copy Cloner.