Time Machine to APFS: When a network backup goes wrong

Being more complex and dependent on other systems, making Time Machine backups to shared storage on your network is more prone to fail. As I’ve spent much of the day sorting one such failure out, I thought it might be useful to discuss what went wrong and what went wronger.

Scenario

My M1 MacBook Pro (MBP) doesn’t normally have external storage attached, so I’ve configured it to make backups to a shared backup volume on my M1 Mac mini, using Time Machine to APFS (TMA) running over Wi-Fi through my adjacent router. As it’s unusual for both M1 Macs to be running at the same time, with the shared disk attached to the Mac mini, the last such backup was a couple of weeks ago (please don’t tut me). Since then there hasn’t been a great deal of change on the MBP apart from updates to macOS 11.3.1 and Xcode 12.5.

Failed backup

The day started well, with TMA getting stuck into its backup of 25.53 GB at around 07:50. I knew that with Xcode to back up this was going to take several hours, despite the TM pane insisting that there was only about an hour remaining. That was at 08:23, 09:06, 10:19, even 12:08, at which time it had still only completed copying 80% of the data.

Looking at the T2M2 speed records, this backup started hopefully at a little before 07:52, reached a peak transfer rate of 9.38 MB/s just after 08:00, then hit Xcode at 08:17. From then until 12:35 it plodded through copying Xcode as I’ve seen before. But with less than 400 items to go, the router decided to reset itself, and the backup failed at 12:39.

TMA then spent the afternoon trying to recover from the failure, according to the TM pane “cleaning up”. The log showed no sign of purposeful activity, and further scheduled backups were all cancelled, apparently because TMA was still backing up. I restarted the MBP in a bid to end those and restore normal backups, but that failed to achieve anything either, the TM pane still claiming that it was cleaning up.

I wondered if it might help to add Xcode to the TM exclusion list, in the hope that before the day was done I might complete at least one backup. After I did that, the TM pane switched from cleaning up to “preparing” a backup which never came.

Restarting backups

The solution was to disable automatic backups altogether in the TM pane, wait a while, then turn them back on again. TMA then started a countdown to the next backup, for which there was a long period of “preparing backup” which started at 17:22.

TMA then checked for “runtime corruption” of the backup sparse bundle, which took 90 seconds to complete successfully. The sparse bundle was then mounted, and a routine automatic backup occurred. The last ‘incomplete’ backup (from 28 April) was found and deleted, and FSEvents was chosen as the strategy for determining what should be backed up.

The backup performed (which this time omitted Xcode) amounted to 517 MB in 3260 items. Copying started at 17:48:55 and completed at 18:00:45, with the whole backup declared completed at 18:01:45. Following that, Spotlight indexing took place as normal, and Time Machine had apparently recovered. During this backup, the TM pane hopelessly overestimated the time required to complete the backup, at over half an hour.

Conclusions

When a network backup fails in mid-flight, TMA may be left in a deadlocked state in which it’s trying to clean up the failed backup, but can’t do so on its own.

The best way to resume normal backups in such situations is the time-honoured trick of turning TMA off and back on again.

When TMA detects a failed backup, it should automatically check for runtime corruption in the backup sparse bundle. This is extremely reassuring to know, although what it does if the sparse bundle has been damaged is unclear (and I’d rather not discover).

macOS 11.3.1 hasn’t fixed any of the problems with backing up huge file trees such as those found in Xcode, over a network. If those problems are the result of bugs or shortcomings in SMB, then they haven’t changed.

If you have Xcode installed, add it to your TM exclusions list now.