Time Machine beaten by the clock: backup fails due to bug

When I woke the screen on my iMac Pro last Sunday morning, I was shocked to see this notification:

tmerror01

As might be expected, when I clicked in the Details button, the Time Machine pane opened and there was no further information given. So I opened my own utility, T2M2, and checked for problems making backups over the last eight hours. According to that, there weren’t any errors.

tmerror02

So which was right? As it turned out, both of them, and neither.

Looking in more detail at the T2M2 report, 8 auto backups had been started over that time, of which only 7 had completed successfully. Not only that, but there should have been 3 snapshots made for each of those 8 backups. That would have resulted in a total of 24 snapshots over that period, but only 21 were found. It looks as if one of those automatic backups had failed to create its three snapshots, as a result of which it had aborted.

This happened to occur during the night of 26-27 October, when we put the clocks back. Could that have been the cause, perhaps? How could changing local clock time (as shown in the menubar) have caused snapshots to fail? Or was I just leaping to the wrong conclusion?

The answers became more clear when I browsed the log using Consolation 3.

Here’s a very brief outline of relevant log entries made in the automatic backup preceding the error:
01:00:41.863 Starting automatic backup
01:01:27.685 Created Time Machine local snapshot with name 'com.apple.TimeMachine.2019-10-27-010127' on disk '/Volumes/External2'
01:01:27.798 Created Time Machine local snapshot with name 'com.apple.TimeMachine.2019-10-27-010127' on disk '/Volumes/External1'
01:01:30.128 Created Time Machine local snapshot with name 'com.apple.TimeMachine.2019-10-27-010127' on disk '/'
01:03:19.356 Marked as reference snapshot: com.apple.TimeMachine.2019-10-27-010127
01:03:19.485 Completed snapshot: 2019-10-27-010317
01:11:25.977 Backup completed successfully.

Times here are given in UTC, which would a minute earlier have been an hour behind local time, so this backup would have actually started at 02:00:41 according to local clock time had the clocks not been changing that night. However, the clock change occurred at 02:00:00 local time, so for the second time that morning, the local time would have been showing 01:00:41, this time the same as UTC.

These entries show Time Machine successfully making its three snapshots, and naming each com.apple.TimeMachine.2019-10-27-010127, in accordance with both UTC and local time, the latter having just changed.

It was the next automatic backup which failed, though. Because there was nothing else going on at the time, by chance it was despatched to start almost exactly an hour later, at just after 02:00:00, both UTC and local time.

02:00:42.249 Starting automatic backup
02:01:27.947 Failed to create APFS snapshot with name 'com.apple.TimeMachine.2019-10-27-010127' on volume '/Volumes/External2', error: Error Domain=NSPOSIXErrorDomain Code=17 "File exists"
02:01:27.948 Failed to create APFS snapshot with name 'com.apple.TimeMachine.2019-10-27-010127' on volume '/', error: Error Domain=NSPOSIXErrorDomain Code=17 "File exists"
02:01:27.948 Failed to create APFS snapshot with name 'com.apple.TimeMachine.2019-10-27-010127' on volume '/Volumes/External1', error: Error Domain=NSPOSIXErrorDomain Code=17 "File exists"
02:01:27.948 Failed to create local snapshots
02:01:27.948 Failed to create or mount stable snapshot.
02:01:27.959 Backup failed with error 49: 49

By an odd quirk of fate, this backup chose to try to create its three snapshots at 02:01:27. For a reason I can’t fathom, it decided to name those snapshots as if it was an hour ago, 01:01:27, which gave them identical names to those made during the last backup. Those snapshots failed because of the name conflict, and the whole backup was aborted as a result.

The next backup, started almost immediately after that failure, returned to normal, and selected the correct (UTC and local) time to use in the snapshot name.

02:03:42.956 Starting automatic backup
02:03:43.312 Created Time Machine local snapshot with name 'com.apple.TimeMachine.2019-10-27-020343' on disk '/Volumes/External2'
02:03:43.346 Created Time Machine local snapshot with name 'com.apple.TimeMachine.2019-10-27-020343' on disk '/'
02:03:43.415 Created Time Machine local snapshot with name 'com.apple.TimeMachine.2019-10-27-020343' on disk '/Volumes/External1'
02:06:49.279 Backup completed successfully.

When I first saw the error occur in the log, I thought that this was a simple case of Time Machine not using UTC times when naming its snapshots. Because only UTC provides unique times, and local times play evil tricks at least twice a year, software which could be running at a time when local time is adjusted for season, or when changing time zones, is likely to get into trouble.

What is odd here is that, in naming snapshots, Time Machine seems to have been doing the right thing. But whatever it was doing, an hour after local time was put back an hour, it too dropped back an hour when it shouldn’t have done. That caused a name clash with those snapshots, which couldn’t be made, and the whole backup had to be cancelled. As it happened, at that time in the morning, I was asleep in bed, and no harm was done. But the code which decides on the names of snapshots is clearly faulty. I hope that its bug(s) don’t affect any other backups.

Every so often, users ask me why T2M2, Consolation, and my other utilities insist on working in UTC rather than local time. I appreciate that it’s not always the most convenient, but this is an excellent example of why it’s important to use a consistent system which always delivers unique times. Can you imagine the chaos which would have resulted if either T2M2 or Consolation had been trying to use local time here?