Last weekend, in the UK, we put our clocks back to GMT/UTC for the winter. As I reported later, that triggered a bug in Time Machine’s backups which caused one scheduled backup to fail. Although no big deal, as the nature of the bug is obscure, and the log entries remain puzzling, it bears further examination. This article starts by looking at system clock time, and its changes.
I made one significant error in that previous article: I stripped the time zone information from the log timestamps, which confused me into thinking that all the entries used the same +0000 timezone. They didn’t, so my first task was to establish when my Mac changed its clock.
One complication here is the log show
command, which I’m using indirectly through Consolation 3. If you work in Terminal, you can specify start and end times for log extracts with timezones; that’s a sophistication which, for the sake of simplicity, I have omitted from Consolation. That uses another supported time format when setting start and end times, HH:MM:SS. And when there’s a time change, that behaves oddly.
If I set the start time to 01:16:45, and the end time to 01:16:46, a period of just one second, log show
returns all log entries between 00:16:45 and 01:16:46 in UTC, as it decides that the start time is +0100, but the end time is +0000. Instead of getting one second of log, you get an hour and a second. But there’s also no way of specifying a time between those without using the full time format including timezone. log show
should surely use the same timezone for both start and end, which would most sensibly be that current at the time of making that call. That’s a second bug.
When I browse all those log entries, this is the (edited) sequence for the time change. At this time of the day, my Mac is normally unused, with the display sleeping, but not the system. It’s not until 02:09:00+0100 that macOS decides to do anything about the clock change, which results from a scheduled time check despatched by DAS:
02:09:42.422018+0100 DuetActivitySchedulerDaemon 0:com.apple.timed.ntp.wanted:35EF27:[{name: MemoryPressurePolicy, policyWeight: 5.000, response: {Decision: Can Proceed, Score: 0.50, Rationale: [{[memoryPressure]: Required:2.00, Observed:1.00},]}}] sumScores:33.010000, denominator:35.510000, FinalDecision: Can Proceed FinalScore: 0.929597}
02:09:42.422032+0100 DuetActivitySchedulerDaemon '0:com.apple.timed.ntp.wanted:35EF27' CurrentScore: 0.929597, ThresholdScore: 0.680332 DecisionToRun:1
So the time service timed
is run to check the time:
02:09:42.591101+0100 timed cmd,fetchTime,num,0,result,0,mach,616385934006709,tv_sec,1572138582,tv_usec,626723,delay,0.021936,dispersion,0.000214,more,1,ip,17.253.34.123,port,123,attempts,0,slept,0,use_service_port,1
02:09:42.615412+0100 timed timed cmd,fetchTime,num,1,result,0,mach,616385958316720,tv_sec,1572138582,tv_usec,652252,delay,0.024178,dispersion,0.000122,more,1,ip,17.253.34.125,port,123,attempts,0,slept,0,use_service_port,1
02:09:42.636299+0100 timed timed cmd,fetchTime,num,2,result,0,mach,616385979243209,tv_sec,1572138582,tv_usec,672021,delay,0.020790,dispersion,0.000198,more,1,ip,17.253.34.251,port,123,attempts,0,slept,0,use_service_port,1
02:09:42.658154+0100 timed timed cmd,SYNC,rtc_s,57863648.357542999,unc_s,0.000500000,mach,616385979717240
02:09:42.658229+0100 timed Received time Sun Oct 27 01:09:42 2019±0.01 from "NTP"
At this moment, the clock time reflected in log entries is still in summer time +0100, but as it’s now after 02:00:00, that should be set back an hour. Corrections are made first to the SMC:
02:09:42.766914+0100 kernel AppleSMCRTC setUTCTimeOfDay setting PMU offset
02:09:42.766916+0100 kernel AppleSMCRTC SMCRTC: setting PMU offset to 49619761047646
02:09:42.767220+0100 kernel AppleSMCRTC SMCRTC: set PMU offset to 49619761047646
Following those, the system time is changed, and the log timestamps switch to GMT/UTC +0000:
01:09:42.804816+0000 === system wallclock time adjusted
01:09:42.805090+0000 timed Setting system time to Sun Oct 27 01:09:42 2019 from TMTimeSynthesizer
After that, a series of times are changed to match, for example
01:09:42.807196+0000 CommCenter #I Got significant time change notification
01:09:42.808738+0000 KnowledgeMonitor Time changed; reevaluating sunrise/sunset times
What I find puzzling here is that, knowing that summer time ended at 02:00:00+0100 that morning, macOS left it to a routine scheduled time check, which occurred more than 9 minutes later, to perform that time change. Given the importance of synchronised times in some network operations, for example, this looks like a third bug to me. When such time checks were scheduled by cron
, I doubt that they’d be left unattended to for so long.
So how does this fit with the observed error in Time Machine backup? That’s where things get even more confusing. The log entries for those failed snapshots read:
02:01:27.947839+0100 backupd TimeMachine Failed to create APFS snapshot with name 'com.apple.TimeMachine.2019-10-27-010127' on volume '/Volumes/External2', error: Error Domain=NSPOSIXErrorDomain Code=17 "File exists"
02:01:27.948013+0100 backupd TimeMachine Failed to create APFS snapshot with name 'com.apple.TimeMachine.2019-10-27-010127' on volume '/', error: Error Domain=NSPOSIXErrorDomain Code=17 "File exists"
02:01:27.948727+0100 backupd TimeMachine Failed to create APFS snapshot with name 'com.apple.TimeMachine.2019-10-27-010127' on volume '/Volumes/External1', error: Error Domain=NSPOSIXErrorDomain Code=17 "File exists"
Note that those log entries were written when the system clock was still using summer time +0100, and are timed more than 8 minutes before macOS performed the time check which resulted in changing the system clock. Yet at that time, backupd
was assigning names to the snapshots which had already been set back to GMT/UTC +0000, and that is how the name collision occurred (the original bug). So the fourth bug is that the algorithm for naming these snapshots is not based on system time, but something different, which isn’t unique and can allow a collision to occur.
Finally, there’s still another twist to this story: the apparent discrepancy between my two utilities Consolation 3 and T2M2. As Ingo pointed out, as they both use the unified log to derive their information, they should have been in agreement as to how many backups should have been performed over this period.
However, the two apps differ in how they handle times obtained from the unified log. Consolation delivers the exact strings for the datestamp obtained from the log show
command, with whatever timezone adjustment they include. T2M2 works with the times it obtains from the log, so converts them into macOS internal Date values, which account for timezone changes, and allow it to calculate intervals between backups, and so on. So when the timezone changes, you may see small discrepancies between their output. For the whole of the rest of the year, that shouldn’t be the case.
To summarise the bugs which this has revealed:
backupd
should handle snapshot name collisions safely, so as to avoid any backup failing.log show
should handle HH:MM:SS formats for start and end times consistently, using the same timezone even when applied across timezone changes.- Scheduled summertime/DST changes should should be applied to system time at the correct time of 02:00:00, rather then being despatched when convenient by DAS.
backupd
should name snapshots so as to ensure that name collisions cannot occur, e.g. to append the timezone, or to use UTC/GMT rather than local time.
The above applies to macOS 10.14.6 Mojave, although I strongly suspect that the issues are identical in both High Sierra and Catalina.
Perhaps it is time to replace Occam’s Razor, or the law of parsimony, with a principle more appropriate to modern software and operating systems: bugs will necessarily be multiple. If you think you’ve found one, you can be fairly confident that you’ll find more. In a final twist of irony, to prove that law, when I was checking the source code of Consolation 3, I found no less than three completely unrelated bugs, which are now fixed in version 3.4.
Many thanks to Michael Tsai @mjtsai and Ingo for their helpful comments which made me dig deeper. At least I’ve made this hole significantly bigger and buggier.