The Time Bomb Ticking in Sierra: Grand Central Dispatch is broken

If you are running macOS Sierra – any version from the original 10.12 release to the current 10.12.3 – and leave your Mac running all the time without letting the system sleep, you are likely to encounter a serious bug in its scheduling of tasks. This may make Sierra systems unfit for use as servers, requiring them to be restarted every 10-20 days to work around this bug.

From the early days since Sierra’s release, I have noticed that Time Machine backups can become irregular. At first, I assumed that was intentional, and part of some scheme to make them work on demand, rather than simply by the clock. More recently, I have noticed that this only happened when my iMac had been running continuously, without system sleep, for periods of around 14 days.

Without ready access to Sierra’s new log system, it is extremely difficult to discover what is happening in situations like this. Now that is available with my utility Consolation (available from the Downloads page in the top menu), it was only a matter of time before I would be able to study what happened when backups became irregular. When they did so early yesterday morning, I was ready to obtain some log extracts to discover what had failed.

At 02:14 on 24 February, Time Machine made its last scheduled backup without significant error:
02:14:27.483262+0000 backupd: (TimeMachine) [com.apple.TimeMachine.TMLogInfo] Starting automatic backup
02:14:27.497811+0000 backupd: (TimeMachine) [com.apple.TimeMachine.TMLogInfo] Backing up to /dev/disk3s2: /Volumes/PROMISE PEGASUS/Backups.backupdb
02:14:27.917536+0000 backupd: (TimeMachine) [com.apple.TimeMachine.TMLogInfo] Will copy (8.8 MB) from Macintosh HD
02:14:27.917889+0000 backupd: (TimeMachine) [com.apple.TimeMachine.TMLogInfo] Found 108 files (8.8 MB) needing backup
02:14:27.952353+0000 backupd: (TimeMachine) [com.apple.TimeMachine.TMLogInfo] 14.94 GB required (including padding), 2.99 TB available
02:15:04.685828+0000 backupd: (TimeMachine) [com.apple.TimeMachine.TMLogInfo] Copied 118 items (8.8 MB) from volume Macintosh HD. Linked 4920.
02:15:06.160280+0000 backupd: (TimeMachine) [com.apple.TimeMachine.TMLogInfo] Created new backup: 2017-02-24-021504
02:15:06.537408+0000 backupd: (TimeMachine) [com.apple.TimeMachine.TMLogInfo] Starting post-backup thinning
02:15:06.537437+0000 backupd: (TimeMachine) [com.apple.TimeMachine.TMLogInfo] No post-backup thinning needed: no expired backups exist
02:15:06.569439+0000 backupd: (TimeMachine) [com.apple.TimeMachine.TMLogInfo] Backup completed successfully.

The next backup did not start until shortly after I woke the iMac from display sleep just before 07:00:
07:00:26.301630+0000 backupd: (TimeMachine) [com.apple.TimeMachine.TMLogInfo] Starting automatic backup
07:00:26.316105+0000 backupd: (TimeMachine) [com.apple.TimeMachine.TMLogInfo] Backing up to /dev/disk3s2: /Volumes/PROMISE PEGASUS/Backups.backupdb

and so through to
07:06:16.743302+0000 backupd: (TimeMachine) [com.apple.TimeMachine.TMLogInfo] Backup completed successfully.

As is often the case when backups become irregular, another backup was then scheduled too soon after that:
07:06:16.811196+0000 backupd-helper: (TimeMachine) [com.apple.TimeMachine.TMLogInfo] Not starting scheduled Time Machine backup: Less than 10 minutes since last backup

I have my tool Blowhole being run hourly by launchd, and that continued to run normally throughout the night:
01:03:25.368631+0000 blowhole: [co.eclecticlight.blowhole.general] Blowhole snorted!
02:03:25.370928+0000 blowhole: [co.eclecticlight.blowhole.general] Blowhole snorted!
03:03:25.366645+0000 blowhole: [co.eclecticlight.blowhole.general] Blowhole snorted!
04:03:25.371652+0000 blowhole: [co.eclecticlight.blowhole.general] Blowhole snorted!
05:03:25.368168+0000 blowhole: [co.eclecticlight.blowhole.general] Blowhole snorted!
06:03:25.371388+0000 blowhole: [co.eclecticlight.blowhole.general] Blowhole snorted!
07:03:25.363190+0000 blowhole: [co.eclecticlight.blowhole.general] Blowhole snorted!

Following my previous scheme, I then looked at the initiation of backups. Strangely, there was no evidence of any attempt during the period that backups were not made:
02:15:06.569439+0000 backupd: (TimeMachine) [com.apple.TimeMachine.TMLogInfo] Backup completed successfully.
02:15:06.491497+0000 backupd-helper: (CoreFoundation) Loading Preferences From System CFPrefsD For Search List
06:59:56.273217+0000 backupd: (libsystem_trace.dylib) subsystem: com.apple.siri, category: Intents, enable_level: 1, persist_level: 1, default_ttl: 0, info_ttl: 0, debug_ttl: 0, generate_symptoms: 0, enable_oversize: 0, privacy_setting: 0, enable_private_data: 0
06:59:56.274917+0000 backupd: (CoreFoundation) Loading Preferences From System CFPrefsD For Search List
06:59:56.276761+0000 backupd: (CoreFoundation) Loading Preferences From System CFPrefsD For Search List
07:00:26.301630+0000 backupd: (TimeMachine) [com.apple.TimeMachine.TMLogInfo] Starting automatic backup

Exactly the same pattern emerges in CTS (the Centralised Task Scheduling service), with the same long gap between 02:15 and 06:59:
02:15:06.577575+0000 UserEventAgent: (com.apple.cts) [com.apple.xpc.activity.All] Completing DASActivity: 0:com.apple.backupd-auto
02:15:06.571134+0000 UserEventAgent: (com.apple.cts) [com.apple.xpc.activity.All] Rescheduling XPC Activity: com.apple.backupd-auto
02:15:06.577757+0000 UserEventAgent: (com.apple.cts) [com.apple.xpc.activity.All] Submitting DASActivity: <_DASActivity: "0:com.apple.backupd-auto", Utility, 60s, [2/24/17, 3:05:09 AM - 2/24/17, 3:35:09 AM], Plugin Required, Dark-Wake Eligible>
06:59:56.151461+0000 UserEventAgent: (com.apple.cts) [com.apple.xpc.activity.All] DAS told us to run com.apple.backupd-auto
06:59:56.252172+0000 UserEventAgent: (com.apple.cts) [com.apple.xpc.activity.All] com.apple.backupd-auto state change 1 -> 2
06:59:56.236160+0000 UserEventAgent: (com.apple.cts) [com.apple.xpc.activity.All] Initiating XPC Activity: com.apple.backupd-auto

But this is the most remarkable log extract, for DAS, the Duet Activity Scheduler, which is at the heart of Grand Central Dispatch and responsible for scheduling these tasks:
02:15:10.080074+0000 DuetHeuristic-BM: (DuetActivitySchedulerDaemon) [com.apple.duetactivityscheduler.scoring] <private>:[
<private> ] sumScores:39.010000, denominator:39.010000, FinalDecision: Can Proceed FinalScore: 1.000000}
02:15:10.080078+0000 DuetHeuristic-BM: (DuetActivitySchedulerDaemon) [com.apple.duetactivityscheduler.scoring] '<private>' CurrentScore: 1.000000, ThresholdScore: 0.805706 DecisionToRun:1
02:15:10.080088+0000 DuetHeuristic-BM: (DuetActivitySchedulerDaemon) [com.apple.duetactivityscheduler.lifecycle] Running activities : <private>
02:15:10.080246+0000 UserEventAgent: (DuetActivityScheduler) [com.apple.duetactivityscheduler.client] STARTING: <private>
02:15:10.215795+0000 DuetHeuristic-BM: (DuetActivitySchedulerDaemon) [com.apple.duetactivityscheduler.lifecycle] COMPLETED <private>!
06:54:45.435722+0000 DuetHeuristic-BM: (DuetActivitySchedulerDaemon) [com.apple.duetactivityscheduler.default] Trigger: <private> is now [1]
06:54:45.436061+0000 DuetHeuristic-BM: (DuetActivitySchedulerDaemon) [com.apple.duetactivityscheduler.default] Trigger: <private> is now [1]
06:54:47.472471+0000 DuetHeuristic-BM: (DuetActivitySchedulerDaemon) [com.apple.duetactivityscheduler.lifecycle] Attempting to suspend based on triggers: <private>
06:54:47.472491+0000 DuetHeuristic-BM: (DuetActivitySchedulerDaemon) [com.apple.duetactivityscheduler.default] Ignoring trigger <private> because conditions are deteriorating
06:54:47.472509+0000 DuetHeuristic-BM: (DuetActivitySchedulerDaemon) [com.apple.duetactivityscheduler.default] Ignoring trigger <private> because conditions are deteriorating
06:54:47.472512+0000 DuetHeuristic-BM: (DuetActivitySchedulerDaemon) [com.apple.duetactivityscheduler.lifecycle] Evaluating 0 activities based on triggers
06:59:54.099975+0000 DuetHeuristic-BM: (DuetActivitySchedulerDaemon) [com.apple.duetactivityscheduler.default] Trigger: <private> is now [0]
06:59:54.100348+0000 DuetHeuristic-BM: (DuetActivitySchedulerDaemon) [com.apple.duetactivityscheduler.default] Trigger: <private> is now [0]
06:59:56.133287+0000 DuetHeuristic-BM: (DuetActivitySchedulerDaemon) [com.apple.duetactivityscheduler.lifecycle] Attempting to suspend based on triggers: <private>
06:59:56.133356+0000 DuetHeuristic-BM: (DuetActivitySchedulerDaemon) [com.apple.duetactivityscheduler.lifecycle] Evaluating 72 activities based on triggers
06:59:56.133512+0000 DuetHeuristic-BM: (DuetActivitySchedulerDaemon) [com.apple.duetactivityscheduler.scoring] <private>:[
<private> ] sumScores:45.448571, denominator:49.020000, FinalDecision: Can Proceed FinalScore: 0.927143}
06:59:56.133518+0000 DuetHeuristic-BM: (DuetActivitySchedulerDaemon) [com.apple.duetactivityscheduler.scoring] '<private>' CurrentScore: 0.927143, ThresholdScore: 0.507172 DecisionToRun:1

For a period of nearly 5 hours, DAS failed to rescore the 72 activities for which it was responsible, and therefore failed to recognise that a backup was due. Thus DAS did not tell CTS to run the backupd-auto command, which in turn resulted in Time Machine’s backups not taking place. It was only when I generated user events by waking the display that DAS was kicked back into action, when it recognised that timed tasks were overdue.

These log extracts also demonstrate how frustrating Apple’s ‘privacy’ protection has become: many log entries are rendered almost completely meaningless because their key content is redacted by <private>.

The only way that I know to restore normal function to DAS, so that it once again will schedule backups and other activities correctly, is to restart. Sure enough, following a restart Time Machine backups returned to normal, and ran almost like clockwork.

Let’s hope that Apple addresses this bug well before macOS 10.13.

If you do leave your Mac running macOS Sierra for more than a few days continuously, watch for backups becoming irregular. As soon as they do, you should restart it, as DAS failure will affect many more tasks than just backups: its list of ‘activities’ above contains 72 members, all of which will have become erratic and unreliable once the bug becomes manifest. It’s not a time machine so much as a time bomb.