Inside iCloud Drive: Pushed updates, that syncing feeling

The two common scenarios in which iCloud Drive appears to become stuck, and cause long waits for users, are uploading files, and propagating such uploads (and other changes) to other clients connected to the same account.

In my previous article I looked at the steps involved in uploading. This article looks at propagating those changes to other clients, but stops short of them downloading new or changed files.

To investigate this, I ran my tool Cirrus simultaneously on two Macs connected to the same iCloud account, on the same local network. I ran the standard 1 MB upload test in Cirrus on one, then watched for that new file to be propagated via iCloud Drive to the other. I did this with privacy disabled, and examined both log extracts, and a full log extract on the recipient Mac, using Consolation. I’ll refer to the uploading Mac as A (running 10.12.6), and the syncing Mac as B (running 10.13.4).

Mac A’s progress icon went very quickly to over 90% complete, as it usually does, then paused before completing within 5 seconds. The last delay coincides with the phase identified by CloudDocs as sync down, which runs after the actual upload is complete, and is described by CloudKit as FetchRecordZoneChanges.

This phase started at 4.207 seconds time elapsed, and completed at 4.462 seconds, at which point Mac A was finished.

Receipt of the push notification from APS to Mac B of changes to iCloud Drive was logged at 4.310 seconds time elapsed, well before the completion of the final phase on Mac A.


The log entry first recording the push notification is worth noting:
04.310 courier Stream processing: complete yes, invalid no, length parsed 249, parameters {
APSProtocolCommand = 10;
APSProtocolMessageExpiry = "1970-01-01 18:12:15 +0000";
APSProtocolMessageID = <e9c74616>;
APSProtocolMessagePriority = 5;
APSProtocolMessageTimestamp = 1522602544155347565;
APSProtocolPayload = <…>;
APSProtocolToken = <…>;
APSProtocolTopicHash = <…>;

I hope that the expiry given means that the message never expires.

This was then interpreted by apsd as:
04.311 courier-oversized Received message for enabled topic '' onInterface: NonCellular with payload '{
ck = {
ce = 1;
cid = "";
fet = {
dbs = 1;
sid = "subscription/";
zid = "";
zoid = "…";
nid = "…";
}' with priority 5

Key events in its processing include:
04.311 apsd asked to store incoming message <…> with guid … environment …
04.311 courier Stream processing: complete no, invalid no, length parsed 0, parameters (null)

The next entry is important, and records that low-priority notifications are then deleted, which could of course explain some problems:
04.311 apsd deleting low-priority incoming messages for topic environment production
04.311 apsd APSMessageStore - APSIncomingMessageRecordDeleteMessagesForTopic 1 environment production
04.316 apsd APSMessageStore - New message record [<…>{}] has ID [8726].
04.316 apsd calling completion block for incoming message …
04.316 courier Sending acknowledgement message with response 0 and messageId <…> (3922150934)
04.317 courier _notifyForIncomingMessage … with guid …
04.317 connectionServer Dispatching low priority message: <xpc object>
04.317 apsd Looking up connection on peer: … found …
04.318 connection bird ApplePushService … Delivering message from apsd: … …
04.318 connection bird ApplePushService … responding with an ack for message with guid …

At this stage, bird has received the pushed notification, and this is announced by CloudDocs at the start of the sync sequence:
04.318 CloudDocs [NOTICE] received a push for client zone
CloudDocs then starts the sync down, which is announced by bird.

This triggers a long dialog between cloudd and CloudKit to perform the sync down:
04.322 CloudKit starts CKFetchRecordZoneChangesOperation
which is then echoed by cloudd as fetch-record-changes.

Next, CloudKit starts CKDFetchRecordZoneChangesOperation, announced by cloudd as db-operation, and CloudKit starts CKDRecordFetchAggregator.

cloudd announces url-request then c2-request-task, and at this stage DuetActivityScheduler (DAS), one of the main activity scheduling and dispatching services, becomes involved, submitting activities for this to one of its activities groups, Note that this is scheduled to a pool of background connection activities, which may explain why sync down can become greatly delayed.

After cloudd announces url-request-transmission, there is a loop to fetch the new/changed records for iCloud Drive. This runs in sequence:
04.830 cloudd announces fetch-records
CloudKit starts CKDFetchRecordsOperation
cloudd announces decrypt-records
CloudKit starts CKDDecryptRecordsOperation
cloudd announces db-operation, then starts its state transition sequences
04.849 CloudDocs [INFO] got a record
CloudKit finished CKDFetchRecordsOperation
and then recycles with cloudd announcing fetch-records again, until all the records are fetched.

Once those are completed:
04.863 bird announces did-sync-down
CloudDocs [NOTICE] received 2 edited items from the cloud for
CloudKit finished CKDFetchRecordsOperation, CKDRecordFetchAggregator, CKDFetchRecordZoneChangesOperation
04.868 bird finish-changes
CloudKit finished CKFetchRecordZoneChangesOperation – sync down (push triggered)

This completes the sync-down, and the first run through the iCloud progress icon. At this stage, iCloud Drive shows the new/changed items, but they have not yet been downloaded to Mac B. That sequence starts with the announcement:
00:04.871 CloudDocs [NOTICE] we will now download new documents automatically because the account contains less than 5.37 GB
00:04.913 CloudDocs [NOTICE] downloading 1 documents in

then bird calls dl-content and client/fetch-records, for CloudKit to download (opportunistic). I will analyse the download process in another article shortly.

The sync down sequence on Mac B is summarised in the following chart. Note that MMCS is not involved at all (judging by its complete lack of log entries) in this sequence, as no bulk transfers take place.


This suggests several good reasons for a Mac or iOS device failing to sync promptly:

  • iCloud may not push the notification promptly
  • the pushed notification may not be received, relying on iCloud to send another when that first message passes unacknowledged
  • the pushed notification may get deleted by higher priority messages
  • bird, CloudDocs, or CloudKit may stall part way through the sync down sequence.

It should be fairly straightforward to distinguish between these using the log obtained with Cirrus.

These also explain why uploading a file to iCloud Drive could clear a sync down failure, by forcing a fresh sync down of its own. That should include all earlier changes too.

My concern over performance is the involvement of DAS and the use of its pool of background connection activities. If that pool is busy, iCloud Drive sync down could be delayed considerably. There doesn’t appear to be any way to influence that, which is a design decision on Apple’s part.