How M1 Macs feel faster than Intel models: it’s about QoS

Last week I showed fascinating screenshots of how M1 Macs can run background processes exclusively on their four ‘efficiency’ cores. I’ve now been digging deeper, and this article is a summary of how Intel and M1 Macs deal with multiple tasks, priorities and resources under macOS, and how that influences our assessment of their speed.

Operations and QoS

Whether Intel or ARM cores, macOS has to manage how tasks are run side-by-side on the same processor. Although this can be achieved in many different ways, for the sake of simplicity I here concentrate on operation queues using Apple’s Foundation framework, which you’re most likely to encounter in macOS apps.

At their heart, the principle of these is fairly simple. An app has to perform work which will take time, so to avoid it blocking the user and work alongside other apps, the developer puts the code and data into an Operation, like a task, which can then be run on one or more cores at a time. Two common features which they can specify are the maximum number of concurrent operations, and their importance, or Quality of Service (QoS), and it’s the latter which I focus on here.

Apple provides four QoS levels, and a fifth which leaves it up to macOS to decide. When writing the code, the developer uses names for the levels of QoS, ranging from background (lowest) to userInteractive (highest). These turn out to be integer values spread evenly between 9 and 33, respectively, which is strange in itself.* Trying to use integer values for which there are no names, such as 32 or 34, doesn’t work: macOS doesn’t recognise those values, and assigns the QoS -1, which lets the system decide which of the four defined levels to use.

Testing

To investigate what QoS does I’ve built another version of my free compressor-decompressor utility Cormorant (available below) which lets you set the QoS for its tasks. I then created a standard 10 GB test file, and compressed that many, many times using different QoS and concomitant operations.

On an Intel Xeon W 8-core processor, when there’s no competing processes, all QoS settings result in the operation being performed as quickly as possible. My test 10 GB file normally took 5.6 to 6.6 seconds to compress irrespective of the QoS, which only came into play when there were competing operations taking place at the same time, when QoS functions as a priority.

For example, running one compression at a QoS of 9 (background) and another at 33 (userInteractive), the process with the higher QoS still completed in normal time, and that with the lower QoS was delayed, taking as much as 24 seconds. When multiple compression tasks were used to load the processor, hyperthreading was seen in Activity Monitor, with virtual second cores taking some of the additional load.

Repeating similar tests on an M1, whether a Mac mini with its mains power supply or a MacBook Pro running on battery, the results were quite different. All operations with a QoS of 9 (background) were run exclusively on the four Efficiency (Icestorm) cores, even when that resulted in their being fully loaded and the Performance cores remaining idle. Operations with any higher QoS, from 17 to 33, were run on all eight cores.

The effect on performance of the task was also distinct. With a QoS of 9 (background), the standard compression task took 38-43 seconds, changing little with loading of higher QoS operations. When two intensive background tasks were run at the same time, one completed in that same time (40 seconds), while the other took almost twice the time (77 seconds), both remaining constrained to using the Efficiency cores.

Operations with higher QoS were also more consistent than on Intel cores. Tasks with the higher QoS completed in much the same time as when run alone, and that with the lower QoS extended to around 15.5 seconds, still less than half the time required on the Efficiency cores.

Strategy

If you’ve already got access to an M1 Mac, you can observe Apple’s new strategy at work. Open Activity Monitor and in its Window menu use the command CPU History to display what’s going on with its cores as your M1 idles and you use it.

coreloading2

This is my previous example of what happens during Spotlight indexing.

efficcores

Here, this M1 MacBook Pro is making a Time Machine backup to a network share.

The pattern of use of cores is that almost all the activities of macOS are run on the Efficiency cores, with only the occasional blip on the Performance cores. Running apps and performing other user tasks is the other way around, with the brunt borne on the Performance rather than Efficiency cores. This is because those user tasks are more likely to run with QoS of at least 17, and in many cases 25 and 33.

As far as the processor goes, an M1 Mac is divided into two: the four Efficiency cores are there largely to run macOS and its many background tasks, freeing the four Performance cores for the apps which you run.

It feels so much faster

Benchmarks are all very well, but one almost universal comment made about M1 Macs is how much faster they feel, even when performance measurements don’t show as big a difference as we might like. One very effective way of giving a good impression of speed is to segregate macOS and user software to use different cores in the way that the M1 does.

Few events give a worse impression to the user than the interface slowing down in the face of a problem in the operating system. We’ve all experienced it: this could be a rogue mdworker process which keeps crashing and restarting, or anything which causes macOS to choke. Because those processes are handed over to the Efficiency cores, all they do now is slow other macOS background tasks, to which we’re much less sensitive.

The Time Machine backup pictured above ran ridiculously slowly, taking over 15 minutes to back up less than 1 GB of files. Had I not been watching it in Activity Monitor, I would have been completely unaware of its poor performance. Because Macs with Intel processors can’t segregate their tasks onto different cores in the same way, when macOS starts to choke on something it affects user processes too.

For those who felt that desktop Apple Silicon Macs should return to a pool of identical cores, I too would love 8 Performance cores, but please don’t take away those macOS/Efficiency cores.

Cormorant 1.3

Cormorant version 1.3 is now available from here: cormorant13
from Downloads above, from its Product Page, and through its auto-update mechanism. Have even more fun!

* Thanks to Stan, in his comment below, who points out what should have been blindingly obvious: QoS values are from a bitmask, so make sense as binary flags rather than integers. These are presumably just part of a group of settings internal to Operations.