My first article in this series explained the history behind Apple’s M-series chips, and how they use ARM’s big.LITTLE architecture in heterogeneous multi-processing (HMP) with two types of CPU core. If you haven’t yet watched my presentation for MacSysAdmin 2022, now is a good time to view it, so you’re better prepared for the detail that follows.
The P cores in Apple’s M1 and M2 series chips have six integer units and four floating-point/NEON units. While they use plenty of techniques such as out-of-order execution to optimise performance, as explored and documented by Dougall Johnson, Maynard Handley and others, those are well beyond the influence or control of mere users. Here I’ll concentrate on features of more direct relevance to how macOS uses those cores.
P cores idle at a frequency of 600 MHz, and have a maximum frequency of either 3204 MHz in the original M1 chip, or 3228 MHz in M1 Pro/Max/Ultra versions. In practice, under the management of macOS, P cores are normally run at steady frequencies of 600 or 3036 MHz and higher, but can run at intermediate frequencies when loads are changing. Once load is removed, they return almost immediately to idle frequency.
Frequencies in both types of core are set by cluster, and don’t differ within any given cluster. So when the first P cluster is loaded with one or more threads, macOS raises the frequency of all its four cores until those threads are complete, when they’ll fall back very quickly to idle.
Power measurements match frequency, with each P core typically drawing up to a maximum around 2.5 W for a cluster total of about 10 W, but using very little when idle.
In terms of functional units, each E core is roughly half a P core, sufficient to ensure that E cores have full support for floating-point and NEON features. This means that anything a P core can do, an E core can do too, if rather more slowly.
E cores also idle at a frequency of 600 MHz, but have a maximum frequency of only 2064 MHz, which is the same across the whole M1 series of chips. macOS also controls the frequency of E cores slightly differently, in that they can be run at an intermediate frequency of 972 MHz, as well as idle and maximum. Although this might appear to be a minor detail, it turns out to be significant in their control and performance.
If the relationship between performance and power were linear, you might then expect an E core to use a third of the power of a P core, thus a maximum of about 800 mW. When measured, each E core has a maximum power usage of less than half that, around 300 mW.
Taken together with the difference in functional units, you’d expect an E core running at maximum frequency and 100% active residency to have a throughput of about a third of a P core at its maximum frequency. In practice, running tight loops of code accessing only registers, E cores can achieve almost twice that expected, giving them nearly two thirds of the throughput of P cores. For example, a task running in two threads allocated to two P cores might complete in 32 seconds, and on two E cores in 52 seconds.
Real-world task performance of E cores isn’t as impressive, though. Compressing an IPSW image using two threads and two P cores takes 32 seconds, but on two E cores takes 134 seconds, for almost a quarter of the performance. Thus, whether code is allocated to the P or E cores can make a substantial difference to the time it takes to complete.
If the relationship between performance and power were linear, then there would be no efficiency benefit to running tasks more slowly on cores that used lower power. Because E cores use less power than that, substantial savings can be made by running tasks on the E cores alone, instead of P cores. One example, based again on file compression, required 10.3 J total energy when run on P cores, and only 3.1 J on the E cores, which is 30%.
Thus, for this specific instance of compression, running its threads entirely on E cores takes four times as long as on P cores, but uses a total of less than a third of the energy.
In Apple’s current designs, the number of P cores in any M1 chip is equal to or greater than the number of E cores, and in the faster chips P cores outnumber E cores 4:1. This works well when threads allocated to E cores need to be completed over a period of time, rather than at a moment in time, such as background services. Tasks the user is waiting for then need to use the greater and more immediate capacity of P cores. This is quite different from many Intel Alder Lake chips, which provide equal numbers of their core types.
Task performance isn’t just limited by core performance. A good example is making a Time Machine backup, which is heavily dependent on I/O with storage. By default, macOS throttles that I/O so that it doesn’t impair the performance of user tasks. This means that running Time Machine’s background backup service on P rather than E cores wouldn’t be expected to alter performance significantly, unless its I/O throttling were also removed.
While Activity Monitor’s CPU History window provides valuable qualitative information about core allocation and performance on Apple silicon chips, it has one major flaw which prevents it from being used for quantitative work: CPU %, whether given in its main window or shown by the height of columns in CPU History, takes no account whatsoever of the frequency at which cores are being run. There’s a good example of this shown in my MacSysAdmin presentation, and I’ll examine this in a future article in this series.
You should also ignore the Energy values given, which are based entirely on CPU %, and take no account of core frequency or type. It’s extraordinary that estimation of energy use makes no distinction between the P and E cores in Apple silicon chips.
Sadly, the only way of getting reliable information about core frequency, energy and power is in the command tool