Apple silicon: 3 But does it save energy?

At the end of the second article in this series, I concluded that the power efficiency of CPU cores in Apple silicon chips enables high performance to be sustained at low thermal pressure. What I’ve avoided considering yet is energy use.

A large proportion of Apple’s hardware business depends on low-energy devices such as iPhones, iPads and its highly successful Mac notebooks. Those have to strike the right balance between performance and features, and endurance on their battery charge. Models such as the MacBook Air have sold by the million largely on their superior endurance, and that has been one of the main driving factors behind the development of Apple silicon.

At first sight, though, manipulating core frequency will have little impact if any on energy use. Return to the formula for estimating dynamic power use given in the previous article:
P = C × f × V^2
where P is dynamic power, C is a constant normally considered to be a switched load capacitance, f is core frequency, and V is voltage. If that holds good, then running a thread at half the core frequency should require about half the power, but it will also take twice as long to complete. That will reduce thermal pressure, but does nothing for total energy use.

Core types

This is where two core types come in handy. Although I haven’t seen any similar analysis for M3 cores, E and P cores in the M1 CPU are built differently. The E core has roughly half the processing units of the P core, and at the same frequency completes instructions slower. If that really is more efficient in terms of energy use, then a combination of E and P cores, together with a wise core allocation strategy, could significantly reduce total energy use, and increase battery endurance.

One way to assess this for CPU cores is to estimate energy use against throughput of in-core test loops for the two core types individually. This is easier with M3 Pro than M1 Pro or Max chips because of the number of P and E cores in each, and the way in which macOS controls E core frequency in M1 Pro and Max.

When running a standard thread consisting of 200 million loops of floating-point arithmetic, total energy used by the two core types and clusters differed greatly. At high frequency, using the P cluster incurred an overhead of about 450 mJ, and an additional cost of just over 900 mJ for each P core used. However, the additional cost of each E core used at high frequency was half that, about 450 mJ, and at low frequency that fell to only 150 mJ, in addition to a cluster overhead of a mere 60 mJ.

m3pecoreenergy2

Measured values aren’t quite as simple as those estimated from linear regression, and are shown in the chart above. This shows measured energy use when running exactly the same computational task on different numbers of the same core type, and on E cores at both low frequency (low QoS) and high frequency (high QoS).

For P cores, energy use ranged between 1,300 and 1,000 mJ per core, but was about 500 mJ for E cores at high frequency, and only 200 mJ at low frequency. Thus, running background tasks that don’t need to be completed rapidly, it’s possible to reduce energy use to around a fifth (20%) by allocating them to E cores rather than P cores.

Core allocation strategy

That explains the first part of the core allocation strategy used by macOS, in allocating high QoS threads to P cores, and low QoS threads to E cores. But what happens when there are too many high QoS threads for the available P cores, and they overflow onto E cores? Because an E core running at high frequency uses only about half the energy of a P core, those overflowing threads will run slightly slower, but are still more efficient in requiring half the energy they would on a P core.

Although I don’t yet have access to an M3 Max to measure this difference in energy use, as it uses the same cores as the M3 Pro, I can predict the effects of cluster overflow for the two 6-core P clusters in the M3 Max. When its first 6-core P cluster overflows to the second, each additional P core should incur another 900-1,000 mJ energy cost, plus the cluster overhead of about 450 mJ. Thus using just one additional P core would use another 1,400 mJ or so, but in the M3 Pro overflow to the E cluster costs only 500 mJ.

Beyond cores

Unfortunately, extending these comparisons beyond the CPU cores becomes more complex, and in practice almost impossible to measure, but at least these figures explain why energy efficiency requires two types of CPU core, and an effective management strategy to strike the right balance between the use of those types.

Since the release of the M1 family, Apple has diversified core allocation strategies in macOS with the addition of special-purpose modes. The first of these applies during virtualisation of macOS, and simply treats all virtualised threads (thus virtual CPU cores) as high QoS threads on the host. That simplifies management of the virtual machine, but increases the energy cost of its low QoS threads. This has the odd side-effect that background threads in a VM run faster than they would on the host.

The other special-purpose mode is Game Mode, which appears to have been little investigated. While this gives high-priority access to the GPU as might be expected, its effect on core allocation is more opaque, as designated games running in this mode are given preferential access to the E, rather than P, cores. Further work is needed to understand the benefits.

There is a third special-purpose mode of core allocation that I have examined in detail, and attributed to the use of the AMX matrix co-processor. That leads me to the next article in this series, where I will consider the roles of the specialist processing units in Apple silicon chips, including the NEON vector processing unit in each CPU core, the AMX, neural engine (ANE), and the GPU in Compute mode.

Concepts

  • E cores are designed differently from P cores, to use less energy.
  • Energy use of an M3 E core is about 20-50% that of a P core when running the same task.
  • Core allocation strategies minimise energy use of background tasks.
  • Overflowing threads from a P cluster to an E cluster halves their energy use compared with a P cluster.

Previously in this series

Apple silicon: 1 Cores, clusters and performance
Apple silicon: 2 Power and thermal glory

Further reading

Evaluating M3 Pro CPU cores: 1 General performance
Evaluating M3 Pro CPU cores: 2 Power and energy
Evaluating M3 Pro CPU cores: 3 Special CPU modes
Evaluating M3 Pro CPU cores: 4 Vector processing in NEON
Evaluating M3 Pro CPU cores: 5 Quest for the AMX
Evaluating the M3 Pro: Summary
Finding and evaluating AMX co-processors in Apple silicon chips
Comparing Accelerate performance on Apple silicon and Intel cores
M3 CPU cores have become more versatile