Apple silicon: 3 But does it save energy?

At the end of the second article in this series, I concluded that the power efficiency of CPU cores in Apple silicon chips enables high performance to be sustained at low thermal pressure. What I’ve avoided considering yet is energy use.

A large proportion of Apple’s hardware business depends on low-energy devices such as iPhones, iPads and its highly successful Mac notebooks. Those have to strike the right balance between performance and features, and endurance on their battery charge. Models such as the MacBook Air have sold by the million largely on their superior endurance, and that has been one of the main driving factors behind the development of Apple silicon.

At first sight, though, manipulating core frequency will have little impact if any on energy use. Return to the formula for estimating dynamic power use given in the previous article:
P = C × f × V^2
where P is dynamic power, C is a constant normally considered to be a switched load capacitance, f is core frequency, and V is voltage. If that holds good, then running a thread at half the core frequency should require about half the power, but it will also take twice as long to complete. That will reduce thermal pressure, but does nothing for total energy use.

Core types

This is where two core types come in handy. Although I haven’t seen any similar analysis for M3 cores, E and P cores in the M1 CPU are built differently. The E core has roughly half the processing units of the P core, and at the same frequency completes instructions slower. If that really is more efficient in terms of energy use, then a combination of E and P cores, together with a wise core allocation strategy, could significantly reduce total energy use, and increase battery endurance.

One way to assess this for CPU cores is to estimate energy use against throughput of in-core test loops for the two core types individually. This is easier with M3 Pro than M1 Pro or Max chips because of the number of P and E cores in each, and the way in which macOS controls E core frequency in M1 Pro and Max.

When running a standard thread consisting of 200 million loops of floating-point arithmetic, total energy used by the two core types and clusters differed greatly. At high frequency, using the P cluster incurred an overhead of about 450 mJ, and an additional cost of just over 900 mJ for each P core used. However, the additional cost of each E core used at high frequency was half that, about 450 mJ, and at low frequency that fell to only 150 mJ, in addition to a cluster overhead of a mere 60 mJ.

m3pecoreenergy2

Measured values aren’t quite as simple as those estimated from linear regression, and are shown in the chart above. This shows measured energy use when running exactly the same computational task on different numbers of the same core type, and on E cores at both low frequency (low QoS) and high frequency (high QoS).

For P cores, energy use ranged between 1,300 and 1,000 mJ per core, but was about 500 mJ for E cores at high frequency, and only 200 mJ at low frequency. Thus, running background tasks that don’t need to be completed rapidly, it’s possible to reduce energy use to around a fifth (20%) by allocating them to E cores rather than P cores.

Core allocation strategy

That explains the first part of the core allocation strategy used by macOS, in allocating high QoS threads to P cores, and low QoS threads to E cores. But what happens when there are too many high QoS threads for the available P cores, and they overflow onto E cores? Because an E core running at high frequency uses only about half the energy of a P core, those overflowing threads will run slightly slower, but are still more efficient in requiring half the energy they would on a P core.

Although I don’t yet have access to an M3 Max to measure this difference in energy use, as it uses the same cores as the M3 Pro, I can predict the effects of cluster overflow for the two 6-core P clusters in the M3 Max. When its first 6-core P cluster overflows to the second, each additional P core should incur another 900-1,000 mJ energy cost, plus the cluster overhead of about 450 mJ. Thus using just one additional P core would use another 1,400 mJ or so, but in the M3 Pro overflow to the E cluster costs only 500 mJ.

Beyond cores

Unfortunately, extending these comparisons beyond the CPU cores becomes more complex, and in practice almost impossible to measure, but at least these figures explain why energy efficiency requires two types of CPU core, and an effective management strategy to strike the right balance between the use of those types.

Since the release of the M1 family, Apple has diversified core allocation strategies in macOS with the addition of special-purpose modes. The first of these applies during virtualisation of macOS, and simply treats all virtualised threads (thus virtual CPU cores) as high QoS threads on the host. That simplifies management of the virtual machine, but increases the energy cost of its low QoS threads. This has the odd side-effect that background threads in a VM run faster than they would on the host.

The other special-purpose mode is Game Mode, which appears to have been little investigated. While this gives high-priority access to the GPU as might be expected, its effect on core allocation is more opaque, as designated games running in this mode are given preferential access to the E, rather than P, cores. Further work is needed to understand the benefits.

There is a third special-purpose mode of core allocation that I have examined in detail, and attributed to the use of the AMX matrix co-processor. That leads me to the next article in this series, where I will consider the roles of the specialist processing units in Apple silicon chips, including the NEON vector processing unit in each CPU core, the AMX, neural engine (ANE), and the GPU in Compute mode.

Concepts

E cores are designed differently from P cores, to use less energy.
Energy use of an M3 E core is about 20-50% that of a P core when running the same task.
Core allocation strategies minimise energy use of background tasks.
Overflowing threads from a P cluster to an E cluster halves their energy use compared with a P cluster.

Previously in this series

Apple silicon: 1 Cores, clusters and performance
Apple silicon: 2 Power and thermal glory

8Comments

Add yours

1

Paul R on February 26, 2024 at 2:52 pm

Looking more simply at the opening question, “but does it save energy?” it certainly does in my case.

While I’m getting a Mac Studio for badass computer stuff—working on batches of 100-megapixel images and so on—the truth is that the machine will be doing this maybe a few hours a week.

Most of the time it will be doing (let’s face it) hardly anything. It will sit idle for 6 or so hours a day while background processes hum along. It will surf the web and answer email and edit text for a few hours. The badass stuff will come in fits and spurts, and will account for a tiny fraction of the coal belched into the upper atmosphere on my behalf.

So I looked at energy consumption at idle. The answers are stark, since I’m upgrading from a 2010 Mac Pro tower stuffed with memory and spinning drives and an old-but-upgraded video card.

Apple says the base configuration idles at 162 watts. My personal machine with all the add-ons uses more like 200.

In contrast, the Mac Studio idles at 9 watts. If I add on my two external Thunderbolt enclosures, u.2 ssd, and pair of spinning backup drives, it’s more like 30w.

So I’m saving roughly 170 watts. If the machine is awake 12 hours a day that’s 745 kilowatt-hours. Just in terms of money, this is about 2% of the cost of the whole Mac Studio system per year.

LikeLiked by 1 person
- 2
  
  hoakley on February 26, 2024 at 5:33 pm
  
  Thank you.
  I don’t think comparing idle power consumption between such different Macs reveals anything you can useful generalise about their energy use. To compare like with like, go back to the figures I quoted from Apple’s measurements on Mac minis, in the previous article. Note though that they refer to idle as being awake and running with just the Finder active. Few Macs spend long like that, as there are TM backups to run, Spotlight indexes to maintain, incoming Mail, and more. And when they are truly idle, most Macs are set up to sleep, or shut down altogether.
  But that’s not where much of the energy goes: when working with your images, the difference between a comparable pair of minis could be 50 W or more.
  Back of the envelope estimates may look conclusive, but as far as energy is concerned you’ve got to get the detail right, hence the measurements here.
  Howard.
  
  LikeLike
3

Paul R on February 26, 2024 at 8:16 pm

I didn’t consider exactly what was meant by idle. Would be interesting to see what’s really going on with your own workload by using something like a kill-a-watt meter over a few days.

I do think I burn more fossil fuel during the many hours of passive use and disuse than during the few hours of image editing.

LikeLiked by 1 person
- 4
  
  hoakley on February 26, 2024 at 9:24 pm
  
  Energy use is of course most important to notebook users, but sadly measuring power drawn at the mains socket isn’t then of any value. You also have to integrate power measurements with time to give energy used.
  This is why using measurements from powermetrics can be more useful.
  Howard.
  
  LikeLike
5

iustin on February 26, 2024 at 11:33 pm

Question: you say “Energy use of an M3 E core is about 20-50% that of a P core when running the same task.” But that’s not for _completing_ the same task, right? Just running the same task, and for longer.

Since you have the data, how does the total energy consumed for the same task compare?

LikeLiked by 1 person
- 6
  
  hoakley on February 27, 2024 at 9:40 am
  
  Erm, I’m not sure I understand you.
  It might help to revisit the meaning of ‘power’ and ‘energy’.
  Power is the *rate* of energy use, per unit time. It’s measured at an instant, and can of course be averaged over any given period.
  Energy is a *quantity*, not a rate. It makes no sense to refer to energy at a moment in time, as it’s the integral of power over the period that the energy is used. Thus the energy used for running a task is the total energy used for the period required to run that task, regardless of the time taken to run it.
  So the energy required to run “the same task” is directly comparable, and the time taken to run the task is irrelevant: it’s always a total energy, as nothing else makes any sense.
  So I am giving you the total energy consumed for the same task – nothing else makes sense with respect to energy, does it?
  Howard.
  
  LikeLike
  - 7
    
    iustin on February 27, 2024 at 9:57 am
    
    Aah, I misunderstood you, apologies. I misread “when running” as referring to power use (due to the verb), vs. “for completing”. Reading more closely, you actually mention “total energy used by the two core types”.
    
    In such a case, yes, using 20% in E-low compared to P is indeed amazing. Thank you again for the articles and for correcting my understanding!
    
    LikeLiked by 1 person
    - 8
      
      hoakley on February 27, 2024 at 1:51 pm
      
      Thank you.
      Howard.
      
      LikeLike

The Eclectic Light Company

Apple silicon: 3 But does it save energy?

Core types

Core allocation strategy

Beyond cores

Concepts

Previously in this series

Further reading

Core types

Core allocation strategy

Beyond cores

Concepts

Previously in this series

Further reading

Share this:

Related