Making the most of Apple silicon power: 4 Frequency

There’s a major difference between the two most popular versions of Apple’s M1 chips: the original has four E and four P cores, while the M1 Pro and Max have only two E cores and eight P cores. It follows that the latter should deliver twice the performance of the original design when running higher QoS threads on their P cores. It also implies that Macs with M1 Pro or Max chips run multiple threads at lowest QoS, for background tasks, at half the speed of the original M1 chip. That would be embarrassing if true: a basic MacBook Air or mini would complete tasks such as Time Machine backups and Spotlight indexing in half the time taken by an expensive MacBook Pro or Studio with an M1 Max.

To understand how macOS addresses this, I look back at what happens when an original M1 chip’s E cores are loaded with threads running at lowest QoS.

m1miniQoS9_1-8mixed

From the left, they run 1, 2, 3, 4, 6 and 8 threads. They complete in the same time until the number of threads exceeds the number of E cores, and take longer when there are more threads than cores, as only one thread runs on each core at a time, so additional threads are queued by GCD until the first batch of four threads has completed.

What happens on an M1 Pro or Max is quite different.

m1proEcores1-4neon

Here’s a similar sequence of 1-4 threads on an M1 Pro chip. Note how the width (duration) of the second test with 2 threads is obviously smaller than the first with a single thread. Looking at the time taken to complete those two threads, it’s about half that for a single thread, although the area shown for those two tests appears similar.

This doesn’t make sense, and highlights a major shortcoming in using Activity Monitor’s CPU History window to study how macOS uses the cores in Apple silicon chips. It’s only when you look at the frequency those E cores were running at that sense returns, and the explanation is revealed.

Run one thread on the two E cores in an M1 Pro or Max, and the cores run at a frequency of around 1000 MHz, half their maximum. Run two threads at the same time, and their frequency is boosted to their maximum of 2064 MHz.

One way to look at this more precisely is to graph the speed of execution for different numbers of threads against the number of threads, or notional E cores as each thread is effectively allocated to a single core.

m1allcoresfloatupd

The upper solid line shows this relationship for P cores being used at maximum QoS. Each core effectively adds 0.15 billion loops/second to total throughout, whether on an original M1 (+ points) or M1 Pro (♢ unfilled diamonds). Although not shown here, on an M1 Pro that line continues up to its total of eight P cores.

The broken line below shows the same relationship for E cores in an original M1, this time each adding 0.033 billion loops/second, 22% of the throughout of each P core. Shown in red, though, are the equivalent points for the two E cores in an M1 Pro (or Max): with one core, throughput is the same as an original M1, but with both cores active, throughput more than doubles that of two cores on the original chip. That’s macOS controlling the E core frequency to ensure that M1 Pro and Max chips don’t perform any slower than those in the original M1, and in fact are here slighter faster than all four E cores together, running at 1000 MHz.

This clearly illustrates the danger of believing Activity Monitor’s figures for CPU % and its CPU History window: while they appear to show core allocation faithfully, because they don’t take into account the frequency of cores in Apple silicon chips, they will mislead.

Putting these observations together shows that macOS has different strategies for managing E cores in different chips:

On the original M1, all four E cores are run at a frequency of about 1000 MHz when running threads of lowest QoS, further enhancing their economy of power. However, when those same cores are used to run threads of higher QoS, they will normally run at their maximum frequency, so sacrificing energy efficiency for better performance.
E cores in M1 Pro/Max chips are run at maximum frequency when they’re loaded with two or more minimum QoS threads, giving up some energy efficiency to deliver performance at least as good as the E cores in the original M1. When higher QoS threads spill over onto E cores in an M1 Max/Pro, they’re run at maximum frequency to deliver better performance, closer to that of P cores.
macOS not only allocates threads to cores, but also controls their frequency according to the number of E cores available and thread QoS.

By this stage I hope that it’s clear how important QoS is in determining how macOS allocates threads to different cores, and sets their frequency. But if QoS is hard-coded into apps and services, then how can the user have any influence over the controls in macOS? That’s the starting point for the next article.

Previous articles

Making the most of Apple silicon power: 1 M-series chips are different
Making the most of Apple silicon power: 2 Core capabilities
Making the most of Apple silicon power: 3 Controls

MacSysAdmin 2022 video (watch)
MacSysAdmin 2022 Keynote slides (download)

4Comments

Add yours

1

WuMing2 on October 19, 2022 at 3:31 am

Wondering what prevents apps from misusing QoSs and benefit just themselves. As example Samsung on their ARM-based smartphones infamously did (and possibly still does) set custom QoS level for the most popular speed benchmark applications. Thanks.

LikeLiked by 1 person
- 2
  
  hoakley on October 19, 2022 at 6:21 am
  
  I don’t know whether Android uses QoS, or anything like it. However, in the scheme used by macOS any performance tests such as benchmarks would normally be run at maximum QoS, as they’re trying to measure maximum performance.
  Geekbench effectively does this when it gets the chance: for its single-core tests, it runs them in one thread on a P core, as you’d expect. Multi-core tests are then run in sufficient threads to use all available P cores. At a QoS of 33, they then get the cores run at maximum frequency, which is exactly what happens to other apps. So there’s no ‘cheating’, and I’m not even sure how an app could ‘cheat’ in macOS.
  Howard.
  
  LikeLike
  - 3
    
    WuMing2 on October 19, 2022 at 8:11 am
    
    Benchmark apps was an example. I meant asking about any app. Can they can monopolise P cores by manipulating QoS? Thanks.
    
    LikeLiked by 1 person
    - 4
      
      hoakley on October 19, 2022 at 9:18 am
      
      Because of multiprocessing, no app can “monopolise” the P cores, any more than it could an SMP Intel processor. However, apps can certainly set higher QoS than we might wish them to – for example running all their threads at maximum QoS. That doesn’t block other apps from getting their share, though.
      Because much of what apps do is intended to be run at high QoS, as they are actions for the user, those aren’t so much of a problem. But if you had a backup app that always made its backups at a QoS of 33, then the user would notice a reduction in responsiveness of the other apps they’re trying to use at the same time a backup is being made. That doesn’t change with Apple silicon.
      What does change is that those apps which use QoS wisely can now run time-consuming background threads on the E cores alone, where the user isn’t likely to notice them. SMP processors don’t have that option.
      Howard.
      
      LikeLike

Share this:

Related