Last Week on My Mac: Core allocation in M1 chips

I’ve never understood how engineers coped with the switch to email. So many engineering calculations in the past were accomplished on the backs of envelopes that I can’t imagine how they can cope with the loss of that fertile medium. One envelope I’d love to see is that from Apple’s M1 design team in which they decided the numbers of Firestorm and Icestorm cores. For, despite Apple having longer and greater experience of designing chips containing multiple types of CPU core, its M1 series chips are the first to be used in quantity for general purpose computing.

Let me repeat that: Apple’s M1 chips are the first such asymmetric design to be used in quantity for general purpose computing. That’s quite a milestone.

There’s some debate over whether M1 chips actually use asymmetric multiprocessing (AMP), or whether this should instead be termed heterogeneous computing. Apple prefers the former, but terminology shouldn’t obscure this milestone. Two of its most interesting questions are how Apple chose the number of cores for its first two M1 chips, and how macOS uses them. I suspect the answer to the first question is inspired guesswork, the truth behind the backs of many envelopes, and it’s the second I dwell on here.

Rather than requiring every process to express its preference for the core type it should be run on, Apple uses a more opaque system based on Quality of Service (QoS). For user processes, this comes in four levels, of which only one constrains threads to be run exclusively on the Efficiency (E) cores; for the other three, threads can be allocated to either Performance (P) or E cores, with a preference for the former.

Outside the process, there is also very limited control over which cores get which threads. One command, taskpolicy, and one API call, setpriority(), can demote processes so that their threads are confined to E cores, but there doesn’t appear to be any way to promote those already constrained to E cores so that they can enjoy a bite of the P core cherries.

General purpose computing on Macs is remarkably diverse. You’ll find Macs working in almost every role you could expect computers to work in, as servers, graphics workstations, hosting large databases, rendering movies, playing games, streaming media, and more. For some systems, Apple’s limited control over allocation to core types works very well, but for others it doesn’t. Not only that, there are times when users want to override what macOS sets for them.

One obvious example is performing the first Time Machine backup, a task which often takes more than an hour even when backups are stored on relatively fast disks. Most users want to get that backup out of the way as quickly as possible, but macOS decides for them that I/O will be throttled, and backupd’s threads will be run exclusively on E cores. Those are excellent choices for later, hourly backups, but far from ideal on the first occasion, but there’s no manual override available.

The defence is that the P cores are dedicated primarily to user interactive tasks, which is good for those circumstances in which the user decides to continue using their apps during that first backup. But if the user chooses to leave their Mac to get on with it, macOS will happily throttle I/O and run two E cores flat out while eight P cores sit idle for a couple of hours. That’s not being responsive.

An alternative approach, adopted by Asahi Linux, for example, is to leave the developer and user to determine which core types each process is run on. While that makes sense in Linux, I doubt that many Mac users would want to do that, and it risks unwise users making bad choices.

As Apple’s engineers move past this remarkable milestone, they must also be planning where to go next in terms of thread allocation. Most of the tools are already there, in the form of QoS, Grand Central Dispatch, and most recently RunningBoard and its associates. What users need is more flexible and sophisticated core allocation which responds to changes in core load, and gives the user options. At the moment, those options consist of just a brake, which isn’t a good way to control any device apart from a toboggan.

While many like to speculate on how many E and P cores we’ll see in Apple’s next chips, we should perhaps be paying more attention to all the envelopes that are evolving more advanced core allocation systems, which are likely to play a greater part in the Mac’s present and future.

6Comments

Add yours

1

Albert Godfrind on February 6, 2022 at 12:06 pm

I wonder how this separation of cores will affect or be used by virtualization tools, for example with Linux ARM64 …

LikeLiked by 1 person
- 2
  
  hoakley on February 6, 2022 at 11:09 pm
  
  That’s a very good question. I don’t know whether the M1 virtualisation environment gives access to core allocation. I rather suspect that it doesn’t, merely assigning an intermediate to high QoS and running on all available cores.
  Howard.
  
  LikeLike
3

Jon Gotow on February 6, 2022 at 8:08 pm

One other issue that I’m wrestling with on these asymmetric (or heterogeneous) CPUs is what the metrics from utilities like Activity Monitor, iStat Menus and my own App Tamer actually mean. We’re used to reporting a process’ demands in terms of “% CPU.” When an app uses 75% CPU on an Intel Mac, that’s considered quite a lot. But on the M1 Macs, that doesn’t account for which type of core(s) the process is running on. Certainly 75% CPU usage on an E core is putting much less of an actual load on the M1 processor than 75% CPU usage on a P core. Unfortunately, the only other similar metric we have in Activity Monitor is “Energy Impact,” which encompasses much more than the raw CPU load, though it’s not clear exactly how Activity Monitor arrives at the value. Do we need to think about some other way of measuring how hard a Mac is working?

LikeLiked by 1 person
- 4
  
  hoakley on February 6, 2022 at 11:18 pm
  
  Thank you. Yes, it’s even more complicated I fear. Existing CPU% appears to be a simple sum of the active residency of all the cores. So, when an M1 Pro is running at peak, you’d expect the scale to be 0-1000%. As you point out, that should really be a weighted sum, in which the E cores might count at half, making the scale maximum 900% instead.
  That’s all fine until you look at core frequency. While active residency of each of the four E cores in an original M1 chip could readily come to 400% in total, if they’re running QoS 9 threads, then they’ll be running at around 972 MHz, less than half their maximum, which they can only achieve when they’re running higher QoS threads. But obtaining core frequency isn’t at all easy, so even Activity Monitor ignores it.
  Nothing’s simple, clear or reliable.
  Howard.
  
  LikeLike
  - 5
    
    gotow on February 8, 2022 at 6:40 pm
    
    Yes, that’s the other fly in the ointment – CPU speed. As you point out, standard system APIs like sysctl( ) no longer deliver CPU frequency on M1-powered Macs. So we’ve got different capabilities with different cores, CPU frequencies that adjust on the fly, and _then_ the actual time a process is allocated on particular cores. Complicated indeed. I’m still stuck on how to quantify the benefit when App Tamer moves an app to the E cores.
    
    LikeLiked by 1 person
    - 6
      
      hoakley on February 8, 2022 at 8:36 pm
      
      I do wish that Apple would release the source of powermetrics so we could work out how to obtain frequency information.
      Howard.
      
      LikeLike

Share this:

Related