How you can’t promote threads on an M1

There are two types of CPU core in Apple’s M1 chips which are used by different types of software. Efficiency (E) cores are predominantly used to run services and background processes in macOS, such as making Time Machine backups and maintaining Spotlight indexes. User apps run their threads largely on Performance (P) cores which consume more power to complete their tasks interactively.

macOS doesn’t give developers or users direct control over which type of core processes or threads are run on. Instead, core allocation is determined by setting the Quality of Service (QoS) for processes, threads and their queues. The lowest QoS setting of background (integer equivalent 9) confines that code to the E core cluster; three higher settings allow it to be run on either E or P clusters, thus to run faster but with higher power use. Apps can give the user control over the QoS of their threads, and it’s possible to run code such as command tools at a chosen QoS, but there’s no general method by which the user can control QoS, thus how fast any given process or threads will run.

There are times when the user might wish to accelerate completion of tasks which are normally run exclusively on E cores. For example, knowing that a particular backup might be large, they might elect to leave their Mac to get on with that, wanting to run it as quickly as possible. There are two methods which appear intended to change the QoS of processes: the command tool taskpolicy, and its equivalent code function setpriority().

Latest experience with using those demonstrates that, while they can be used to demote running processes to E cores, they can’t promote processes and threads which are already confined to E cores so that they can use both types. I have recently demonstrated this using the command
taskpolicy -B -p 567
which should promote the process with PID 567 to run on both types of core. When that process is normally confined to running on the E cluster, that command has no effect on its core allocation or performance.

Jon Gotow of St. Clair Software has added an experimental feature using setpriority() to his App Tamer utility, and confirmed this phenomenon. While setpriority() can be used to demote processes to use only E cores, it can’t promote those already confined to E cores to have access to P cores as well.

schedthreadapptamer

What might at first appear more puzzling is that taskpolicy and setpriority() can repromote processes and threads which they have demoted. If a process is normally set to run at high QoS, so having access to both E and P cores, and it’s then demoted to run on E cores alone, it can be promoted back to have access to both core types again. This implies that the effects of taskpolicy and setpriority() are the result not of changing QoS, but directly on which cores can be used.

Methods

To investigate this, I modified my AsmAttic test utility so that it can manage two thread queues, one at minimum QoS, the other at maximum. This is used by a new option to alternate test threads between those two extreme QoS values.

Previously, AsmAttic ran one thread queue at a single user-selectable QoS. It creates up to a hundred identical threads and adds those to that queue. As you’d expect, threads are run in the order that they’re created, and after allowing for some to run more slowly when allocated to the E cluster, they normally complete in similar order, with batch effects.

Results

Alternating test threads between different QoS changes this. With modest numbers of odd-numbered threads run at lowest QoS and even-numbered ones at highest QoS, all the high QoS threads, run on P cores, complete first, in roughly the same order that they are added to the queue.

schedthreadasmattic

This shows a small demonstration, in which three QoS 33 threads, numbers 6, 4 and 2, completed at an elapsed time of less than 0.66 seconds, followed by QoS 9 threads 1, 3 and 5 after 1.7 and 3.6 seconds.

Running the command
taskpolicy -B -p 567
against AsmAttic’s PID didn’t change results at all, with QoS 9 processes still running slower on the E cluster. However, as expected
taskpolicy -b -p 567
confined all threads to the E cluster, as shown in the CPU History window, and confirmed using powermetrics.

schedthreadcpuhist

First run, marked by the figure 1, shows these tests being run without the use of taskpolicy. Short peaks on both P clusters reflect those threads run from the high QoS queue, and a prolonged peak on the two E cores at the top results from the odd-numbered threads running on the E cluster.

Shown at the figure 2 is the effect of running taskpolicy -b on AsmAttic’s PID. This successfully demotes all threads, from both queues, to run on the E cluster alone. What it doesn’t do, though, is alter the order of completion of the threads, which are still run in the same sequence as determined by their original QoS. Furthermore, running taskpolicy -B returns the high QoS threads to the P cores, but doesn’t affect low QoS threads.

The most likely explanation is that, on M1 chips, taskpolicy and setpriority() don’t affect QoS, and can only (currently) demote processes and threads which could on the basis of their QoS be assigned to either core type, so that they’re run on E cores alone. This demonstrates a separation between dispatching according to QoS in Grand Central Dispatch (GCD), and allocation to core type. Threads with the lowest QoS are indelibly marked to be run on E cores; those with higher QoS are normally marked capable of being run on either core type, but they can be restricted to just the E cluster.

Conclusions

On current M1 series chips:

external controls, in taskpolicy and setpriority(), appear unable to change QoS, or the dispatching of threads by GCD;
those external controls can limit threads, which on the basis of QoS could be allocated to either core type, to just E cores;
those external controls cannot promote threads, which on the basis of QoS can only be allocated on E cores, so they can be run on either core type;
thus threads originally designated for E cores alone can’t be run on P cores;
promotion of background processes and threads so they can be run more quickly using P cores isn’t currently possible in macOS;
dispatching threads according to QoS and their allocation to clusters are performed separately in macOS.

10Comments

Add yours

1

David C. on January 24, 2022 at 8:05 pm

These commands remind me a bit of the legacy Unix “nice” and “renice” commands.

The “nice” command launches a process at reduced or elevated priority (elevation only being permitted to root users). It doesn’t actually change the process priority, however, but launches it with a non-zero “nice” value as a part of its per-process metadata. The priority and the nice value work together to compute an effective priority, which is used by the scheduler.

A nice value of 0 (the default) runs the process at its normal priority, while values greater than 0 (up to 20) request that it run at reduced priority. Negative values (down to -20) may be selected by root users to request that it run at elevated priority.

“renice” is similar, but can change the nice value of a running process.

So, an ordinary user can launch a process at reduced priority (with a positive nice value) and later increase the priority up to what would be its default (nice value of 0), but not raise it above that.

Your observations appear to be similar to this, strongly implying that the commands/APIs you’re using are behaving similar to the way nice works. You have a parameter that is separate from the actual priority (QoS), which can be used to lower priority but not raise it. So when multiple threads are all set to the same reduced priority, they all schedule as they previously did, relative to each other. And you can “raise” the priorities back up to the defaults, but no higher than that.

It would not surprise me if we later discover there there is a permission that can actually raise a thread’s priority above normal. Maybe root can do it, or maybe it requires some special procedure to grant the permission (similar to how SIP and other system features restrict the root account in other ways).

LikeLiked by 1 person
- 2
  
  hoakley on January 24, 2022 at 9:12 pm
  
  Thank you.
  macOS still has nice and renice, although most find them pretty ineffective.
  They’re very different from what I’m describing here. QoS sets thread queue priorities, which aren’t modified by command/API. As I have shown with my tests using AsmAttic, the threads are still dispatched according to QoS.
  What happens after that is that each thread is allocated to a core cluster, something alien to nice/renice, which are designed for SMP. Those threads originally marked with the lowest QoS are also marked as E cluster only. Other threads can be run on either type of core, and all the command/API does is determine whether that’s (E or P) or (E only). That’s independent of any dispatch priority. Indeed, threads with a QoS of 33, highest priority, can be confined to run on the E cores alone, as in the demonstration shown in the CPU History window above.
  Howard.
  
  LikeLiked by 1 person
  - 3
    
    David C. on January 27, 2022 at 6:35 pm
    
    No disagreement here. I didn’t mean to imply that nice/renice affects QoS. I was just using it as an example of a legacy API with similar behavior – where users can force lower-than-normal priority and later raise priority up to, but no higher than, the default value.
    
    LikeLiked by 1 person
    - 4
      
      hoakley on January 27, 2022 at 10:43 pm
      
      Thank you – yes, you’ve underlined the problem here. QoS itself is about priority, but allocation to core clusters isn’t, it’s concerned with making the best use of resources, including power/energy. That what AMP is all about.
      Howard.
      
      LikeLike
5

Tom C. on March 7, 2022 at 1:34 pm

I am intimidated by your wonderful work on the M1. I am binge-reading it all.

So I apologize in advance for the stupid question and for failing to find the code in question, but isn’t the scheduling algorithm of xnu open-source? I was able to see some QOS related code in Apple’s repository – but not the actual scheduling. You are speaking of the core allocation algorithm as if it’s mysterious but it should be open for all to see. Or is it hidden somewhere in a lower/higher level than the kernel?

LikeLiked by 1 person
- 6
  
  hoakley on March 7, 2022 at 1:43 pm
  
  Thank you.
  Yes, however I can’t find any source which allocates GCD threads to specific core types or clusters. I think that occurs above the kernel. Maybe someone with the time to work their way through the labyrinth of source can find it, but I’ve been unsuccessful in doing so.
  Howard.
  
  LikeLike
  - 7
    
    bemachshavashnia on March 7, 2022 at 4:06 pm
    
    Well I found the code of the scheduler at https://opensource.apple.com/source/xnu/xnu-7195.81.3/osfmk/kern/sched_amp_common.c.auto.html It explicitly refers to P and E cores, but it requires an understanding of Mach scheduling and psets (apprantly “processor sets”) above mine…
    
    LikeLiked by 1 person
    - 8
      
      hoakley on March 7, 2022 at 6:46 pm
      
      Thank you.
      That’s very interesting, but quite different from what happens in macOS! According to that source, the two lowest QoS levels result in threads being confined to E cores; in macOS, it’s only the very lowest, ‘background’, which is.
      That’s a striking disparity. Neither does there appear to be anything about setting core frequency, which is nub of the differences between the original M1 and M1 Pro/Max behaviours.
      Perhaps it’s as well that I didn’t see that code earlier.
      Howard.
      
      LikeLike
9

Jay on June 13, 2022 at 9:55 pm

what program are you using to view the CPU History here which labels the performance and efficiency cores?

LikeLiked by 1 person
- 10
  
  hoakley on June 13, 2022 at 9:57 pm
  
  Activity Monitor.
  Howard
  
  LikeLike

Share this:

Related