Scheduling of Processes on M1 Series Chips: first draft

As I promised in yesterday’s article about the management by macOS 12 of many processes on the cores in M1 series chips, here is my first draft account, complete with an initial flowchart of sorts. For the observations and evidence, and fuller details of how I’ve arrived at this, please refer to that previous article, and the several links it contains.

The chips

In the context of CPU cores, there are currently two variants of the M1 chip: the original version, which shipped in 2020, and that used in the M1 Pro and Max models which shipped in late 2021.

The original M1 chip has two core clusters, each containing four cores. One cluster contains Efficiency (E) cores with a maximum frequency of 2064 MHz and about half the internal processing units of the Performance (P) cores in the other cluster. P cores also have a higher maximum frequency, of 3204 MHz in the M1 and 3228 MHz in the M1 Pro/Max.

In contrast, the M1 Pro/Max has three core clusters: one containing just two E cores, the others containing four P cores each. Cores are managed and perform in those clusters. For instance, when you load four high-priority processes onto an M1 Pro/Max chip, they will be run in the first P cluster, and whenever possible the second P cluster will remain unloaded and inactive. Frequency is also set per cluster, and shouldn’t differ between cores within any given cluster.

Queues

This all starts with the creation of an Operation or similar, with an assigned Quality of Service (QoS), which determines how macOS will schedule it. Those processes with the lowest QoS of 9 are deemed ‘background’ processes, and will be run exclusively on the E cores; those with higher QoS, up to the maximum of 33, are deemed ‘user’ processes, and are eligible to be run on either P or E cores, according to their availability. Although I don’t (yet) have any direct evidence, I suspect that the queues of processes for these two QoS types are maintained separately. I also suspect that the other two intermediate QoS values are handled here as QoS 33.

When a process slot becomes free on one of the designated types of core for that queue, macOS assigns that process to the slot. Much of the time I have been observing M1 chips which are nearly idle, thus with all their slots available. In those circumstances, when there are multiple processes in the queue, macOS will allocate them to clusters in batches. For example, if the E cluster, consisting of four E cores in the original M1 chip, is almost idle, and there are ten processes in the low QoS queue, macOS will assign the first four of those processes to that cluster.

Background processes

Low QoS processes are loaded and run differently in original M1 and M1 Pro/Max chips, as they have different E cluster sizes.

In the original M1 chip, with four E cores, QoS 9 processes are run with the core frequency set at about 1000 MHz (1 GHz). What happens in the M1 Pro/Max with its two E cores is different: if there’s only one process, it’s run on the cluster at a frequency of about 1000 MHz, but if there are two processes, the frequency is increased to 2000 MHz. This ensures that the E cluster in the M1 Pro/Max delivers at least the performance for background tasks as that in the original M1, at similar power consumption, despite the difference in size of the clusters.

User processes

All processes with a QoS higher than 9 appear to be handled similarly at present, although further work is needed to investigate that properly.

As high QoS processes are eligible to be run on either of the core types and any core cluster, their management differs between M1 and M1 Pro/Max variants. On the original M1, with its single P cluster, batches of up to eight processes can be distributed to the two available clusters, with four process slots available on each. When there are four or fewer processes, they will be run on the P cluster whenever possible, and the E cluster is only recruited when there are more high QoS processes in the queue. P cores are run at a frequency of about 3 GHz, and E cores at about 2 GHz, twice the frequency normally used for QoS 9 processes.

M1 Pro and Max chips have a total of three clusters, two of four P cores each, plus the half-size two-core E cluster. With up to four processes in the queue, they will be allocated to the first P cluster (P0); processes 5-8 will go to the second P cluster (P1), which would otherwise remain unloaded and inactive for economy. If there are a further two processes in the queue, they will be run on the E cores. Frequencies set are the maximum for the core type, to 3228 MHz on P0 and P1, and 2064 MHz on the E cluster.

schedulingrocessesm1

Here’s a tear-out PDF to take away: schedulingprocessesm1

Contention

The greatest limitation of my testing to date is that I haven’t observed how contention from other parent processes and across different QoS might affect these behaviours. For instance, if one app is trying to run processes with a high QoS and another is trying to run processes with a low QoS. Does macOS then reserve the E cluster for the latter, and limit the former to the P cluster(s)? Are processes with the highest QoS of 33 given priority over those with either of the two intermediate QoS values?

I will be looking at these in the coming weeks, and reporting back. In the meantime, if you’re aware of any other evidence which confirms or contradicts any of the above, please let me know, preferably by comment below (or email). I value your thoughts and results.