I hope the previous articles in this series have convinced you how well macOS manages the use of the two core types in Apple silicon chips, to deliver optimum performance and minimise power consumption. This article looks at one performance-critical situation where macOS core management is very different, and it largely abandons its usual strategy.
There are a few occasions when threads are run on just one P core. These include early phases of the boot process, until the kernel starts cores other than the first, and for some parts at least of the preparation of macOS updates, which also appear to be confined to the first P core. It’s currently unclear how the latter is achieved, and it doesn’t appear to be described or documented.
Apple’s current implementation of lightweight virtualisation completely ignores core types in a guest macOS, and effectively runs virtual CPUs as high QoS threads on the host, with little regard for core types. The overall effect is that virtualisation is managed as if running on an SMP processor.
This is best demonstrated by comparing concurrent CPU History windows in the virtual machine (VM) and the host, while running benchmark tests in the guest.
This cutout from a single screenshot shows the CPU History window of the host at the left, and that of the guest at the right. Both operating systems are Ventura, with the guest allocated four virtual CPUs (vCPU), running on an M1 Max. A series of tests was run on the guest, with increasing numbers of threads going from the left to right, for 1, 2, 3, 4 and 8 threads. The four vCPUs map almost entirely into the four P cores of the first P cluster in the host, with a little overspill to P cores in the second cluster.
Changing the QoS of the threads run in the guest has no effect at all on performance or core allocation in either the host or guest. On the host, vCPUs are allocated to P cores in the same sequence as with processes on the host, from the first cluster, then the second, and finally to the two E cores.
When this is performed on a Ventura host, the VM is shown as running ‘user’ rather than ‘system’ threads in CPU History. Previous testing in Monterey (shown below for four vCPUs) showed VM threads on the host as ‘system’ instead, suggesting a change in the way that Activity Monitor classifies VM threads.
Choking on threads
Provided that the host is lightly loaded, this lightweight virtualisation works well with numbers of vCPUs up to the number of P cores. Performance isn’t as good when the E cores are added to increase vCPU numbers to equal the total of cores in the chip. Thereafter, with rising vCPU count, the VM grinds to a halt when the number of vCPUs exceeds the number of physical cores in the chip, as the host queues VM threads waiting for those already running on the cores and they become deadlocked.
Virtualisation software needs to ensure that users can’t inadvertently be allocated more vCPUs than the total number of cores available in the host.
Effects of performance
Provided the number of vCPUs allocated to a VM allows the host sufficient physical cores for its own processes, lightweight virtualisation is truly lightweight in its effects on host performance. With four vCPUs on an M1 Pro/Max, both E cores and almost all of four P cores are available to the host. Even with the VM under heavy load, macOS core allocation strategy prevents that from impairing the host.
The penalties of virtualisation are more apparent in the guest: QoS has no effect, background and userInteractive threads contend for the same vCPUs, and all the advantages of core allocation strategy are lost. These need careful consideration when deciding whether to virtualise. Oddly, virtualisation is one solution to the problem of rescheduling background threads running at lowest QoS to run them on P cores instead, although it’s unlikely to prove practical.
- Each vCPU for a macOS VM is run as one high QoS thread on the host.
- With careful choice of the number of vCPUs, lightweight virtualisation need have little effect on host performance.
- vCPUs have a single core type, and QoS has no effect on thread performance in guest macOS.
- Guest macOS thus performs differently from that expected on Apple silicon chips.
- Consequently, virtualised macOS may prove unsuitable for some purposes, such as running a mixture of demanding background and userInteractive threads.
So far, these articles have concentrated on CPU cores. In the next article, I will cast my net wider and consider other computational units within Apple silicon chips.
Making the most of Apple silicon power: 1 M-series chips are different
Making the most of Apple silicon power: 2 Core capabilities
Making the most of Apple silicon power: 3 Controls
Making the most of Apple silicon power: 4 Frequency
Making the most of Apple silicon power: 5 User control
Making the most of Apple silicon power: 6 Empowering users
Does removing I/O throttling make backups faster?
MacSysAdmin 2022 video (watch)
MacSysAdmin 2022 Keynote slides (download)