Making the most of Apple silicon power: 7 Virtualisation and core use

I hope the previous articles in this series have convinced you how well macOS manages the use of the two core types in Apple silicon chips, to deliver optimum performance and minimise power consumption. This article looks at one performance-critical situation where macOS core management is very different, and it largely abandons its usual strategy.

Core solos

There are a few occasions when threads are run on just one P core. These include early phases of the boot process, until the kernel starts cores other than the first, and for some parts at least of the preparation of macOS updates, which also appear to be confined to the first P core. It’s currently unclear how the latter is achieved, and it doesn’t appear to be described or documented.

Virtual CPUs

Apple’s current implementation of lightweight virtualisation completely ignores core types in a guest macOS, and effectively runs virtual CPUs as high QoS threads on the host, with little regard for core types. The overall effect is that virtualisation is managed as if running on an SMP processor.

This is best demonstrated by comparing concurrent CPU History windows in the virtual machine (VM) and the host, while running benchmark tests in the guest.

vcpuVent

This cutout from a single screenshot shows the CPU History window of the host at the left, and that of the guest at the right. Both operating systems are Ventura, with the guest allocated four virtual CPUs (vCPU), running on an M1 Max. A series of tests was run on the guest, with increasing numbers of threads going from the left to right, for 1, 2, 3, 4 and 8 threads. The four vCPUs map almost entirely into the four P cores of the first P cluster in the host, with a little overspill to P cores in the second cluster.

Changing the QoS of the threads run in the guest has no effect at all on performance or core allocation in either the host or guest. On the host, vCPUs are allocated to P cores in the same sequence as with processes on the host, from the first cluster, then the second, and finally to the two E cores.

When this is performed on a Ventura host, the VM is shown as running ‘user’ rather than ‘system’ threads in CPU History. Previous testing in Monterey (shown below for four vCPUs) showed VM threads on the host as ‘system’ instead, suggesting a change in the way that Activity Monitor classifies VM threads.

vcpu4

Choking on threads

Provided that the host is lightly loaded, this lightweight virtualisation works well with numbers of vCPUs up to the number of P cores. Performance isn’t as good when the E cores are added to increase vCPU numbers to equal the total of cores in the chip. Thereafter, with rising vCPU count, the VM grinds to a halt when the number of vCPUs exceeds the number of physical cores in the chip, as the host queues VM threads waiting for those already running on the cores and they become deadlocked.

Virtualisation software needs to ensure that users can’t inadvertently be allocated more vCPUs than the total number of cores available in the host.

Effects of performance

Provided the number of vCPUs allocated to a VM allows the host sufficient physical cores for its own processes, lightweight virtualisation is truly lightweight in its effects on host performance. With four vCPUs on an M1 Pro/Max, both E cores and almost all of four P cores are available to the host. Even with the VM under heavy load, macOS core allocation strategy prevents that from impairing the host.

The penalties of virtualisation are more apparent in the guest: QoS has no effect, background and userInteractive threads contend for the same vCPUs, and all the advantages of core allocation strategy are lost. These need careful consideration when deciding whether to virtualise. Oddly, virtualisation is one solution to the problem of rescheduling background threads running at lowest QoS to run them on P cores instead, although it’s unlikely to prove practical.

Conclusions

Each vCPU for a macOS VM is run as one high QoS thread on the host.
With careful choice of the number of vCPUs, lightweight virtualisation need have little effect on host performance.
vCPUs have a single core type, and QoS has no effect on thread performance in guest macOS.
Guest macOS thus performs differently from that expected on Apple silicon chips.
Consequently, virtualised macOS may prove unsuitable for some purposes, such as running a mixture of demanding background and userInteractive threads.

So far, these articles have concentrated on CPU cores. In the next article, I will cast my net wider and consider other computational units within Apple silicon chips.

Previous articles

Making the most of Apple silicon power: 1 M-series chips are different
Making the most of Apple silicon power: 2 Core capabilities
Making the most of Apple silicon power: 3 Controls
Making the most of Apple silicon power: 4 Frequency
Making the most of Apple silicon power: 5 User control
Making the most of Apple silicon power: 6 Empowering users

Does removing I/O throttling make backups faster?

MacSysAdmin 2022 video (watch)
MacSysAdmin 2022 Keynote slides (download)

4Comments

Add yours

1

Paul Rockwell on October 27, 2022 at 3:01 pm

There were anecdotal reports of Windows 11 ARM running better virtualized on Apple Silicon than on native platforms. I’ve always wondered if virtual core allocation behavior contributed to that. Your analysis gives my theory some plausibility. Thanks!

LikeLiked by 1 person
- 2
  
  hoakley on October 27, 2022 at 6:36 pm
  
  Thank you – that is interesting. What I hear of Windows and thread management isn’t encouraging, so maybe the full might of P cores is the solution.
  Howard.
  
  LikeLike
3

joethewalrus on November 14, 2022 at 1:15 am

Thank you! You’ve led me to a workaround for a huge gripe of mine, which is that the single-threaded nature of music encoding (to ALAC or AAC) takes forever on an M1 Mac using the Apple Music app. It’s locked to the E cores, and throttled on the 4E core non-pro M1s, so that the same 41 AIFF tracks encoding to AAC-256 takes 3m55s on a 2020 M1 Mac Mini and 3m57s on a 2021 8core MacBook Pro. The same process on the same tracks takes 2:32 on a 2018 6core i5 Mac Mini, 3:13 on a 2019 i5 MB Air, and 3:27 on a 2011 i5 Mac Mini. That’s right…my 2021 MacBook Pro encodes music in Apple Music (Ventura) slower than my 10 year old Mac Mini does in iTunes (High Sierra).

That is, of course, unless you run the process in a Virtual Machine on the M1 Pro system. Then the same tracks take 59 seconds, 4x faster than when run in the host OS, exactly the kind of performance we were promised with Apple Silicon. Apple should immediately give users the option to encode in Music on the P cores, and it would be really nice if we could encode multiple tracks in parallel in separate threads. Of course, they never will; their strategy is to sell you their streaming service and those of us who collect our own (legally licensed) tracks are an afterthought at best.

LikeLiked by 1 person
- 4
  
  hoakley on November 14, 2022 at 5:09 pm
  
  Thank you.
  I’m amazed that Apple constrains encoding like that. If it’s single-threaded, then you shouldn’t need any more than 2 or 3 vCPUs either to enjoy that performance.
  Howard.
  
  LikeLike

Share this:

Related