Why does virtualisation run some code far slower on Apple silicon?

One of the most valuable features of virtualisation of macOS guests on an Apple silicon host is near-native performance. I have previously reported measurements of integer and floating-point core performance in VMs that were close to those of the host. This article reports fuller comparisons on both M1 Max and M3 Pro hosts over a much wider range of in-core tests.

Testing

Native tests were run on a Mac Studio M1 Max (8P+2E) and a MacBook Pro M3 Pro (6P+6E), both running Sonoma 14.2.1. Virtual machines were created in Sonoma 14.2.1 using Viable, and run in Viable with 6 virtual cores and 16 GB of memory. Tests and times were measured using the same app AsmAttic, built for ARM64 architecture only. Each test was run with a single thread, and with 4 simultaneous threads, although times used for analysis are those for the single thread, thus representing a total of 100% active residency on a Performance core on the host. Although QoS was set high (33) for every test, that only has effect when running on the host, as VMs don’t make any distinction between core types.

Tests used were:

empty loop, only incrementing an integer loop counter (assembly)
integer arithmetic (assembly)
floating-point arithmetic using multiply-add (assembly)
NEON vector unit calculating a dot-product on two vectors of four 32-bit floating-point numbers (assembly)
simd_dot, calculating a dot-product on two vectors of four 32-bit floating-point numbers (SIMD library)
CPU matrix multiplication of two 16 x 16 matrices of 32-bit floating-point numbers (Swift)
vDSP_mmul matrix multiplication of two 16 x 16 matrices of 32-bit floating-point numbers (Accelerate library)
SparseMultiply, multiplication of dense and sparse matrices of 32-bit floating-point numbers (Sparse Solvers, Accelerate library)
BNNSMatMul matrix multiplication of 32-bit floating-point numbers (Accelerate library).

Source code has been appended to previous articles (see links at the end).

Results are expressed throughout as percentages relative to host performance, thus those for the VM run on an M3 Pro are given relative to the M3 Pro host, with higher percentages meaning faster, and lower slower, than the host.

Results

In general, tests performed in VMs ran only slightly slower than when run in the host. These are summarised in the chart below.

vmperfalltestcompares

Bars in pale blue show results for the M1 Max, and those in red are for the M3 Pro. With some minor exceptions, and those for vDSP_mmul, VMs ran tests only slightly slower than the host. The empty loop ran slightly faster on the M1 VM, but ran rather slower on the M3 VM. Otherwise VMs ran at 93-99% of native speed, regardless of whether the test used integer, floating-point or NEON units in the core.

The most obvious and greatest exception to that were the results of vDSP_mmul matrix multiplication, which showed a marked reduction in performance to about 50% when virtualised on both chips. This was seen in both single thread tests and when running 4 threads at a time.

The second chart shows performance for tests run in a VM on the M3 Pro relative to results from the M1 Max running native, again with higher percentages being faster.

vmperfalltestcomparesm3vmvm1native

With the sole exception of vDSP_mmul, all tests ran significantly faster in a VM on the M3 than they did natively on an M1. Speed in the VM was 120-160% of native M1.

Explanation

These results are consistent with the claim that virtualisation of macOS on Apple silicon Macs using Apple’s API delivers near-native CPU performance, at least as far as processing units within CPU cores are concerned. Assembly language routines that access integer, floating-point and NEON arithmetic functions do run almost as fast when virtualised, as do most in the Accelerate and related libraries. All other factors, such as memory and GPU access, being equal, this should deliver near-native performance to apps running virtualised.

The marked difference in vDSP_mmul matrix multiplication demonstrates that not all functions in the Accelerate library fare as well, though. As Apple provides no information on which hardware that function can run on, we can only speculate why it performs so poorly when run in virtualisation.

This suggests that, when run native on both M1 and M3 chips, the Accelerate library uses hardware that isn’t available to it when virtualised. That’s almost certainly going to be a processing unit that is outside CPU cores, of which the favourite must be the AMX matrix co-processor. Previous results from powermetrics measurements during tests have shown that there’s no recorded power consumption from the ANE neural engine, making that most unlikely.

On the M1 Max, core allocation when running vDSP_mmul tests has also been shown to be very different from that of other tests used, and again implies that the Accelerate library is accessing hardware outside the CPU core.

I’m not aware of any studies made of AMX use from VMs, but it here appears most likely that, when running native, the Accelerate function uses the AMX, but that’s not available to virtualised code, which then runs a non-AMX substitute that is half the speed.

Conclusions

In-core performance tests demonstrate that most CPU code run in a macOS VM on Apple silicon does so at near-native speed.
In-core performance of virtualised code runs significantly faster on M3 cores than native code on M1 cores. Thus, all other factors being equal, a VM running on an M3 chip is likely to perform better than when run natively on an M1.
One Accelerate library function, vDSP_mmul, is an exception to this, and runs at half the speed when virtualised.
That’s most likely the result of vDSP_mmul using the AMX matrix co-processor, which implies that the AMX can’t be used from a VM.

Evaluating M3 Pro CPU cores: 1 General performance
Evaluating M3 Pro CPU cores: 2 Power and energy
Evaluating M3 Pro CPU cores: 3 Special CPU modes
Evaluating M3 Pro CPU cores: 4 Vector processing in NEON
Evaluating M3 Pro CPU cores: 5 Quest for the AMX
Evaluating the M3 Pro: Summary
Finding and evaluating AMX co-processors in Apple silicon chips
Comparing Accelerate performance on Apple silicon and Intel cores
Can a different core allocation strategy work on Apple silicon?
M3 CPU cores have become more versatile

8Comments

Add yours

1

mac2net on January 22, 2024 at 12:20 pm

But with Apple’s RAM and internal SSD prices so high, what’s the point. I have a Minisforum 780 with an internal 1TB net RAID 1 with 64GB RAM and 2 v 2.5gb ethernet connection running silently in the other room with 3 VMs at an an idle temperature of 32°C. I even threw a lightweight XFCE desktop environment on the VMs I can access via VNC that run under 500MB RAM. The VMs are running a web server and a mail server and a test machine. The only obvious practical use case of running a MacOS VM is for testing. The hardware setup I described costs $800.

LikeLiked by 1 person
- 2
  
  hoakley on January 22, 2024 at 5:37 pm
  
  Thank you.
  I take it that you’ve never used an Apple silicon Mac. As your test setup is unable to run any Arm code, how do you test that?
  That may be the obvious practical use case for you, but there are many other use cases both present and future. In case you hadn’t noticed, Apple has stopped selling Intel Macs, and won’t ever sell them again. Where does that leave your $800 hackintosh?
  Howard.
  
  LikeLike
3

mac2net on January 23, 2024 at 3:48 pm

Mac guy since ’86. Bought for a company the SuperMac 68040 Nubus board that enable a machine within a machine and setup another company to run DOS on PowerPC for an accounting dept running a legacy system and many Parallels with Windows about 10 years ago. Over the years I have bought millions of dollars of Macs for companies. And I’m writing this on an M2 MBA. Also the MinisForum has an AMD Ryzen 7 7840 HS CPU and the little machine is perfect for virtualisation that can do real stuff using Fedora on the metal. Fedora is basically an IBM project via Red Hat. They are bringing the ability to self host edge services to the average Joe.
While I am American, I spent 5 years doing Mac stuff in London (1989-94. For one company in Putney, when I walked in they had 3 Macs and when I walked out they had 70 (18 months). Then I ran the Mac system at Booz Allen, had a mail server blown up by the IRA on the Isle of Dogs.

LikeLiked by 1 person
- 4
  
  hoakley on January 23, 2024 at 6:15 pm
  
  Thank you. That makes it all the more surprising that you wrote your previous comment.
  Remember the PowerPC to Intel transition, when everyone wanted Apple to support Rosetta for longer? Virtualising macOS on Apple silicon Macs is a good way to ensure users can continue to run Intel apps for long into the future, after Rosetta 2 has been dropped from macOS. How about my running apps only compatible with Monterey or Ventura on my new MBP M3 Pro, then, which must run Sonoma or later? This article isn’t about virtualising Linux, but macOS – where my M3 Pro runs apps in a VM quicker than my M1 Studio does native.
  macOS VMs on Apple silicon deliver such good performance that you could even use them for running apps within a self-contained and sandboxed environment.
  Why do you think Apple has invested so much engineering effort in lightweight virtualisation?
  Howard.
  
  LikeLike
5

mac2net on January 24, 2024 at 3:58 am

Great question – why? My point is simple – Macs are way too expensive in terms of RAM and storage to run MacOS VMs for anything but edge cases – especially if the VM needs to handle a MacOS upgrade – as you pointed out a while ago. I solved the storage problem by dumping the MacOS VM onto a cheap USB 3.1 Gen 2 drive on which it performed good enough for my needs. I run the Lima-based Fedora VM on an Intel Mac Mini with the VM stored on a Thunderbolt SSD. This VM is important because Cockpit allows me to access the AMD metal and VMs via port 22, instead of Cockpit’s 9090. I also access the AMD (and it’s VMs) via VNC on that port.

Perhaps Mr Cook is planning to further leverage his Apple Silicon investments by releasing Macs w/o a gizillion GPU cores and upgradeable RAM and/or storage. VMs need RAM and storage. So yeah “why” is a great question!

But right now AMD CPUs like the Ryzen 9 7940HS and Ryzen 7 7840HS and their just released successors are going to eat Mac hardware for lunch. I suggest you check out some of the video reviews.

LikeLiked by 1 person
- 6
  
  hoakley on January 24, 2024 at 7:03 am
  
  “Macs are way too expensive in terms of RAM and storage to run MacOS VMs for anything but edge cases”
  For you, maybe. But there are plenty of other people who do so, and many more will in the future. macOS on Intel is dying – if that’s your only choice, then you’re in a dead end. How are you going to run Arm macOS on AMD then?
  Howard.
  
  LikeLike
7

mac2net on January 24, 2024 at 8:11 am

I don’t want to. For the moment my personal computer is a Mac and it’s been that way since 1986. But my interest in all the Apple GPU stuff and the $€£ that it costs is negligible. And really when it comes down to it, Apple wants the Mac to be a device in a similar way as IOS devices. Most of the interesting stuff I do with a Mac comes from brew.sh. For other apps, I rely on Apple, SetApp and mostly other free software. On Fedora the key software is free and sponsored by IBM. On the downstream related distros I use other free software. Most of the key server apps I use do offer some kind of paid subscriptions. I run the AMD from home and access it through both IPv4 and IPv6 DMZ or through the Mac Mini running Lima giving me an SSH tunnel to the AMD.
OTOH, keep an eye on Tuxedo. I can buy a 14″ laptop comparable to the MBA/MBP w/32GB RAM and 2 x 1TB PCIe 4 running as RAID 1 with the same CPU I mentioned above (Ryzen 7 7840HS) for €1241 net of VAT. This is probably what I would get if I had a job supporting the above described type of installation.
https://www.tuxedocomputers.com/en/TUXEDO-Pulse-14-Gen3.tuxedo
This CPU single-core performance is equivalent to the Mac Studio Ultra according to GeekBench and just under the Mac mini 10 core in multi core performance.

LikeLiked by 1 person
- 8
  
  hoakley on January 24, 2024 at 8:27 am
  
  So you’re just trolling me then?
  Why do you keep recommending me to look at AMD and Linux? Haven’t you noticed that this blog is about Macs and macOS?
  Howard
  
  LikeLike

The Eclectic Light Company

Why does virtualisation run some code far slower on Apple silicon?

Testing

Results

Explanation

Conclusions

Previous articles

Testing

Results

Explanation

Conclusions

Previous articles

Share this:

Related