hoakley March 10, 2022 Macs, Technology

What performance to expect in the Mac Studio

Following this week’s announcement of a fourth chip in the first generation of Apple Silicon systems, we’re all full of the heights of their performance. Apple threw us the teaser that it has yet to announce its replacement for the Mac Pro, which presumably will also use the new M1 Ultra. This article steps back from the hype and assesses performance already attained, and where it’s heading.

M1 family

There are now four chips in the M1 family, and according to Apple that’s the complete set:

M1, with 4 P and 4 E cores, 8 core GPU, and 16 core Neural Engine,
M1 Pro, with 8 P and 2 E cores, 16 core GPU, and 16 core Neural Engine,
M1 Max, with 8 P and 2 E cores, 32 core GPU, and 16 core Neural Engine,
M1 Ultra, with 16 P and 4 E cores, 64 core GPU, and 32 core Neural Engine,

There have been rumours of a fifth, consisting of four M1 Max chips conjoined, which could have been intended for the forthcoming Mac Pro, but it now appears most likely the replacement for Apple’s top-end model will also use the M1 Ultra – see the comments to this article for more thoughts about that.

Although there are differences in caches across the variants, their E cores are essentially the same, and the most significant difference in their P cores is that those in the original M1 chip have a slightly lower maximum frequency of 3204 MHz, while those in the M1 Pro/Max have a maximum of 3228 MHz.

CPU performance

When running tight loops of assembly code which only accesses in-core resources including registers, there’s a strong linear relationship between performance measured as the number of loops completed per second and the number of threads run.

m1allCoresFloatThreads

Looking first at the solid line, that’s a linear regression through the loop throughputs measured as 10^9 loops per second, against the number of test threads run on P cores. That has a gradient of 0.15, indicating that each P core runs this code at a rate of 150 million loops per second. The broken line is the equivalent regression for the four E cores in an original M1 chip, each of which runs the loop at a rate of 50 million loops per second, a third of a P core.

There are also two points plotted with an x which start on the regression line for the E cores, but rise sharply above it for 2 threads. Those are the results from the two E cores in an M1 Pro chip. With a single thread running on them, they follow the performance of the E cores in the M1, but loading a second thread results in a loop throughput which matches the total of all four E cores in the M1. That’s the result of core frequency control imposed by macOS.

For code running in sufficient threads to ensure that each core has full active residency, and whose performance isn’t limited by access to resources such as memory, we can expect a linear performance increase with increasing numbers of P cores. Effects of increasing the number of E cores are, though, largely determined by the way in which their frequency is managed.

Benchmark results

The first set of benchmarks for an M1 Ultra have been published by Juli Clover on MacRumours. These are based on the Geekbench 5 CPU suite, and indicate a single-core score of 1793, and multi-core of 24,055. My own previous tests on my M1 Pro returned a remarkably similar value of 1772 for single-core, and 12,548 multi-core, the latter being slightly more than half that of the M1 Ultra.

These are entirely in accordance with what you’d expect for threads being run at high QoS, where they’ll be given maximum frequency on both P and E cores. What they don’t tell us about is performance of the E cores when running threads at minimum QoS, which is more typical of macOS background tasks such as Time Machine backups and Spotlight indexing services.

Benchmarking GPU and Neural Engine performance is more complicated, and access to both is normally limited to APIs such as Metal for GPUs. For developers, the joy of these chips is that that access is largely transparent and handled by macOS. That should result in linear performance improvement with increasing number of cores, provided they can all be used by the app.

What to expect

User processes are almost exclusively run on P cores, with E cores being recruited when there’s sufficient demand. Those user processes should therefore be accelerated in proportion to the number of P cores, provided that there are sufficient threads to run. It’s that last requirement which is key: if there are 8 or fewer threads with high active occupancy, then the M1 Ultra’s 16 P cores will be of little or no extra benefit. Only when the number of heavyweight threads exceeds 8 will those extra cores result in improved performance over the M1 Pro/Max.

This is likely to be reinforced by macOS’s management of cores, which are grouped into clusters, typically of four cores (two in the case of E cores in the M1 Pro/Max). When running 4 or fewer heavyweight threads, only the first P cluster (P0) will be active; with 5-8 threads, the second P cluster (P1) will be added; the Ultra’s P2 and P3 clusters will normally remain inactive, at a frequency of 600 MHz and full idle, until 9 or more threads are fully active.

Where the M1 Ultra may prove little advantage is in macOS background tasks, which aren’t just configured to run on E cores alone but also usually have I/O throttling applied. Unless that throttling is eased and E cores are run at higher frequencies, tasks such as Time Machine backups and Spotlight indexing are likely to take as long on an Apple Studio equipped with an M1 Ultra as on an M1 Max, or even on an original M1.

Glossary

Active residency is the proportion (usually percentage) of clock cycles in which a core is actively processing, and not idling.

E core is an Efficiency core (Icestorm), designed for low power consumption while still delivering useful performance.

P core is a Performance core (Firestorm), designed for high performance at higher power consumption.

QoS is Quality of Service, a setting used in macOS to determine both priority and core allocation of threads. The lowest, background with a numeric value of 9, results in that thread being run exclusively on E cores; three higher values result in the thread being run preferentially on P cores, but they can be run on E cores when all P cores are already at high active residency.

Revised following the comments below, for which I’m very grateful, and updated 1800 GMT 10 March 2022.

28Comments

Add yours

1

Oliver Busch on March 10, 2022 at 9:55 am

SVP of Hardware Engineering John Ternus mentioned in the event video that M1 Ultra would be the last chip in the M1 family, though.

LikeLiked by 1 person
- 2
  
  Colstan on March 10, 2022 at 1:18 pm
  
  Correct me if I am wrong, I could be, but didn’t he say that it was the last member of the M1 *family*? Keep in mind that the various M1 incarnations are branding. Apple could release something like a 40-core chip inside the Mac Pro and simply call it the M1 Ultra with 40 cores.
  
  I think you are probably right, and that we’ve seen the last of the M1 line, and that Mac Pro might be based upon M2. That’s just speculation, at the moment. However, I’m just wondering if we are reading too much into the branding side of it.
  
  LikeLiked by 1 person
  - 3
    
    hoakley on March 10, 2022 at 2:05 pm
    
    Thank you. His exact words, transcribed from the video, were:
    “making our transition nearly complete, with just one more product to go, Mac Pro, but that is for another day”
    He didn’t refer to whether the Mac Pro would use the Ultra, or another M1 derivative (nor would he, of course).
    Launching a brand new top-of-the-range chip based on the M2 would be an extremely high risk venture, and until Apple was happy that a more basic member of the family was giving good yields and reliable, I don’t think it would attempt that. I expect Apple will start the M2 generation in a similar way that it did with M1: a replacement for the original M1 first, then an M2 Pro and Max, and only then when yields are good and the design proven, go for a duplex design like the Ultra.
    This isn’t just branding, it’s the way that design and manufacture tends to work, particularly at the leading edge. And the first requirement for any new generation is a new pair of core types.
    Howard.
    
    LikeLike
    - 4
      
      Colstan on March 10, 2022 at 2:33 pm
      
      Thanks for the clarification about the Mac Pro, but I was curious about what he said regarding the M1 Ultra being the last in the chip family, regardless of the Mac Pro. However, I may not be recalling that correctly. I understand that Apple’s cadence goes from least complex (M1) to most (M1 Ultra) for technical, not marketing reasons. The only reason I mention the M2 being a possibility for the Mac Pro is because Ming-Chi Kuo is saying the Mac Pro is a 2023 product. That could be announced at WWDC and shipping in January, but it wouldn’t be the first time that Apple has delayed the Mac Pro. I know that would violate their two-year plan. For what it is worth, according to a former chip architect who worked with the guys at Apple when he was at AMD and DEC, says he isn’t sure that the M1 Max can go quad.
      
      In other words, you need a degree in Kremlinology to divine Apple’s intentions. At least it gives us something to talk about.
      
      LikeLiked by 1 person
    - 5
      
      hoakley on March 10, 2022 at 4:05 pm
      
      Thank you – I stand corrected, he did earlier refer to the Ultra as “one last chip to the M1 family”. Which suggests that the Mac Pro will also use the Ultra.
      Howard.
      
      LikeLike
    - 6
      
      James on March 10, 2022 at 3:47 pm
      
      He actually did say it was the last chip in the M1 family. It is at 25:20 into the video at the beginning of the M1 Ultra segment. The exact quote from John Ternus was, “We are adding one more chip to the M1 family and it is going to blow your mind.”
      
      LikeLiked by 1 person
    - 7
      
      hoakley on March 10, 2022 at 4:09 pm
      
      Thank you – you’re correct, he did refer to adding one “last” chip to the M1 family.
      Howard.
      
      LikeLike
    - 8
      
      James on March 10, 2022 at 3:51 pm
      
      Sorry, I got one critical word wrong in that previous quote. What John Ternus said was, “We are adding one last chip to the M1 family and it is going to blow your mind.” The critical word was “last” and not “more”. Very clear that the M1 Family is now complete.
      
      LikeLiked by 1 person
- 9
  
  hoakley on March 10, 2022 at 1:55 pm
  
  Not according to the video: his exact words were:
  “making our transition nearly complete, with just one more product to go, Mac Pro, but that is for another day.”
  He didn’t state whether the Mac Pro replacement would use the Ultra or another chip.
  Howard.
  
  LikeLike
  - 10
    
    Oliver Busch on March 10, 2022 at 7:29 pm
    
    As already clarified above, this was not related to the Mac Pro, but indeed to the M1 Ultra, as – quote – “one *last* chip to the M1 family”.
    
    My hunch (as a person unfamiliar with the matter): new chip generation for the next Mac Pro with new core design based on ARMv9 ISA.
    
    LikeLiked by 1 person
    - 11
      
      hoakley on March 10, 2022 at 8:19 pm
      
      So Apple’s first M2 chip is going to be Ultra class, with brand new cores, new Fabric, and new GPU? That would probably be the biggest gamble ever in processor design, and why Apple wasn’t launching the M1 Ultra eighteen months ago.
      Howard.
      
      LikeLike
    - 12
      
      Oliver Busch on March 12, 2022 at 2:45 pm
      
      Again, I am not an expert. The alternative would mean an upcoming Mac Pro would use the M1 Ultra as well?
      
      LikeLiked by 1 person
    - 13
      
      hoakley on March 12, 2022 at 4:30 pm
      
      If Apple doesn’t intend any more members of the M1 family, and wouldn’t be ready to release an M2 Ultra for over a year, then the only other possibility would be an M1 Ultra for the Mac Pro as well.
      Howard.
      
      LikeLike
14

Oliver Busch on March 10, 2022 at 10:00 am

Btw., completely unrelated: is the NAS review you mentioned a while ago the one I see on the cover of the current issue of MacFormat?
(Thanks to Brexit, the only viable alternative to read MacFormat here in DE would be readdle, where I can only see the cover.)

LikeLiked by 1 person
- 15
  
  hoakley on March 10, 2022 at 10:02 am
  
  Yes – issue 376 which I’ve just bought today. I also have the lead cover feature!
  Howard
  
  LikeLike
16

Harald Striepe on March 10, 2022 at 3:42 pm

Howard,
What’s your basis for the fifth chip option?

LikeLiked by 1 person
- 17
  
  hoakley on March 10, 2022 at 4:08 pm
  
  There has been speculation from the release of the Pro and Max that the design incorporated interconnection, which we now see in the Ultra. Although connecting four is harder than two, it is feasible, and could then provide something well beyond the Ultra to support those who really do need core farms.
  However, it appears that the Ultra is the last in the M1 family, which will make the Mac Pro quite a challenge.
  Howard.
  
  LikeLike
  - 18
    
    Harald Striepe on March 10, 2022 at 8:06 pm
    
    The current fabric between two chips uses 10000 lines. I am not sure doubling the traffic over the same lines is manageable without impacting performance given Unified Memory.
    
    LikeLiked by 1 person
    - 19
      
      hoakley on March 10, 2022 at 8:22 pm
      
      Thank you. If a quad design is ever used, I don’t think the additional pair of chips (chipsets?) would interconnect the same – and you’ve pointed out why. There are other interesting possibilities, of course.
      Howard.
      
      LikeLike
    - 20
      
      hoakley on March 10, 2022 at 8:25 pm
      
      Harald,
      Do you think it would be plausible for Apple to launch the Ultra version of the next generation, e.g. M2, before it has built and proved more basic versions of the new chip on lower-end Macs?
      If the Mac Pro doesn’t have an M1 Ultra, then it’s going to need second-gen Apple Silicon, which surely would push it out by well over a year, possibly two?
      Howard.
      
      LikeLike
    - 21
      
      Harald Striepe on March 11, 2022 at 12:57 am
      
      I would think that really depends on their delivery schedule for the Mac Pro Apple Silicon.
      They probably need to make that one 2x the performance of the Mac Studio Ultra, but with a 4nm Ultra the 2x Max/fabric type of design should yield that.
      I also just read that the fabric is on silicon. The Ultra chips are one on the wafer. Crazy hard!
      Not sure what their yields are. They can bin down to 48 cores, but not sure whether they actually have a way to scribe and save a Max chip if the other half is not working.
      This really does explain the $1000 delta of Ultra to equivalent Max.
      
      LikeLike
22

Javier Gallardo on March 10, 2022 at 6:40 pm

Well… the interesting thing (for me) is how this up-scaling is going to work for software. You make a clear resume and general description, but same as M1 differs to Pro & Max in core count, Ultra makes another jump. How will this be used by software? (I ignore how M-native “pro” apps are using cores. Sorry, perhaps answer is trivial, and of course more apps could run concurrently. But how LogicPro or FinalCut benefit from double cores?).
Presentation made big force about being “transparent” to developers. That’s a quite fascinating thing! (But deserves explanation, imho)

LikeLiked by 1 person
- 23
  
  hoakley on March 10, 2022 at 8:16 pm
  
  Thank you.
  In macOS, you don’t have to decide which cores to run your code on, nor whether they make use of the larger GPUs, as the system handles all that for you. However, for an app to benefit from using more than 8 cores, it has to have more than 8 threads which run at the same time. That requires careful structuring and coding to make that possible. For some apps, that’s easy to do, and they can usefully run as many threads as they’re allowed. For others, they can only run a small number of threads at a time.
  This is why some apps get much faster when run on computers with a large number of cores, and others don’t benefit it all. I don’t know about Logic Pro or Final Cut, and that is going to vary considerably between apps.
  So before you go out and commit the extra £/$/€ 2,000 to upgrade from an M1 Mac to an Ultra, you need to know whether the software you use will actually get any performance benefit.
  So, while using many cores or GPUs doesn’t require the developer to write the software to specifically use those cores/GPUs, it does need to be written so that it could, if they’re available.
  Howard.
  
  LikeLiked by 1 person
24

Warren Nagourney on March 11, 2022 at 1:43 am

The most recent ATP podcast raised some interesting questions about the future Mac Pro. The suggestion from what was said at the keynote is that Apple will either use more than one M1 Ultra unit or a new processor (non-M1) design. If the former, John Siracusa raised the problem of asymmetric memory access: since Apple is committed (for good reasons) to keeping the memory on the substrate and any program would regard the memory system as a single unit, the access time for two M1 Ultras would depend whether the memory was on its own substrate or on the other one. This sounds somewhat awkward.

As has been said, a different processor would not be available soon in an “Ultra” form, since it would need to be tested first in simpler configurations. This suggests a 2+ year wait for the Mac Pro. Then, why did they tease us about it?

The future of that product is a puzzle.

LikeLike
- 25
  
  Warren Nagourney on March 11, 2022 at 3:26 am
  
  My apologies: the phrase that Siracusa used to describe the address-dependent memory access time for two M1 Ultras was “nonuniform memory access” (and not asymmetric memory access).
  
  LikeLike
26

Michael Tsai - Blog - Mac Studio on March 16, 2022 at 7:00 pm

[…] Update (2022-03-16): Howard Oakley: […]

LikeLike
27

Anders Åberg on March 18, 2022 at 7:17 pm

What if Apple says; screw efficency, up the clock speed, let’s go to 5GHz with the M1 Ultra in the MacPro?

LikeLiked by 1 person
- 28
  
  hoakley on March 18, 2022 at 8:41 pm
  
  Thank you. As far as I can tell, the Firestorm chips already run at the highest frequency for an ARM, of 3.2 GHz. Going to 5 GHz would appear to be groundbreaking, generate roughly double the heat output, and probably require a new design.Early indications from Ultra systems are that it’s really hard to saturate their cores. I’m not sure what that rise in frequency would offer in the real world, either.
  Howard.
  
  LikeLike