How M1 Macs may lag behind

Like it or loathe it, AI seems here to stay, and in the form of machine learning (ML) has already been changing our Macs. Use Spotlight, Siri, word completion, or any image processing tools, and you’ll be benefitting from them. Apple silicon chips contain sophisticated hardware support for both AI and ML in their GPUs and ANE, a dedicated neural engine. While the latter is probably the least-used part of the chips at present, that’s changing rapidly, and Apple is set to release more support in the near future. One clue has dropped in the appearance of a new Private Framework named DeepThought in Sonoma 14.2.

But not all M-series chips are equal in this respect: M1 chips have more limited support for recent AI/ML features, including what has become a near-universal format for floating-point numbers, bfloat16. Without that, Macs with M1 chips are likely to remain at a significant disadvantage when running AI and ML functions.

Representing integers (whole numbers) in binary formats used by computers is relatively straightforward: the more digits, the larger the numbers that can be represented. With one hex digit, you get 0-15 in decimal; double that to two hex digits, and the range goes from 0-255. If you want negative numbers, then just set a bit to indicate that, and the range can go from -128 to +127.

The most common way of representing floating-point numbers is to express them in a similar format to scientific or engineering format in decimal. The latter uses a sign (+ or -), a fraction, and an exponent. For example, the number -1,234,567.89 might be expressed as 1.23456789 x 10^6 (ten to the power of six) with the negative sign: that has a fraction of 1.23456789 and an exponent of 6. Being computers, rather than using powers of ten for the exponent, powers of 2 are used instead.

The most common floating-point formats are those of the IEEE 754 standard, where a single-precision 32-bit float has a sign bit, an 8-bit exponent, and 23 bits to contain the fraction. The size allowed for the exponent determines the range of floating-point numbers that can be represented in that format, while the size allowed for the fraction determines how precise any number can be.

With recent rapid developments in AI and ML, several new floating-point number formats have come into use, among them what’s known as bfloat16, with a sign bit, an 8-bit exponent just like the single-precision 32-bit float, but only 7 bits to contain the fraction. Compared with the 32-bit standard, in half the number of bits, bfloat16 numbers cover the same range at lower precision. That’s claimed to be ideal for AI, ML, and use with smart sensor technology.

bfloat16 was developed as part of Google Brain, and has been adopted quickly over the last couple of years across Intel, AMD and Arm processors, and is widely supported in the tools and libraries used for AI and ML. As far as Apple’s M-series chips go, M2 and M3 CPUs support the ARMv8.6A instruction set, which includes bfloat16 support, but the M1 only supports ARMv8.5A, which doesn’t. Support by GPUs and Apple’s neural engine (ANE) is less clear, although work on the M1 ANE suggests that it uses float16 (presumably IEEE half-precision 16-bit float) throughout. Given that the first M1 chips were being delivered in M1 Macs in late 2020, it seems most unlikely that Apple could have incorporated support for bfloat16 in their design.

If the use of bfloat16 is as advantageous as is generally claimed, it looks like M1 Macs will remain at a significant disadvantage compared with M2 and later models. As Apple and third-parties roll out more products with AI and ML at their heart, don’t be surprised if their performance on M1 Macs proves disappointing compared with their M2 and M3 successors.

This situation is starker with Intel Macs, though, as they lack any hardware support for AI and ML, and are already being left in the past.

Reference

Wikipedia

10Comments

Add yours

1

Enzo Vincenzo on January 13, 2024 at 8:19 am

Very interesting, thanks. It is rare for others to provide such accurate information. Yet this knowledge is more important than others to decide, e.g., whether it is really worth buying a second-hand or super-offered M1 Mac, compared to thinking about other features.

LikeLiked by 1 person
- 2
  
  hoakley on January 13, 2024 at 9:19 am
  
  Thank you. There are other features in the newer instruction set that will make smaller differences, but I think this is the most significant. But note the limitation with Intel Macs, which don’t do any of this in hardware, and never will.
  Howard.
  
  LikeLike
3

Maurizio on January 13, 2024 at 10:01 am

There is a new version of GeekBench ML that can give you an idea of performance gain across generation

it still use CoreML , you cannot opt for ane but the framework should address all the computation suitable for it (managing the other in gpu or cpu amx)

https://www.geekbench.com/blog/2023/12/geekbench-ml-06/

MacBook Pro (14-inch, 2021) Apple M1 Pro 3220 MHz (10 cores)
Platform macOS Inference Framework Core ML Neural Engine
Inference Score 7450

MacBook Pro (14-inch, 2023) Apple M2 Pro 3478 MHz (12 cores)
Platform macOS Inference Framework Core ML Neural Engine
Inference Score 8919

MacBook Pro (14-inch, Nov 2023) Apple M3 Pro 4051 MHz (12 cores)
Platform macOS Inference Framework Core ML Neural Engine
Inference Score 10354

LikeLiked by 1 person
- 4
  
  hoakley on January 13, 2024 at 10:21 am
  
  Thank you. Yes, I’ve been playing around with that. The problem with AI benchmarks is that they’re black boxes: all of them end up making calls to macOS APIs that the benchmark has no control over, and doesn’t even know which processor they’ll be run on! I was interested in looking at ANE performance, but using powermetrics, those benchmarks hardly ever do anything using the ANE, and most appear to run on the CPU cores, with some on the GPU in compute mode. But those will vary according to model, macOS, and current load – so it’s a black box that you really have no insight into. Perhaps more like Schrödinger’s cat!
  The good thing with the Arm instruction set extensions is that I can code those in assembly language, so I can guarantee where and how they’re run.
  Howard.
  
  LikeLike
  - 5
    
    Maurizio on January 13, 2024 at 10:25 am
    
    you could use powermetric to check in wich part of the bench ane or gpu is activated , running the bench on terminal show you wich tensorflow test is performing
    
    to plot ane usage in a fashion way i use this
    
    https://github.com/tlkh/asitop
    
    LikeLiked by 1 person
    - 6
      
      hoakley on January 13, 2024 at 10:28 am
      
      That’s exactly what I’ve done – and the ANE barely lights up through the whole benchmark suite. Most of the power used is in CPU cores, throughout.
      Howard.
      
      LikeLike
    - 7
      
      hoakley on January 13, 2024 at 10:31 am
      
      I use powermetrics direct from the CLI – that minimises all other code overhead, which is essential to get good measurements. The moment you put a GUI on it, then that starts using CPU and GPU itself.
      Howard.
      
      LikeLike
8

Maurizio on January 13, 2024 at 10:05 am

This is a comparison from M1 and M2 with size details used for inference

https://browser.geekbench.com/ml/v0/inference/compare/340589?baseline=340108

LikeLiked by 1 person
9

Maurizio on January 13, 2024 at 10:07 am

Sorry ,this is better , on the other M2 was capped freq

https://browser.geekbench.com/ml/v0/inference/compare/339364?baseline=340108

LikeLiked by 1 person
10

Maurizio on January 13, 2024 at 10:10 am

Shame on me ,last try 😦 ,with direct comparison from M1 and M2

https://browser.geekbench.com/ml/v0/inference/compare/340589?baseline=339364

LikeLiked by 1 person

Share this:

Related