A brief history of Mac numeric processing

The Motorola 68000 CPU had no floating point instructions, so Apple introduced SANE, then went on to the PowerPC Velocity Engine, and its Accelerate framework, and more.

Apple silicon: 4 A little help from friends and co-processors

How the NEON vector processor, neural engine, matrix co-processor, and GPU all deliver high performance with low power and energy use.

Why does virtualisation run some code far slower on Apple silicon?

In a wide range of in-core tests, CPU performance in VMs is close to that of code running native on the host, and M3 VMs are faster than M1 native. With one significant exception.

M3 CPU cores have become more versatile

M3 chips widen the gap between Pro and Max variants. They also change relative performance between P and E cores to make M3 CPUs more versatile.

Why apps need to Accelerate

If Apple offered to do much of the hard work of coding your app for you for free, and to optimise it for different Mac hardware, how could you refuse?

Evaluating the M3 Pro: Summary

Comparison with M1 variants, energy use with comparison between M3 Pro and Max, virtualisation, Game Mode, vector processing and matrix co-processing – all in summary.

Evaluating M3 Pro CPU cores: 4 Vector processing in NEON

Differences in vector processing performance between the M1 Max and M3 Pro, and in their use of power. Their frequency control is more complex.

Explainer: Vectors, Accelerate and poor performance on M1 Macs

Some apps and other code doesn’t appear to run faster on M1 chips, and some even runs more slowly. Could this be a result of it not using the best acceleration for vectors and matrices?

Code in ARM Assembly: Lanes and loads in NEON

How ARM64 uses its special SIMD registers in lanes, and how they can be loaded with and without de-interleaving.