The Motorola 68000 CPU had no floating point instructions, so Apple introduced SANE, then went on to the PowerPC Velocity Engine, and its Accelerate framework, and more.
NEON
How the NEON vector processor, neural engine, matrix co-processor, and GPU all deliver high performance with low power and energy use.
In a wide range of in-core tests, CPU performance in VMs is close to that of code running native on the host, and M3 VMs are faster than M1 native. With one significant exception.
M3 chips widen the gap between Pro and Max variants. They also change relative performance between P and E cores to make M3 CPUs more versatile.
If Apple offered to do much of the hard work of coding your app for you for free, and to optimise it for different Mac hardware, how could you refuse?
Comparison with M1 variants, energy use with comparison between M3 Pro and Max, virtualisation, Game Mode, vector processing and matrix co-processing – all in summary.
Differences in vector processing performance between the M1 Max and M3 Pro, and in their use of power. Their frequency control is more complex.
Some apps and other code doesn’t appear to run faster on M1 chips, and some even runs more slowly. Could this be a result of it not using the best acceleration for vectors and matrices?
How ARM64 uses its special SIMD registers in lanes, and how they can be loaded with and without de-interleaving.
