How ARM64 uses its special SIMD registers in lanes, and how they can be loaded with and without de-interleaving.
More cores are great for running more processes, but how can you make individual operations within a process faster? SIMD is one solution.
By segregating macOS background tasks on Efficiency cores, M1 Macs can run user apps unfettered on their Performance cores. And that feels really fast.
How the M1’s asymmetric cores can run background tasks more efficiently, or deliver high performance, according to Quality of Service.
Designing algorithms which can benefit from multiple cores and GPUs is not only non-trivial, it remains desperately difficult for humans.