How ARM64 uses its special SIMD registers in lanes, and how they can be loaded with and without de-interleaving.
Three recent WWDC sessions extolling Apple’s “extensive reference material” and Xcode can’t find anything on these rich and extensive libraries.
More cores are great for running more processes, but how can you make individual operations within a process faster? SIMD is one solution.
Benchmarking 32-bit Float vector dot-product calculations using Swift, NEON assembly, and Apple’s SIMD libraries, on Intel and M1 Macs.