M1 Icestorm cores can still perform very well

What are the penalties in real-world use for running your code on Icestorm cores, using around 10% of the power used by Firestorms?

Code in ARM Assembly: Lanes and loads in NEON

How ARM64 uses its special SIMD registers in lanes, and how they can be loaded with and without de-interleaving.

Accelerating the M1 Mac: an introduction to SIMD

More cores are great for running more processes, but how can you make individual operations within a process faster? SIMD is one solution.

Code in ARM Assembly: Rounding and arithmetic

Details options available for rounding floating point numbers, and all the scalar floating point operations. There’s another cheat sheet summary too.

Code in ARM Assembly: Floating point registers and conversions

Floating point numbers are very different from integers, but are loaded and stored much the same. Conversion between registers, including to and from integers, is complex.

Code in ARM Assembly: Conditions without branches

Where code can make simple selections according to a conditional test, it may be possible to eliminate branching and ensure rapid execution.

Are there flaws in some ARM64 instructions?

Many processors like the ARM64 have instructions to perform fused multiply-add operations. Do they deliver reduced error and better performance?

Code in ARM Assembly: Bit operations

An overview of bit operations, including MOVK for 16-bit immediate values, bit shift operations, bitwise AND, OR, XOR, and more, plus a cheat sheet.

Code in ARM Assembly: Integer arithmetic

Basic integer arithmetic – add, subtract, negate, multiply, multiply-and-add, and divide – in their many variations. With some catches for those more used to high-level languages.

Code in ARM Assembly: Moving data around

Explaining the LDR family of instructions for loading registers, MOV for moving one register to another, STR for storing to memory, and SXTx/UXTx for filling a register with smaller data types.

The Eclectic Light Company

assembly language