Code in ARM Assembly: Integer arithmetic

This article continues looking at ARM64 assembly language using the general-purpose registers. The previous article worked through instructions for moving data around; this looks at integer arithmetic instructions.

Datatypes

Although working with 64-bit Swift integers should be straightforward in terms of registers (X0-X30), other datatypes crop up frequently. As a reminder:

  • X register, doubleword = 64-bit, C long, Swift Int
  • W register, word = 32-bit, C int, Swift Int32
  • Halfword = 16-bit, C short, Swift Int16
  • Byte = 8-bit, C char, Swift Int8

Each of these can of course exist in signed or unsigned form.

Generally useful abstract registers are XZR, the 64-bit X Zero Register, and WZR, its 32-bit W equivalent, which represent zero.

Basic 64-bit instructions

The basic arithmetic instructions for 64- and 32-bit integers in general-purpose registers are:

  • ADD Xd, Xa, Xb – adds Xa and Xb, and stores the lowest 64 bits of the result in Xd
  • SUB Xd, Xa, Xb – subtracts Xa – Xb, and stores the lowest 64 bits of the result in Xd
  • MUL Xd, Xa, Xb – multiplies Xa by Xb, and stores the lowest 64 bits of the result in Xd
  • SDIV Xd, Xa, Xb – divides signed Xa/Xb, and stores the lowest 64 bits of the quotient in Xd
  • UDIV Xd, Xa, Xb – divides unsigned Xa/Xb, and stores the lowest 64 bits of the quotient in Xd

These use X registers for 64-bit values, and W registers for 32-bit. Generally speaking, you can’t mix registers such as adding two W registers into an X register, although there are composite instructions which can extend a smaller datatype in one register, as explained below.

As you might expect, each of those basic instructions has variants and relatives which form extended families. If you’re doing anything beyond simple arithmetic on values of the same size, it pays to be more familiar with individual instructions.

One of the most important differences between arithmetic coding in high-level and assembly languages is the lack of protective features in the latter. Using vanilla instructions such as those above, assembly arithmetic ignores carries, overflows, even division by zero – all of which high-level languages handle for you. The last of these is a good example: dividing by zero will return an integer (zero), but unless your code tests to see whether the denominator is zero, and handles that error, any calling code will be unaware that the result is meaningless because of the error. That isn’t true of floating point arithmetic, though, which generates and propagates errors in conformance with IEEE 754.

Add, subtract, negate

Neither ADD nor SUB take into account any carry (for ADD) or negative result (for SUB). Instructions which do essentially the same job but respect the Carry flag in the NZCV flags are ADC and SBC:
ADC Xd, Xa, Xb adds Xa and Xb and the value of the Carry flag, storing the result in Xd
SBC Xd, Xa, Xb subtracts Xa – Xb and the value of NOT(the Carry flag), storing the result in Xd.

Neither ADD nor SUB (nor ADC/SBC) change the NZCV flags. Equivalent instructions which set flags according to the result have the postfix S: ADDS, SUBS, ADCS and SBCS. These are often used together in idioms, for example two which extend native 64-bit addition and subtraction to 128-bit using pairs of X registers:

ADDS X1, X3, X5 // adds the lower order 64 bits, and sets the Carry flag
ADC X0, X2, X4 // adds the higher order 64 bits, and any carried over

SUBS X1, X3, X5 // subtracts the lower order 64 bits, and clears the Carry flag if the result is negative
SBC X0, X2, X4 // subtracts the higher order 64 bits, and subtracts 1 if the Carry flag is clear

Instructions in this family are readily used with immediate constants, such as
ADD Xd, Xa, #immediate // adds an immediate constant, such as #1

When working with datatypes smaller than 32-bit words, you can combine instructions which extend those into W or X registers, of the form
ADD Xd, Xa, ab, -XT-
where ab could be an X or W register, and -XT- is an extension to the second source operand, drawn from those in the SXTx and UXTx family described previously. For example,
ADD X1, X2, W3, SXTW
sign-extends a 32-bit word in register W3 and adds it to the value in X2, writing the result to register X1. When such an extension is being used, a left shift can also be given. This is explained in the instruction reference.

There’s also a small family of negation instructions, NEG (negate) and its flag-setting sibling NEGS, and its with-carry equivalents NGC (negate with carry) and NGCS. For example,
NGCS X0, X1
negates the value in X1 with the value of NOT(the Carry flag), writes the result to X0, and updates the NZCV flags according to that result. This is the equivalent of
SBCS X0, XZR, X1

Multiply and multiply-add/subtract

The standard instruction can be used in either 32- or 64-bit form, but size can’t be mixed:
MUL X1, X2, X3 // or, for 32-bit
MUL W1, W2, W3
There are no -S or -C versions which set or respect NZCV flags. This means that there’s no simple way to detect overflow. If you suspect overflow might occur, then you should consider using a combination of standard multiply with multiply high instructions.

On ARM64 processors, MUL actually uses an instruction which combines multiplication and addition, MADD:
MADD Xd, Xa, Xb, Xc
or its 32-bit sibling using W registers. What these do is calculate the result of (Xa * Xb) + Xc, and store that in Xd. Its relatives are MNEG and MSUB, which subtract rather than add the value in the third source operand.

Other relatives are SMULH (signed multiply high), SMULL (signed multiply long), SMSUBL (signed multiply-subtract long, also SMNEGL), and SMADDL (signed multiply-add long). The ‘long’ variants multiply two 32-bit W register values and add/subtract a 64-bit X register value. The ‘high’ variant multiplies two 64-bit X registers and stores the high doubleword of the result in a 64-bit X register. These enable the chaining of instructions to cope with overflowing results, such as
MUL X1, X2, X3 // multiplies the 64-bit integers to give the low-order 64-bits of the result
SMULH X0, X2, X3 // repeats the same multiplication, this time storing the high-order 64-bits of the result

For 32-bit multiplication, this is conveniently accomplished in the single instruction
SMULL X0, W1, W2 // multiplies the 32-bit integers to return the 64-bit result

There are unsigned variants UMADDL, UMNEGL, UMSUBL, UMULH and UMULL which perform the same operations as their signed equivalents on unsigned integers.

Divide

The standard instructions can be used in either 32- or 64-bit form, but sizes can’t be mixed, for example signed divide:
SDIV X1, X2, X3 // or, for 32-bit
SDIV W1, W2, W3
which return X2/X3 or W2/W3 respectively. For unsigned integers, use UDIV instead. There are no -S or -C versions which set or respect NZCV flags. This means that there’s no simple way to detect division by zero, which will return a result of zero. If you suspect that might occur, then you need to test whether the denominator is zero and handle that error appropriately.

Division also doesn’t return any remainder, which needs to be calculated separately, for example using the following for 64-bit integers:
SDIV X0, X1, X2 // signed division X1/X2
MNEG X2, X0, X2 // multiply quotient x denominator and negate
ADD X1, X1, X2 // add numerator to negated product to give remainder in X1 and quotient still in X0.

That covers integer arithmetic. In the next article in this series, I’ll tackle bit and other operations on the general-purpose registers, and try to produce a cheat-sheet summarising these instructions.

Previous articles in this series:

1: Building an app to develop assembly routines, including an explanation of calling assembly language from Swift, with a complete Xcode project
2: Registers explained
3: Working with pointers
4: Controlling flow
5: Conditional loops
6: Flow, pipelines and performance
7: Moving data around

Downloads:

ARM register summary
ARM operand architecture
Conditions and conditional branching instructions
Control Flow
AsmAttic 2, a complete Xcode project (version 2)
AsmAttic, a complete Xcode project (version 1)

References

Procedure Call Standard for the Arm 64-bit Architecture (ARM) from Github
Writing ARM64 Code for Apple Platforms (Apple)
Stephen Smith (2020) Programming with 64-Bit ARM Assembly Language, Apress, ISBN 978 1 4842 5880 4.
Daniel Kusswurm (2020) Modern Arm Assembly Language Programming, Apress, ISBN 978 1 4842 6266 5.
ARM64 Instruction Set Reference (ARM).