Code in ARM Assembly: Moving data around

After a short diversion looking at program flow and performance, I move on here to look at instructions involving the general-purpose registers. If you haven’t already downloaded and browsed the Instruction Set Reference, you might still be under the impression that ARM64 processors have a reduced instruction set. While there are still a great many instructions, they do at least come in families. In this article I look at those concerned with moving data around, actions like loading a register, moving data between registers, and storing the contents of a register – the bread and butter of a great deal of code.

Data types and registers

Each 64-bit general-purpose register from X0 up to X30 can also be used to hold 32-bit words and smaller data types. When used for 32-bit words, the lower half of register X0 is referred to as W0, and so on for X1 and W1 up to X30 and W30. Corresponding data types for those and fractions of words are:

  • X register, doubleword = 64-bit, C long, Swift Int
  • W register, word = 32-bit, C int, Swift Int32
  • Halfword = 16-bit, C short, Swift Int16
  • Byte = 8-bit, C char, Swift Int8

Each can exist in signed or unsigned form, and when smaller types are used in the 32-bits of a W register, they are sign-extended when a signed data type, or zero-extended when an unsigned data type.

There’s a family of instructions to perform such conversions:

  • the initial letter of the instruction is S for sign-extending, or U for zero-extending;
  • the next two letters are XT denoting this as extending smaller data to fill the register;
  • the last letter denotes the original data type – B for byte, H for halfword (16 bit), or W for word (32 bit).

Thus
SXTB W1, W0
sign-extends a single byte in the least-significant byte of register W0 into a word in register W1, taking 0x88 to 0xFFFF FF88, and
UXTW X3, W2
zero-extends the word in register W2 into the lowest 32 bits in the 64-bit register X3, taking 0x8844 2211 to 0x0000 0000 8844 2211.

Loading registers with the LDR family

The LDR instruction loads the specified register with an immediate value, or data obtained from a memory address. In its basic form, it can be used with 32- and 64-bit data, using any of the standard addressing modes explained here. Other members of this family of instructions have names constructed as follows:

  • the initial three letters are LDR to denote these load a register;
  • the fourth and final letter can be H or B, to load a halfword (16 bit) or byte (8 bit), respectively, for unsigned data;
  • the fourth letter can be S to indicate the data is signed, followed by a fifth letter indicating its size, W for word, H for halfword, or B for byte.

So the instruction to load a signed byte is LDRSB which includes sign-extension, and that to load an unsigned byte is LDRB which zero-extends it instead. The LDRSx instructions can be used to sign-extend into a destination W or X register.

There are additional load instructions, such as LDP which loads a pair of registers and is discussed further below, but for the moment mastery of the LDR family will get you a long way.

Moving register contents with MOV

Although there are more involved variants, and other instructions such as those for conversions detailed above, the basic instruction is
MOV Xd, Xs
MOV Wd, Ws

where d is the number of the destination register, and s that of the source. This is actually an alias of the more opaque instruction
ORR Xd, ZXR, Xs
(for X registers) which performs an inclusive (bitwise) OR of Xs with the zero register ZXR and writes the result in Xd. You may find that some disassemblers express many MOV instructions as ORR instructions of this form, which could otherwise be very confusing.

MOV is also commonly used to load a register with a constant. Although I think I have seen a statement that it’s no longer necessary to precede such immediate integer values with the hash sign #, that’s a useful marker for such data and I adhere to it. Normally, such constants are given in decimal as #15, but if you prefer you can use hexadecimal such as #0x0F. Because the constant is encoded into the opcode, there are limits on its size.

Storing register contents with STR

There are three basic store instructions:

  • STR stores the contents of a whole register, X (64-bit) or W (32-bit),
  • STRH stores the least significant halfword in a W register,
  • STRB stores the least significant byte in a W register.

The catch with each of these is that the first register given (normally referred to as the destination!) is the source of the data, and the second resolves to an address in memory, using a standard addressing mode. This is the reverse of most other instructions, in which the result ends up in the first register given.

As with LDP, there’s another instruction STP to store a pair of registers. These are most often used in pairs to push onto and pop off the stack. Examples taken from disassembled code might read:
STP X20, X19, [SP, #-0x20]!
STP X29, X30, [SP, #0x10]
// …
LDP X29, X30, [SP, #0x10]
LDP X20, X19, [SP], #0x20

which first store pairs of registers to the stack, then pop them off after the code routine between them.

My own source code, used in AsmAttic demonstrations, uses a similar technique for register X30, which is the Link Register, and can be referred to as LR:
STR LR, [SP, #-16]! // put LR on the stack
// …
LDR LR, [SP], #16 // return LR

The next article in this series will look at integer arithmetic instructions for the general-purpose registers. In the meantime, feel free to exercise some of the instructions above. Disassemble some ARM64 code and trace the loading and storing of data, and try some out yourself within AsmAttic’s framework.

Previous articles in this series:

1: Building an app to develop assembly routines, including an explanation of calling assembly language from Swift, with a complete Xcode project
2: Registers explained
3: Working with pointers
4: Controlling flow
5: Conditional loops
6: Flow, pipelines and performance

Downloads:

ARM register summary
ARM operand architecture
Conditions and conditional branching instructions
Control Flow
AsmAttic 2, a complete Xcode project (version 2)
AsmAttic, a complete Xcode project (version 1)

References

Procedure Call Standard for the Arm 64-bit Architecture (ARM) from Github
Writing ARM64 Code for Apple Platforms (Apple)
Stephen Smith (2020) Programming with 64-Bit ARM Assembly Language, Apress, ISBN 978 1 4842 5880 4.
Daniel Kusswurm (2020) Modern Arm Assembly Language Programming, Apress, ISBN 978 1 4842 6266 5.
ARM64 Instruction Set Reference (ARM).