Apple silicon: 5 Memory and internal storage

So far in this series explaining how Apple silicon Macs differ from their predecessors, I have concentrated on processors built into their chip. This article looks at how they store data in memory and on their internal SSDs, major determinants of both performance and energy requirements.

Unified memory

CPU cores have small amounts of local cache memory, in the M1:

  • Level 1 (L1) instruction cache of 192 KB (P cores) and 128 KB L1 data cache in each core for its immediate use;
  • L2 cache of 12-48 MB (P cores) shared within each core cluster.

E core caches are smaller than those for P cores. L1 instruction cache is surprisingly large in order to support out-of-order execution of instructions as a means of ensuring performance and efficiency of CPU cores.

In addition to co-processors such as the neural engine and AMX, which may have their own local cache memory, there are more than 20 other cores in the chip, to support functions previously performed by the System Management Controller (SMC), input/output features and ports, and the Secure Enclave Processor (SEP). Most of those are likely to have their own small local cache memory.

The bulk of memory in M-series chips is unified into a single store, with a System Level Cache (Last Level Cache) of anything from 8-96 MB, for the M1 chip range from basic to Ultra. While caches are integrated within the chip itself, Unified Memory is mounted in the chip carrier, minimising its physical connection path, but making its replacement largely impractical. Main memory is thus shared between CPU cores, GPU, AMX, and other processing units within the chip.

When GPUs evolved from simple display controllers, they acquired their own memory to replace what had been special Video RAM (VRAM), which was typically wired in so it was accessible to both the processor that wrote to it, and the video controller that wrote its contents to the display. High-speed memory required for GPUs was then expensive and limited in capacity, so it made good sense that it was separate from main memory. The penalty is that transferring data between main and GPU memory takes time and requires careful synchronisation. As general-purpose memory became faster, some GPUs were thus designed to share main memory.

That’s very different from low-performance GPUs that became popular in many notebook designs. Those also use main memory, but that’s allocated for their use, and they don’t share memory with the processor.

Unified memory thus brings the advantage that data don’t have to be moved around between main, GPU or any other dedicated memory, apart from local caches, but can’t be shared outside the chip, making external GPUs far more difficult to support. It remains to be seen whether any Apple silicon Mac will be able to support an external GPU, or whether that would offer any advantage over the GPU integrated into the chip.

How much?

Unified memory has extensive effects on memory use, particularly when compared against previous Intel Macs with separate GPU memory. Memory requirements of WindowServer, for instance, work differently. That process is responsible for compositing all the windows and other elements in the macOS interface into the image to be displayed, a task that overlaps with features of the GPU. You can get insight into the complexity of these differences in Apple’s accounts of resource storage in Intel and Apple silicon Macs.

As a result, there’s no simple rule by which you can estimate Unified memory requirements from those observed in an Intel Mac. In most cases, adding the Intel main and GPU memory will predict unnecessarily high Unified memory. For example, replacing an iMac Pro with 32 GB main memory and 8 GB graphics card shouldn’t require anything like a minimum of 40 GB. On the other hand, those who claim that Unified memory requirements are lower than main memory in Intel models are equally wrong: while they can be, switching from a 32 GB Intel Mac to a 16 or 24 GB Apple silicon Mac is likely to result in increased use of VM swap space on disk.

Apple has long had a corporate dislike of using special ECC (error-correcting code) memory, and has only offered it as an option on some high-end Macs. It’s not known whether the memory controllers in Apple silicon chips are capable of managing ECC memory, and coupled with RAM modules being fitted into the chip carrier, it seems unlikely that Apple intends offering it as an option in the future.

Internal storage

As with their T2-equipped predecessors, Apple silicon Macs come with internal solid-state storage, although the modules used are different from internal SSDs sold separately. The latter come with complete controllers responsible for managing their NAND flash storage, including caches/buffers, wear levelling, management of bad blocks and other housekeeping. In T2 and Apple silicon internal storage, functions of the SSD controller are divided between the NAND flash itself, mounted outside the chip, and accessory cores in what Apple terms the Fabric within the Apple silicon chip itself. Among the most important features external to the SSD is encryption, which is performed in hardware, within the chip.

It’s sometimes mistakenly claimed that T2 and Apple silicon Macs have their internal storage soldered in, but that isn’t the case for two model lines, the Mac Studio and Pro, both of whose internal storage is replaceable. Not that replacement is simple, as it requires NAND flash components not generally available, and the Mac to be put into DFU mode for a full restore to be performed using an IPSW image of macOS and full firmware before that Mac can start up again.

The end result is suitably high performance from internal storage to match that delivered by the rest of the Apple silicon Mac. External SSDs might theoretically be able to match the speed of the internal, but as they can only be connected via the Thunderbolt 3/4 bus they’re limited in practice to read and write speeds of around 3 GB/s, compared with typical internal SSD performance of 6-7 GB/s, and that’s with full encryption too.

Before Apple released Big Sur, it was widely thought that internal storage of T2 Macs included whole-disk encryption, whether or not FileVault was enabled. That’s performed in hardware, thus imposes no overhead. Since the introduction of the Signed System Volume in macOS 11, Apple has stated that its bootable snapshot isn’t encrypted, because it already enjoys the protection of a hash tree, making encryption unnecessary. However, the Data volume on internal storage is fully encrypted, with the FileVault option available to protect the volume encryption key with the user’s password. This means that enabling FileVault incurs no performance penalty at all.

Finally, unlike Intel Macs, Apple silicon models rely on their internal storage to be able to boot. This is because pre-boot ‘firmware’ is stored in hidden partitions or containers on the internal SSD. After starting the boot process from a small ROM, control passes first to the Low-Level Bootloader (LLB) before iBoot, which then hands over to the macOS kernel. Each stage in that process verifies the next, in Secure Boot, and no UEFI is involved at any stage. Bootable external disks can’t and don’t contain LLB or iBoot, for which the Mac has to rely on access to its internal storage. This ensure the integrity of the process, and its inability to proceed if internal storage is unable to support it. It does, though, enable full security when starting up from an external disk, so that option doesn’t require any compromise in security as it does in Intel Macs with a T2 chip.

Concepts

  • CPU cores have their own L1 cache, with a large instruction cache to support out-of-order execution, and share L2 cache within a cluster.
  • Main memory is Unified, so used by CPU cores, GPU, AMX and others. This ensures that data doesn’t have to be copied from main to GPU memory, for example.
  • Estimating the amount of memory required doesn’t follow any simple rules, but is likely to be less than main + GPU memory, and little less than main memory alone, in an Intel Mac.
  • Storage used as the internal SSD is different from regular SSD modules, as some of its controller functions are performed within the chip.
  • Internal SSDs are typically more than twice as fast as external storage, and better-matched to overall Apple silicon performance. Encryption for FileVault incurs no performance penalty at all.
  • Apple silicon Macs must start their boot process from the internal SSD, even when starting up from a system on an external disk. That ensures the integrity of Secure Boot.

Previously in this series

Apple silicon: 1 Cores, clusters and performance
Apple silicon: 2 Power and thermal glory
Apple silicon: 3 But does it save energy?
Apple silicon: 4 A little help from friends and co-processors

Further reading

Evaluating M3 Pro CPU cores: 1 General performance
Evaluating M3 Pro CPU cores: 2 Power and energy
Evaluating M3 Pro CPU cores: 3 Special CPU modes
Evaluating M3 Pro CPU cores: 4 Vector processing in NEON
Evaluating M3 Pro CPU cores: 5 Quest for the AMX
Evaluating the M3 Pro: Summary
Finding and evaluating AMX co-processors in Apple silicon chips
Comparing Accelerate performance on Apple silicon and Intel cores
M3 CPU cores have become more versatile