Last Week on My Mac: A Christmas Core Carol

The ideas behind symmetric multi-processing (SMP) are simplistic and adopt an approach which goes back to Victorian times.

Scrooge’s business has one clerk who does the accounting, correspondence and runs errands for him. As the firm grows in size, there’s too much work for one, so Scrooge reluctantly hires a second clerk, who does the same tasks. Eventually, flushed with success, the company ends up with a large room full of clerks, all doing the same book-keeping, correspondence and running errands, but Scrooge is unimpressed, because thirty clerks can’t do thirty times as much as one.

When Bob Cratchit goes to ask Scrooge for Christmas Day off, he suggests to the old miser that he’d be better off hiring a couple of energetic young boys to run his errands, and giving the accounting tasks to those who are best at arithmetic. Predictably, Scrooge has none of that, and angrily tells Cratchit he has work for him to do on Christmas Day as well.

Apple had a similar choice as it was developing chips for its mobile devices. Early on it decided to evolve asymmetric multi-processing (AMP) chips for iPhones and iPads, each containing a combination of cores designed for performance (P) or efficiency (E). Its first quad-core chip, the A10 Fusion launched in 2016, came with two Hurricane (P) and two Zephyr (E) cores, although in that early design only one core type could be active at any time. A year later, in the A11 Bionic, Apple stretched to two Monsoon (P) and four Mistral (E) cores which ran concurrently. Significantly, it was the Monsoon which went on to become the T2 chip, the first AMP chip used in Macs.

Just over a year ago, Apple launched Macs and devices with two combinations of Firestorm (P) and Icestorm (E) cores. Its A14 Bionic chips, with 2P+4E, were destined for iPads and iPhones, while the 4P+4E combination became the M1 as used by the first generation of Apple Silicon Macs, and iPad Pros.

This year saw three new chips, and two different combinations of cores. While the M1 Pro and Max CPU cores are variants of the original M1 with 8P+2E, new iPhone and iPad models feature the A15 Bionic, with 2P+4E using Avalanche (P) and Blizzard (E) cores.

In just five years, Apple has come up with a succession of AMP designs with total core counts now amounting to ten, and there’s speculation that the current M1 Max chip is designed to be linked into pairs, perhaps offering as many as 16 P and 4 E cores for future Apple Silicon Macs. There’s also a third type of core which has been identified in the M1, which isn’t a CPU as such, but runs many of the chip’s specialist interfaces: the Chinook, which may have its origins in cores in the A12 Bionic. While we know of at least a dozen, there are probably more than twice that number embedded in M1 series chips. Add to those GPUs, Neural Engines, a matrix math processor AMX2, and you’ll see how committed to AMP Apple has become over the last couple of decades.

As Maynard Handley has discovered from trawling through Apple’s many patents, this ingenuity in hardware design must be matched in software to manage system and user processes. If AMP is to succeed, the right processes must end up on the most appropriate cores, which demands a great deal more than just balancing their loads.

One of the basic mechanisms used for this is the request for a Quality of Service (QoS) for each process when it’s run. The software developer can opt for a low value, which confines that process to the E cores, or one of three higher values, which enables it to be scheduled for either type. Differences between the three higher values currently seem elusive, but may become more obvious as Apple’s AMP chips develop. Smart scheduling makes or breaks AMP, determining performance, responsiveness to changing demand, and energy use.

Apple’s patents covering this and other details of its approach to AMP date from the appearance of the A10 Fusion, and no doubt the techniques used in recent iPhones, iPads and M1 series Macs have advanced considerably over subsequent years of experience. Discovering how this now works, and its implications for those developing apps to run on Apple Silicon systems, is a new challenge. This is growing ever more complex, and even more important, with the two different M1 chips now shipping in five different models. Next year, as Apple continues replacing existing Intel Macs, we can expect to see more combinations of P and E cores, and refinements of core scheduling to make the best use of them.

After many years of sticking to Scrooge’s approach, Bob Cratchit’s time has come at last.