Last Week on My Mac: A Christmas Core Carol

The ideas behind symmetric multi-processing (SMP) are simplistic and adopt an approach which goes back to Victorian times.

Scrooge’s business has one clerk who does the accounting, correspondence and runs errands for him. As the firm grows in size, there’s too much work for one, so Scrooge reluctantly hires a second clerk, who does the same tasks. Eventually, flushed with success, the company ends up with a large room full of clerks, all doing the same book-keeping, correspondence and running errands, but Scrooge is unimpressed, because thirty clerks can’t do thirty times as much as one.

When Bob Cratchit goes to ask Scrooge for Christmas Day off, he suggests to the old miser that he’d be better off hiring a couple of energetic young boys to run his errands, and giving the accounting tasks to those who are best at arithmetic. Predictably, Scrooge has none of that, and angrily tells Cratchit he has work for him to do on Christmas Day as well.

Apple had a similar choice as it was developing chips for its mobile devices. Early on it decided to evolve asymmetric multi-processing (AMP) chips for iPhones and iPads, each containing a combination of cores designed for performance (P) or efficiency (E). Its first quad-core chip, the A10 Fusion launched in 2016, came with two Hurricane (P) and two Zephyr (E) cores, although in that early design only one core type could be active at any time. A year later, in the A11 Bionic, Apple stretched to two Monsoon (P) and four Mistral (E) cores which ran concurrently. Significantly, it was the Monsoon which went on to become the T2 chip, the first AMP chip used in Macs.

Just over a year ago, Apple launched Macs and devices with two combinations of Firestorm (P) and Icestorm (E) cores. Its A14 Bionic chips, with 2P+4E, were destined for iPads and iPhones, while the 4P+4E combination became the M1 as used by the first generation of Apple Silicon Macs, and iPad Pros.

This year saw three new chips, and two different combinations of cores. While the M1 Pro and Max CPU cores are variants of the original M1 with 8P+2E, new iPhone and iPad models feature the A15 Bionic, with 2P+4E using Avalanche (P) and Blizzard (E) cores.

In just five years, Apple has come up with a succession of AMP designs with total core counts now amounting to ten, and there’s speculation that the current M1 Max chip is designed to be linked into pairs, perhaps offering as many as 16 P and 4 E cores for future Apple Silicon Macs. There’s also a third type of core which has been identified in the M1, which isn’t a CPU as such, but runs many of the chip’s specialist interfaces: the Chinook, which may have its origins in cores in the A12 Bionic. While we know of at least a dozen, there are probably more than twice that number embedded in M1 series chips. Add to those GPUs, Neural Engines, a matrix math processor AMX2, and you’ll see how committed to AMP Apple has become over the last couple of decades.

As Maynard Handley has discovered from trawling through Apple’s many patents, this ingenuity in hardware design must be matched in software to manage system and user processes. If AMP is to succeed, the right processes must end up on the most appropriate cores, which demands a great deal more than just balancing their loads.

One of the basic mechanisms used for this is the request for a Quality of Service (QoS) for each process when it’s run. The software developer can opt for a low value, which confines that process to the E cores, or one of three higher values, which enables it to be scheduled for either type. Differences between the three higher values currently seem elusive, but may become more obvious as Apple’s AMP chips develop. Smart scheduling makes or breaks AMP, determining performance, responsiveness to changing demand, and energy use.

Apple’s patents covering this and other details of its approach to AMP date from the appearance of the A10 Fusion, and no doubt the techniques used in recent iPhones, iPads and M1 series Macs have advanced considerably over subsequent years of experience. Discovering how this now works, and its implications for those developing apps to run on Apple Silicon systems, is a new challenge. This is growing ever more complex, and even more important, with the two different M1 chips now shipping in five different models. Next year, as Apple continues replacing existing Intel Macs, we can expect to see more combinations of P and E cores, and refinements of core scheduling to make the best use of them.

After many years of sticking to Scrooge’s approach, Bob Cratchit’s time has come at last.

8Comments

Add yours

1

Duncan on December 5, 2021 at 10:55 am

That’s a great analogy with Scrooge!

Going back to pre-OSX/Apple days, Steve Jobs’ NeXT machines, running on a single 68040 processor, needed a performance boost beyond Motorola’s then-current designs. NeXT wanted to stay competitive with Sun’s SPARC machines and other workstations of the day. There were numerous rumors of what might come next, including a switch to RISC architecture and the tantalizing prospects of a then-exotic (for desktop-class machines) multi-CPU design. One strong speculation was a followup to the NeXT Cube with the NeXT ‘Brick’ (escaping from Job’s initial and costly geometric design constraints) which would have two RISC CPUs, dedicated to two different areas of responsibility.

One CPU would continue with the main task of number-crunching and other algorithms that would proceed within the limits of the CPU itself. But Jobs presciently understood the importance of usability even when the machine got loaded, and the rumors held that the second CPU would be foremost dedicated to running the ever-more-sophisticated User Interface, even if the first CPU got bogged down. Anyone who used early Sun or HP workstations in those days well remembers the jittering mouse pointer and window tearing that was the first sign that their operating systems were straining under load. Jobs wanted none of that and pushed his engineers to prioritize the UI in all their developments.

Alas, NeXT as a company failed to advance beyond their 68040 ‘Turbo’ machine and instead turned to Intel and their infamous ‘beige box’ menagerie of vendors as a host architecture for the now unshackled NeXTSTEP operating system. Multiprocessing for NeXT/Apple would revert to its conventional SMP roots for another decade until the iPhone offered an entirely new platform to spring from.

LikeLiked by 1 person
- 2
  
  hoakley on December 5, 2021 at 10:14 pm
  
  Thank you.
  Howard.
  
  LikeLiked by 1 person
3

artiste212 on December 5, 2021 at 5:40 pm

What a delightful way of presenting this topic. I’m in hospital recovering from surgery and not feeling much like reading serious material, but I started reading this and couldn’t stop until the end. Thanks for this, Howard

LikeLiked by 1 person
- 4
  
  hoakley on December 5, 2021 at 10:16 pm
  
  Thank you.
  I’m so sorry to hear that you’re in hospital: I wish you a rapid recovery and look forward to your return to full speed!
  Howard.
  
  LikeLiked by 1 person
5

ericrfmwp on December 5, 2021 at 7:15 pm

FANTASTIC analogy! I love it!

LikeLiked by 1 person
- 6
  
  hoakley on December 5, 2021 at 10:17 pm
  
  Thank you.
  Howard.
  
  LikeLiked by 1 person
7

Tim on December 5, 2021 at 10:11 pm

SMP/AMP traditionally refers to capabilities, not performance, e.g., in AMP, one CPU is blessed as the only one which can enter the kernel, or the only one connected to I/O. As far as I know, XNU is still SMP.

What Apple is doing here is more typically called “heterogeneous computing”.

To extend the Dickens analogy, HC is acknowledging that some accountants are faster (and perhaps also more expensive) than others. AMP is realizing that you’ve only got one CPA on staff who is qualified to oversee a crew of accountants. Your apprentices can add numbers all day long (and some may be quite good at it), but only the CPA can legally stamp a final document and submit it to the government.

LikeLiked by 2 people
- 8
  
  hoakley on December 5, 2021 at 10:39 pm
  
  Thank you.
  I think it’s a complex mixture, depending on how you look at it.
  Some of the cores, the Chinooks, are very much AMP in that traditional sense. They don’t run any of macOS at all, only their ‘firmware’, and are connected to specific I/O devices, which they manage.
  The P and E cores are also managed quite differently. For example, macOS won’t run processes with the minimum QoS on a P core – I’ve never seen that happen in all the tests that I’ve been doing. Even if the P cores are doing next to nothing and the E cores are maxed out, if given a low QoS, the processes will be queued for the E cores.
  Watch how the many services in macOS are run on the cores, and you’ll see almost all the routine work is performed on the E cores, with the P cores being primarily used to run the human interface and user processes.
  To quote the Wikipedia definition of AMP, “not all of the multiple interconnected central processing units (CPUs) are treated equally”.
  Howard.
  
  LikeLiked by 1 person

Share this:

Related