Live Text, Visual Look Up, Face Recognition, ML and Privacy

The hardware in my iPhone XR, and my M1 Macs, is vastly superior to that in my iMac Pro in one respect: since its A12 in 2018, Apple’s own chips incorporate a Neural Engine.* This article considers what difference that makes, and how it affects our privacy.

The Apple Neural Engine (ANE) is by no means unique. Intel even offers its Neural Compute Sticks, but no Intel Mac has neural computation cores. This means that, when an Intel Mac is performing any feature which uses Machine Learning (ML), it’s not just slower than a recent iPhone, but that code has to be run largely on its main CPU, possibly with the assistance of the GPU. Do the same on an M1 Mac and the hard work is performed on its ANE and, in parts, the GPU, which are far quicker and don’t steal cycles from other code running on its general-purpose cores.

It’s believed that the ANE also benefits from the Unified Memory model of M1 series chips. Other neural accelerators normally have their own, limited memory, and incur the overhead of moving data to and from it. It’s likely that the ANE and GPU can be allocated shared memory, which would improve performance when running code reliant on both.

Everyday examples of features reliant on ML include speech recognition, face recognition, and most recently Live Text and Visual Look Up, both of which are new with macOS Monterey. While some of those may currently appear to be toys, they’re steadily becoming more important for many Mac users.

Apple started work in earnest on face detection and recognition eight years ago, when the techniques it uses today were relatively novel. At the outset, its engineers decided that these tasks would have to be performed entirely locally, rather than by uploading images for remote analysis. That meant implementing computationally intensive methods on low-power mobile devices with limited memory and storage. The only practical solution was to design the 8-core neural engine as was incorporated into the A12 Bionic chip, and known as the A11 ANE.

The basic functions required by ML are now implemented in Apple’s Accelerate and BNNS libraries, and in Metal Performance Shaders. Where they can, those run their code on the ANE and GPU. Core ML is then layered on top, and provides higher-level support for ML models. Above that are domain-specific kits including those for vision, natural language, speech and sound analysis. It’s those which are used by the features the user sees, including Live Text and Visual Look Up.

ML in iOS and macOS is low-profile, and intended to be as transparent to the user as possible. In the time it takes to display the contextual menu on an image in Safari, mediaanalysisd may have loaded half a dozen neural networks and computed several Neural Hashes. On an M1 Mac, those will have taken less than half the time with none of the CPU required on an Intel Mac. Apple reports that running face recognition end-to-end on a recent ANE takes less than 4 ms, which is eight times faster than running it on a GPU.

No images or other data have been uploaded to servers, as all neural computation is performed entirely locally. Only when you decide you want to Look Up that image does macOS connect to a server, and the data uploaded to that consists of neural (or perceptual) hashes; while those may characterise an image, you can’t reconstruct the original image from its neural hashes. Face recognition and Live Text are even more private, as no data is uploaded to any servers at all.

The downside of this transparency is that ML is extremely hard to observe and monitor. While you can observe the activity of each of the CPU cores and the GPU, there are no equivalent tools for the ANE. We’re fortunate that, at present, Live Text and Visual Look Up are new, and make copious entries in the Unified log. Even so, the information provided is limited, for example to
Creating context 5060079424 eng=10007 dev=-1
Creating plan 5067031040
Creating plan 5067031040
espresso_plan_add_network plan=5067031040 path=<private> cp=65552
Loaded network: '<private>' pf=10007 cp=1

from com.apple.espresso.

Other use of the ANE, for example by third-party apps, may leave no trace in the log, and there’s nothing to see of it in Activity Monitor. Even the expert’s tool powermetrics only provides ANE power use and four bandwidth counters.

The Apple Neural Engine ensures high-performance machine learning to deliver new features like Live Text and Visual Look Up without your data leaving the privacy of your M1 series Mac. It remains one of the biggest secrets in Apple Silicon.

* A11 chips also include a Neural Engine, but with only two cores it’s much simpler and doesn’t support Core ML.

References

Core ML introduction
Core ML and Vision Tutorial: On-device training on iOS, by Audrey Tam and Christine Abernathy
An On-device Deep Neural Network for Face Detection, 2017 article giving Apple’s early history and concepts
Recognizing People in Photos Through Private On-Device Machine Learning, 2021 update on Apple’s progress
BNNS routines documented.