Apple has been investing heavily in AI and Machine Learning. Every one of its M1 series chips, and those in A11 and subsequent iPhone/iPad chips, features an Apple Neural Engine to accelerate those features, yet it’s comparatively unusual to be able to watch them at work. Visual Look Up, introduced in Monterey and iOS/iPadOS 15, isn’t shy about recording what it’s up to in the log, and demonstrating whether that investment is paying off. Over the last couple of weeks I’ve been exploring how this works, and here consider what happens when it doesn’t.
Visual Look Up (VLU) consists of two main phases. In the first, the image is analysed, classified, and objects within it are identified. Neural or perceptual hashes are computed by neural networks. In the second phase, those are used to search Apple’s servers for matches, either of the whole image as a work of art, or objects identified within it as paintings (within a painting), nature (flowers and plants), landmarks, or pets. Identifications and linked information are then displayed about what has been recognised.
These two phases are distinct; the first is performed entirely locally, and doesn’t require data to be uploaded to any server. Only if the user chooses to perform the Look Up does any information leave that system. In any case, as NeuralHashes are computed locally, the image isn’t uploaded to Apple, as would be required by Google Images Search.
There are four possible outcomes to an attempt to perform VLU on an image:
- MAD Visual Search (MVS) returns one or more correctly recognised objects;
- MVS returns one or more incorrectly recognised objects;
- MVS returns no recognised object;
- Look Up isn’t offered at all.
VLU is unable to distinguish between the first two, as the user is never invited to provide feedback as to whether objects are correctly recognised, and in some cases may not know the correct answer anyway. In this sense, VLU doesn’t learn in the slightest. Apple should be able to obtain information from its servers as to what proportion of VLU searches fail to recognise objects, although it wouldn’t know whether they could have succeeded.
The most opaque outcomes, for both Apple and the user, are those in which analysis is performed but fails to reach the state where visual search could follow; in those cases, VLU isn’t offered for an image when it appears that it should be. The user sees this last outcome differently, according to whether it occurs in a browser window, or in the Preview or Photos apps. In the former, when the user summons the contextual menu, the Look Up command is never added to it. In Preview or Photos, the Info tool never gains its ornament at the upper left, and the open image is never awarded the white spot which could trigger MVS on demand.
That last outcome, the image which can’t be looked up at all, is unusual, but appears common for certain types of image. For example, the following painting by the great Swedish natural history painter Bruno Liljefors consistently fails, although your mileage might differ.
It’s sharp, contains ample detail, and is instantly recognisable to the human eye. Some other paintings by Liljefors also fail, as seen in this selection.
VLU is impressively tolerant of manipulation or damage to images of paintings, though. I have explored this using the better-known example of Leonardo da Vinci’s Mona Lisa. For example, VLU correctly recognises this intentionally blurred version.
But just a little more loss of detail and the following image fails to offer Look Up.
I have added those three images to my VLU test page so you can use them in your own testing.
I suspect the problem with some of Liljefors’ paintings is that blades of grass and other visual clutter distract from objects which aren’t easy to resolve. Interestingly, Google Images Search struggles with the blurred version of the Mona Lisa which VLU is able to identify, but not with the other blurred version, which VLU couldn’t resolve. Clearly Google’s and Apple’s techniques work rather differently.
Examining the log for Look Up failures isn’t particularly revealing either, as there are no messages there reporting errors or failure. MRC and MAD parsing are reported as completing, and the completion handler claims that it “Has Analysis” with “Error: (null)”. However, this sequence stops abruptly after that and before the usual entries of
Setting Active Interaction Types: <private> for baseView: <private>
Setting Analysis: <private> with requestID: 1 on interaction: <private>
This suggests that, for these pathological images, computation of NeuralHashes fails, and without those to search with, VisionKit doesn’t add Look Up to the contextual menu (Safari), or the Info tool (Preview, Photos).
In real-world use such cases appear unusual and marginal. However, if the technique used in VLU were to be applied to other image searching, such as has been proposed by Apple in CSAM detection, these failures could become more significant. Like security vulnerabilities, image characteristics which cause classification and search to fail altogether would be exploitable. These could set up a similar cat-and-mouse pursuit that currently plagues malware defence such as the detection signatures used by XProtect.
What we have for Visual Look Up in Monterey is version one. It’s very quick, and generally highly effective and robust to image manipulation. I have no doubt that it will improve in future updates, and that fewer images will fail so completely as these do at present. It remains to be seen whether future versions will be as exploitable.