You can disable Live Text in Language & Region settings, but what exactly does that do? Does it also block Visual Look Up, or connection to Apple?
VisionKit
How images have text recognised within them as Live Text, and objects of interest are classified, using VisionKit, mediaanalysisd, Espresso, the ANE and more.
Does Sonoma analyse and classify your images? What content does it obtain from them? How to discover what search terms it recognises? And could this be used to detect CSAM?
For some images, Visual Look Up fails so completely that it’s not even offered. Could this be exploited as a way of blocking image recognition?
Visual Look Up also recognises flowers, landmarks and pets, as well as well-known paintings. Here’s how it does those, and how Live Text is different.
The first phase analyses, classifies and detects any objects within the image. When the user clicks on the white dot, this completes with a search for the best match.
Not only does this version of Mints extract information from the log detailing what happens during Visual Look Up, but it includes it own browser window to look up with.
Remember Apple’s failed attempt to detect CSAM in images? Would that have been similar to the way that Visual Look Up works? Is this the thin end of the wedge?
A promising start for a new feature which could, with a little improvement, become a uniquely powerful tool.
