Last Week on My Mac: Spotlight sorcery

According to scientific tradition, we first observe then experiment. If you proceed to the latter before you understand how a system behaves, then you’re likely to labour under misapprehensions and your trials can become tribulations. Only when a system is thoroughly opaque and mysterious can we risk attempting both together.

That’s the case for Spotlight, which despite its name does everything but shine any light on its mechanisms. It presents itself in several guises, as a combination of web and local search (🔍), as local search using terms limited in their logical operators (Finder’s Find), as full-blown predicate-based local search (mdfind), as in-app file search (Core Spotlight), and the coder’s NSMetadataQuery and predicates. It relies on indexes scattered across hundreds of binary files, and runs multiple processes, while writing next to nothing in the log.

Last week’s code-doodling has been devoted to turning the Spotlight features in Mints into a separate app, SpotTest, so I can extend them to allow testing of different volumes, and search for text that has been derived from images. Those are proving thorny because of Spotlight’s unpredictable behaviour across different Macs running Sequoia.

Every week I search for screenshots to illustrate another article on Mac history. When using my old iMac Pro where most of them are stored, Spotlight will find many images containing search terms from the text shown within them, even from ancient QuickDraw PICT images, demonstrating that text is being recovered using Live Text’s optical character recognition. When I try to repeat this using test images on an Apple silicon Mac, Spotlight seems unable to recognise any such recovered text.

Image analysis on Macs has a stormy history. In a well-intentioned gaffe four years ago, Apple shocked us when it declared it was intending to check our images for CSAM content. Although it eventually dropped that idea, there have been rumours ever since about our Macs secretly looking through our images and reporting back to Apple. It didn’t help that at the same time Apple announced Live Text as one of the new features of macOS Monterey, and brought further image analysis in Visual Look Up.

Although I looked at this in detail, it’s hard to prove a negative, and every so often I’m confronted by someone who remains convinced that Apple is monitoring the images on their Mac. I was thus dragged back to reconsider it in macOS Sonoma. What I didn’t consider at that time was how text derived from Live Text and image analysis found its way into Spotlight’s indexes, which forms part of my quest in SpotTest.

This doesn’t of course apply to images in PDF documents. When I looked at those, I concluded: “If you have PDF documents that have been assembled from scans or other images without undergoing any form of text recognition, then macOS currently can’t index any text that you may still be able to extract using Live Text. If you want to make the text content of a PDF document searchable, then you must ensure that it contains its own text content.” I reiterated that in a later overview.

My old images aren’t PDFs but QuickDraw PICTs, TIFFs, PNGs and JPEGs, many from more than 20 years ago. When the circumstances are right, macOS quietly runs Live Text over them and stores any text it recovers in Spotlight’s indexes. It also analyses each image for recognisable objects, and adds those too. These happen more slowly than regular content indexing by mdworker, some considerable time after the image has been created, and have nothing whatsoever to do with our viewing those images in QuickLook or the Finder, or even using Live Text or Visual Look Up ourselves.

There are deeper problems to come. Among them is discovering the results of image recognition as can be revealed in the command line using a search such as
mdfind "(** == 'cattle*'cdw) && (kMDItemContentTypeTree == 'public.image'cd)"
to discover all images that have been recognised as containing cattle. There’s no equivalent of the first part of that when calling NSMetadataQuery from Swift code, and a predicate of
kMDItemTextContent CONTAINS[cd] \"cattle\"
will only discover text recovered by Live Text, not the names of objects recognised within an image.

What started as a quick doodle is now bogged down in the quirks of Spotlight, which defies the scientific method. Perhaps it’s time for a little sorcery.

sandysmedea — Frederick Sandys (1829–1904), Medea (1866-68), oil on wood panel with gilded background, 61.2 x 45.6 cm, Birmingham Museum and Art Gallery, Birmingham England. Wikimedia Commons.

8Comments

Add yours

1

fds on August 10, 2025 at 4:07 pm

One of my perplexing discoveries with regards to Live Text was that it is significantly more accurate on Intel Macs than Apple Silicon. In my case, the M4 Mac is 3x faster at it, which is the expected and welcome part. If only it weren”t also 3x worse at it.

My source images aren’t even that challenging: x3 Retina iPhone screenshots. The M4 Mac is particularly bad at Italic text on the screenshots, constantly failing to determine the correct word boundaries, mixing up digits with letters and vice versa, and so on. All the while Intel Macs, while much slower, also give significantly superior results.

All this on the exact same newest non-beta macOS, same Vision framework API calls, same parameters, same image files.

LikeLiked by 1 person
- 2
  
  hoakley on August 10, 2025 at 9:34 pm
  
  I’m shocked at that, as I’m just taking a fresh dive into Live Text. I’ve never encountered problems with accuracy here, even when using handwriting, across M1, M3 and M4 models, but I only ever use larger images from Macs rather than iOS. I wonder if dimensions could be the cause?
  Howard.
  
  LikeLike
3

Maurizio on August 11, 2025 at 6:58 am

Hello , I still dont understand why Spotlight is a registered app entitled to save data on icloud. I still puzzled since the first time i noticed

LikeLiked by 1 person
- 4
  
  hoakley on August 11, 2025 at 8:26 pm
  
  I suspect that may relate not to local search, but sharing ‘knowledge’ used to enhance its web search features. You could always try turning it off, and see if it made any difference to your use.
  Howard.
  
  LikeLike
5

John Gilbert on August 12, 2025 at 2:46 am

I don’t expect kMDItemTextContent to include non-text items – it does say “TextContent”. Rather the problem is how to use the kMDItemPhotosSceneClassification… metadata.

I have discovered that (kMDItemPhotosSceneClassificationLabels == “cattle”cdw) does successfully find photos with cattle.

LikeLiked by 1 person
- 6
  
  John Gilbert on August 12, 2025 at 2:52 am
  
  I should add that the query works in Finder search, but not with mdfind.
  
  LikeLiked by 1 person
  - 7
    
    hoakley on August 12, 2025 at 7:17 am
    
    Thank you again. Then it’s pretty well useless, as you can always use the wildcard ** either in the Finder, or in mdfind. If you can’t use it in mdfind, then you can’t use it from compiled code either, which is the biggest problem, as I’ll demonstrate tomorrow.
    Howard.
    
    LikeLike
- 8
  
  hoakley on August 12, 2025 at 7:15 am
  
  Thank you, John. That’s undocumented of course.
  Howard.
  
  LikeLike

Share this:

Related