hoakley June 28, 2026 Macs, Technology

Last Week on My Mac: Spotlight on semantics

You may have noticed one phrase that was repeated throughout much of WWDC earlier this month, semantic search. Although it had appeared occasionally in the past, this year it came up in more than a dozen presentations, starting in the Platforms State of the Union on day 1. Just what is changing in Spotlight that is semantic?

In traditional search of text content, Spotlight discovers in its content indexes each file containing the search term you have provided. When you search for the term cow, it should return only those files containing those exact characters. This is inevitably a bit more complex, as we normally want search to be case-insensitive, and there are other rules we might want to apply, such as whether that should return words like cower where the term is a prefix, or the place-name Cowleaze, where it’s also a capitalised location name. Those are normally determined by a set of language-specific rules for the Unicode collation applied.

Where there are many hits, as occurs when searching the internet, search ranking can be used to return and order those websites that contain the term and are the most frequently visited, or using a more complex ranking algorithm. But that is of limited use when searching local files.

Semantic search is different, in that its matches aren’t as crisp and Boolean. Rather than working like a simple index, it’s more like a thesaurus in effect. This associates the word cow with a meaning, such as a mature female ox of the species Bos taurus, then looks up related concepts. Some will be close matches, like cattle, bovid, or ungulate, others might be related terms like heifer, an immature cow, and other terms with similar or related meaning.

Semantics is heavily dependent on context. If you’re a farmer, you won’t be interested in the females of other species also known as cows, such as elephants and rhinos, which a zoologist would want to include. A more general audience might want its slang association for a disagreeable woman as another of its associations. There are also regional variations: in US English, cow commonly refers to both sexes and all ages of oxen, while in Australian and New Zealand English it can extend to almost anything that’s deemed objectionable.

In the days before AI, this type of search was often referred to as fuzzy, compared to the crisp black-and-white of regular search, as it not only returns hits that contain the specified term, but those for a grey zone of related terms.

One way to envisage this is to represent concepts, encapsulated as tokens, in multi-dimensional space. Each concept can be located by its coordinates, and by calculating the distance between any two concepts you can express how closely related they are. Semantic search thus tries to discover files and other items of similar and related concepts.

Earlier search methods did this using explicit lists of terms. For example, the photo below shows a few Belted Galloway cattle grazing in a field on chalk downs near here.

Traditionally, if I were maintaining my own image library I’d have to enter detailed information about that image to be stored in Exif metadata, a time-consuming task that’s also prone to error. I could get the location or breed wrong, but we now have the benefit of GPS to ensure at least the location is accurate.

More recently we’ve been able to get images analysed automatically, and in that case it returned a set of keywords to identify the contents:
{animal, cow, mammal, ungulates, outdoor, grass, land, sky, cloudy, "blue sky", plant, shrub}
If we then search for images with the keyword of cow, that should appear in the results, but it omits semantically similar words such as cattle or oxen.

Rather than compiling more exhaustive sets of keywords, semantic search can broaden the scope to cope better. And because we can interact through Siri, we can fine-tune our search results by specifying the cattle should be black and white, perhaps, and combining conventional search criteria such as location.

To get this to work effectively, there are some limitations. Because semantics are so contextual and variable, this involves apps and Core Spotlight. That’s a big benefit to user privacy, as Core Spotlight’s indexes are separated by user and stored locally, although in places like ~/Library/Metadata rather than volume-based Spotlight indexes in the existing hidden .Spotlight-V100 folders. And unlike global Spotlight indexing and search, it requires apps to have code to support both tasks, as it can’t just happen by magic.

While I’m sure we’ll all be impressed with many of the results of semantic search, hits that we never expected to find, it’s going to prove harder to assess those that it misses. That’s the more concerning aspect of the performance of all search systems, and in many cases how we will judge their value. Even if you aren’t impressed yet by other advanced AI coming in Golden Gate, semantic search could prove decisive.

8Comments

Add yours

1

Simon on June 28, 2026 at 3:51 pm
Reply

Feel to me like the more emphasis is placed on context, the poorer Spotlight works for just simple file name search. Tahoe has brought this to new heights: I cannot get Spotlight to show simple text files even if I literally punch in their name. So Spotlight refuses to list simpleFile.txt unless I punch in that entire file name all the way to the very end. simple will not suffice. simpleF will not suffice. It will list a gazillion nonsense *.h files for simple, but I cannot for the life of me get it to prioritize showing the file that has the literal filename I’m punching in. I’ve tried deactivating all the extra categories Apple has over the years added (web search wtf???) in an attempt to Google-ify Spotlight and I can rebuild Spotlight databases till the cows come home. But I frickin cannot get Spotlight to just show me my darned simpleFile.txt as the first hit when I punch in simplefi… Or forget first hit, just even show it at all! OK, done. Rant over. ;)

I’m a big fan of knocking out the bare basics first before you try to get fancy. I would have considered simple file name search the bare basics and contextual searching (not to mention semantics, as you astutely point out here, Howard) to be fancy. But apparently I’m a totally misguided luddite. Oh well.

So now I’m looking for a simple file name search that I can use as a launcher to replace Spotlight, which to me is becoming more and more useless by every major macOS update. My favorite and ever so efficient launcher is no more. So 3rd party to the rescue. This replacement should have a menubar interface like Spotlight. I should be able to invoke it with cmd-space. And all it should do is show me a list of files with names that contain my search query, sorted by similarity to what I’m typing, plus (if we’re getting fancy) according to my file type priorities (remember when Spotlight allowed us to select that display of text files was more important than display of Pages docs?). If this tool also offered, besides return/enter or click to launch, cmd-return to show in Finder that would be the icing on the cake. I’d love to throw $$$ at a Mac dev if they could provide me with such a simple utility. :)

LikeLiked by 1 person
- 2
  
  hoakley on June 28, 2026 at 7:09 pm
  Reply
  
  I’m sorry you’ve had these problems, but I don’t think they’re general.
  
  Although I hardly ever use the pop-up Spotlight window, I do use the Finder’s Find window quite often, and have never experienced the problem there, nor in specialist utilities like HoudahSpot, etc. I have just checked Spotlight’s search window to look for an app, and it works excellently here, so I’m at a bit of a loss to explain why yours doesn’t.
  
  None of these use Core Spotlight, though, and you can’t draw conclusions from performance using completely different search methods on physically separate indexes.
  Howard.
  
  LikeLike
  - 3
    
    Simon on June 29, 2026 at 1:26 pm
    Reply
    
    Searching for apps indeed work just fine here too. In fact, I use Spotlight most as an apps launcher because that works so smoothly.
    
    The issue I describe is specifically .txt files. My suspicion is that there, at least internally, still exists some kind of prioritization (like the thing users used to be able to adjust) and Apple in all its wisdom has determined that nobody could be interested in boring old text files so everything else gets listed first. And thus, until you type in that entire file name, making search results collapse to just a few or even only one candidate, you won’t see that one text file you’re obviously searching for.
    
    LikeLiked by 1 person
    - 4
      
      hoakley on June 29, 2026 at 5:29 pm
      
      I think that’s a ranking issue, and it’s dependent on the filename. But I wouldn’t use the global search window to search for local files, in any case, and the Finder’s Find window is much better suited to that, and never returns websites and other unwanted hits, just files and folders.
      Howard.
      
      LikeLike
5

Duncan on June 28, 2026 at 4:02 pm
Reply

“While I’m sure we’ll all be impressed with many of the results of semantic search, hits that we never expected to find, it’s going to prove harder to assess those that it misses.”

Therein lies my concern with any search routine. No one can expect perfection when ‘interpretation’ of the terms is involved (as opposed to a literal letter-for-letter text search), but how are we to get the balance correct?

Email spam filtering has had to contend with this problem for decades now, and mostly gets it correct. However the most significant training for spam filters comes from crowd-sourced feedback. With a private on-device system that Apple implements (whose principle I fully agree with), how does the feedback loop work? If the search misses a result (and thus we likely don’t know what was missed), how are we to correct Apple’s algorithm for the sake of improvement?

LikeLiked by 1 person
- 6
  
  hoakley on June 28, 2026 at 6:57 pm
  Reply
  
  This is a longstanding problem with all search methods, and normally addressed during development using test suites. However, I don’t think those are readily applied to Core Spotlight, because it’s app-specific. Some apps may use machine learning and other methods of self-improvement, but while those work well for fine-tuning hits, they can’t do much for misses.
  Ultimately, it will be our experience that leads to improvement, although most methods can expose parameters the user could adjust. For example, in the case of distance between tokens in multi-dimensional space, that distance could be adjusted by the user. I doubt whether we’ll see that, though.
  Howard.
  
  LikeLike
- 7
  
  Duncan on June 28, 2026 at 8:14 pm
  Reply
  
  One conceivable was to improve results is to start with a significantly large test corpus with known attributes (as predetermined by humans – CAPTCHA training anyone?) and that way the misses will be quantifiable with each query. Focus solely on the misses and keep training the algorithm until it makes firm associations with the queries. Obviously this won’t yield 100% success but it could improve the overall results.
  
  I’m guessing that data scientists and developers already work along this vein, but a missing component ten years ago was the unimaginably-vast quantities of source material that we now have available thanks to AI warehousing. But again, humans will need to vet the test corpus somehow before understanding what the search algorithm might be missing. And random blurry photos will still pose a problem.
  
  LikeLiked by 1 person
  - 8
    
    hoakley on June 28, 2026 at 9:33 pm
    Reply
    
    There are several such corpora available, some free of charge, However, they’re of little use in these contextual situations. LLMs can and have been used, as they already have the necessary tokenisation and modelling. But don’t ever expect humans to be involved in vetting or checking – the whole purpose is to minimise human involvement.
    We’ll see how well it works.
    Howard.
    
    LikeLike

Share this:

Related