Diagnosing a Spotlight bug in Big Sur: failure to index RTF content

Spotlight search is great when it works, but all too often it doesn’t seem to work as we expect it to. If you’ve tried out the test features in Mints and been puzzled, either that you can’t get it to recognise text in Rich Text (RTF) files, or that your Spotlight works but mine doesn’t, here’s how to investigate a real live bug in Spotlight.

Spotlight is a complex series of linked services, any of which can go wrong. When a search consistently fails, these are my suggestions for the most likely causes and how to test for them:

  • The folder or volume isn’t being indexed, as it’s in Spotlight’s list of exclusions; check in the Spotlight pane, or copy the file to a location which definitely is being indexed.
  • The Spotlight indexes on that volume are dysfunctional or broken; copy the file to another location where indexes are still functional; force re-indexing by adding the volume to the Privacy list, closing the pane, and removing it again. That may take some hours to complete, though, and tends to be my last resort.
  • It’s a malformed file, in which the content which you think should be indexed can’t be harvested using the Spotlight importer; try another file of the same type to see if that works any better.
  • The document may have the wrong extensions or UTI which results in the wrong Spotlight importer trying to harvest its contents and failing; check the extension and UTI.
  • A broken Spotlight importer is being used; check whether it’s the correct one (see below).

When developing the Spotlight features in my free utility Mints, I noticed that Spotlight cannot find my special search term syzygy999 in the test RTF file, no matter where I move it, and no matter which app saves it. Inevitably, it’s a bit more complex than that: if I use Finder’s Spotlight search or HoudahSpot, the more of the search term that I give, the less the chances of finding it in the test file.

spotlightbug01

Give just the first letter s, and my test document shows up among the hits every time.

spotlightbug02

Add a second letter, the y, and it vanishes from them.

From this simple test, we know that the file has been indexed. However, as John points out in his comment below, unless we search specifically for Text Content, the reason that it’s finding this file is because it’s finding the s in the filename rather than its content. Switch to finding by text content and Spotlight is unable to find any content in any RTF file which it has indexed. John reports that converting RTF files to RTFD works around this bug, as their content is then correctly indexed. That’s even stranger, because if you look inside an RTFD file, the text content is stored in – can’t you guess? – an RTF file.

There’s one way of discovering which Spotlight importer is called by macOS to harvest metadata from any given file. As usual, this is buried in developer documentation which Apple has now archived and no longer maintains: the Spotlight Importer Programming Guide. The command has changed slightly from those days, but if you type into Terminal something like
mdimport -t -d 2 MyFile.rtf
where MyFile.rtf is the path to your test file, you should see a response which early on identifies exactly which importer handles that file. In this case, it came back with
Imported '/Users/hoakley/Documents/SpotTestA.rtf' of type 'public.rtf' with plugIn /System/Library/Spotlight/RichText.mdimporter.

So we now know that Big Sur’s built-in Spotlight importer for Rich Text files has a bug, in which it can fail to index the contents of RTF files correctly. What fascinates me, though, is that this doesn’t appear universal in Big Sur, even though I can reproduce it every time on both Intel and M1 Macs here. Are we all using the same importer, I wonder?

(Corrected, thanks to John in his comment below, at 1245 UTC.)