Spotlight search can skip files

Many Mac users feel that Spotlight search has steadily deteriorated, but it’s very hard to assess this objectively. Maybe it’s like the police looking younger, and just a sign of advancing age?

I’ve been looking at some other issues recently which required me to flag Rich Text files using distinctive words, then search for them using Spotlight, in Mojave 10.14.6. In the course of doing that, I discovered that Spotlight can completely overlook files and consistently fail to find words within them, which appears to result from an indexing problem.

My tests ran as follows. Using several different apps capable of saving Rich Text in RTF and RTFD formats, I created single-line documents containing one of my distinctive words, here Superhydrogenated. I also added document metadata with other distinctive words such as Maxillofacial and Anathematically. I put these in a folder, then opened the Spotlight search window and tried to locate them.

spotlight01

This is an example folder of test files: two each written by Nisus Writer Pro, Pages, TextEdit, and Microsoft Word, for a total of eight documents each containing the word Superhydrogenated.

spotlight02

On this occasion, Spotlight search was only able to find half of them, including both written by Pages, one by TextEdit, and the Microsoft Word .docx version but not its RTF equivalent. There was no consistent pattern across different apps, though.

I also tried copying the whole folder between volumes, which produced different patterns, and sometimes complete success in locating all the files. Both volumes are otherwise reliable SSDs in APFS format.

Suspecting that Spotlight needed to reindex the two volumes, I forced that using the Spotlight pane. Once that was complete, the documents which were present at the time of reindexing could then all be found correctly. But 12 hours or so later, copying the folder containing those documents exhibited similar problems to those which had occurred before reindexing, with some files being found and others not.

The only action which appeared able to change Spotlight’s search results was writing extended attributes (xattrs) to a file which had previously been omitted from the results. Otherwise, if Spotlight skipped the file in one search, it skipped it every time; likewise, those files which it found it did so consistently.

To understand where this error is most probably occurring, you need to understand what happens when a file is created or changed, with respect to Spotlight:

  1. Creating a new file, or changing an existing one, is recorded in that volume’s FSEvents database, in the volume’s .fseventsd folder.
  2. FSEvents notifies Spotlight that a file has been created or changed, prompting Spotlight to reindex the content and metadata of the file(s) concerned.
  3. One of the multiple copies of the mdworker daemon checks the type (UTI) of the changed file, and locates the appropriate .mdimporter plugin bundle for that type. In the case of Rich Text files, this is /System/Library/Spotlight/RichText.mdimporter.
  4. The mdworker uses the .mdimporter code to generate indexed content for the changed file.
  5. That indexed content is then added to the Spotlight database in the volume’s .Spotlight-V100 folder for use in future searches.

That series of steps is usually completed within a second or two of the file being created or edited.

Experience from reindexing the two volumes is that steps 3-5 worked perfectly well during that action. In particular, RichText.mdimporter appears to work correctly with the test files used here, and there was no evidence that their data caused mdworker daemons to crash or behave unpredictably. It’s therefore most likely that FSEvents is failing to notice the changed file in the first place, or if it is, it’s failing to notify Spotlight of the need for the file’s reindexing to be performed.

In Mojave, and particularly on APFS volumes, FSEvents are significantly less used than earlier: FSEevents used to form the basis for determining which files Time Machine should back up, but that has now been taken over by APFS Snapshots. Perhaps it’s just as well.

If you’ve had similar experiences, of files which seem never to be searched by Spotlight until you reindex a volume, I’d be very interested to hear about them.