In the background: Spotlight indexing

If you’ve ever watched Activity Monitor shortly after logging in to your Mac, you’ll have seen how busy it is for the first ten minutes or more. Apple silicon Macs are different here, because their sustained high % CPU is largely restricted to the Efficiency cores. This is commonly attributed to Spotlight indexing files, and may appear worrying. This article tries to describe what’s going on over that period, and why it doesn’t necessarily mean there’s a problem with Spotlight.

On-the-fly indexing

When new files are created, or existing ones changed, Spotlight indexes them very quickly. The first mdworker process is spawned within a second, and others are added to it. They’re active for about 0.2 seconds before the new posting lists they create are ready to be added to that volume’s indexes. They may later be followed by CGPDFService and mediaanalysisd running similar image analysis to that performed in Live Text. Text extracted from the files is then compressed by mds_stores before adding it to that volume’s Spotlight indexes, within seven seconds or so of file creation.

These steps are summarised in the diagram above, where those in blue update metadata indexes, and those in yellow and green update content indexes. It’s most likely that each file to be added to the indexes has its own individual mdworker process that works with a separate mdimporter.

Spotlight indexes

The indexes used in search services are conventionally referred to as inverted, because of the way they work, and those would normally be largely static. Spotlight’s have to accommodate constant change as files are altered and saved, new files are created, and others are deleted. To enable its main inverted indexes to remain well-structured and efficient, Spotlight stores appear to use separate transient posting tables to hold recently acquired metadata and content. Periodically data from those is assimilated into its more static tables. Similarly, when files are deleted their indexed metadata and contents aren’t removed immediately, but when the store next undergoes housekeeping.

Image analysis and text extraction performed by CGPDFServices and mediaanalysisd, introduced in macOS Sonoma, are computationally intensive, and normally deferred until they can be performed with minimal disruption to the user. When completed, that text also needs to be incorporated in Spotlight’s content indexes.

Startup sequence

I gathered 15 log extracts each covering all entries (excluding Signposts) for periods of 3 seconds during the 11 minutes of high Spotlight process activity after user login, on a Mac mini M4 Pro running macOS 26.2 Tahoe. Those show Spotlight processes running in phases, starting from an arbitrary time zero when their activity was first seen reaching a peak:

  • 00:00 – mdworker processes were indexing files for periods of 1-4 seconds each; Spotlight indexes were being maintained, with a journal reset and sync;
  • 02:40 – CGPDFService started;
  • 04:10 – mediaanalysisd started running its Live Text extraction on files, with photoanalysisd activity; then coremanagedspotlightd maintained indexes, replaying journals;
  • 07:20 – mediaanalysisd continued Live Text extraction;
  • 10:40 – mdworker returned to indexing as before; index maintenance occurred again with a journal reset and sync, following which index file permissions were set;
  • 10:45 – caches were deleted and there was general tidying up before background processes tailed off.

Times are given as MM:SS following the arbitrary start. After about 5 minutes had elapsed, Activity Monitor and the log also showed substantial activity for the initial Time Machine backup, and running the daily complete set of XProtect Remediator scans.

All Spotlight processes appeared to run in the background, at low QoS and on Efficiency cores, apart from those of mediaanalysisd. That process was run at a QoS of Utility rather than Background or Maintenance, and confirmed by the MADServiceTextProcessing being called with a QoS numeric value of 17 instead of 9 or less. That would normally be scheduled on Performance cores, although little was seen on those in Activity Monitor’s CPU History window. Text extraction run by mediaanalysisd typically took about 0.25 seconds for each file processed. mediaanalysisd ran repeatedly for about 6 minutes, between 04:10 and about 10:40.

Abnormally prolonged indexing

Several macOS upgrades in recent years appear to have caused Spotlight indexing at startup to take prolonged periods, in some cases reported as several days, and comparable to the time required to rebuild all indexes from scratch. Given the paucity of log entries recording index maintenance, this can be difficult to confirm, although text extraction by mediaanalysisd is easier to identify. In most cases, it seems preferable to allow prolonged maintenance to run to completion, by allowing that Mac to run without sleeping. In Apple silicon Macs, as those maintenance processes should run almost exclusively on E cores, this should have limited impact on the user.

Forcing a full reindex of a volume is likely to take longer than allowing maintenance to complete.

Key points

  • Spotlight indexes new and changed files rapidly to supplementary journals rather than main indexes.
  • Macs that are shut down daily perform extensive indexing and index maintenance shortly after the user logs in.
  • Macs that remain running should perform the same maintenance periodically during light use.
  • Maintenance includes the incorporation of supplementary journals into main indexes
  • Text extraction from images by mediaanalysisd is performed at the same time, and can take a long time.
  • Although image analysis may be run on P cores, almost all Spotlight indexing and maintenance is performed in the background on E cores.
  • Prolonged indexing and maintenance isn’t necessarily a bad sign, and may well be normal.
  • Disrupting Spotlight routine maintenance may affect search results.