hoakley July 30, 2025 Macs, Technology

A deeper dive into Spotlight indexes

From its inception Spotlight was designed to encompass multiple types of search, among those searching the metadata and contents of local files, as detailed in Apple’s patent by Yan Arrouye and Keith Mortensen filed in 2000 (see References at the end), five years before Spotlight was released. Although documentation of local search is limited, a series of patents awarded to Apple provides deeper insights. At the heart of local search are hidden index and supporting files in each volume’s .Spotlight-V100 folder, which is served by the mds daemon and its helpers.

Indexing

When any file in a watched folder is created or saved, Spotlight re-indexes that file for the volume’s Spotlight indexes. The process runs like this:

The file change is recorded in that volume’s FSEvents database, in the volume’s .fseventsd folder.
FSEvents notifies Spotlight that a file has been created or changed, prompting Spotlight to re-index the content and metadata of the file(s) concerned.
One of the multiple copies of the mdworker daemon checks the type (UTI) of the changed file, and locates the appropriate mdimporter plugin bundle for that type. In the case of Rich Text files, this is /System/Library/Spotlight/RichText.mdimporter, for example. Additional plugins can be found in /Library/Spotlight/ and sometimes ~/Library/Spotlight/, but most app-specific plugins are now stored in the Library/Spotlight folder inside that app.
The mdworker uses the mdimporter code to generate indexed content for the changed file.
That indexed content is then added to the Spotlight files in the volume’s .Spotlight-V100 folder, for use in future searches.

That series of steps is usually completed within a second or two of the file being created or edited, and both metadata and content are available to search shortly afterwards.

Content extracted from each file that’s indexed by an mdworker process includes:

file attributes, such as datestamps;
extended attributes, stored in the file system metadata; these include keywords and copyright information, where provided;
structured metadata from the main data in the file, as specified by that file type’s mdimporter plugin; examples include EXIF data;
content, normally text, exported from the main data of the file, again using the mdimporter plugin.

Evidence from multiple patents shows that file metadata and content are indexed separately. Content appears to go through a conventional processing sequence:

text content is divided into tokens at word boundaries, and most frequent words such as the may be eliminated as stop words;
a stemmer may be used to derive word stems, and prefixes may also be separated;
an indexer generates an inverted index.

Tokenisation of file names uses rules for word boundaries laid down in the International Components for Unicode. In practice, word boundaries include a space, the underscore _, hyphen – and changes of case used in CamelCase. At least in file names, Spotlight treats each of the following examples as three words:
one target two one_target_two one-target-two OneTargetTwo
Languages other than English may allow other word boundaries, but those are the most common.

Indexes

A volume’s hidden index folder contains the store itself, in a folder named Store-V2, and VolumeConfiguration.plist, a standard property list containing several dictionaries:

Annotations, a large dictionary containing Creation_Predicates, another dictionary with extensive settings
datestamps of creation and modification, with version numbers
Exclusions, an array of excluded paths
Options, a brief dictionary
Stores, with the UUID of the store directory, datestamps, versions and other details.

The store directory is named using the UUID given in VolumeConfiguration.plist, and holds around 99 files and folders containing that volume’s store. Of those, store.db uses a proprietary format that has been reversed by Yogesh Khatri, who provides a parser here, and that’s relatively small, as the dictionaries, indexes and postings are contained in the many other files. The diagram below outlines this structure and lists some of the contents of the store.

Items shown in blue are directories, those in red are most likely to change soon after files are changed, and the ellipsis … after a name indicates there are multiple items with that as a prefix. Note that, while Core Spotlight has its own journal directory, other files don’t appear to separate its indexes.

Inverted indexes

Spotlight’s indexes are based on what is known by convention as the inverted index. At its most basic, this consists of a dictionary together with a series of posting lists for each of the tokens in that dictionary. Posting lists reference the location of the occurrence of that token in the documents that have been indexed.

For example, suppose the token light has been obtained by tokenisation of a text file. For that token in the dictionary, there will be a postings list identifying where that token was found. There are different conventions as to how those posting lists work, and whether they include separate document identifiers.

Apple’s patents include several elaborations of basic inverted indexes. Hornkvist and others describe a two-level inverted indexing table with live index, together with an annotated postings list, with update sets and multiple index files with deltas. The two-level table keeps frequent tokens in a small table that is optimised for updates, and less common tokens in a larger table optimised for searching rather than updating.

Sachs and Sagotsky describe a collocation index constructed from an inverted index by determining distances between the occurrence of tokens in posting lists. Those that fall within a specified threshold are then added to the collocation index.

Changing indexes

Most inverted index systems are largely static, but Spotlight’s have to accommodate constant change as files are altered and saved, new files are created, and others are deleted. To enable main inverted indexes to remain well-structured and efficient, Spotlight stores appear to use separate transient posting tables to store recently acquired metadata and content. Periodically data from those is incorporated into more static tables. Similarly, when files are deleted their indexed metadata and contents aren’t removed immediately, but when the store next undergoes housekeeping.

This is likely to explain sustained periods of activity of mds and its helpers, for example in the minutes after startup. This is difficult to establish, as that activity isn’t accompanied by informative entries in the log.

Summary

Each volume has its own hidden, top-level .Spotlight-V100 folder containing Spotlight indexes for the contents of that volume.
When files change, mdworker processes extract metadata and contents for indexing, using the mdimporter plugin for that file type.
Metadata and content appear to be indexed separately.
Text content is tokenised and filtered using stop words and may be further processed for stems and prefixes.
Inverted indexes are used, with entries in a dictionary having a postings list specifying the locations of each occurrence.
More elaborate inverted indexes may be used, separating frequent tokens from those less common.
Indexes are designed to cope with frequent changes, only incorporating those into more static tables during periodic housekeeping.
Local Spotlight indexes and indexing are complicated and almost entirely undocumented.

References

US Patent 6,847,959 B1 Universal Interface for Retrieval of Information in a Computer System, Yan Arrouye and Keith Mortensen, filed 5 January 2000, dated 25 January 2005.
US Patent 7,698,328 B2 User-Directed Search Refinement, Matthew G Sachs and Jonathan A Sagotsky, filed 11 August 2006, dated 13 April 2010.
US Patent 7,783,589 B2 Inverted Index Processing, John M Hornkvist and others, filed 4 August 2006, dated 24 August 2010.
Stefan Büttcher, Charles LA Clarke and Gordon V Cormack (2010) Information Retrieval, Implementing and Evaluating Search Engines, MIT Press, ISBN 978 0 262 52887 0.

I’m very grateful to Yogesh Khatri for correcting me about the store.db database (see comments below).

23Comments

Add yours

1

Duncan on July 30, 2025 at 11:28 am
Reply

If the ‘V’ in the name “.Spotlight-V100” stands for ‘Version’ then that implies that there were to be subsequent iterations of those folders or their structure. Yet we’re still at ‘Version 100’ after all these years.

If my assumption is not correct, then why append the ‘-V100’ at all? Why not just use “.Spotlight” and keep things simple?

LikeLiked by 1 person
- 2
  
  hoakley on July 30, 2025 at 12:59 pm
  Reply
  
  Thank you, Duncan.
  I have always read that as meaning ‘version 1.00’, and it’s the same with some other locations, such as the hidden versions database. What’s striking here is that the index store inside it is named Store-V2. So maybe they’re just window dressing.
  Howard.
  
  LikeLike
3

fds on July 30, 2025 at 3:07 pm
Reply

One aspect of Spotlight I’ve always been unsure about is database version compatibility between major macOS releases.

I have some bad memories from long ago, back around the PowerPC – Intel switch, when attempting to use the same external drive with Macs running different OS versions kept needing to regenerate the Spotlight index. Has that ever been improved? I simply stopped trying to do that, and don’t ever dare attach my large external drives to an older OS version any more.

As mentioned with the seemingly versioned folder or file names, it would make sense to be able to have multiple versions of Spotlight databases around on the same disk, for different OS versions. Obviously that would only waste storage if the older database version was no longer needed by the user, and tracking file system changes made in the other OS version between last mount could be tricky.

LikeLiked by 1 person
- 4
  
  hoakley on July 30, 2025 at 3:57 pm
  Reply
  
  Thank you. It’s impossible to tell without knowing in detail the structure of every file in the index.
  I think the situation was very different at the time of the switch to Intel: Spotlight had only been released a year or so earlier, and was probably itself undergoing great change.
  While more recent versions of macOS have been able to index more content, such as text recovered from images by OCR, and some image recognition, there’s absolutely no reason those should have resulted in any change in index format or other incompatibilities.
  Do you have any evidence of incompatibilities arising since, say, 2010?
  Howard.
  
  LikeLike
  - 5
    
    fds on July 30, 2025 at 4:12 pm
    Reply
    
    No, not really, as far as my external drives go; but I’ve been avoiding the situation, and not even attempting to test it, as noted. Come to think of it, however, I don’t recall newer macOS releases re-indexing my external drives for a long time. Even though I’m quite sure that, as part of the now-yearly OS upgrade ritual, the internal drive does keep getting re-indexed, or at least refreshed in some way. There’s usually a progress bar post-upgrade before Spotlight is fully ready.
    
    LikeLiked by 1 person
6

joethewalrus on July 30, 2025 at 6:33 pm
Reply

Now you have me deeply curious about the size of my Spotlight index, but I’m struggling to find a way to determine that. It won’t show in Finder even when I enable hidden files, and when I try to access it in the Terminal, it either tells me it’s 128 bytes, or denies me entry to the directory. I apologize if you’ve already answered this before, but please let me know if you have a solution.

LikeLiked by 1 person
- 7
  
  hoakley on July 30, 2025 at 6:43 pm
  Reply
  
  Here’s one way:
  – enable hidden files
  – navigate to /System/Volumes/Macintosh HD and identify the locked folder
  – copy that to another location, e.g. another volume such as an external disk, inside another folder so it doesn’t clash with another index folder; you’ll be asked to authenticate
  – the copy will tell you how large it is, and once it’s at a new location you can remove the leading dot from its name.
  Howard.
  
  LikeLiked by 1 person
  - 8
    
    joethewalrus on July 31, 2025 at 6:03 am
    Reply
    
    Thanks for that. I’m able to follow that suggestion for a non-boot volume, and determine the index is impressively compact (131 kb) for the volume where I keep my virtual machines and a handful of installers.
    
    Unfortunately, it looks like neither Sequoia nor Tahoe will permit that act on the Spotlight index on the boot volume’s linked data volume. Finder won’t even show the index on the merged boot volume, and Terminal returns “operation not permitted” on sudo cp -R when performed on this Spotlight index (which it will at least show, but not operate on).
    
    I may try disabling SIP and give it another go, but my Mac is too busy for a reboot right now.
    
    LikeLiked by 1 person
    - 9
      
      hoakley on July 31, 2025 at 6:06 am
      
      Well, before suggesting that, that’s exactly what I did on a Sequoia Data volume on the startup internal SSD.
      Howard
      
      LikeLiked by 1 person
    - 10
      
      joethewalrus on July 31, 2025 at 6:21 am
      
      *Throws arms at the heavens*
      
      “Why does my Mac hate me!?!?!!”
      
      I’ll figure out what I’m doing wrong.
      
      LikeLiked by 1 person
    - 11
      
      joethewalrus on July 31, 2025 at 6:36 am
      
      Ok, that wasn’t so hard to figure out. I don’t know what Terminal’s problem is, but in Finder the problem was, of course, me. I was looking in the root directory “/” and you were clearing telling me to go to /System/Volumes/Macintosh HD.
      
      The answer is 2.8 GB, or 6.2% of the total storage of the main volume on my main laptop. and 1.7 GB or 1.7%* of the storage on the main volume of my Mac Mini. That’s a curious discrepancy that goes to show HOW we use a Mac matters significantly in Spotlight’s index size.
      
      I’ll gladly sacrifice 6-7% of a volume for the services Spotlight provides, but I’m not sure I’m willing to give much more.
      
      *Fear not, friends, I am not living dangerously on a 1 TB SSD. It just so happens that the main container of a 2TB SSD is currently holding 1007 GB of data.
      
      LikeLiked by 1 person
    - 12
      
      hoakley on July 31, 2025 at 11:15 am
      
      Well done. Those indexes are probably considerably small than caches or snapshots, and far more useful in the long run. Indexed content is probably the reason for the difference.
      Howard.
      
      LikeLiked by 1 person
    - 13
      
      joethewalrus on July 31, 2025 at 6:41 am
      
      Adding one more thing I just thought of. It is possible, thought not a hypothesis I’m willing to test in the immediate future, that the Spotlight index on the MacBook Pro is inflated due to my recent update to Tahoe, twelve hours ago.
      
      The Mac Mini will run Sequoia at least until Tahoe release day.
      
      LikeLiked by 1 person
- 14
  
  kapitainsky on July 31, 2025 at 6:40 am
  Reply
  
  In terminal:
  
  $ sudo du -sh /System/Volumes/Data/.Spotlight-V100
  5.4G /System/Volumes/Data/.Spotlight-V100
  
  LikeLiked by 2 people
  - 15
    
    joethewalrus on July 31, 2025 at 6:51 am
    Reply
    
    % sudo du -sh /System/Volumes/Data/.Spotlight-V100
    
    du: /System/Volumes/Data/.Spotlight-V100: Operation not permitted
    
    LikeLiked by 2 people
    - 16
      
      kapitainsky on July 31, 2025 at 9:31 am
      
      du: /System/Volumes/Data/.Spotlight-V100: Operation not permitted
      
      It requires Terminal application you are using to have “Full Disk Access” granted in Settings -> Privacy & Security -> Full Disk Access.
      
      LikeLiked by 2 people
    - 17
      
      joethewalrus on July 31, 2025 at 10:13 am
      
      Thank you. I don’t recall when I turned full disk access off or why, but I had assumed it was active and hadn’t checked it.
      
      Earlier I wrote:
      It is possible, thought not a hypothesis I’m willing to test in the immediate future, that the Spotlight index on the MacBook Pro is inflated due to my recent update to Tahoe, twelve hours ago.
      
      To add evidence to the hypothesis, in the last 3–4 hours, the index has decreased in size to 1.9 GB.
      
      LikeLiked by 2 people
18

Jerry Leichter on September 10, 2025 at 11:11 pm
Reply

FYI, store.db, despite its name, isn’t an sqlite3 database. At least that’s what sqlite3 will tell you if you ask it to do any operation on the file (like .tables to list the tables).

There’s also a .store.db. My best guess is that one of the files is a backup copy of the other.

LikeLiked by 1 person
- 19
  
  hoakley on September 11, 2025 at 5:32 am
  Reply
  
  Thank you, Jerry.
  Howard.
  
  LikeLike
20

Kevin D on November 25, 2025 at 11:41 pm
Reply

Question – what is the password that DB Browser is asking me for when attempting to open a copied off version of my own store.db? Thank you!

LikeLiked by 1 person
- 21
  
  hoakley on November 26, 2025 at 8:51 pm
  Reply
  
  My only suggestion is that it’s your admin password, to gain elevated privileges to access that database.
  Howard.
  
  LikeLike
22

Yogesh Khatri on December 7, 2025 at 11:30 am
Reply

Hi Howard, the store.db and .store.db files are not SQLITE databases. They have a proprietary format which I reverse engineered a few years back. A parser is available here: https://github.com/ydkhatri/spotlight_parser

Thanks for all your macOS articles. They’ve been great to read!

LikeLiked by 2 people
- 23
  
  hoakley on December 7, 2025 at 11:50 am
  Reply
  
  Thank you very much indeed for that info and the link. I have amended the article with due credit.
  Howard.
  
  LikeLiked by 1 person