Skip to content

The Eclectic Light Company

Macs & painting – 🦉 No AI content
Main navigation
  • Downloads
  • Freeware
  • M-series Macs
  • Mac Problems
  • Mac articles
  • Macs
  • Art
hoakley July 30, 2025 Macs, Technology

A deeper dive into Spotlight indexes

From its inception Spotlight was designed to encompass multiple types of search, among those searching the metadata and contents of local files, as detailed in Apple’s patent by Yan Arrouye and Keith Mortensen filed in 2000 (see References at the end), five years before Spotlight was released. Although documentation of local search is limited, a series of patents awarded to Apple provides deeper insights. At the heart of local search are hidden index and supporting files in each volume’s .Spotlight-V100 folder, which is served by the mds daemon and its helpers.

Indexing

When any file in a watched folder is created or saved, Spotlight re-indexes that file for the volume’s Spotlight indexes. The process runs like this:

  1. The file change is recorded in that volume’s FSEvents database, in the volume’s .fseventsd folder.
  2. FSEvents notifies Spotlight that a file has been created or changed, prompting Spotlight to re-index the content and metadata of the file(s) concerned.
  3. One of the multiple copies of the mdworker daemon checks the type (UTI) of the changed file, and locates the appropriate mdimporter plugin bundle for that type. In the case of Rich Text files, this is /System/Library/Spotlight/RichText.mdimporter, for example. Additional plugins can be found in /Library/Spotlight/ and sometimes ~/Library/Spotlight/, but most app-specific plugins are now stored in the Library/Spotlight folder inside that app.
  4. The mdworker uses the mdimporter code to generate indexed content for the changed file.
  5. That indexed content is then added to the Spotlight files in the volume’s .Spotlight-V100 folder, for use in future searches.

That series of steps is usually completed within a second or two of the file being created or edited, and both metadata and content are available to search shortly afterwards.

Content extracted from each file that’s indexed by an mdworker process includes:

  • file attributes, such as datestamps;
  • extended attributes, stored in the file system metadata; these include keywords and copyright information, where provided;
  • structured metadata from the main data in the file, as specified by that file type’s mdimporter plugin; examples include EXIF data;
  • content, normally text, exported from the main data of the file, again using the mdimporter plugin.

Evidence from multiple patents shows that file metadata and content are indexed separately. Content appears to go through a conventional processing sequence:

  • text content is divided into tokens at word boundaries, and most frequent words such as the may be eliminated as stop words;
  • a stemmer may be used to derive word stems, and prefixes may also be separated;
  • an indexer generates an inverted index.

Tokenisation of file names uses rules for word boundaries laid down in the International Components for Unicode. In practice, word boundaries include a space, the underscore _, hyphen – and changes of case used in CamelCase. At least in file names, Spotlight treats each of the following examples as three words:
one target two
one_target_two
one-target-two
OneTargetTwo

Languages other than English may allow other word boundaries, but those are the most common.

Indexes

A volume’s hidden index folder contains the store itself, in a folder named Store-V2, and VolumeConfiguration.plist, a standard property list containing several dictionaries:

  • Annotations, a large dictionary containing Creation_Predicates, another dictionary with extensive settings
  • datestamps of creation and modification, with version numbers
  • Exclusions, an array of excluded paths
  • Options, a brief dictionary
  • Stores, with the UUID of the store directory, datestamps, versions and other details.

The store directory is named using the UUID given in VolumeConfiguration.plist, and holds around 99 files and folders containing that volume’s store. Of those, store.db uses a proprietary format that has been reversed by Yogesh Khatri, who provides a parser here, and that’s relatively small, as the dictionaries, indexes and postings are contained in the many other files. The diagram below outlines this structure and lists some of the contents of the store.

Items shown in blue are directories, those in red are most likely to change soon after files are changed, and the ellipsis … after a name indicates there are multiple items with that as a prefix. Note that, while Core Spotlight has its own journal directory, other files don’t appear to separate its indexes.

Inverted indexes

Spotlight’s indexes are based on what is known by convention as the inverted index. At its most basic, this consists of a dictionary together with a series of posting lists for each of the tokens in that dictionary. Posting lists reference the location of the occurrence of that token in the documents that have been indexed.

For example, suppose the token light has been obtained by tokenisation of a text file. For that token in the dictionary, there will be a postings list identifying where that token was found. There are different conventions as to how those posting lists work, and whether they include separate document identifiers.

Apple’s patents include several elaborations of basic inverted indexes. Hornkvist and others describe a two-level inverted indexing table with live index, together with an annotated postings list, with update sets and multiple index files with deltas. The two-level table keeps frequent tokens in a small table that is optimised for updates, and less common tokens in a larger table optimised for searching rather than updating.

Sachs and Sagotsky describe a collocation index constructed from an inverted index by determining distances between the occurrence of tokens in posting lists. Those that fall within a specified threshold are then added to the collocation index.

Changing indexes

Most inverted index systems are largely static, but Spotlight’s have to accommodate constant change as files are altered and saved, new files are created, and others are deleted. To enable main inverted indexes to remain well-structured and efficient, Spotlight stores appear to use separate transient posting tables to store recently acquired metadata and content. Periodically data from those is incorporated into more static tables. Similarly, when files are deleted their indexed metadata and contents aren’t removed immediately, but when the store next undergoes housekeeping.

This is likely to explain sustained periods of activity of mds and its helpers, for example in the minutes after startup. This is difficult to establish, as that activity isn’t accompanied by informative entries in the log.

Summary

  • Each volume has its own hidden, top-level .Spotlight-V100 folder containing Spotlight indexes for the contents of that volume.
  • When files change, mdworker processes extract metadata and contents for indexing, using the mdimporter plugin for that file type.
  • Metadata and content appear to be indexed separately.
  • Text content is tokenised and filtered using stop words and may be further processed for stems and prefixes.
  • Inverted indexes are used, with entries in a dictionary having a postings list specifying the locations of each occurrence.
  • More elaborate inverted indexes may be used, separating frequent tokens from those less common.
  • Indexes are designed to cope with frequent changes, only incorporating those into more static tables during periodic housekeeping.
  • Local Spotlight indexes and indexing are complicated and almost entirely undocumented.

References

US Patent 6,847,959 B1 Universal Interface for Retrieval of Information in a Computer System, Yan Arrouye and Keith Mortensen, filed 5 January 2000, dated 25 January 2005.
US Patent 7,698,328 B2 User-Directed Search Refinement, Matthew G Sachs and Jonathan A Sagotsky, filed 11 August 2006, dated 13 April 2010.
US Patent 7,783,589 B2 Inverted Index Processing, John M Hornkvist and others, filed 4 August 2006, dated 24 August 2010.
Stefan Büttcher, Charles LA Clarke and Gordon V Cormack (2010) Information Retrieval, Implementing and Evaluating Search Engines, MIT Press, ISBN 978 0 262 52887 0.

I’m very grateful to Yogesh Khatri for correcting me about the store.db database (see comments below).

Share this:

  • Click to share on X (Opens in new window) X
  • Click to share on Facebook (Opens in new window) Facebook
  • Click to share on Reddit (Opens in new window) Reddit
  • Click to share on Pinterest (Opens in new window) Pinterest
  • Click to share on Threads (Opens in new window) Threads
  • Click to share on Mastodon (Opens in new window) Mastodon
  • Click to share on Bluesky (Opens in new window) Bluesky
  • Click to email a link to a friend (Opens in new window) Email
  • Click to print (Opens in new window) Print
Like Loading...

Related

Posted in Macs, Technology and tagged Apple, index, inverted index, mdimporter, mds_stores, mdworker, metadata, patents, Spotlight. Bookmark the permalink.

23Comments

Add yours
  1. 1
    Duncan's avatar
    Duncan on July 30, 2025 at 11:28 am
    Reply

    If the ‘V’ in the name “.Spotlight-V100” stands for ‘Version’ then that implies that there were to be subsequent iterations of those folders or their structure. Yet we’re still at ‘Version 100’ after all these years.

    If my assumption is not correct, then why append the ‘-V100’ at all? Why not just use “.Spotlight” and keep things simple?

    LikeLiked by 1 person

    • 2
      hoakley's avatar
      hoakley on July 30, 2025 at 12:59 pm
      Reply

      Thank you, Duncan.
      I have always read that as meaning ‘version 1.00’, and it’s the same with some other locations, such as the hidden versions database. What’s striking here is that the index store inside it is named Store-V2. So maybe they’re just window dressing.
      Howard.

      LikeLike

  2. 3
    fds's avatar
    fds on July 30, 2025 at 3:07 pm
    Reply

    One aspect of Spotlight I’ve always been unsure about is database version compatibility between major macOS releases.

    I have some bad memories from long ago, back around the PowerPC – Intel switch, when attempting to use the same external drive with Macs running different OS versions kept needing to regenerate the Spotlight index. Has that ever been improved? I simply stopped trying to do that, and don’t ever dare attach my large external drives to an older OS version any more.

    As mentioned with the seemingly versioned folder or file names, it would make sense to be able to have multiple versions of Spotlight databases around on the same disk, for different OS versions. Obviously that would only waste storage if the older database version was no longer needed by the user, and tracking file system changes made in the other OS version between last mount could be tricky.

    LikeLiked by 1 person

    • 4
      hoakley's avatar
      hoakley on July 30, 2025 at 3:57 pm
      Reply

      Thank you. It’s impossible to tell without knowing in detail the structure of every file in the index.
      I think the situation was very different at the time of the switch to Intel: Spotlight had only been released a year or so earlier, and was probably itself undergoing great change.
      While more recent versions of macOS have been able to index more content, such as text recovered from images by OCR, and some image recognition, there’s absolutely no reason those should have resulted in any change in index format or other incompatibilities.
      Do you have any evidence of incompatibilities arising since, say, 2010?
      Howard.

      LikeLike

      • 5
        fds's avatar
        fds on July 30, 2025 at 4:12 pm
        Reply

        No, not really, as far as my external drives go; but I’ve been avoiding the situation, and not even attempting to test it, as noted. Come to think of it, however, I don’t recall newer macOS releases re-indexing my external drives for a long time. Even though I’m quite sure that, as part of the now-yearly OS upgrade ritual, the internal drive does keep getting re-indexed, or at least refreshed in some way. There’s usually a progress bar post-upgrade before Spotlight is fully ready.

        LikeLiked by 1 person

  3. 6
    joethewalrus's avatar
    joethewalrus on July 30, 2025 at 6:33 pm
    Reply

    Now you have me deeply curious about the size of my Spotlight index, but I’m struggling to find a way to determine that. It won’t show in Finder even when I enable hidden files, and when I try to access it in the Terminal, it either tells me it’s 128 bytes, or denies me entry to the directory. I apologize if you’ve already answered this before, but please let me know if you have a solution.

    LikeLiked by 1 person

    • 7
      hoakley's avatar
      hoakley on July 30, 2025 at 6:43 pm
      Reply

      Here’s one way:
      – enable hidden files
      – navigate to /System/Volumes/Macintosh HD and identify the locked folder
      – copy that to another location, e.g. another volume such as an external disk, inside another folder so it doesn’t clash with another index folder; you’ll be asked to authenticate
      – the copy will tell you how large it is, and once it’s at a new location you can remove the leading dot from its name.
      Howard.

      LikeLiked by 1 person

      • 8
        joethewalrus's avatar
        joethewalrus on July 31, 2025 at 6:03 am
        Reply

        Thanks for that. I’m able to follow that suggestion for a non-boot volume, and determine the index is impressively compact (131 kb) for the volume where I keep my virtual machines and a handful of installers.

        Unfortunately, it looks like neither Sequoia nor Tahoe will permit that act on the Spotlight index on the boot volume’s linked data volume. Finder won’t even show the index on the merged boot volume, and Terminal returns “operation not permitted” on sudo cp -R when performed on this Spotlight index (which it will at least show, but not operate on).

        I may try disabling SIP and give it another go, but my Mac is too busy for a reboot right now.

        LikeLiked by 1 person

        • 9
          hoakley's avatar
          hoakley on July 31, 2025 at 6:06 am

          Well, before suggesting that, that’s exactly what I did on a Sequoia Data volume on the startup internal SSD.
          Howard

          LikeLiked by 1 person

        • 10
          joethewalrus's avatar
          joethewalrus on July 31, 2025 at 6:21 am

          *Throws arms at the heavens*

          “Why does my Mac hate me!?!?!!”

          I’ll figure out what I’m doing wrong.

          LikeLiked by 1 person

        • 11
          joethewalrus's avatar
          joethewalrus on July 31, 2025 at 6:36 am

          Ok, that wasn’t so hard to figure out. I don’t know what Terminal’s problem is, but in Finder the problem was, of course, me. I was looking in the root directory “/” and you were clearing telling me to go to /System/Volumes/Macintosh HD.

          The answer is 2.8 GB, or 6.2% of the total storage of the main volume on my main laptop. and 1.7 GB or 1.7%* of the storage on the main volume of my Mac Mini. That’s a curious discrepancy that goes to show HOW we use a Mac matters significantly in Spotlight’s index size.

          I’ll gladly sacrifice 6-7% of a volume for the services Spotlight provides, but I’m not sure I’m willing to give much more.

          *Fear not, friends, I am not living dangerously on a 1 TB SSD. It just so happens that the main container of a 2TB SSD is currently holding 1007 GB of data.

          LikeLiked by 1 person

        • 12
          hoakley's avatar
          hoakley on July 31, 2025 at 11:15 am

          Well done. Those indexes are probably considerably small than caches or snapshots, and far more useful in the long run. Indexed content is probably the reason for the difference.
          Howard.

          LikeLiked by 1 person

        • 13
          joethewalrus's avatar
          joethewalrus on July 31, 2025 at 6:41 am

          Adding one more thing I just thought of. It is possible, thought not a hypothesis I’m willing to test in the immediate future, that the Spotlight index on the MacBook Pro is inflated due to my recent update to Tahoe, twelve hours ago.

          The Mac Mini will run Sequoia at least until Tahoe release day.

          LikeLiked by 1 person

    • 14
      kapitainsky's avatar
      kapitainsky on July 31, 2025 at 6:40 am
      Reply

      In terminal:

      $ sudo du -sh /System/Volumes/Data/.Spotlight-V100
      5.4G /System/Volumes/Data/.Spotlight-V100

      LikeLiked by 2 people

      • 15
        joethewalrus's avatar
        joethewalrus on July 31, 2025 at 6:51 am
        Reply

        % sudo du -sh /System/Volumes/Data/.Spotlight-V100

        du: /System/Volumes/Data/.Spotlight-V100: Operation not permitted

        LikeLiked by 2 people

        • 16
          kapitainsky's avatar
          kapitainsky on July 31, 2025 at 9:31 am

          du: /System/Volumes/Data/.Spotlight-V100: Operation not permitted

          It requires Terminal application you are using to have “Full Disk Access” granted in Settings -> Privacy & Security -> Full Disk Access.

          LikeLiked by 2 people

        • 17
          joethewalrus's avatar
          joethewalrus on July 31, 2025 at 10:13 am

          Thank you. I don’t recall when I turned full disk access off or why, but I had assumed it was active and hadn’t checked it.

          Earlier I wrote:
          It is possible, thought not a hypothesis I’m willing to test in the immediate future, that the Spotlight index on the MacBook Pro is inflated due to my recent update to Tahoe, twelve hours ago.

          To add evidence to the hypothesis, in the last 3–4 hours, the index has decreased in size to 1.9 GB.

          LikeLiked by 2 people

  4. 18
    Jerry Leichter's avatar
    Jerry Leichter on September 10, 2025 at 11:11 pm
    Reply

    FYI, store.db, despite its name, isn’t an sqlite3 database. At least that’s what sqlite3 will tell you if you ask it to do any operation on the file (like .tables to list the tables).

    There’s also a .store.db. My best guess is that one of the files is a backup copy of the other.

    LikeLiked by 1 person

    • 19
      hoakley's avatar
      hoakley on September 11, 2025 at 5:32 am
      Reply

      Thank you, Jerry.
      Howard.

      LikeLike

  5. 20
    Kevin D's avatar
    Kevin D on November 25, 2025 at 11:41 pm
    Reply

    Question – what is the password that DB Browser is asking me for when attempting to open a copied off version of my own store.db? Thank you!

    LikeLiked by 1 person

    • 21
      hoakley's avatar
      hoakley on November 26, 2025 at 8:51 pm
      Reply

      My only suggestion is that it’s your admin password, to gain elevated privileges to access that database.
      Howard.

      LikeLike

  6. 22
    Yogesh Khatri's avatar
    Yogesh Khatri on December 7, 2025 at 11:30 am
    Reply

    Hi Howard, the store.db and .store.db files are not SQLITE databases. They have a proprietary format which I reverse engineered a few years back. A parser is available here: https://github.com/ydkhatri/spotlight_parser

    Thanks for all your macOS articles. They’ve been great to read!

    LikeLiked by 2 people

    • 23
      hoakley's avatar
      hoakley on December 7, 2025 at 11:50 am
      Reply

      Thank you very much indeed for that info and the link. I have amended the article with due credit.
      Howard.

      LikeLiked by 1 person

Leave a reply to fds Cancel reply

Quick Links

  • Free Software Menu
  • System Updates
  • M-series Macs
  • Mac Troubleshooting Summary
  • Mac problem-solving
  • Painting topics
  • Painting
  • Long Reads

Search

Monthly archives

  • December 2025 (66)
  • November 2025 (74)
  • October 2025 (75)
  • September 2025 (78)
  • August 2025 (76)
  • July 2025 (77)
  • June 2025 (74)
  • May 2025 (76)
  • April 2025 (73)
  • March 2025 (78)
  • February 2025 (67)
  • January 2025 (75)
  • December 2024 (74)
  • November 2024 (73)
  • October 2024 (78)
  • September 2024 (77)
  • August 2024 (75)
  • July 2024 (77)
  • June 2024 (71)
  • May 2024 (79)
  • April 2024 (75)
  • March 2024 (81)
  • February 2024 (72)
  • January 2024 (78)
  • December 2023 (79)
  • November 2023 (74)
  • October 2023 (77)
  • September 2023 (77)
  • August 2023 (72)
  • July 2023 (79)
  • June 2023 (73)
  • May 2023 (79)
  • April 2023 (73)
  • March 2023 (76)
  • February 2023 (68)
  • January 2023 (74)
  • December 2022 (74)
  • November 2022 (72)
  • October 2022 (76)
  • September 2022 (72)
  • August 2022 (75)
  • July 2022 (76)
  • June 2022 (73)
  • May 2022 (76)
  • April 2022 (71)
  • March 2022 (77)
  • February 2022 (68)
  • January 2022 (77)
  • December 2021 (75)
  • November 2021 (72)
  • October 2021 (75)
  • September 2021 (76)
  • August 2021 (75)
  • July 2021 (75)
  • June 2021 (71)
  • May 2021 (80)
  • April 2021 (79)
  • March 2021 (77)
  • February 2021 (75)
  • January 2021 (75)
  • December 2020 (77)
  • November 2020 (84)
  • October 2020 (81)
  • September 2020 (79)
  • August 2020 (103)
  • July 2020 (81)
  • June 2020 (78)
  • May 2020 (78)
  • April 2020 (81)
  • March 2020 (86)
  • February 2020 (77)
  • January 2020 (86)
  • December 2019 (82)
  • November 2019 (74)
  • October 2019 (89)
  • September 2019 (80)
  • August 2019 (91)
  • July 2019 (95)
  • June 2019 (88)
  • May 2019 (91)
  • April 2019 (79)
  • March 2019 (78)
  • February 2019 (71)
  • January 2019 (69)
  • December 2018 (79)
  • November 2018 (71)
  • October 2018 (78)
  • September 2018 (76)
  • August 2018 (78)
  • July 2018 (76)
  • June 2018 (77)
  • May 2018 (71)
  • April 2018 (67)
  • March 2018 (73)
  • February 2018 (67)
  • January 2018 (83)
  • December 2017 (94)
  • November 2017 (73)
  • October 2017 (86)
  • September 2017 (92)
  • August 2017 (69)
  • July 2017 (81)
  • June 2017 (76)
  • May 2017 (90)
  • April 2017 (76)
  • March 2017 (79)
  • February 2017 (65)
  • January 2017 (76)
  • December 2016 (75)
  • November 2016 (68)
  • October 2016 (76)
  • September 2016 (78)
  • August 2016 (70)
  • July 2016 (74)
  • June 2016 (66)
  • May 2016 (71)
  • April 2016 (67)
  • March 2016 (71)
  • February 2016 (68)
  • January 2016 (90)
  • December 2015 (96)
  • November 2015 (103)
  • October 2015 (119)
  • September 2015 (115)
  • August 2015 (117)
  • July 2015 (117)
  • June 2015 (105)
  • May 2015 (111)
  • April 2015 (119)
  • March 2015 (69)
  • February 2015 (54)
  • January 2015 (39)

Tags

APFS Apple Apple silicon backup Big Sur Blake Bonnard bug Catalina Consolation Console Corinth Delacroix Disk Utility Doré El Capitan extended attributes Finder firmware Gatekeeper Gérôme High Sierra history of painting iCloud Impressionism landscape LockRattler log M1 Mac Mac history macOS macOS 10.12 macOS 10.13 macOS 10.14 macOS 10.15 macOS 11 macOS 12 macOS 13 macOS 14 macOS 15 malware Metamorphoses Mojave Monet Monterey Moreau myth narrative OS X Ovid painting performance Pissarro Poussin privacy Renoir riddle Rubens Sargent security Sierra SilentKnight Sonoma SSD Swift Time Machine Tintoretto Turner update upgrade Ventura xattr Xcode XProtect

Statistics

  • 21,045,497 hits
Blog at WordPress.com.
Footer navigation
  • Free Software Menu
  • About & Contact
  • Macs
  • Painting
  • Downloads
  • Mac problem-solving
  • Extended attributes (xattrs)
  • Painting topics
  • SilentKnight, Skint, SystHist, silnite, LockRattler & Scrub
  • DelightEd & Podofyllin
  • xattred, SpotTest, Spotcord, Metamer & xattr tools
  • 32-bitCheck & ArchiChect
  • XProCheck, T2M2, LogUI, Ulbow, blowhole and log utilities
  • Cirrus & Bailiff
  • Precize, Alifix, UTIutility, Sparsity, alisma, Taccy, Signet
  • Versatility & Revisionist
  • Text Utilities: Textovert, Nalaprop, Dystextia and others
  • PDF
  • Keychains & Permissions
  • Updates
  • Spundle, Cormorant, Stibium, DropSum, Dintch, Fintch and cintch
  • Long Reads
  • Mac Troubleshooting Summary
  • M-series Macs
  • Mints: a multifunction utility
  • VisualLookUpTest
  • Virtualisation on Apple silicon
  • System Updates
  • Saturday Mac Riddles
  • Last Week on My Mac
  • sysctl information
Secondary navigation
  • Search

Post navigation

What has changed in macOS Sequoia 15.6?
Medium and Message: All in the caption

Begin typing your search above and press return to search. Press Esc to cancel.

  • Comment
  • Reblog
  • Subscribe Subscribed
    • The Eclectic Light Company
    • Join 8,887 other subscribers
    • Already have a WordPress.com account? Log in now.
    • The Eclectic Light Company
    • Subscribe Subscribed
    • Sign up
    • Log in
    • Copy shortlink
    • Report this content
    • View post in Reader
    • Manage subscriptions
    • Collapse this bar
 

Loading Comments...
 

    %d