How to check whether Spotlight is getting the right metadata

Spotlight can only search the metadata it has entered in its indexes. As I demonstrated a couple of days ago in two test cases, some metadata may be present in a file and available for indexing, but may not be added to those indexes. This normally occurs because of a problem or bug in the mdimporter responsible for extracting metadata and passing it for storage in the indexes. Fortunately, macOS provides a method of identifying that, using two command tools.

Commands

The first command
mdimport -t -d2 filename
lists all its known metadata recognised by the mdimporter used. Currently, that may crash persistently for some types of file such as images.

The second command
mdls filename
lists all indexed metadata for that file, and shouldn’t crash.

mdimport – aspirations

Output from mdimport is a long catalogue of all the metadata attached to and associated with that file. This starts with a statement of the file examined, tells you its type as a UTI, and reveals which mdimporter was used:
Imported '/Users/hoakley/Documents/0xattrtests/testtext1.text' of type 'public.plain-text' with plugIn /System/Library/Spotlight/RichText.mdimporter.

It then tells you how many metadata attributes it found
35 attributes returned

Those are listed, starting with those found in extended attributes, prefaced by :EA:
":EA:kMDItemLastUsedDate" = "2026-05-04 18:52:32 +0000";

Then come standard file metadata
":MD:kMDItemPath" = "/Users/hoakley/Documents/0xattrtests/testtext1.text";
"_kMDItemContentChangeDate" = "2026-05-04 11:34:50 +0000";

The main body lists all the rest with the prefix kMDItem common to metadata
kMDItemContentCreationDate = "2026-05-04 11:34:49 +0000";

Among those are the UTI of the file, and its more general types in the UTI tree. These can explain why a file appears to have been processed by the wrong mdimporter
kMDItemContentType = "public.plain-text";
kMDItemContentTypeTree = ("public.plain-text", "public.text", "public.data", "public.item", "public.content");

There’s a long series of entries giving the long form of the file type in multiple languages
kMDItemKind = {en = "Plain Text Document"; };

Text content that has been indexed isn’t given in this form of the command, but a summary is:
kMDItemTextContent = "<<< Text content of 4968 characters >>>";

Those are the metadata that should then be passed to Spotlight to be stored in its indexes, but not necessarily what does get stored. To discover that, we need the mdls output. Note that additional metadata obtained by mediaanalysisd and the CGPDF Service aren’t included in this, as they operate separately from mdimporters and normally after significant delay.

mdls – reality

This output is far shorter, and contains entries in Spotlight’s indexes for that file, except for indexed text content. The only way to assess that is by searching for text it should contain.

This should match metadata attributes seen in the mdimport output, such as
_kMDItemDisplayNameWithExtensions = "testtext1.text"
kMDItemContentCreationDate = 2026-05-04 11:34:49 +0000
kMDItemContentType = "public.plain-text"
kMDItemKind = "Plain Text Document"

Examples

Plain text file with extended attributes

mdimport:
“:EA:kMDItemAuthors” = “Andy Bill Charlie”;
“:EA:kMDItemComment” = “A. regular comment.”;
“:EA:kMDItemDescription” = “A description.”;
“:EA:kMDItemKeywords” = “keyword1,ketwird2,keyword3”;
“:EA:kMDItemSubject” = “The subject.”;

mdls:
kMDItemAuthors = (“Andy Bill Charlie”)
kMDItemComment = “A. regular comment.”
kMDItemDescription = “A description.”
kMDItemKeywords = (“keyword1,ketwird2,keyword3”)
kMDItemSubject = “The subject.”

Metadata attributes were faithfully added to Spotlight’s indexes.

RTF file with extended attributes

mdimport:
“:EA:kMDItemAuthors” = “Andy Bill Charlie”;
“:EA:kMDItemComment” = “A. regular comment.”;
“:EA:kMDItemDescription” = “A description.”;
“:EA:kMDItemKeywords” = “keyword1,ketwird2,keyword3”;
“:EA:kMDItemSubject” = “The subject.”;
kMDItemAuthors = “<null>”;
kMDItemComment = “<null>”;
kMDItemKeywords = “<null>”;
kMDItemSubject = “<null>”;

The last four are those obtained from the (absent) Info metadata embedded in the file data, and conflict with those from four of the extended attributes.

mdls:
kMDItemComment = “A. regular comment.”
kMDItemDescription = “A description.”
kMDItemKeywords = (“keyword1,ketwird2,keyword3”)
kMDItemSubject = “The subject.”

These reveal that Spotlight’s indexes captured four of the five extended attributes, and ignored the null values for the Info metadata. However, kMDItemAuthors is missing, presumably because of a bug in the mdimporter.

I’m considering whether it might be useful to add these to SpotTest, to help diagnose problems.