How QuickLook Preview doesn’t tell Apple about images

Since I published my detailed explanation of why there are reports that macOS might appear to send Apple details of images you browse in the Finder, rumour has continued unabated. Most recently, while agreeing with my conclusions, @mysk has claimed this was the result of a bug, and has been fixed in macOS 13.2. This article looks at just what does happen in Live Text, and what it’s doing contacting Apple.

What do we know?

The story so far is of a single claim that, when browsing images in Ventura’s Finder, mediaanalysisd tried to make an outgoing connection to an Apple server, as revealed by the software firewall Little Snitch. Following replication of that by using QuickLook Preview, I then demonstrated that the outgoing connection was performed during image analysis for Live Text. Thus the facts are:

  • Simply browsing images in the Finder doesn’t elicit any outgoing network connections.
  • However, opening an image in QuickLook Preview (pressing the Spacebar) can instigate an outgoing network connection.
  • Opening an image in QuickLook Preview can trigger the OCR process in Live Text, but not Visual Look Up.
  • Live Text doesn’t compute any neural hashes for an image, but analyses the image for possible text content, which is different.
  • Any outgoing network connection during Live Text OCR therefore cannot send Apple any identifiers that could be used to check an image for CSAM or any other content.
  • Thus, the claim that browsing images in the Finder could be used to check for CSAM has no factual support.

This is fully consistent with Apple’s most recent statement on checking images for CSAM, as reported late last year in Wired, but carefully omitted by those spreading rumours. Specifically, Apple is there reported as stating:
“We have further decided to not move forward with our previously proposed CSAM detection tool for iCloud Photos. Children can be protected without companies combing through personal data, and we will continue working with governments, child advocates, and other companies to help protect young people, preserve their right to privacy, and make the internet a safer place for children and for us all.”

Despite this, some still claim that the outgoing connection during LiveText OCR somehow provides information about an image to Apple. This article therefore analyses logs obtained from this event immediately following updating of a virtual machine to macOS 13.2.

Image analysis

When you press the Spacebar, QuickLook recognises you want to display a preview in a floating window, the start of this process.
7.244339 com.apple.quicklook previewView:<private> didShowPreviewItem:<private>

Because this is a full preview rather than a Finder thumbnail, Live Text is supported, but not Visual Look Up. VisionKit gets to work starting image analysis.
7.776088 com.apple.VisionKit BEGIN "VKImageAnalyzerProcessRequestEvent"

For this, mediaanalysisd is spawned, set up for XPC access, and passed the request for image analysis.
7.801901 com.apple.mediaanalysisd internal event: WILL_SPAWN, code = 0
7.805015 com.apple.mediaanalysisd Successfully spawned mediaanalysisd[1605] because ipc (mach)
7.824079 mediaanalysisd Received on-demand image processing request (CVPixelBuffer) with MADRequestID 1

Because this is being run in real-time, it’s given a high QoS.
7.827881 mediaanalysisd Run <private> (1) [QoS: 25 Cost: 1.000]; remaining resource: 0.00
7.828544 mediaanalysisd VCPMADVIDocumentRecognitionTask running...

CoreML and Espresso then get going with the analysis using neural networks. A small sample of many similar log entries is given here.
7.838437 com.apple.espresso Creating context 5043897904 eng=5 dev=-3
7.838482 com.apple.espresso Creating plan 5033208320
7.848279 com.apple.espresso espresso_plan_add_network plan=5033208320 path=<private> cp=65552 Completed
7.851370 com.apple.coreml <private> class has successfully loaded the model at <private>.
8.334736 com.apple.espresso [change_input_shapes] index=0/1 name=<private> w=200 h=32 k=1 n=4 seq=1
8.395860 com.apple.espresso espresso_plan_add_network plan=5033345536 path=<private> cp=65552 Completed

Note the time gap of nearly 0.5 seconds, during which image analysis is taking place using neural networks.

Language modelling

This doesn’t generate any neural hashes, but if it’s successful in discovering what may be text within the image, LanguageModeling is initiated to decipher that text. Again this is a small sample of many similar entries.
8.396966 com.apple.LanguageModeling Options is updating <private> from 1 to 0
8.397006 com.apple.LanguageModeling Creating CompositeLanguageModel (<private>):
<private>
8.398759 com.apple.LanguageModeling NgramModel: Loaded language model: <private>
8.399817 com.apple.Lexicon Lexicon <private>

Linguistic data

It’s at this early stage that macOS checks the currency of local linguistic data to use in text recognition. Significantly, the client is given as mediaanalysisd, which may well be identified as the parent process by a software firewall.
8.403145 com.apple.mobileassetd Creating client/daemon connection: 2A0DE96A-A3AD-4A5B-BD0A-4D3B99BC8F91
8.403240 com.apple.mobileassetd -[ControlManager handleClientConnection:on:]_block_invoke: assetType: com.apple.MobileAsset.LinguisticData client: mediaanalysisd, command: 1 (MA_QUERY_ASSET_TYPE)

This relies on a catalogue kept in /System/Library/AssetsV2/com_apple_MobileAsset_LinguisticData/com_apple_MobileAsset_LinguisticData.xml, with numerous .asset files in sub-folders. If you care to look there, you will see the large number of such files. On one of my Macs there are more than 90 folders, each containing a further hierarchy of language-specific files.
8.403331 com.apple.mobileassetd -[ControlManager determineAssets:clientName:connection:downloadingTasks:message:resultTypes:queryArray:isForSpecificAsset:specificAssetId:specificAllowedDifferences:]: mediaanalysisd queried for: com.apple.MobileAsset.LinguisticData with returnType of: 2 with Purpose: (null)
8.403415 com.apple.mobileassetd -[ControlManager newCatalogLoad:withPurpose:]: Catalog fileLocation: /System/Library/AssetsV2/com_apple_MobileAsset_LinguisticData/com_apple_MobileAsset_LinguisticData.xml
8.454579 com.apple.mobileassetd dataFillInstalledWithPurpose: Path to asset dir: /System/Library/AssetsV2/com_apple_MobileAsset_LinguisticData/495dbc280e6494f8635c3f2e7797e0ccf53546d3.asset
8.454627 com.apple.mobileassetd -[ControlManager determineAssets:clientName:connection:downloadingTasks:message:resultTypes:queryArray:isForSpecificAsset:specificAssetId:specificAllowedDifferences:]: mediaanalysisd queried for: com.apple.MobileAsset.LinguisticData with returnTypes 2 (MAUnionOfCatalogInstalled) and found 0 assets with result 0 (MAQuerySuccessful) --> From 743 listed in the catalog and 1 local (1/1 downloaded, 0 preinstalled)--> Catalog info ({ isLiveServer = 0; }) --> Filtered for MAUnionOfCatalogInstalled to 0 in catalog (0 installed, 0 server-only, 0 preinstalled), 0 installedNotInCatalog, 0 installedWithOS, 0 requiredByOS; the query params are: [ AssetLocale:'en' and AssetType:'Delta' and _CompatibilityVersion:'11' and _SupportedPlatforms:'macOS'] --> Merged to 0 assets
8.454970 com.apple.DataDeliveryServices assetsForQuery: <query: com.apple.MobileAsset.LinguisticData, locO: 1, iO: 1, latO: 1, <filter: {
AssetLocale = "{(\n en\n)}";
AssetType = "{(\n Delta\n)}";
}>> final result: (
) was cached: 0, cachedOnly: 0

A similar sequence adds another language.
8.506237 com.apple.DataDeliveryServices assetsForQuery: <query: com.apple.MobileAsset.LinguisticData, locO: 1, iO: 1, latO: 1, <filter: {
AssetLocale = "{(\n ja\n)}";
AssetType = "{(\n Delta\n)}";
}>> final result: (
) was cached: 0, cachedOnly: 0

On this occasion, because macOS has been updated so recently, no .asset files had to be downloaded or updated, and LanguageModeling is able to proceed without making any network connection.
8.611597 com.apple.LanguageModeling NgramModel: Loaded language model: <private>
8.613612 LanguageModeling Creating CompositeLanguageModel (<private>):
<private>
8.624175 com.apple.LanguageModeling NgramModel: Loaded language model: <private>
8.654266 com.apple.coreml <private> class has successfully loaded the model at <private>.
8.668123 com.apple.LanguageModeling NeuralNetwork: Loaded neural language model: <private>

A similar series of entries records any translation of recognised text, which may again involve downloading updated translation information from Apple’s servers.

That lets the document recognition task complete, and visual search gating too.
8.682162 mediaanalysisd VCPMADVIDocumentRecognitionTask complete
8.682166 mediaanalysisd VCPMADVIVisualSearchGatingTask running...
8.940130 mediaanalysisd VCPMADVIVisualSearchGatingTask complete (0)

Reporting

Finally, 1.2 seconds after the start of image analysis, VisionKit reports it complete.
8.942336 com.apple.VisionKit Completed MRC Parsing of 0 elements in 0.000000 seconds.
8.945134 com.apple.VisionKit VisualSearchGating: Request completed: <private>
8.945151 com.apple.VisionKit VisualSearchGating: Request completed: <private>
8.945313 com.apple.VisionKit END "VisionKit MAD Parse Request"
8.945355 com.apple.VisionKit Request completed: <private>
8.945396 com.apple.VisionKit Calling completion handler For Request ID:1
Total Processing Time 1169.34ms
Has Analysis: YES
TextLength: 65 DD: 0, MRC: 0, VS:0
request: <private>
Error: (null)

Reported total processing time was 1.17 seconds, and yielded text of length 65 (presumably Unicode code points). It’s at this point that the Live Text icon appears at the bottom right of the preview window, and can select all the recognised text in that image.

Visual Look Up

The sequence of log entries above is quite different from those I have described for Visual Look Up (VLU). Most obvious is the fact that the final phase in VLU, Visual Search, doesn’t take place. That’s marked not by the VisionKit Analyzer process seen in both VLU and Live Text, but by VisionKit’s MAD Visual Search which is confined to VLU. It’s then that neural hash(es) computed during analysis are sent to Apple servers by mediaanalysisd, for the servers to return information about image matches and content.

Conclusions

  • Live Text analysis doesn’t generate neural hashes or other identifiers for an image, in the way that Visual Look Up does.
  • Any connection to Apple’s servers during Live Text analysis is performed before the image has been analysed, and before the extraction of any text. It cannot, therefore, send Apple any image identifiers or extracted text.
  • Live Text relies on language asset files, which may need to be augmented or updated over a network connection during text recognition.
  • macOS 13.1 and 13.2 perform Live Text essentially the same, and will both attempt to connect to Apple’s servers in the event that they need to update language asset files.
  • Users may encounter outgoing connections when opening a local image in QuickLook preview, but can have confidence that it’s not being used to send Apple or any third party image identifiers or extracted text from the image.
  • Blocking outgoing connections used in Live Text will only result in poorer text recognition, and cannot affect the user’s privacy.

As ever, I welcome other factual evidence.