hoakley June 1, 2022 Language, Macs, Technology

Inside Live Text

Live Text is a feature available in Monterey on all Macs officially supported by that version of macOS. It’s not available yet for all languages and types of script: Apple’s official list includes multiple versions of Cantonese, Chinese, English, French, German, Italian, Portuguese and Spanish, but doesn’t yet mention Japanese, although that does appear to work to a degree already.

Other than its use being confined to certain apps, including Safari, Preview and Photos, Apple doesn’t mention any other requirements or limitations. Although not mentioned by Apple, it also works in WKWebKit windows, which may be available in third-party apps. To help you test this, I have now added a sample image to my Visual Look Up test page, readily accessed using my free utility Mints.

If you want to use a similar facility on any text shown on your display, regardless of which app it’s being displayed in, then TextSniper from the App Store should prove ideal.

Although it may appear related to Visual Look Up, which recognises images and objects within them such as dog and cat breeds, and many paintings, Live Text uses a different mechanism to recognise text in images. This doesn’t rely on any information being sent from your Mac anywhere else, nor on Siri or any other part of macOS. Neither do you need to access contextual menus or open Information windows: Live Text just works, and lets you select the text it has already recognised, as if by magic.

What happens in Live Text

When you open an image containing text which Live Text could recognise, macOS doesn’t try to recognise any text at first, but appears to segment that image into areas which it thinks contain recognisable text.

The feature is triggered when the pointer is passed over an area suspected to contain text. This initiates the process of text recognition and changes the pointer to an I-beam. First, VisionKit determines whether the device supports analysis. If it does, it then begins a request for an Image Analyzer Process, and parsing of that image area by mediaanalysisd.

Like Visual Look Up, analysis is performed using neural networks. Where available (currently only Apple M1 series chips), this is performed on the Apple Neural Engine (ANE), otherwise it’s run on the CPU, and managed by Espresso, which is responsible for this Machine Learning support.

After a few runs of neural networks, mediaanalysisd creates a Composite Language Model of the text, capable of handling multiple languages. The underlying language(s) are recognised using linguistic data within macOS. This is apparently performed for each block of text recognised in the image, until the Document Recognition task is completed.

With the recognised text parsed ready for the user to select, Vision Kit declares the media analysis complete, and writes the time taken to the log. For readily recognised English, this is typically around 400-800 ms. The image analysis is declared complete, and the recognised text is then available for the user to select.

When you have selected some of the text recognised in the image, you can then copy it, or Control-click on selected text to access other services such as translation and Look Up. Live Text works with short snippets such as phone numbers, and with whole windows. When recognising Latin/Roman characters, it also works with projected and vertical arrangements of characters. On a good day it can even extract text which you may find difficult to decipher.

Live Text is an excellent example of a simple but powerful tool which relies on the machine learning features built into macOS, and is accelerated when used on M1 Macs with their ANE hardware.

25Comments

Add yours

1

Paul D. on June 1, 2022 at 6:58 am

I’m still on Big Sur, so I bought TextSniper several months ago. However, I’ve noticed that my Mini M1 takes longer and longer to wake up to the point of becoming unusable after a few days if TextSniper is running. I suspect some kind of memory leak is the culprit.

LikeLiked by 2 people
- 2
  
  hoakley on June 1, 2022 at 12:09 pm
  
  I’m sorry to hear that.
  You should be able to tell if anything has a memory leak by looking in the Memory view of Activity Monitor. If you think TextSniper is causing any problem, please contact its developer and raise it with them.
  Howard.
  
  LikeLiked by 1 person
3

coxorange on June 1, 2022 at 9:25 am

An alternative to TextSniper is picatext (https://apps.apple.com/gb/app/picatext/id711411133) I’ve been using it for years. Supports many languages.

LikeLiked by 2 people
- 4
  
  John Gilbert on June 1, 2022 at 11:42 am
  
  Picatext looks like abandonware – no update for 5 years and dead website. TextSniper seems a better bet, though I do have doubts about the spelling of the name.
  
  LikeLiked by 2 people
  - 5
    
    coxorange on June 1, 2022 at 12:04 pm
    
    Their website *is* there: https://www.picatext.com (“www” was missing in the App Store link).
    It works perfectly. Some apps don’t need permanent updating. 😉 I don’t think it’s abandoned, because you have to pay each year to keep your app in the Mac App Store.
    
    LikeLiked by 2 people
    - 6
      
      hoakley on June 1, 2022 at 12:15 pm
      
      Thank you for the corrected link.
      Developers don’t pay Apple to keep their apps in the App Store, and I’m afraid there is a lot of abandonware there. However, attempts to fix that in the iOS store have not been at all well received.
      For Mac apps, those five years have brought a great many changes. It’s unlikely to work well with modern privacy restrictions, and won’t run natively on M1 Macs. Some also have problems with the new boot disk layout. So not being updated in five years is very significant for anyone running Big Sur, Monterey, or an M1 Mac.
      Howard.
      
      LikeLiked by 1 person
    - 7
      
      coxorange on June 1, 2022 at 3:18 pm
      
      Isn’t a developer membership required to sell apps on the Mac App Store ($99 or £79)?
      
      LikeLiked by 1 person
    - 8
      
      hoakley on June 1, 2022 at 3:32 pm
      
      Yes, but the purpose of the subscription isn’t to enable the developer to sell apps in the App Store. I’ve had a developer subscription pretty well continuously since 1989, and haven’t submitted a single app to the App Store, and have no intention of doing so. The developer subscription gives access to a great deal more, and access to the App Store is probably the least significant feature. WWDC, beta releases, technical support are all universal features.
      Howard.
      
      LikeLiked by 1 person
  - 9
    
    hoakley on June 1, 2022 at 12:11 pm
    
    Thank you. Yes, I don’t like its name.
    Howard.
    
    LikeLike
- 10
  
  hoakley on June 1, 2022 at 12:11 pm
  
  Thank you.
  Howard.
  
  LikeLike
11

Javier Gallardo on June 1, 2022 at 10:14 am

I didn’t realize at first that the image added to your “Visual Look Up test page” was an image file. The example serves right, but I find much more impressive how Live Text deals with fotocopies, manuscripts and non common fonts, like in this: https://upload.wikimedia.org/wikipedia/commons/8/8f/United_States_Declaration_of_Independence.jpg
OCR software is not a new thing, but Apple’s implementation is superb, IMO.

LikeLiked by 2 people
- 12
  
  hoakley on June 1, 2022 at 12:17 pm
  
  Thank you.
  It’s not intended as a stress test – I think I did that with my image of Japanese Wikipedia!
  Howard.
  
  LikeLike
13

bhallstein on June 1, 2022 at 12:24 pm

Is there a way to use macOS’s text recognition from the command line? Or is there an API for developers to hook into? It seems like a bit of a missed opportunity if not!

LikeLiked by 2 people
- 14
  
  hoakley on June 1, 2022 at 1:38 pm
  
  I’m not sure how this would ever work from a text-based interface. As I wrote, macOS doesn’t appear to perform OCR on the whole image, but does so on demand, and the way you demand it is by moving the pointer over a block of what could be text. That triggers the OCR, presumably only on that block.
  No, Apple hasn’t produced a public API, just as it hasn’t for Visual Look Up. Whether those will come next week in macOS 13 I don’t know.
  Howard.
  
  LikeLiked by 1 person
  - 15
    
    Ben on June 1, 2022 at 2:58 pm
    
    Thanks for your reply howard. I don’t understand why instigating from the mouse pointer would be a technical requirement to run OCR analysis — plenty of command line tools work on images. I’m envisaging something like this:
    
    apple_text_recognition_tool -i /path/to/image.heic -o recognised-text.txt
    
    LikeLiked by 1 person
    - 16
      
      hoakley on June 1, 2022 at 3:27 pm
      
      But that isn’t what Live Text does – you’re creating an OCR tool, which isn’t what Live Text provides. Maybe Apple could write something different to do that, but I suspect demand would be almost zero. The magic in Live Text is exactly that it doesn’t try OCRing the whole page in advance, but in real time recognises the block of text under the pointer.
      In my experience with OCR and Roman/Latin text, the problem isn’t the OCR, it’s stitching together the blocks of text in the page into something coherent. Live Text doesn’t have to do that – it lets the user do it, a block at a time. Hence the interactivity and GUI requirement.
      Howard.
      
      LikeLike
    - 17
      
      Ben on June 2, 2022 at 9:44 am
      
      Very respectfully Howard I think you have a couple of misapprehensions here.
      
      Firstly that Live Text has anything at all to do with wiggling the mouse pointer. This is not the case — live text runs on the full image, extracts any text available within it, and makes this available to select. The mouse pointer position is not an input to the AI text extraction process, but only features at the user interface level and has nothing whatsoever to do with the underlying AI-powered OCR technology. To make this claim is to confuse the underlying technology behind live text with how it is made accessible to the user in various UIs around macOS and iOS.
      
      When you open an image, clearly Live Text doesn’t wait until you move the mouse pointer to begin. If it did then we would see continuous text detection processes churning the CPU and/or ML cores while moving the mouse cursor over image views.
      
      Secondly that Live Text is somehow in a separate category to OCR. Live Text absolutely qualifies as OCR software. The only difference with traditional OCR is that it uses AI for text recognition. And as above, how it is made available to the user through the UI is certainly non-traditional as it runs automatically and on-demand on any images the OS can find throughout the system. But the underlying technology is of course in the category of OCR.
      
      One thing you can try for yourself that makes the above clear: the OCR behind Live Text can be used in a Shortcut, “Extract Text From Image”, you pass it an image and it extracts the text in the image. This shows that the UI aspects with the mouse are irrelevant to the underlying technology that is OCR images on the fly, on-demand throughout the system. There is no “interactivity and GUI requirement” for live text to work, only for making the text available for the user to select — analogous to the final stage in any OCR software.
      
      Thirdly that making live text available to developers via the command line and through APIs would be inconsequential. To maintain that OCR is inconsequential, then this suggest the very idea of live text is pointless. Clearly there was enough point for apple to implement this. Exposing this amazing AI-powered text recognition technology for use in scripts and apps on the system would be potentially game-changing. As with any foundational technology, like Core Animation back in the day, making it available broadly could have wide-reaching applications that might be hard to predict.
      
      Cheers for reading!
      
      LikeLiked by 2 people
    - 18
      
      hoakley on June 2, 2022 at 11:35 am
      
      Thank you.
      “clearly Live Text doesn’t wait until you move the mouse pointer to begin” Well, I have studied the log entries for Live Text, and they tell a different story altogether. When the image is opened at 0 s, and the pointer is moved and changes into an I-beam at 5 s, the entries relating to text recognition don’t start until 5 s. I was surprised to see that, and have checked it several times, as you can using my utility Mints. Perhaps you should try that yourself rather than using indirect evidence.
      Yes, thank you, I do know that Live Text is performing OCR. The distinction I was making was in how it’s performed, as I explained.
      I don’t anywhere suggest that Live Text is pointless. OCR is already widely available, yet Live Text presents it in an interactive way which is far more convenient than available previously, except in some apps like TextSniper.
      Howard.
      
      LikeLike
    - 19
      
      hoakley on June 2, 2022 at 11:58 am
      
      Here’s an example taken from my M1 Studio:
      
      Image opened by double-tap at 13:38:00.484737 and 13:38:00.664132:
      2022-05-29 13:38:00.484737+0100 5044 568 Finder AppKit sendAction:
      2022-05-29 13:38:00.664132+0100 5044 568 Finder AppKit sendAction:
      
      Preview up and running and about to open the image at 13:38:00.954849:
      2022-05-29 13:38:00.954849+0100 Default 44845 1568 com.apple.processmanager front-35286506 Preview HIServices SetFrontProcess: asn=0x0-0x50050 options=0
      
      Document being opened at 13:38:00.996955:
      2022-05-29 13:38:00.996955+0100 Info 44845 1568 com.apple.appleevents receive Preview AE RECEIVED:(aevt,odoc) {aevt,odoc target=Finder returnID=3385}…
      
      Pointer moved over text and changes into I-beam at 13:38:10.552285:
      2022-05-29 13:38:10.552285+0100 Default 44845 1568 com.apple.VisionKit com.apple.VisionKit.processing Preview VisionKitCore Device Supports Analysis: YES
      following which
      22-05-29 13:38:10.562 VisionKit com.apple.VisionKit : BEGIN “VKImageAnalyzerProcessRequestEvent”
      22-05-29 13:38:10.562 VisionKit com.apple.VisionKit.processing : Processing Request: 1 –
      22-05-29 13:38:10.562 VisionKit com.apple.VisionKit.processing : Added request to Mad Interface with total method return time: 0.000029 request:
      22-05-29 13:38:10.562 VisionKit signposts :
      22-05-29 13:38:10.562 VisionKit com.apple.VisionKit : BEGIN “VisionKit MAD Parse Request”
      and so on for the OCR to take place.
      
      I repeated this with two completely different images on different occasions – each time VisionKit doesn’t start until the pointer is moved over an area of text and changes into an I-beam.
      How do you explain that, please?
      Howard.
      
      LikeLike
20

Paul D. on June 1, 2022 at 12:26 pm

By the way, it would be interesting if you covered OCR software some time.

I have to OCR a lot of Japanese files, and most products are rubbish at it. (They produce very similar results, so I suspect they use the same OCR module.) The only one I have found that works well is FineReader OCR Pro. Alas, it no longer receives updates or support from the publisher as far as I can tell.

LikeLiked by 2 people
- 21
  
  hoakley on June 1, 2022 at 1:41 pm
  
  I’m sorry, but no. I don’t and can’t accept sponsorship, so I’d have to buy all the products myself. It’s now a niche market too, and Live Text addresses many more common applications.
  For Japanese, I’d look at a Japanese product, or at least one that is recognised in the Japanese market as being good. All my experience says that may well not be a product even known in the West.
  Howard.
  
  LikeLike
22

joestoner49 on June 2, 2022 at 3:08 pm

I’m running Monterey v12.4 Beta and can not find Mints in the App Store!! > ATB > Joe

Sent with a Spark

LikeLiked by 2 people
- 23
  
  hoakley on June 2, 2022 at 3:15 pm
  
  That’s because I don’t distribute any of my apps through the App Store. At the foot of this page you’ll see a link to its product page or you can find it in Downloads.
  Howard
  
  LikeLike
24

Michele Galvagno on June 19, 2022 at 10:39 am

Have you ever written an article explaining what “agents” and “daemons” are, and what their differences are?

LikeLiked by 1 person
- 25
  
  hoakley on June 19, 2022 at 10:42 am
  
  Yes, but I’m about to write more about this in the coming few days.
  Howard
  
  LikeLike