Mojave’s Linguistic Support: a promising start

One of Mojave’s major new enabling technologies is its support for parsing natural language. This is enabled and supported by its Machine Learning (ML) system, and is aimed at controlling apps (primarily on iOS) using spoken input. Speech recognition in Siri converts what you said into text, which an app can then analyse and act on.

This linguistic analysis is a valuable tool in its own right, for tagging natural language to help you learn and improve second languages, for grammatical analysis (are you using “too many” adverbs, for example, if you like grammatical prescription), and for translation.

At present, support for languages is relatively limited: in the first release of Mojave, the full features of linguistic analysis are only available for five as far as I can tell: English, French, Spanish, German, and Russian. But Apple is clearly working on many other languages, and Mojave can already recognise many more, although it doesn’t yet seem to like Esperanto in the slightest.

The main support files for this appear to be in RequiredAssets bundles in /System/Library/LinguisticData, but there are also some additions which appear to relate to more dynamic models in ~/Library/LanguageModeling. According to Apple, improved language models and linguistic data are among the data files which will be updated silently and in the background, so we may not have to wait for full macOS updates to obtain improved support.

Adding new languages is dependent on the availability of suitable corpora. Any macOS developer can use the ML support in Xcode 10 to build support for a new language, provided that they have access to sufficient databases of texts in that language in which words are classified into parts of speech. Those databases are then used to train a new classifier model. The main limitation to this is the availability of suitable corpora: in the case of many ‘Latin’ languages, they are widely available, but many other major languages remain poorly supported at present.

I’ve been testing out twenty different languages, from العربية (Arabic) to Türkçe (Turkish), and am impressed at how well the current implementation of linguistic analysis works with multiple languages. It seems able to recognise the switch between language instantly, even when a language isn’t yet supported in full analysis.

liguistics10

While Mojave was still in beta, I put together a small app Nalaprop, which uses linguistic analysis to mark up text according to parts of speech. I have just added a little colour key to it, and bundled my test file with passages in those twenty languages, in version 1.0b3 of Nalaprop, now available from here: nalaprop10b3
and from Downloads above.

If you’re studying a second language, interested in different languages, or just plain curious, I hope that you’ll find it interesting, and a demonstration of just what Mojave can do. But I am afraid this does only run on Mojave: High Sierra and earlier lack the support for linguistic analysis.

11Comments

Add yours

1

Frode @ WordPress.com on September 28, 2018 at 5:49 pm

This is quite exciting, thank you for Nalaprop.

LikeLiked by 1 person
- 2
  
  hoakley on September 28, 2018 at 8:33 pm
  
  Thank you.
  I’m looking to develop it further into a classifying concordancer, possibly a bit more. But it is mostly just a fun exploration of something which could become much bigger in the future.
  Howard.
  
  LikeLiked by 1 person
3

Frode @ WordPress.com on September 29, 2018 at 11:51 am

oh YOU wrote this Howard? I thought you only tweeted it, I didn’t look at the name on the top. This kind of analysis is what we should try to build into our word processor Liquid | Author. What do you think?

LikeLiked by 1 person
- 4
  
  hoakley on September 29, 2018 at 11:54 am
  
  Yes – all me. Look at the docs with it to see the source. It’s easy to call, and free in Mojave.
  Howard
  
  LikeLike
5

Frode @ WordPress.com on September 29, 2018 at 1:13 pm

Yes, I’ll discuss this with my developers but if there is something you would specifically like to experiment with in a word processor we can do that. This would be commercial work, but not with a big budget, if you are interested. I am thinking initially about simple things like highlighting all names in a document but ONLY on scrolling, since that is likely when you are looking for names and things like that…

LikeLiked by 1 person
- 6
  
  hoakley on September 29, 2018 at 1:17 pm
  
  Thank you. I’m happy hacking away for fun but I think it would be exciting to see some commercial products starting to use these new features.
  Howard
  
  LikeLike
7

Frode @ WordPress.com on September 29, 2018 at 1:43 pm

How about a Skype or FaceTime call/chat Monday or Tuesday?

LikeLiked by 1 person
- 8
  
  hoakley on September 29, 2018 at 2:41 pm
  
  Delighted: my email is in the About page here, so we can set it up.
  Howard.
  
  LikeLike
9

Frode @ WordPress.com on September 29, 2018 at 10:03 pm

emailed :-)

LikeLike
- 10
  
  hoakley on September 30, 2018 at 7:53 pm
  
  Frode,
  Sorry – not received any email yet. I’ll try DMing you.
  Howard.
  
  LikeLike
11

Voice Interaction for Text – liquid thoughts on September 30, 2018 at 7:41 am

[…] opened up, such as the linguistic support for macOS Mojave which Howard Oakley mentions in his blog https://eclecticlight.co/2018/09/27/mojaves-linguistic-support-a-promising-start and Apple’s APIs for machine learning being included in iOS and macOS, as coreML […]

LikeLike

Share this:

Related