Analysing natural languages with Nalaprop

Last week, when we were looking at Cupertinos and other errors perpetrated by spell-checkers, I mentioned the natural language capabilities of macOS and my little app Nalaprop. This article is an introduction to Nalaprop’s features, which rely on the advanced analysis functions built into recent versions of macOS from Mojave onwards. If you’re in the least bit interested in language, or want to analyse any writing, then Nalaprop has features which are only available in expensive and specialised products – all free, courtesy of macOS.

Mojave and later include sophisticated functions which analyse natural languages in several different ways. To start with, I’ll explain these using the multi-language sample file which is included with Nalaprop: nalapropMultiTest1.txt. Nalaprop opens plain text files directly, or you can paste or type text for analysis into its left view. To start this demonstration, simply open that file using the Open command.

nalaprop01

When it opens a text file, the contents are automatically analysed and displayed in two of its views. At the left is the original text, and in the middle is the same text with each word coloured according to its part of speech, such as noun and verb. The key to the colours used is at the right.

Scroll down through the centre view and you’ll notice that this file consists of multiple languages, a total of 20. Some of them are shown in multiple colours, indicating that macOS has been able to recognise that language and parse it into parts of speech. There’s support for this for the following eight languages: English, French, Spanish, German, Italian, Portuguese, Russian, and Turkish (I suspect the last is limited).

Apple normally adds support for languages other than English when the first full version of each major version of macOS is released. If you’re running Nalaprop on a Monterey beta, you’ll only see English parsed for the time being, but that should change with the release of 12.0 in the next month or so.

To perform a wider range of analyses, click on the MultiParse button at the top of the window. After a short delay during which macOS analyses the text in more detail, the popup menu at the top of the middle view is enabled, and the colours of the text in that view change, as does the colour key at the right.

To see the sample text analysed for its script (alphabet), select Script in that popup menu. This displays each script in a different colour, which includes Russian, Arabic, and more. Even those languages which macOS is unable to parse into parts of speech are recognised and labelled.

nalaprop02

Repeat this by switching the popup menu to Language, and you’ll see that macOS is excellent at recognising almost all of the languages, although some of the East Asian languages and Esperanto at the end of the samples make it struggle.

There are two entries in the popup menu for Types: the upper of the two uses colours different from the default, and the lower is the same as when you first opened the file.

The last of the options in the popup menu shows different text from the original, at the left: these are Lemmas, word roots without grammatical alteration (inflection). Instead of the verb was, it displays the verb root be, and so on. This can be very helpful if you’re looking at the frequency of words, or trying to translate them, for instance.

Nalaprop also performs word frequency analysis, and can do that either on the words used or on their lemmas. To test this out, you’ll need a fairly long plain text document: here I’ll use a copy of Charles Dickens’ novel A Christmas Carol. Open that using the Open command, which should automatically parse it into parts of speech. Then click on the List button at the top right of the window.

nalaprop03

All the words found in the document are then summarised by part of speech, starting with verbs. For each word found, the number of occurrences is given in parentheses. Because this analysis is performed on unmodified words, you’ll see words from the same lemma, like afford and afforded, counted separately.

Nalaprop can be smarter than that: click on the MultiParse button, and wait while macOS works through the whole document again; this can take a few moments, and you may see the spinning beachball briefly, which indicates it’s busy and not hung. When that completes, click on the List button again.

nalaprop04

The contents of the view at the right now shows the frequency not of individual inflected words, but of their roots. In English, for example, you shouldn’t see the words am or was listed, but they’re now included in the lemmatised form be.

You can save the word frequency list (but neither of the other views) by clicking on the Save button.

Nalaprop has an extensive Help book available, also provided as a separate PDF, and two colour keys which help you see the colour palettes in detail. You can use its Find command to turn it into an interactive concordance, based on the word frequency list.

In case you’re wondering, Nalaprop correctly recognises there as a pronoun, their as a determiner, and they’re as a pronoun and verb, and even gives their lemmas respectively as there, they and they-be. I did assure you that your Mac is smarter than its spell-checker.

Nalaprop requires Mojave or later, is a Universal App so runs native on M1 Macs, and is fully compatible with current betas of Monterey. It’s free.