Easier than the Georgian verb

Not this year, but in the future we have resolved to holiday in Georgia – the country, not the US state.

Georgia holds two particular attractions: the Caucasus mountains soaring 3,000 metres above ancient villages, and a language with arguably the most complex verb declension on earth.

For those accustomed to the morphologic simplicity of the English verb, the Georgian variety is constructed from a sequence of up to 11 different components, starting with a preverb that indicates the direction (towards, away, and so on) in verbs associated with motion, going through elements indicating subject and objects, markers for tense, and the root inculcating core meaning.

These are glued together into a collision of consonants, then cast into its unique and beautiful alphabet, whose only relative is Armenian, in all other respects a completely different language. For example the word transliterated as ts’q’vlep’av means “you squeeze (fruit pulp or similar) with the hand”, the ‘ marks indicating glottal stops that are sounded simultaneously with the preceding consonant.

In the absence of any helpful installable dictionaries, I have been slowly assembling my own, which has been helping me to build my vocabulary. It is most fortunate that much of Georgia’s written literature is available in PDF format, from its National Parliamentary Library, so I quickly built an impressive corpus. But before I could browse its contents using a concordancing tool, I had to export it as text.

Thanks to Unicode support in OS X, most documents emerged intact in their original Georgian characters, but a significant few were spat out in some obscure pseudo-phonetic Romanized format. At first I suspected Acrobat’s equivocal stance on Unicode, so tried those documents in Preview, only to suffer the same problem: every attempt to copy or export their contents somehow munged Georgian into near-gibberish.

Thinking that this was an simple encoding error, I bought Text Encoding Converter from the App Store. This is a valuable tool for dealing with such problems: for instance, many different encodings have been employed for Japanese Kanji documents, and they commonly exhibit problems similar to those I was experiencing with Georgian. This inexpensive utility lets you browse text files using more than a hundred different encoding systems from Windows, Mac, and other architectures, until you can work out which works best to turn them into well-formed content.

In my case, not one of the available encoding systems helped, and exhaustive online searching also drew a blank.

The vital clue came when I engaged in a brief font orgy, installing far more Georgian fonts than I ever should. Those configured for Unicode worked a treat with OS X’s Georgian keyboard configuration (selected in the Input Sources tab of the Keyboard pane), but most only worked with an English keyboard.

So these mysteriously Romanized text files contained the simple keyboard equivalents when using a pre-Unicode font, something unrelated to any of the seven different transliterations officially used for Georgian.

For example, when using one of these fonts, to generate the Georgian character ‘gan’ (გ if you have a suitable font to hand), pronounced as a hard ‘g’, you press the English keyboard letter ‘G’ (Shift-g) rather than lower case ‘g’; so all I had to do was to back-substitute the correct Georgian letters for each corresponding English key. Even in megabytes of exported text, this was readily accomplished using BBEdit, in which I have set up a Text Factory script to map the whole Georgian character set in one fell swoop.

In these days of apparently universal Unicode support, the encoding of text can still prove a puzzling problem. Compared to the complexities of the Georgian verb, though, it seems a doddle.

Updated from the original, which was first published in MacUser volume 28 issue 11, 2012.