Beyond Unicode: lightweight fonts for language documentation

When Tevfik Esenç died on 7 October 1992, he had no idea how famous he would become. The last Circassian who could still speak the Ubykh language, he now merits his own Wikipedia page, videos on YouTube, and citation by eminent authors.

Since then, several hundred, perhaps over a thousand, languages have died in anonymity. Notable last native speakers have included Marie Smith Jones (Eyak), Ned Maddrell (Manx), Big Bill Neidjie (Gaagudju), Fidelia Fielding (Mohegan-Pequot), and Cristina Calderón (Yaghan).

Linguists around the world are striving to document thousands of dying tongues before they similarly disappear forever.

I would have thought that the massive standard that is Unicode 7.0, with over 113,000 characters available, should be ideal for this task. Having recently added support for extinct languages such as Linear A, Avestan, Samaritan, and Vedic Sanskrit, and sporting more than a thousand Egyptian hieroglyphs, it seems exhaustively comprehensive.

But standards like Unicode can be cumbersome when you are working in a remote place trying to build a tribal dictionary: the Unicode Consortium took 8 months to upgrade from 6.1 to 6.2, adding just a single character, the Turkish lira sign. At that rate, many more languages would die before their scripts could be standardised.

The Summer Institute of Linguistics (SIL International), an organisation at the forefront of documenting dying languages, has developed a lightweight non-Roman script platform to address this: Graphite, detailed here. Although Apple Advanced Typography (AAT) is flexible enough to cater for this need, SIL required a cross-platform environment as most field workers not only have to brave wars, disease and insects, but have to use Windows too.

Graphite is no rival to OpenType, and the two can co-exist happily, with applications supporting both smart font technologies. Whilst OpenType endeavours to save description effort within fonts by coding script behaviour, the Graphite engine is script-agnostic as all the script description resides in the font.

SIL cites example scripts such as Tai Tham or Lanna, used by a tiny minority of people in Northern Thailand and Laos, and those of primarily academic interest like Linear B, an ancient antecedent to classical Greek script. With an application that supports Graphite, all you need do is create a font to implement the desired script behaviour, and you can install and use it immediately.

This flexibility is not only attractive to linguists, but to anyone trying to accomplish something that OpenType will simply not countenance, such as complex contextualisation.

This may seem a return to the world of font-dependence and rampant ‘mojibake’ (text rendered as garbage due to codepage errors, see here), but Graphite is not about mass-market mainstream scripts. Precious few folk are ever likely to read Tai Tham documents, and those who do can reasonably be expected to have an appropriate Graphite font installed.

Support for Graphite is already surprisingly extensive. Firefox 11 and later can use it when its gfx.font_rendering.graphite.enabled flag is set to true via the about:config page, and you can also experiment with it in LibreOffice (3.4 and later), OpenOffice (3.2 and later), and the XeTeX typesetting system.

SIL offers 17 production-quality Graphite fonts, including Tengwar to support Tolkien’s Elvish script. Sadly its cross-platform Graide integrated font development tool is still at version 0.8 and the OS X version (0.5) currently appears non-functional.

We will never know how many languages have died in the last couple of decades, but perhaps SIL’s devoted efforts to record some of them, and its work on Graphite, will be a fitting memorial to Tevfik Esenç and others.

Updated from the original, which was published in MacUser volume 29 issue 04, 2013.