When you shouldn’t use unconventional Unicode text

I can imagine the scene. It’s around 1452 in Mainz, and in his workshop in Hof Humbrecht the great Johannes Gutenberg has been called over to look at some of the first work of a new apprentice who’s started setting pages for his 42-line Bible. Halfway down the page, the young man clearly became bored and began setting the words in a crazy mixture of fonts. Little did they know then that a mere 568 years later, we’d be doing the same in our tweets.

Yet there’s one evergreen tweet that keeps appearing, pleading with us not to do that, because it causes grief to those reliant on screen readers to hear the words which they can’t read for themselves. Are they right, or wrong?

Like so much these days, Twitter uses not plain old ASCII for text, but Unicode, with its support for over 140,000 different characters or codepoints. What could possibly be wrong in using a little creative freedom, or tools like the wonderful Textlicious, and spicing up our text with fancy styles?

When we look at such text, it’s easy to read. But the characters we readily interpret as styled Latin aren’t what they seem. Unicode doesn’t work as a text styling system, as Rich Text does. Use a Rich Text editor to apply fonts and styles to your words and the underlying characters remain unchanged. It’s easy to demonstrate this.

Open the Accessibility pane, and select the Speech item at the left. Then enable Speak selected text when the key is pressed, using a convenient key combination. Open your favourite Rich Text editor, such as my free DelightEd. Type in a few lines of text and apply fonts and styles to satisfy Gutenberg’s apprentice. When you’re ready, select all that text and press the key combination to have them read to you.

Now try that with text which has been ‘styled’ using unconventional Unicode characters. You can do this easily using Textlicious, or you can encode that text in my own free app Dystextia, which even uses characters which look identical to regular Latin text. Then select that text and press the key combination to have them read aloud.

dystextia20

One of two things will happen: in many cases, macOS refuses to even attempt to read those words, but with Dystextia’s encoding it speaks incomprehensibly, spelling out the names of individual Unicode codepoints, which might seem amusing at first. Now imagine you have very limited vision, and that Accessibility feature is the only way you have of knowing what’s written in that tweet.

Several have suggested that screen readers should be smarter, able to map unconventional codepoints used to ‘style’ text in this way. Although they could be much better than they are, they’d have to cater for thousands of potential substitutions, and a shortcoming in the design of Unicode: from its outset, Unicode hasn’t grouped similar characters in any way. It has no intrinsic way of knowing that a script letter đť“Ś has any relationship to the plain Latin codepoint for w. To accomplish that, a screen reader would have to recognise that it needed to remap ‘styled’ text to plain Latin, and be provided with mapping tables. There’s the issue of context too: the writer might be using that specific codepoint to represent a well-known mathematical variable, for instance.

Another problem resulting from the use of unconventional Unicode codepoints to represent regular Latin text is that it disrupts search. That’s one of the intentions of encoding text using Dystextia, of course, but in tweets and other text which need to be searchable, it’s decidedly unhelpful.

Before reaching for Textlicious and similar tools to ‘style’ text using unconventional codepoints, please ask yourself whether you want to exclude everyone using a screen reader, and prevent others from finding your text using search. If you’re still determined to put form above content, then go ahead. But don’t be surprised when someone complains.