Compound emoji can confuse

If you’re deeply immersed in emoji, you probably already know about and use compound emoji, known by many as ZWJ, an abbreviation for the Unicode Zero Width Joiner item which is used to form them. If you don’t know what they are, read on.

Most of us are familiar with the idea that entering some accented and similar characters from the keyboard requires two keystrokes. For example, the normal way of entering the letter e with an acute accent is to press Opt-e then the e key. What is then inserted into the text is actually a single Unicode character é.

Some systems allow you to combine a sequence of two or more emoji, separated using a special invisible character, the Zero Width Joiner (Unicode U+200D), for that compound to be represented as a single emoji formed from them both/all: these are the compounds known as ZWJs. If your system doesn’t support the display of such compounds, then the ZWJ is not displayed, as it has zero width, and you see an emoji ‘word’ of the two or more characters before compounding.

For example,
👩 + ZWJ + 🚀 = 👩‍🚀
being a ZWJ compound formed from U+1F469, U+200D, and U+1F680. But your system, browser, and font may display that just as the two emoji 👩🚀 instead.

If you paste that compound emoji into the Characters pane in macOS, it should display the compound character and tell you how it is constructed, although (unless you have a very heavily customised keyboard) you can’t enter any of those three constituent characters from your keyboard, let alone the entire compound.

There’s an extensive, if not complete, listing of all current ZWJ compounds here on Emojipedia, each with a link to explain its constituents.

Another couple of examples illustrate how complex these can become:
🤦🏾‍♂️ ‘face palm man’ is made up from U+1F926 + U+1F3FE + U+200D + U+2642 + U+FE0F, and
👨 👩 👧 a man, a woman, and a girl compound into a family 👨‍👩‍👧.

These ZWJ compounds are different from those emojis which can be modified for skin colour, known sometimes as the Fitzpatrick types, from the eponymous scale used to classify skin tone. For example, these show the basic ‘girl’ emoji U+1F467 in the following variants:
👧👧🏻👧🏼👧🏽👧🏾👧🏿
which are made by compounding ‘girl’ U+1F467 with Fitzpatrick types none, and U+1F3FB to U+1F3FF.

If you do any programming with Unicode strings, this illustrates some of the problems now embedded in the encoding system which is at the heart of all strings and text. Pause for a moment and ponder whether an app should consider these compounds, both ZWJ and Fitzpatrick types, as one or more characters? Should the same rules apply to a system which cannot show the compound characters, but only show the consituent emoji?

For many users in many circumstances, the biggest problem with all these different emoji is their readability. Even at larger font sizes, they are hard to distinguish and many visually similar emoji have very different meanings: you can observe that by enlarging the font used to display this page in your browser.

As a replacement for verbal communication, emoji have too many serious shortcomings; as fun Easter eggs, they’re entertaining. And don’t even think about the iPhone X’s Animoji…