Did you miss this toolset? Information and communication theory

My wife has a predilection for the online game of Text Twist, a strange game in which you try to build as many legitimate words as possible out of six letters. Its strangeness arises because every answer is already there in plain view, you just have to get the letters in the correct order for each word.

For example, when you are given the letters ACHMSS, there is just one word in English which can be formed from all the letters: CHASMS. There are also 5-letter words like SCAMS, 4-letter words like MASS, and 3-letter words like CAM. If you have never played it before, the aim in Text Twist is to arrive at all acceptable English words of 3-6 letters within a set period of time. If you fail to get at least one 6-letter word, you lose the game. Points are amassed for each word which you derive.

The letters which you are given to ‘solve’ are effectively an encoding of all the words contained therein. As such, they are ‘rich’ in information, provided that you can decode them. The randomisation of the choice and order of letters is what is responsible for that richness, but only when set against the dictionary used: in English, for example, the letters BCHMSZ would have no such information, as no matter how you select from them or try to rearrange them, you cannot make any words from them.

Like many things in life, from trivial wordgames to sophisticated science like genetic inheritance, there is underlying theory and structure – in what is known as information and communication theory.

This theory, which covers both ‘static’ information and its communication, is remarkably novel. Unlike most other human enterprises, the classical Greeks and Romans did not even come close to its conception. They communicated much about the science of argument and discourse, in rhetoric, and pondered the study of knowledge itself, but this area even escaped the great thinkers of the Renaissance and the Enlightenment.

Information and communication theory is a product of the twentieth century, with early ideas such as Harry Nyquist’s analysis of telegraph speed in 1924, and Ralph Hartley’s proposals on measuring information, in the same year. Its foundation is usually attributed to Claude E Shannon’s paper A Mathematical Theory of Communication published in 1948, and it is still undergoing substantial development and change.

Its heartland is in telecommunications engineering, and computing. Formally taught in many university courses on those disciplines, it has also become important in diverse but rather isolated, perhaps esoteric, topics such as economic theory, statistics, and molecular biology.

However in a more enlightened moment recently on Twitter, I went as far as to suggest that it should be taught to everyone who completes a university degree, as it is one of the fundamental toolsets needed in life. I included the arts/humanities explicitly: anyone who writes, paints, composes/performs is engaging in the communication of information. Unless they understand key concepts in information and communication theory, they lack vital skills and tools.

I am not referring here to the sort of ‘communications skills’ teaching which often takes place, aimed at improving oral and written presentations. Useful though tips on PowerPoint are (top tip: use Keynote), such practical instruction does not come close to any theory, such as what information is, the importance of entropy, coding, communication channels, or receivers.

Unlike many scientific theories, those of information and communication are not yet cut and dried. There is an established ‘classical’ theory, which has as its essence some sweeping and important concepts. The first is how to measure information, which is effectively the definition of the term. This is usually illustrated in terms of the tossing of coins and rolling of dice.

Entropy is the amount of uncertainty involved in predicting the value of a ‘random’ variable, or at least one which could be random. For a coin being tossed, there are two outcomes, heads and tails, which are equally likely. Encapsulating the outcomes in a minimal single message requires one binary digit, a bit, which might be coded as 0 for tails, and 1 for heads.

For a six-sided fair die, there are six outcomes which are equally likely, which requires between two and three bits to encode. Because the die has more possible outcomes, and they require more binary digits to encode, in terms of information, the die has higher entropy, and contains more information.

Returning to the example of Text Twist, each of the six letters provided in the game has 26 different possible values, although because of the nature of English words they are not truly randomly distributed (the most common letters being e and a, the least common j, q, x and z).

Without detailed prior analysis of six-letter words in English, it would be reasonable to assume that there might be 26^6 (26 to the power of 6) possible 6-letter ‘words’, which is almost 309 billion, although a great many of those do not form any English words. Encoding each letter in binary requires slightly less than 5 bits per character. Thus our Text Twist ‘words’ have high entropy and a lot of information, relative to coins and dice.

Encoding those six letters (which are case insensitive) would require a minimum of 6 bytes in ASCII or Unicode UTF-8, but 24 bytes in UTF-32. However in Morse code, which has variable length characters, 6-letter words vary in length from 6 (EEEEEE, TTTTTT or any mixture) to 24 Morse characters, equivalent to binary digits. The word CHASMS would in fact require 18 bits, or just over 2 bytes.

The length of messages in Morse is thus shorter than in ASCII or UTF-8, and its encoding more efficient, and close to estimates of entropy. This is because Alfred Vail, who worked with Samuel Morse and Joseph Henry to devise the code, counted the different letter stocks held by a local printer, to estimate the frequency of use of different characters in English, and they decided to use variable- rather than fixed-length encoding.

chasmswordWe can of course represent words in many other ways, such as in a graphics file, which could also have letters distorted as is used in CAPTCHA challenge-response tests. Now our entropy is increasing further: the small anti-aliased image shown above requires 2878 bytes when in PNG format. As a CAPTCHA, it could be ambiguous or misread, perhaps, so with this inefficiency in coding we face the paradox that it may be communicated less clearly. That is because we are interposing additional steps between the transmitter and receiver, each of which can only increase the risk of the received information having errors.

chasmsword2A common way for images to become damaged used to be faxing. In the version above, I have simulated a fault in the process of transmitting the image, which has blurred the lower half of the letters beyond recognition. Thankfully our brains are used to filling in the missing or damaged sections of images, and most of us would recognise that the word shown is still chasms, provided that we could assume that it was in English.

Visual artists can use such effects deliberately in their works, but, as with abstract art, their success is not guaranteed. This is because they are depending on the receiver (viewer) understanding how to reconstruct or decode the image which they present. Highly abstract art, such as Jackson Pollock’s huge drip paintings, are most dependent on the receiver, although in image terms they are also extremely high in entropy and thus the amount of information which they contain.

This has been a whistlestop tour which may give you a feel for what I mean by information and communication theory, and for some of the big ideas which come with it. I will return to the subject in the future.