If we can compress text down so efficiently, using Morse code or even better using a computer, can we do the same for still images, such as photos?
Most images consist of a rectangular array, traditionally given row by row, of colour specifications for each pixel in sequence. Although the depths of images differ, for the sake of simplicity I will here assume that our image consists of 3 channels containing one-byte red, green, and blue colour values, and a fourth alpha channel (which might contain transparency or other information). I will also ignore any additional information which comes with many images, such as metadata and the colour profile.
Individual images may have inherent structure within their content, but compression cannot assume any particular structure, and must still perform acceptably for unusual images.
Run length encoding, as mentioned for text, can still be a useful if crude scheme: for the simplest of images, it can perform quite well. But in many real-world images, there are small changes between adjacent pixels which would stop long runs of identical values. Furthermore we need to look not at linear runs along the rows, but 2D areas covering adjacent rows.
It is difficult to devise a non-lossy compression scheme which will achieve good levels of compression across the great majority of images, and not perform poorly on more unusual cases.
However, with a better understanding of the limits to human visual perception (many of which are covered in this series of articles), it has been possible to devise lossy compression methods which – when used carefully – produce images which are usually impossible to distinguish from the original. The most popular of these remains JPEG.
The full sequence of events required to encode an image using JPEG consists of:
- changing its colour space to Y’CʙCʀ;
- reduction in the resolution of Cʙ and Cʀ channels, as we perceive fine colour details more poorly than we perceive brightness details;
- splitting the image into 8 x 8 blocks for frequency-based analysis using the Discrete Cosine Transform;
- reducing the high-frequency components in each block according to the quality setting of the user (0-100);
- lossless compression of block data using a form of Huffman encoding, in which the most common data is given the shortest encoding (as with text and Morse code).
Key features of this process are that it is designed to lose information which we are least likely to perceive, and that the user controls the amount of information which is lost, thus any degradation in image quality.
Using a standard 500 x 500 image consisting entirely of identical red pixels, JPEG will compress the image without loss from its uncompressed size of 1 MB (750 KB for only three channels) to just 6 KB, no matter what the quality setting.
Although intended for photographic images rather than fine black-and-white patterns, JPEG can still do well on a test pattern. In this and other test cases shown here, the image on the left was compressed at a quality level of 5, that in the middle at 50, and that on the right at 100. For this pattern, file sizes are (respectively) 36 KB, 70 KB, and 149 KB. To see the images in their full glory (and without Moiré effects), you need to click on then and magnify to full size.
Lower quality values do produce quite visibly defective images, though. In this series of colour gradients, quality level 5 produced a file of 13 KB, level 50 of 15 KB, and level 100 of 150 KB. Once quality is reduced below about 50, it is common to see progressively more obvious blocks of uniform (and often incorrect) colour, as shown in this example of a detail from Delacroix’s painting (via Wikimedia Commons).
Overall, on real-world rather than exceptional images, near-lossless JPEG compression should be able to reduce the file size to less than a third of the original uncompressed image. Reducing the quality to 40-60 should result in barely perceptible reduction in image quality, but a reduction in file size to around one thirtieth of the original uncompressed image. Further reduction in quality then starts to result in obvious pixellation effects which degrade the image significantly.
By discarding some of the original information, which we only perceive very weakly or not at all, JPEG achieves very high efficiency in storage required: far superior to that of any of the lossless methods of compressing text.
Other methods of lossy compression, which may use different techniques such as wavelet transforms, can perform even better than JPEG does, but for most users JPEG is quite sufficiently efficient without damaging the look of an image.