Data Compression

No matter how much disk storage you have, it just seems to fill up. Compressing little-used and archived items might be a good way to save space. Does it?

There are two fundamentally different approaches to the compression of data stored in computer files: if the contents must be preserved bit-for-bit faithfully, as with software installers, archives, and normal documents, then compression cannot lose information, so must be non-lossy. If the contents are media that are to be shown or played to a human audience, then some degradation in quality may be acceptable if it helps achieve high compression, so some fidelity can be lost, and method can be lossy.


Non-lossy compression techniques tend to perform fairly uniformly across different types of uncompressed data, and thus come in general purpose compression tools. Stuffit, Zip, RAR, 7-Zip and others have all found favour at different times for different purposes.

Because decompression must result in perfect reconstitution of the original input data, there are severe limits in what can be achieved. For example, if a file consisted of a single 1 and 9999 zeros, run length encoding could compress that file very efficiently, coding it as <start of file><1 x 1><9999 x 0><end of file>. But very few files are as empty – you may encounter these in ‘sparse disk images’ maintained by Disk Utility – so that technique seldom results in high compression ratios over average binary files.

Applying non-lossy compression to a 24-bit deep 768 by 576 pixel raw image occupying 1.4 MB space can squeeze it down to 1.1 MB using Zip compression, or 996 KB using Stuffit or 7-Zip, for a best compression size of 71% of the original. Non-lossy LZW compression used in the same TIFF image works as well as Zip, to 1.1 MB for 79%.

However smarter use of non-lossy techniques can do far better: a folder containing 25 identical copies of that same image, uncompressed size 35 MB, shrinks to 12.4 MB using Zip, 11.5 MB with Stuffit, and a remarkable 556 KB with 7-Zip, which is clearly optimising across as well as within files to achieve 1.6%.

The lesson is that there is relatively little to be gained from trying lots of different non-lossy methods unless the data that you are compressing are unusual, in which case some may prove far better than the rest.


Lossy compression is inevitably tailored to the medium that is to be compressed.


The clearest example of this is psycho-acoustic techniques in audio compression, such as the ubiquitous MP3 developed by the Fraunhofer Institute from the 1980s onwards. Even the most discriminating human ear does not perceive all the sounds present in audio tracks, so by simplifying the audio file, the amount of information that has to be encoded and compressed is reduced, yielding higher compression ratios.

Early implementations of MP3 were prone to audible distortion of percussive sounds, and still can sound sickly if Indonesian gamelan is encoded at low bit rates, for instance. Similar degradation of quality is readily seen in the pixellation of highly compressed JPEG images, and motion blurring in video compression. These illustrate the importance of control over lossy compressors, to determine the amount of degradation in the compressed output.


Taking the same 1.4 MB raw image and saving it as a maximum quality JPEG reduces it to 596 KB, 43% of the original size, superior to all generic non-lossy compressors but without discernible reduction in image quality. High quality JPEG compression takes the file down to 352 KB (25%), moderate to 224 KB (16%), and low to 152 KB (11%), with increasingly obvious compression artefact.

Although improved in the JPEG 2000 specification, JPEG has proved itself to be a generally excellent lossy compression method for photographic and similar images, typically at 24 bit depth. Shallower images, such as single-bit black-and-white, and vector graphics, require their own formats, and specialist methods are popular in medical radiography, aerial photography, and the like.

Applying a further non-lossy compression step to a JPEG image is usually of little benefit: the 224 KB moderately compressed image shrinks to 136 KB using Zip, an overall compression of 10%, whilst 7-Zip performs even worse at 180 KB (13%). These illustrate the general rule that once well-compressed, trying further compression techniques results in diminishing returns.


The ultimate challenge for lossy compression is that of video, which is usually accompanied by compressed audio tracks. For with all but the lowest resolutions of video, lossy compression is required to be able to move the data to and from storage devices. One second of SD (768 x 576) video at 25 complete frames per second (25p) occupies 35 MB, so the transfer rate for raw SD video is 280 Mb per second, without the soundtrack.

With USB 2.0 offering a maximum of 480 Mb per second, and in practice realising significantly less, even SD is approaching the limit of popular external data connections, and High Definition comfortably surpasses them. The solution for low bandwidth data buses is lossy compression.

A straight QuickTime movie consisting of 25 identical images in 1 second of 25p SD video occupies 12.9 MB. There is then an almost infinite range of different compression methods and settings available. For example, using H.264 compression at a medium quality setting squeezes the movie down to 196 KB, 1.5% of original size.

A DivX movie using High Def settings to preserve resolution is 136 KB, or 1.1%. MPEG-4 export using the same H.264 compression but this time for 256 kbps transfer bandwidth is even smaller, at 116 KB or 0.9%, but unlike the previous methods has significant image degradation at scene start and finish. Windows Media Video 9 standard encoding appears impressive at 56 KB, only by collapsing resolution to 320 x 240.

Using non-lossy compression on video files shows how valuable specialised lossy techniques are: the 12.9 MB demonstration movie hardly shrinks to 12.4 MB using Zip. It is only 7-Zip that achieves the apparently impossible, squeezing the QuickTime movie down to 516 KB, 4% of its original size, still larger than high quality lossy methods. However you end up compressing data, you must ensure that the technique used is effective for the data you are applying it to.

Common use

The most common everyday use of non-lossy compression is in files to be sent as email attachments. If you are serious about compression you should normally perform this yourself, leaving your email client to handle the less contentious issue of encoding to a 7-bit format. As that encoding step inevitably increases the size of attachments as they are sent, efficient compression can more than compensate for that overhead.

First discover which formats the recipient can decompress. Although Zip is fairly universally available across all platforms, Stuffit is proprietary and only commonly supported on Macs. GNUzip’s .gz files can be decompressed on almost any platform but PC users may not be aware of the format.

RAR enjoyed popularity for a while, but 7-Zip seems the more efficient and has good free cross-platform support. Another option is to send a self-decompressing archive, which contains a decompressor bundled into a little platform-specific application, but these often only serve to confuse and carry significant overhead.

Tools: Performance

To illustrate how leading compression tools perform in real life, I put Entropy (App Store, £13.99), WinZip (App Store, £22.99), and OS X’s bundled Archive Utility (free) through their paces on substantial tasks: 495 JPEG images occupying 1.02 GB, 213 PDF documents totalling 978 MB, a single 1.08 GB disk image, and 2360 text files of 102 MB total size.

Entropy offers a range of compression methods, but the most efficient are slow, and even they may not achieve much compression.
Entropy offers a range of compression methods, but the most efficient are slow, and even they may not achieve much compression.

The best performer in terms of compressed file size was Entropy running the 7zip technique in high compression mode. This shrank the text files to 7.5% of their original size, JPEGs to 91%, PDFs to 86%, and the disk image to 94%. However this also took the longest time, ranging from 3.75 to 4.25 minutes per GB of uncompressed files. Thankfully decompression takes under a quarter of the compression time, less than a minute per GB.

WinZip can also be slow at times when using high compression methods.
WinZip can also be slow at times when using high compression methods.

The best performer in terms of time to compress was WinZip running its standard Zip 2.0 algorithm, which took only 11 to 20 seconds per GB of uncompressed files. However compression ratios were not as good as achieved by 7zip: 24% of original size for text content, 96% for JPEGs, 90% for PDFs, and no change at all for the disk image.

Neither of the commercial apps was flawless: Entropy seemed unable to engage high compression mode for the RAR method, and despite claims of excellent performance, WinZip was unable to compress in its new Zipx format. Other modes were roughly comparable with free Archive Utility in either GZip (cpgz) or standard zip methods – far quicker than 7zip high compression but slower than WinZip, and compressed file sizes similar to those achieved by WinZip.

Apple's free Archive Utility only supports Zip and GnuZip (.gz) formats, but is simple to use.
Apple’s free Archive Utility only supports Zip and GnuZip (.gz) formats, but is simple to use.

With a fast Mac and ample time, Entropy’s 7zip is the best performer. When time is of the essence, WinZip is worth buying, otherwise Archive Utility is probably as good as you need.

Tools: Compressor

Formerly part of Apple’s costly Final Cut Studio suite, Compressor 4.2 is now sold separately in the App Store for the modest price of £39.99. This makes all the compression-decompression modules (CODECs) available in QuickTime open to you for transcoding video and audio, to import them in any format supported by your CODECs, and export them to a different format.

One of Compressor’s most important features is a library of settings that you can extend. This ensures that you can develop optimised transcodings for each purpose, then save and re-use those settings whenever you need to. To further simplify the process of transcoding, once you have developed settings that you wish to use frequently, you can turn these into a droplet, allowing you just to drop a file onto it to execute transcoding.

Compressor lets you preview the effect of proposed compression on video.
Compressor lets you preview the effect of proposed compression on video.

When you are developing settings, Compressor’s Preview window gives you direct access to see and hear the results of your transcoding. A neat feature of this is that you can adjust the vertical divider in the window, to show specific details in the image before and after the conversion.

Compressor also has modest pretensions as an effects editor, offering a range of straightforward video and audio effects, including adjustment of image geometry, fading, timecode overlays, and the like, that could prove convenient in batch operation, for instance. As conversions are performed in the background using Apple Qmaster, now bundled within the Compressor app, they can be distributed over different computers provided that each has Compressor installed.

Although very convenient and powerful, Compressor cannot overcome one of the fundamental laws of compression: each time that you encode using a lossy CODEC, there will be irreversible degradation in the output. You therefore need to plan your workflow to keep encoding and transcoding to a minimum, whenever possible working from master files.

Updated from the original, which was first published in MacUser volume 29 issue 05, 2013.