One of the most common types of file to want to archive reliably is the digital image. Whether you just want to put important family photos into safe storage, or you’re more serious about your digital photography, you’ll want archived images to have the best possible chance of surviving for years. Common practice for many is to store original ‘raw’ files for reference, and to keep another copy (perhaps fully processed) in a different format. With many of us now using iOS devices to capture important images comes another question: should you store those in their original HEIC format, or convert them to something else?
This article describes some results from tests in which I’ve deliberately corrupted test images in different formats using my tool Vandal. It’s important to understand that doesn’t try to mimic any particular type of data corruption, but provides precise, evenly spread randomised changes to data at a byte level. Some ‘natural’ errors won’t involve single bytes like that, but might flip single bits; others might change whole sectors at a time. Vandal isn’t a model for any specific type of corruption, merely a tool for producing single byte changes of this type.
I looked at images from a range of sources in five popular formats: TIFF, PNG, JPEG, HEIC (from an iPhone XR), and some different camera ‘raw’ formats. Wherever possible, I’ve used smaller image files of 1-2 MB, so that Vandal can produce 1-8 changed bytes in each image. This wasn’t possible with raw or TIFF images though, whose minimum sizes were 4.7 and 6.1 MB respectively.
Even using very low rates of data corruption of 1-4 B/MB, most images suffered substantial damage, much of which would be difficult if not impossible to repair well. All the corrupted images could be opened successfully, but a few lost all usable content. For each file type, I show one of the test images with 1 and 4 B/MB corruption applied to it.
At low levels of corruption, 2 bytes in a 2 MB image and 4 bytes in a 4 MB image, visual artefact was limited to a partial horizontal line, which multiplied and became more prominent as corruption rose to 8 and 16 bytes respectively. (Here using JPEG 1:8 compression, baseline DCT, Huffman coding.)
Damage was more severe when the JPEG was recompressed, for example from 80% to 100%, and JPEG non-lossy format produced one of the most badly-damaged images of all.
Overall, JPEG appeared moderately resilient to Vandal’s low levels of corruption, although the more serious damage to some images is a significant concern.
The effect of even low levels of corruption on PNG format images was quite destructive. Although the upper section of each image was preserved unaltered, the remainder of the image below a variable cut-off line was badly affected and unusable. With higher levels of corruption, the cut-off rose in the file, progressively making the image useless. (Here using 1:11 compression, deflate/inflate, noninterlaced.)
This distinctive behaviour is visually destructive, and makes PNG unsuitable for use where even low levels of file corruption might occur.
Most uncompressed TIFF images appeared completely unaffected by single-byte corruption as produced by Vandal, even at higher levels.
The snag is, of course, their huge file size: in the case of this image, 60.3 MB for a 3664 x 2742 image. One test TIFF image proved an exception to this, which turned out to be the only one which used compression (1:2). In that case, all corrupted versions were cropped to show a small upper region of intact image, with black below, making the corrupted file unusable. If TIFF is to be used in situations where corruption could occur, it therefore needs to be uncompressed.
Corrupted HEIC images have a distinctive artefact, with each changed byte resulting in small badly corrupted rectangles scattered across unaltered image. At low levels of corruption, these might miss important areas of the image, leaving the whole still usable, but as the corruption increases, the rectangles become more numerous, soon rendering the image useless.
Because of this behaviour, HEIC is not suitable for use in situations where the image might become corrupted, although some corrupted images may still prove usable.
Camera ‘raw’ formats vary very widely in their structure, and as a result the effects of corruption also vary widely. In general, though, most of the corrupted camera raw images were severely affected, and rendered unusable. Unless you know a particular raw format is resistant to corruption, you should avoid using it where that’s considered a risk.
By far the most resilient image file format in the face of corruption is uncompressed TIFF, which also results in the largest files. If a second file format is to be used to guard against the effects of corruption, then it appears to be the best choice.
Most JPEG images performed quite well too, displaying relatively small visual artefacts as a result of corruption, and careful use of this format should also prove quite resilient. JPEG is widely understood, and particularly valuable or important images can be repaired and retouched by specialists if necessary. If the resulting JPEG image is smaller than about 10 MB, it could perhaps be combined with ECC protection at 10% using MacPAR deLuxe to offer even greater potential for recoverability.
Images captured on iOS devices would be better preserved by conversion to JPEG before storage, rather than leaving them in their original HEIC format, which doesn’t appear as resilient to even low levels of corruption.
In general, non-lossy JPEG and PNG formats appear susceptible to visual artefact with low levels of corruption, and should be avoided wherever there is a risk.
In any given situation, and with specific types of file corruption, performance in the face of corruption may vary, as will your mileage, I’m sure. Hopefully this has still provided some useful suggestions.