File Integrity 8 : Compression, encryption and disk images

We often can’t and may not wish to store old files in standard document formats. In many cases, they’re in compressed archives, disk images, or we need them inside encrypted sparse bundles, to protect sensitive contents. Each of these container formats has a reputation of being fragile when corrupted. In this article, I look briefly at how deserved that reputation is, by deliberately corrupting some examples using Vandal.

Zip archives

Having had problems in the past with .zip archives which had apparently become corrupted to the point where they couldn’t be opened, I was expecting the same with modern examples. I was very surprised when low levels of corruption, of 1-2 B/MB, didn’t apparently cause any problems in opening and decompressing them using the macOS Archive Utility. They could also be opened using other apps which handle .zip archives, including WinZip.

That said, at higher levels of corruption all compressed archives run the risk that some of their contents become unusable, and may require specific decompression tools which cope better with damaged archives. Some forms of compression are designed to perform better, but I haven’t discovered any which implement a robust and testable form of error correction code (ECC).

Some feasible mechanisms for file corruption become more likely and/or more severe the larger a file is, or in the case of smaller files, the more files there are. Storing files in compressed archives reduces those risks, by squeezing them into a smaller space, or turning many files into one. These make the assessment of the effects of compressed archives on resilience to corruption complex: there are pros and cons.

For large compressed archives, existing ECC methods are unlikely to prove of much benefit. But for Zip archives of less than around 10 MB, it might make good sense to store them in archives with Parchive (Par2) ECC, to minimise the risk of corruption rendering them unusable. ECC should be applied to the compressed archive, not before compression.

Disk images

Apple’s standard disk images incorporate checksums which are used to verify that they are undamaged before they can be opened. Every disk image which I have damaged using Vandal, even at low rates of 1 B/MB, fails to open normally as a result.

diskimage1

You can work around this using hdiutil in Terminal, or more simply with the excellent free FastDMG, which is a friendly front-end to those complex command options. However, these still may be unable to open a corrupted disk image, and if they are able to, the contents are likely to be irreparably damaged in any case. The extent of that damage is unpredictable, and could just be the loss of a single unimportant file, or the whole image may appear empty.

If you have to use disk images to store important files, you should consider protecting them using Parchive (Par2) ECC. In general, though, they are not a format which you should choose to use for archives.

Sparse bundles and encryption

Sparse bundles consist of some top level files in the enclosing folder, and a folder full of storage bands. With such a complex structure, and data divided across bands which can be 8.4 MB in size, I expected these to be very fragile in the face of corruption. Even after corrupting two of the band files at a rate of 1 B/MB, a test encrypted sparse bundle opened correctly.

So far I have been unable to find any detailed documentation on the current sparse bundle format which establishes whether it might use some form of ECC. For the moment I am surprised at its apparent resilience to limited corruption, and will look in more detail at this in a future article here.

Encryption and resilience

Many users are concerned that encrypted volumes may prove generally less resilient to damage or corruption than unencrypted volumes, particularly internal SSDs in Macs equipped with T2 chips, which have no unencrypted option.

Because the T2 chip acts as the disk controller for internal storage in these models, and access to that storage can only be performed through that specific T2 chip, whether or not the data are encrypted shouldn’t be an issue affecting resilience at all. Disk repair and recovery tools also have to work through the disk controller, and encryption should therefore be completely transparent to them.

More significant is the fact that internal storage can only be accessed through that specific T2 chip, which is both a security feature and a limitation on recovery methods which can be used on the SSD. If a technician could remove and access the SSD in your Mac, then so could a thief, for example. Experience shows that the most common significant failure mode for most SSDs is total failure, which is usually unrecoverable whether the contents are encrypted or not, and regardless of any ECC system. The only defence here is good backing up.

Updated 1140 UTC 24 April 2020 to correct details on disk images, with thanks to Joss for reminding me that you can still open damaged disk images, and for the link to FastDMG.