File Integrity 1 : Why bother?

Over coming articles, I’m going to consider a subject which most of us just take for granted, that the contents of our files can be relied upon – file integrity. It’s one of those things we know from bitter experience isn’t completely reliable, but what more can you do than trust your Mac? This article starts by looking at what can go wrong, and what’s there to prevent it.

There are many different reasons that the contents of a file can become changed, including:

  • we (or a process acting on our behalf) can change it deliberately, by editing the file;
  • non-malicious software can change it accidentally, for example by writing to the wrong file or storage block;
  • the data stored can become altered as a result of failure or ‘bit rot’;
  • malicious software can change the data.

I’m concerned here with the latter three and their variations.

There was a time when it wasn’t uncommon for wobbly apps, often just as they were about to crash, wreaking destruction among files stored on disk. At that time, many apps used to write data using low-level commands for speed. Thankfully that’s now unusual, and accidental modification of files by other apps should be a rarity. But it can still happen, even with protections such as sandboxes.

Hard disks are well-known for developing errors and ‘bad blocks’ which can corrupt files, and regardless of some claims this remains true to a lesser extent in SSDs. Worst cases result in complete failure of the storage, and send you to your backups, but minor errors and ‘bit rot’ appear more common. All storage media become unreliable with use and time, although meaningful estimates of error rate are very hard to come by.

If you’ve ever tried accessing old DVD-R or CD-R storage, you’ll have come across examples where files can only be read with errors, or the whole disk is unreadable, even when it has been stored in good conditions in the dark.

One previously common cause of data corruption is failure to complete outstanding disk operations before a forced restart due to a kernel panic or other severe fault. File systems such as HFS+ are particularly prone to this because of the way that they write changes out to disk. Apple introduced journalling to tackle this, and that has been effective in reducing its occurrence but doesn’t eliminate it altogether. APFS was designed using the ‘copy on write’ principle which should make this a problem of the past, although in practice it can still occur very rarely.

The best-known examples of malicious software modifying user files are, of course, in ransomware. Such Wholesale encryption of files is quite a different issue, but several malicious apps and PUPs have also corrupted user files, and may do so unintentionally.

Overall, files kept on recent storage systems in modern computers are still prone to damage and corruption, although they should be less of a problem than they have been in the past.

Storage manufacturers now try to reduce the chances of files from becoming corrupted or damaged, for instance using error-correcting codes (ECC) in their products. There’s a conflict here in that ECC requires additional storage, effectively reducing that available to the user, and increases its cost per GB. Storage is a price-sensitive market, and few purchasers are prepared to pay 25% more or get 25% less capacity just to have good ECC cover. Its benefits are also not readily visible to the user, while the additional processing required during writing can impair performance.

RAID systems are widely used to safeguard data integrity. The most fault-tolerant, level 6, usually uses ECC, but is far from efficient: four 1 TB disks used at this level only provide a total 2 TB of effective storage capacity, making it particularly expensive when implemented using SSDs. Write performance is also significantly slowed, even when implemented in hardware.

Error-correction can also be incorporated into the file system, as is the case with Btrfs (Linux) and ZFS (cross-platform). This involves a process of ‘data scrubbing’ which scans the file system detecting errors and trying to repair them. Although OpenZFS is available for macOS and compatible with Catalina, installation and use are non-trivial and only feasible for advanced users.

Neither HFS+ nor APFS attempt any form of error correction on stored data, nor has Apple announced its intention that APFS will ever do so.

If error correction isn’t readily available in macOS, the next best thing is to be able to check the integrity of important files. This should enable you to replace a damaged copy of a file from backup or archive.

Alternatives to testing integrity aren’t particularly helpful. You could, for example, check the file modification date of all important files, assuming that you know for each file exactly what that should be. In any case, that will only reveal files which have been modified through the file system. Bit rot and similar damage doesn’t alter the modification date, just the data. For some, opening and checking the document in its normal editor/viewer is sufficient validation, but that’s only true if you can compare the current version against an earlier copy.

Neither HFS+ nor APFS perform any integrity checking on the data stored in regular files, nor has Apple announced its intention that APFS will ever do so. APFS does currently perform limited integrity checking on certain of the file system metadata, but that’s as far as it goes. This may seem a major shortcoming in a new file system, but you’ve got to remember that APFS, unlike ZFS, isn’t primarily designed for large server systems, but has to scale down to the Apple Watch and AppleTV: you’d hardly want your Watch to stop telling the time for an hour while it performs a full scrub.

If error-correction isn’t feasible for most Mac users, and there’s no system support for integrity checking of files, the only way to address these issues is using third-party products. That’s what I’ll be looking at in the next article in this series, and where my utility Dintch comes in.