Checking file integrity with Dintch (first beta)

We all expect our documents and other files to remain intact and uncorrupted. Although some storage media such as hard drives are well-known to develop errors, we’re careful to back those up, I’m sure. But in many cases, all backups will do is save another copy of the corrupt file. So how can you tell whether some random bits in a file have changed, and potentially rendered it unusable?

Not easily.

The standard technique for doing this calculates a checksum or hash for each file, and saves that. By comparing the current checksum against that saved, you can then tell at any time in the future whether that file has changed. If you’re adept with scripting in macOS, this isn’t a particularly tough challenge, but one which few take on. You can use DigLloyd’s IntegrityChecker, which is expensive and doesn’t appear to be notarized. One significant problem with both of those is that there isn’t an easy way to attach a checksum to a file, so whenever you copy or move a file, you have to repeat the process.

Dintch is a new app, currently only compatible with Catalina, which tries to address some of these issues. It has two buttons:

  • click on # to select a folder (it only handles folders or volumes, not individual files, at present) which you want to tag with checksums;
  • click on Check to select a folder which you’ve previously tagged, to check the integrity of its files.

To ensure that Dintch can access all the folders you might wish to use it on, add it to the Full Disk Access list in the Privacy pane before use.

When tagging, Dintch traverses all the folders and files within the folder or volume you select. For every file within that, it calculates the SHA256 digest and writes it out to that file as an extended attribute. That overwrites any existing tag on that file. If you don’t have write access to that file, then no such tag can be written, of course.

When checking, Dintch repeats its traversal of all the folders and files within the selected folder or file. For each it looks to see if that file has already been tagged. If it has, it then calculates the SHA256 digest of that file at that moment, and checks to see whether it is the same as that previously saved. If they don’t match, then it reports that.

dintch03

In this first beta, I have done little to optimise the speed. Using a fast internal SSD, though, this current version takes 85 seconds to tag 5361 files totalling 18.5 GB, and 75 seconds to check them all. Using a USB-C SSD, those times extend a little, suggesting that there’s still performance gain which my code can achieve. Checking the same files stored on a 25 GB BD-R disk extends the time to nearly 24 minutes, which is clearly I/O limited.

Reporting is also fairly basic: there’s a Verbose option which delivers more information, and you can add Debug if you wish.

Although using extended attributes might sound fragile, the custom xattr which I use here is preserved across the following:

  • Finder duplicate,
  • Finder copy and move between folders on the same volume,
  • Finder copy and move between different volumes, including external/removable volumes,
  • Finder copy and move in/out of iCloud on the same Mac,
  • Finder copy and move in/out of iCloud between two different Macs,
  • CCC and ChronoSync backups, and I expect Time Machine too,
  • HFS+ and APFS file systems, at least.

The one failure that I’ve had so far is, sadly, burning to a Mac-only BD-R using Toast Burn, but I will be looking in more detail at how to preserve the tags when writing to optical media for archives.

There’s a lot more to do with Dintch yet, but I’d appreciate it if you could take a look and find some bugs for me, please. My next goal is to add code which supports the same features on versions of macOS prior to 10.15, and to start tuning its performance as well as increasing the usefulness of its reporting.

Dintch 1.0b1 is available from here: dintch10b1
from Downloads above, and from its new Product Page. This does use my auto-update mechanism, so future versions will be offering directly downloadable updates too.

What’s in its name? It’s a digest-based integrity checker. The word is also, I gather, a slang contraction for dinosaur.