Maintaining the integrity of important files

All of us have important files we don’t want to lose. While backing them up in depth should ensure that no matter what happens, we can always find a copy, how can we tell whether it’s intact, or has been damaged or corrupted?

Ideally, we could use a file system that includes integrity checks, such as ZFS or Btrfs, but those are only readily available on external systems such as NAS, and not native to macOS. This article looks at an alternative, using my free family of utilities, including Dintch, which has just been updated.

Checksums and hashes

Most techniques of checking and maintaining file integrity start with making checksums, hashes or ‘digests’ of each file to be protected. When these are made by the file system, they’re effortless for the user, but currently that isn’t an option with either HFS+ or APFS. For macOS native file systems, they’re an add-on for the user.

There’s a wide choice of different checksums and hashes which can be used for this. Those that are quickest and simplest to calculate are normally most prone to collisions, in which two different files have the same checksum. Although not normally a problem, were a file to be maliciously corrupted that could result in undetected substitution. Dintch and its relatives use the SHA-256 hash, which is considered robust enough for cryptography, and is extensively used and supported by macOS.

Associating the hash with the file

Some integrity-checking systems store hashes of the files they’re protecting in a separate file inside each folder. If those files are always going to remain together in the same location, that’s not a bad choice, but it requires that every time you move files around between folders, you have to regenerate their hashes for the hash file collection in their new folder. It’s then easy for files to become separated from their hashes. There’s also the frightening possibility that the file containing all the hashes itself becomes damaged.

Thankfully, macOS has a solution for this in extended attributes (xattrs), which can remain associated with a file no matter where it’s moved to, provided that it remains on a file system like HFS+ and APFS which supports xattrs. There is a slight twist to this, in that macOS does sometimes strip certain xattrs; iCloud has been known for this, for instance.

To prevent that from happening, macOS supports a system of flags that can be appended to the names of xattrs to make them persist. That’s the technique used by Dintch and its relatives: so long as each tagged file is only moved to and from volumes which support xattrs and their flags, the hash moves with that file wherever it goes.

Dintch and its relatives use a custom xattr named co.eclecticlight.dintch.hash, which is given the special flag #S to make it persist in iCloud and when copied between volumes and disks.

Compression and archiving

If you intend compressing and archiving files that have been tagged with their hashes using this method, you need to be sure that your compressor and archiver also preserves xattrs. All good Mac utilities, including my favourite Keka, offer this as an option, and it’s mandatory in my own compression tool Cormorant.

If you want to perform belt-and-braces integrity checking, tag the contents of your archives before compressing them, then tag the compressed file as well. Dintch’s hashes take almost no space, and little time is required to check them.

Timestamps

Although not normally included in integrity-checking systems, it’s sometimes important to know when a file’s integrity was checked by creating and saving its hash. Dintch and its relatives support that option, using another xattr co.eclecticlight.dintch.time, which contains the time of creation of the hash in UTF-8 text, making it readily accessible.

Tools

If you’re going to use any system of integrity checking, you need a range of tools to ensure you don’t end up wasting time hammering in a screw. The Dintch family consists of three tools to cover almost every possible use:

  • Dintch, designed to work with folders or directories, with a GUI;
  • Fintch, a drag-and-drop version designed for individual files and small collections;
  • cintch, a command tool you can invoke in shell scripts, AppleScripts, and even Shortcuts if you wish.

They’re all available from their Product Page, and are compatible with all versions of macOS from El Capitan to Ventura, and with HFS+ and APFS file systems. Note that hashes are consistent across all those too: you can tag files in HFS+ on El Capitan and their tags will still be valid in APFS on your M1 Mac Studio Ultra.

dintchcheck14

This also marks the release of a new version of Dintch. Because this is intended to be used for larger tagging and checking tasks, it can take some time to complete. Version 1.4 now lets you set the speed of its tasks, with three presets for the Quality of Service and number of threads.

These have less effect on Intel Macs, but still work there: I checked nearly 100,000 largely untagged files at just over 500 per second at Dintch’s šŸŽ ‘red racing car’ setting, and at just over 350 per second at the 🐢 ‘green turtle’ setting. M1 Macs are significantly faster, with a greater difference between the two extreme settings, with over 1,500 files/second on the P cores down to 400 per second running only on the E cores. This new control gives you the choice when running longer tasks of putting them into the background on the E cores where they won’t interrupt your work.

Dintch version 1.4 is available now from here: ditch14
from Downloads above, from its Product Page, and via its auto-update mechanism.

Finally, I’ve been conducting a long-term experiment in iCloud Drive, where I have 97 files that I tagged over two years ago, at 1632 on 11 April 2020 according to their timestamps. Every so often I download and check their hashes. Not one of them has changed yet.