hoakley March 21, 2020 Macs, Technology

Should we take bit rot seriously?

We all know that, over time, the documents and other data we store become gradually corrupted by ‘bit rot’. In its broadest sense, this is data degradation, and is the result of imperfections in storage media. But over what period is bit rot a significant risk? Are we talking months, years or centuries?

Bit rot and measures to protect from it are one of the most controversial topics in computing. Look at most accounts and they’re full of sweeping assertions, technical descriptions of how bit rot might happen, but the only figures they’re ever likely to quote are drawn from manufacturers’ specifications or old studies on hard disks which are almost certainly irrelevant to modern storage media, even hard disks.

If you can’t measure it, does it even exist?

When researching this article, I looked for contemporary measurements of bit rot rate on current storage media, and have been unable to find a robust scientific study which might yield such figures. That isn’t to say that it hasn’t been done, but if it has someone’s keeping very quiet about the results. This is in spite of three modern file systems – notably ZFS, Btrfs and ReFS – all incorporating methods designed to detect bit rot.

Detecting bit rot in your sole copy of an important file isn’t particularly helpful either. Much better are error-correcting codes (ECC), which can repair any error when it occurs, in a self-healing file system. Even that is of little help if the medium is write-once, as is most likely in most archival storage.

One storage medium which should benefit most from a file system designed to detect bit rot and to self-correct using ECC is the hard disk. Few survive three years without developing some errors, so even if you replace them at that point, it’s likely that some repairs will have been necessary, or there will have been some data loss. Hard disks are also among the poorer-performing media in widespread use, and the overhead of ECC and other safety measures could only worsen that.

Without much better information on risks and rates of bit rot in SSDs, it seems hard to justify imposing overhead on their performance just in case it might be significant.

A better strategy for SSDs and write-only media such as archival Blu-ray disks is surely to monitor file checksums periodically, falling back to a copy of any file whose checksum changes. That in turn requires a simple GUI method of scanning a volume and comparing each of the files with their expected checksum. I’m thinking about that a great deal just now.

Further reading

Wikipedia, a brief article which explains what can happen
Jody Bruchon’s article arguing that most measures against bit rot are a waste of effort
ProStorage claiming the exact opposite.

51Comments

Add yours

1

Joss on March 21, 2020 at 8:14 am

IntegrityChecker (and the CLIs ic and the Java version icj) which are part of the diglloydTools, will do the trick. They create an invisible file in a scanned directory containing file hashes and other information for all the files. That invisible file will then be archived together with the data, and after years, when you need some of that data, you can verify the file integrity quickly.

It’s fairly easy to create such a tool yourself: I did something like this with a shell script way back when.

If you’re thinking about making your own tool, you should implement the new BLAKE3 hash: it’s so blazingly fast, it’s insane. (Code is PD/CC0 1.0 Universal)

https://github.com/BLAKE3-team/BLAKE3
https://diglloydtools.com (currently offline)

LikeLiked by 1 person
- 2
  
  hoakley on March 21, 2020 at 4:02 pm
  
  Thank you. Yes, I’ve been trying to download IntegrityChecker for a few days now, but get 404s on every page there that I try. So it isn’t much use at the moment.
  I also have serious reservations about what it does: saving a directory of checksums in the folder is useless in many situations; the checksum needs to be attached to the file itself. Otherwise every time that you make any alteration to folder contents, you have to regenerate all the checksum records.
  BLAKE3 looks interesting, but isn’t AFAIK built into and optimised in an accessible form in macOS, whereas SHA-256 is, and is much more widely used in crypto and security.
  So watch this space!
  Howard.
  
  LikeLike
  - 3
    
    Duncan on March 21, 2020 at 7:20 pm
    
    Howard, thank you for addressing this topic – as someone who still has original Mac files dating back to 1986 (when I got my first Mac SE) I have a personal interest in this topic.
    
    Regarding Integrity Checker, I agree that the checksum should ultimately be saved with the file itself, but then that brings this back into Apple’s domain and the file system itself. However storing the checksum of the folder’s contents in the folder itself, while a compromise, is the next best option. And the folder-level checksum isn’t an all-or-nothing prospect in terms of updating: if you change a single file you can regenerate a new folder-level checksum using ‘Update’ which only looks at files that have changed (thus leaving all the rest alone and not re-hashing the entire folder). Again, a compromise, but I can’t think of any other way to do it given the file system constraints.
    
    My biggest complaint with Integrity Checker’s checksum process is it’s entirely manual, whereas I’d like to see something like what Vic (below) suggests – a scheduled background process. Perhaps even coupled with Time Machine.
    
    LikeLiked by 1 person
    - 4
      
      hoakley on March 21, 2020 at 10:49 pm
      
      Thank you.
      Please take a look at my first beta when it’s released. You’ll see what should be (with one exception) a much more robust way of saving the checksum, so that it travels with the file, not the folder.
      Howard.
      
      LikeLike
  - 5
    
    Joss on March 21, 2020 at 11:06 pm
    
    Traveling with the file would mean an extended attribute. This would seem logical. It can even be a proprietary XA. But what if you back up to a file system that doesn’t support macOS XAs?
    
    By the way, it seems that Lloyd has moved his software to his main website; the download page is here: https://macperformanceguide.com/Software-download.html … including release notes, and it says there that “Version 1.2b2 builds on the major performance improvements of IntegrityChecker version 1.1b10 by adding support for SHA-512 hashing. [Speed is now so fast that IntegrityChecker Java will be I/O limited on recent Macs, even by the extremely fast internal SSDs. There is some actual speed benefit, but given the icj is now I/O bound, the is an about 80% reduction in CPU usage.”
    
    So SHA-512 seems to be superior in more than one way, something to (maybe) take into account, if you’re not using (or can’t use) BLAKE3.
    
    LikeLiked by 1 person
    - 6
      
      hoakley on March 21, 2020 at 11:18 pm
      
      It appears that you now have to pay $60 for the whole suite to get access to IntegrityChecker, and I can’t even find a page which lists the other features.
      Howard.
      
      LikeLike
    - 7
      
      hoakley on March 21, 2020 at 11:36 pm
      
      If you back up to a file system which doesn’t support xattrs, then you’ve already got serious problems with 10.15!
      For archives, it’s not hard to write the files into a disk image, of course, which will preserve their xattrs.
      Howard.
      
      LikeLike
    - 8
      
      Joss on March 22, 2020 at 12:28 pm
      
      I would never run a real macOS backup—whether TM, clone or some other solution based on rsync et al.—with a destination that’s not formatted with APFS or HFS+. But there are other scenarios where the destination file system is non-macOS: cross-platform file exchange (exFAT), NAS systems & other servers, cloud services etc. Especially the latter—a lot? only a few?—don’t seem support XAs, e.g. NextCloud, as I recall. (Not sure if the IPFS supports XAs.)
      
      LikeLiked by 1 person
    - 9
      
      hoakley on March 22, 2020 at 12:52 pm
      
      Yes, cloud services are a problem, which is why full restorable backups in the cloud remain a problem. NAS systems are far easier using a sparse bundle, which is the original solution in TM to support Time Capsules. But exFAT does, I believe have full support for xattrs using accessory files.
      Howard.
      
      LikeLike
  - 10
    
    Joss on March 21, 2020 at 11:27 pm
    
    Yep, I paid for 5 years, and the software didn’t get any updates for almost two years. I know it’s not a scam, but it sure feels like it. Well, at least he released some beta versions this year, so it seems to be going somewhere, but sloooowly.
    
    LikeLiked by 1 person
  - 11
    
    oconnor663 on March 25, 2020 at 10:36 pm
    
    > BLAKE3 looks interesting, but isn’t AFAIK built into and optimised in an accessible form in macOS
    
    Hi Howard, BLAKE3 author here. The BLAKE3 Rust and C implementations are tested on Linux, macOS, and Windows, and I’m not aware of any difference in performance across those three. The simplest way to try it is usually to install the Rust toolchain and then run `cargo install b3sum`. My experience is actually somewhat the reverse: I’ve seen a few cases where SHA-256 implementations on macOS have inexplicably poor performance. For example, https://news.ycombinator.com/item?id=22008014.
    
    All that said, I agree that SHA-256 is still the appropriate default for most cases.
    
    LikeLiked by 1 person
    - 12
      
      hoakley on March 26, 2020 at 11:26 pm
      
      Thank you. I’m coding in Swift in Xcode, so calling C functions isn’t the easiest of things to do! When I get a chance I’ll have a look at using BLAKE3, but for the moment SHA256 seems to be performing quite well, from CryptoKit.
      Howard.
      
      LikeLike
13

Javier Gallardo on March 21, 2020 at 1:47 pm

Does a “checksum” always assure 100% integrity? (With my limited maths knowledge, I would say “no”). The procedure you suggest at end of article could be compared to bio-replication: in logical evolution, you could end with lots of “false checksums”, that would keep mutated files “alive”.
Nature (and human cultual History) has found that most secure way to preserve information is to replicate individuals.
I dream of an organised multiple and remote back-ups. …Just if we were able to tele-backup to five, ten friends or collaborators in remote places… Of course, this is possible, but a well done and integrated app could do the trick, if we could assume the procedure.
In fact, spare and diverse located copies seems to be the chosen way by governments and big companies…
I understand the search for most reliable personal backup strategy… but you aim for an almost impossible target. Everything changes. Can you surpass that?

LikeLiked by 1 person
- 14
  
  hoakley on March 21, 2020 at 4:07 pm
  
  Thank you.
  Yes, all checksum/hash functions have a ‘collision’ rate, the probability of them reporting two different files as having the same result, and being indistinguishable. The better the technique you use, the lower the risk of that happening. If you use something like SHA-256, which is the current benchmark, then you almost certainly will never see it. Ever.
  You can do remote and managed backups already – that’s someone that ChronoSync and its companion tools support, as do some others.
  Backup strategies are determined by backup needs. There’s no one size fits all, but you certainly can meet your needs, which is what think I have now done. Everyone needs to work out what their own needs are.
  Howard.
  
  LikeLike
  - 15
    
    Joss on March 21, 2020 at 11:09 pm
    
    In this case, a collision would mean that file has hash #1, then you change the file (or there is bit rot), but the file will still have hash #1 afterwards. Imho that’s even more unlikely.
    
    LikeLiked by 1 person
    - 16
      
      hoakley on March 21, 2020 at 11:16 pm
      
      Unless it’s deliberate, i.e. malware, which could exploit known weaknesses in now-disused checksum methods.
      Howard.
      
      LikeLike
17

Paul on March 21, 2020 at 4:27 pm

Watching this space! I get little blips of noise in some of my older music files, which I’m pretty convinced is bit rot.

LikeLiked by 1 person
18

Vic on March 21, 2020 at 5:43 pm

At least for peace of mind, I would love a slim, fast, bit rot monitoring tool for macOS with the ability to run on a schedule in the background.

The other issue, of course, is how to do protect your files against bit rot if they’re damaged by it? Multiple backups is one solution but you need to keep multiple versions or else the damaged version could replace the undamaged version. Over at /r/datahoarder, some users suggest using PARchive to create PAR files of your important data, but that seems very unwieldy.

LikeLike
- 19
  
  Duncan on March 21, 2020 at 7:55 pm
  
  Regarding scheduled checksum generation and verification, we already have two examples (among many more) of similar background file-level operations: Time Machine and Spotlight indexing. Both use system resources and access the storage device(s) yet we still manage to use our computers just fine with those chugging away on their own. Yes, there is a noticeable hit when performing the first Spotlight index operation on a new, large volume, but it doesn’t lock out the user from doing anything else. And with the advance of technology and overall speed that becomes less and less of an issue.
  
  So in my opinion file integrity operations should be included in that category. To not do so is to essentiality throw up one’s hands and either say it’s too difficult a problem to solve (which isn’t true at all), or it just doesn’t matter. If it’s the latter then I wonder what Apple (and Microsoft, and all the other big tech companies) do with their own internal file archives. Leave it all to chance?
  
  Also, I don’t know the specifics of this, but I have heard that APFS itself _does_ perform some sort of file-integrity checks, but *only on the file system’s metadata*. What kind of design decision is that? Either bit rot exists or it does not. If this partial checking is true then why the glaring omission on the data that really matters: ours?
  
  LikeLiked by 1 person
  - 20
    
    hoakley on March 21, 2020 at 10:53 pm
    
    That’s the whole point of this article: we all accept that bit rot exists, but over what timescale, and to what extent? There’s no point in imposing a significant burden on a local live file system if the degradation only becomes significant over years. And what’s the point of having ECC in storage media like SSDs if you’re then going to add yet another system on top of that?
    Please look at the linked articles – they debate that, but without any data to help!
    Howard.
    
    LikeLike
- 21
  
  Duncan on March 21, 2020 at 8:03 pm
  
  (Sorry – I’m on a tear today. This is supposed to follow my previous reply to Vic but I’m not sure where it will land, in case they’re out of order.)
  
  You know what else does a lot of background data verification, and can even repair errors on the fly? SoftRAID. It’s another example of what computers are capable of doing without hindering the user from their work. Granted, rebuilding a damaged volume entails an I/O performance hit, but it’s certainly acceptable given the alternative (data loss).
  
  So if *that* can be accomplished, by a third party no less, then what is Apple’s excuse for not ensuring our data’s integrity at the file system level?
  
  (Ok, that’s enough ranting for now.)
  
  LikeLiked by 1 person
  - 22
    
    hoakley on March 21, 2020 at 10:55 pm
    
    As argued elsewhere, detecting the bit rot doesn’t actually help. You need ECC, which is built into most modern storage, to actually correct it – the self-healing file system.
    Howard.
    
    LikeLike
23

Michael Tsai on March 21, 2020 at 6:29 pm

EagleFiler (https://c-command.com/eaglefiler/) checksums all the files you add to its library. This has detected many cases of bit rot for me over the years.

LikeLiked by 1 person
- 24
  
  hoakley on March 21, 2020 at 6:37 pm
  
  Thank you, Michael.
  I must admit that I’m not surprised that EagleFiler is so good, or that it hasn’t picked up a lot of bit rot. Anyone who has used a hard disk intensely for any length of time is only too familiar with the consequences.
  But with ECC in storage and SSDs, or BD-R on ‘archival’ media, how much of a problem is it?
  No one would deny its existence, but in order to do anything about it you need to know how likely it is, over what time period. Yet I just can’t find any figures which have been measured experimentally on modern storage media.
  Howard.
  
  LikeLike
  - 25
    
    Michael Tsai on March 21, 2020 at 8:16 pm
    
    I don’t know what the figures are, but I’ve seen it happen with SSDs, too (though less often). I still have a lot of files whose primary storage is on hard drives. I definitely think data integrity should be part of the file system, but that only addresses the narrow case of bit rot within a file. I also want to know: Are all of my files still there? Are the files restored from the backup identical to the ones I started with? Same with the ones migrated to a new Mac or synced between my Macs. EagleFiler tries to help with some of these end-to-end issues (though clearly there’s a lot more that could be done). I’ve also found IntegrityChecker and Git to be useful.
    
    There are so many things lurking beneath the surface once you start to checksum files. I used to have a backup app that did this, and it would always detect failures because (even with apps quit) some process would modify a file between the time it was copied and verified. (Now, some backup apps can use an APFS snapshot as their source, which should help with this.) I also found in developing EagleFiler that some macOS APIs will continue writing to a file after they’ve told you they were done. So it’s essential to calculate the checksum at the right time.
    
    LikeLiked by 1 person
    - 26
      
      hoakley on March 21, 2020 at 10:57 pm
      
      Checksums on live file systems are inevitably dangerous. But remember my main concern here is with more static files, archives in particular, for which the file system has no such excuses.
      Howard.
      
      LikeLike
27

Duncan on March 21, 2020 at 7:38 pm

Another aspect of file checksums is what happens when you copy them from one storage device to another. A prime example is if you upgrade/replace your startup drive, or switch computers, and have to copy all your files over. I have put in a feature request with both DigLloyd Tools and Carbon Copy Cloner to offer a ‘verified copy’ feature, which performs a full checksum on the source material before the copy operation, and then again to the destination files. So far nothing appears to be forthcoming from either vendor.

This feature would be most appropriate for long copy operations, such as a big volume copy that one tends to initiate and then leave running overnight. (It could also be done for a single file, but my main interest is in cases where one copies a full disk’s contents over to another.) As I mentioned in a previous comment I still have some files dating back to the early 80s, and by my loose count I estimate they have crossed over 60 different disks to arrive at my current computer. I simply cannot imagine that there hasn’t been some sort of file degradation over that time, but I have no easy way of telling.

If we’re bringing up the term ‘archive’ here and have any sense of long-term thinking, this problem needs to be addressed. Actually, it should have been addressed when the first screen saver arrived on the market because that clearly shows what can be done with an idle computer. If performing mathematical hashing calculations under programatic control isn’t something that computers can easily handle, then why are we wasting all our time and resources on things like new emojis and Dock animations?

LikeLiked by 2 people
28

Michael Tsai on March 21, 2020 at 8:42 pm

This also interacts with the versions system that you’ve been writing about. If I open a PDF file in Preview (or, generally, an app that auto-saves), and make some changes to better view it, it will start modifying the file on disk even I haven’t manually saved the file. If I tell it not to save the changes, the versions system will then revert it back to the way it was. But what if the versions database is damaged? Or if one of the processes crashes partway through this process (Preview has been very crashy since the PDFKit rewrite)? Checksumming at the file system level doesn’t protect against these sorts of issues where you didn’t intend to modify the file. Or the recent issues with Catalina Mail deleting files that you intended to keep. I don’t know what to call this, because it’s not literally bit rot, but from the user perspective the effect is the same. Some time between saving the files and looking at them again, they changed.

LikeLiked by 1 person
- 29
  
  Duncan on March 21, 2020 at 10:46 pm
  
  As described, that sounds like a terrible way to implement a versioning system. The original file should be held as sacred until the *user* , not the OS, commits a change. If the version system adds a separate ‘delta’ file that might or might not get corrupted during the operations, then that would limit the amount of data loss to just that temporary file.
  
  Of course some types of potentially large dynamic files, such as a database, pose a separate challenge but that’s also a solved problem, at least at the institutional level. Otherwise our entire banking/trading system would collapse into unpredictable rubble every time there’s an interruption imposed upon the host computers.
  
  LikeLiked by 1 person
  - 30
    
    hoakley on March 21, 2020 at 11:01 pm
    
    That isn’t the versioning system, which actually works well. It’s the things that apps like Preview do by way of modifying files without any user intent or consent. They’re straight bugs.
    The versioning system normally pushes the last saved version into its database each time that it saves a new version. That’s actually very robust, especially when it’s implemented using hard links, as it is in macOS.
    Howard.
    
    LikeLike
    - 31
      
      Michael Tsai on March 21, 2020 at 11:47 pm
      
      It’s a bug in the sense that I don’t want it to work that way, but that’s how Apple thinks apps should be. (See also: TextEdit, Pages, etc.) They designed the autosave system to integrate with the versioning system in this way. Hard links are used as an optimization when safe-saving packages, but they don’t relate to versions AFAIK.
      
      LikeLiked by 1 person
    - 32
      
      hoakley on March 22, 2020 at 8:21 am
      
      Mike Bombich reckons that the versioning system is built on hard links, although I’m not sure what his evidence is.
      My evidence is with the problems the database causes during TM first full backups, which are identical to those used by third-party ‘protection’ systems which do use hard links. So it’s inferential rather than direct.
      Howard.
      
      LikeLike
    - 33
      
      Michael Tsai on March 22, 2020 at 12:18 pm
      
      That’s the first I’ve heard of it being built on hard links. This is the most detailed information I’ve seen: https://arstechnica.com/gadgets/2011/07/mac-os-x-10-7/14/#versioning-internals
      
      LikeLiked by 1 person
    - 34
      
      hoakley on March 22, 2020 at 12:55 pm
      
      Thank you, Michael: that article confirms my suspicion. In the files table given is file_inode, which is a de facto hard link in the database, surely.
      Howard.
      
      LikeLike
    - 35
      
      Michael Tsai on March 22, 2020 at 2:41 pm
      
      I can imagine that helps it be more efficient with renames. I don’t consider a stored inode to be a hard link. Aliases do that, too, but are not defacto hard links. Regardless, I don’t see this adding robustness.
      
      LikeLiked by 1 person
    - 36
      
      hoakley on March 22, 2020 at 8:42 pm
      
      But the versions in the versioning system are named by the versioning system itself, not by the user.
      Howard.
      
      LikeLike
37

Duncan on March 21, 2020 at 11:24 pm

Howard, are you planning to create your own software to address data integrity? (You mention your ‘first beta’.) If so that’s fantastic. At the risk of sounding a bit fawning I think you’re doing a great service for the Mac community with all your articles and free software. I’m glad I stumbled upon this site relatively recently.

LikeLiked by 1 person
- 38
  
  hoakley on March 21, 2020 at 11:34 pm
  
  I have the first beta all notarized, but I’d still like to do a bit more testing with it tomorrow, largely to establish some of its limits and quirks. That should – if all goes well – be available from here at 0730 UTC on Monday.
  It’s only basic at this stage, but if it proves sound and useful, I can add bells and whistles, and a command tool version too.
  Howard.
  
  LikeLike
39

Duncan on March 22, 2020 at 3:10 am

(This is in reply to an earlier comment of yours, but I can’t seem to find the ‘Reply’ hover-over button so I’m adding it here instead.)

“There’s no point in imposing a significant burden on a local live file system if the degradation only becomes significant over years. And what’s the point of having ECC in storage media like SSDs if you’re then going to add yet another system on top of that?”

Speaking for myself, I’m a belt-and-suspenders type when it comes to data integrity. Even if ECC is built into storage, I have no way of monitoring whether it’s working or not. And what happens when the hardware fails, but only enough to flip some bits (rather than complete failure)? How will I know?

I understand your point about the lack of data regarding the extent of this problem; nothing is ever 100% assured so we have to accept the best that the hardware can reasonably provide. Maybe I’m overstating my concerns.

But as for system resources, I simply can’t believe that in this day and age we cannot have a functional integrity management system operating in the background with a minimal performance hit. I’ve already mentioned Time Machine, Spotlight indexing, and SoftRaid as examples of useful, complex processes that don’t cripple the system resources, and now I’ll cite another: data encryption. That happens so seamlessly that it’s actually enabled by default, and all we can do is hide it. That is of course enabled by the T2 chip, but file hashing could likewise be accomplished in hardware, if desired, although arguably it is less resource intensive than encryption.

(And unlike all the other system processes, encryption is actually a *negative* feature on a computer. Its sole existence is centered on human mistrust of each other; in all other regards it serves no purpose and indeed makes computer maintenance more problematic. Yet Apple designed a custom chip primarily to implement that feature.)

So for knowing whether my data is remaining intact over all these years, isn’t that exactly the sort of function that a computer should excel at? It’s almost purely mathematical, and should be table stakes for any modern operating system and yet here we are. I really hope that there are undocumented provisions hidden in APFS that will allow file-level integrity checks and repair which just aren’t implemented yet, but can be developed later. If not, then how, again, does Apple manage their own growing mountain of data, stored across exabytes of spinning disks? Do they just leave it all to chance?

LikeLiked by 1 person
- 40
  
  hoakley on March 22, 2020 at 8:25 am
  
  Well, most of APFS is undocumented, but I think you can rest assured that there’s no hidden integrity checks or ECC beyond what is described for file system metadata, and even those are by no means complete.
  As for Apple’s vast amounts of data, I don’t think that you’ll find any of that being stored using APFS: its distributed storage, as with iCloud, is very different indeed.
  Howard.
  
  LikeLike
  - 41
    
    Duncan on March 22, 2020 at 3:53 pm
    
    There might not be any integrity/ECC functionality in APFS *now*, but in our discussion on where to store file checksums (ie. not out in a folder but coupled to the file itself) I hope there might be provisions to add that as another piece of file metadata that APFS honors and retains.
    
    In other words, could a future, automated checksum-generating utility store the checksum with the file such that it travels with it if moved to a different location in the file system (or to another volume)? That might not be the most efficient way to implement a file integrity routine, but it’s better than nothing. I’m envisioning this as an *optional* facility for those who wish to implement that extra layer of verification.
    
    I don’t know myself how extensible APFS might be (if at all) for this sort of thing, so I’m speculating here.
    
    LikeLiked by 1 person
    - 42
      
      hoakley on March 22, 2020 at 8:46 pm
      
      Well, there doesn’t appear to be any slot in the metadata for any digest or checksum. As it’s Apple’s own format, it could always be changed. However, it is primarily designed for iOS devices, so I’m not sure that a feature only wanted by a small proportion of Mac users would get much of a look-in.
      Howard.
      
      LikeLike
- 43
  
  Chris Ridd on March 23, 2020 at 6:59 pm
  
  According to Adam Leventhal in his long APFS article at https://arstechnica.com/gadgets/2016/06/a-zfs-developers-analysis-of-the-good-and-bad-in-apples-new-apfs-file-system/, “Apple engineers contend that Apple devices basically don’t return bogus data” and “Apple engineers I spoke with claimed that bit rot was not a problem for users of their devices”.
  
  Hmm.
  
  I think APFS is a good replacement for HFS+, but if you want something that defends against bit rot you really need to use something else. Note that the checksums being talked about in the comments are quite simplistic; ZFS checksums individual disk blocks, not just files.
  
  LikeLiked by 1 person
  - 44
    
    hoakley on March 23, 2020 at 8:38 pm
    
    Thank you.
    The whole problem here is that, whilst we all accept that bit rot does occur, no one seems to know how often. As APFS is designed for SSDs, those engineers argue that bit rot isn’t an issue with Apple’s internal SSDs. Unless and until someone comes along with figures to demonstrate that it’s a significant risk, in terms of frequency, then it’s hard to disagree.
    Also remember that, while APFS does have many features which are intended for macOS, it’s primary use is in iOS devices, where bit rot may well be so rare that it can be ignored. I don’t think that ZFS would be a good file system for iPhones!
    Howard.
    
    LikeLike
45

Duncan on March 22, 2020 at 3:23 am

By the way, DigLloyd tools is back online, so you can investigate the functionality, user manual, and release notes now if desired.

Regarding purchases, Integrity Check is part of the full software package, not available separately. But you can get it for $45 US (rather than $60) using the minimal purchase option. The pricing scheme reflects a ‘download window’ where you can access any release within the time period that you purchase. Unfortunately the $45 window is only one month’s time, sufficient to get all the copies you need during that interval but no future upgrades beyond that unless you buy a larger window (such as one year or more). There doesn’t appear to be any trial version.

Upon purchase you’ll get assigned a user account name and a password, which allows access to the download page where you can get any version, current or past, that’s been released.

LikeLiked by 1 person
- 46
  
  hoakley on March 22, 2020 at 8:26 am
  
  Hmm. I think I’ll pass on that, no matter how good it might be. $45 for one month of updates? Even Adobe isn’t as expensive as that. Someone’s taking the mickey.
  Howard.
  
  LikeLike
47

Rocky on March 24, 2020 at 12:26 am

Background: Data management was a major part of my 40-year career in the US government. By the time I retired, my relatively small agency (poorly) managed many hundreds of petabytes of data across vast numbers of files. I recovered from data losses on everything from 80-character punch cards to the latest and greatest SSDs, mainframes to tiny sensors.

Bit rot is real, on all media including RAM, through all data transfer paths including PCIe, and through all software. Think your multi-gigabyte files are well protected during network transfers by TCP’s 16-bit CRCs? Do the math!

There are few academic papers on bit rot because almost all the large scale failure data are proprietary to people like Google and Facebook. CERN/LHC published some good papers. Backblaze publishes really good stuff on hard drive failure rates on their web site. But a little math is all it takes to realize that our files are very poorly protected.

And bit rot does affect everyday Mac users. For one project a few years ago, I shot about 100 GB of videos scattered across several dozen files. *Every time* I dragged-and-dropped those folders using the Finder to an external HD, SSD, or thumb drive, the copied files were corrupted – unplayable. Thank goodness rsync worked!

Thinking through the problems with checksums and backups and recovery occupied much of my time for several years. While I never published, here are some tidbits:

– If a checksum check on an existing file fails, which bits were rotted: the file or the stored checksum? We started using two independently-computed checksums for each file, e.g. SHA-384 and SHA-512. If one fails, suspect the checksum. If both fail, start looking at your file and checksum backups. I wanted to use an entirely different second algorithm, but government.

– Data throughput is the biggest bottleneck for checksums, backups, and recovery. If you’re already reading the files, computing two checksums at the same time barely slows down recent CPUs using intelligently-written algorithms.

– You must periodically check checksums on backup files, too.

– To recover from data corruption, you must carefully think through the timing of periodic checksums on your primary and backup data storage, your backup and retention schedule, and (almost always overlooked) how long it takes to recognize bit rot plus how long it takes to recover.

– Having large RAID 5 systems go south multiple times, while facing *weeks* of data recovery from offsite storage, really changed my approach to all of this. RAID is not the answer, either.

– Cloud storage and cloud backups are NOT immune to these problems.

– One of the biggest problems with long-term data storage is technological obsolescence. When I left, we still had thousands of tapes and many boxes full of optical disks and nothing to read them with.

– ZFS/Btrfs/ReFS: Lost track of how many times I fell in and out of love. Looks fantastic on paper; failed miserably in the real world too many times. Somewhat better than nothing, but not a panacea.

– Most ECCs: Like ZFS/…, but worse. Do the math. Somewhat better than nothing.

– The intersection between IT security and data preservation usually results in data preservation losing. Nice secure system you have there, too bad all your data are bogus!

– There are professional data archivists working and publishing on these problems. The Library of Congress was leading the charge for many years.

LikeLiked by 1 person
- 48
  
  hoakley on March 24, 2020 at 7:22 am
  
  Thank you, Rocky.
  You are broadening the meaning of the term bit rot to extend to all forms of data corruption, including complete storage failure. That’s not, IMHO, a useful thing to do because the problem then becomes too protean. Transmission line errors are completely different from slow data degradation when held in storage. There’s a lot of knowledge about the former, and effective means already incorporated into protocols to tackle it. Disk failure is again well studied – and you refer to Backblaze’s excellent quarterly reports. Although I read those, I haven’t come across any figures for slow data degradation in files simply held in store. Have you?
  Without real-world measurements of this more tightly-defined bit rot, it’s impossible to say whether it’s worth doing anything to combat it. The question remains: is this something we should be concerned about over weeks, months, years, or what?
  Howard.
  
  LikeLike
49

Mike on August 22, 2020 at 8:21 pm

I’m late to the party, but if anyone actually still checks this (or gets an automated email about a post occurring) I am wondering what everyone’s opinion of PAR files is. They were heavily used on USENET back in the day and might be useful for the things being discussed here. That said, back in the day was a long time ago and the idea may not have aged well which is why I ask.

LikeLiked by 1 person
- 50
  
  Mike on August 22, 2020 at 8:23 pm
  
  *And yes, I see that PAR files were raised but no one replied so I will just repeat the question.
  
  LikeLiked by 1 person
- 51
  
  hoakley on August 22, 2020 at 10:38 pm
  
  Thank you. The party here is never-ending!
  I’ve since looked at PAR files in some detail, for instance here. It works well, but not on larger files. There are now faster and more robust methods for ECC, but the most accessible remains PAR.
  Howard.
  
  LikeLike