Last Week on My Mac: The importance of document repair

None of us gets any younger, neither do our documents. As those files age, like their owners, they slowly decay with ‘bit rot’ (including ‘disc rot’ and data degradation). When you return to try opening them, it is infuriating to discover – if their ancient formats are still supported – that so many have become broken and unusable.

It isn’t easy to get anyone to take the past seriously in an industry obsessed only about the future, yet it troubles ordinary users quite frequently. In tackling their questions, it’s disturbingly common to be asked how someone can recover cherished movies of their children, or of relatives who are long since dead. As we grow old, and those precious files grow old, this can only increase.

Some key players have made accessing our old documents even harder. Some time ago, Microsoft stopped new versions of Word from opening the oldest formats of Word .doc files, and Apple has been no better. Those curating archives of official records may have had sufficient prescience and time to convert their collections to formats such as PDF/A which are intended for such purposes, but I have plenty of documents in formats which don’t appear to be supported by any current products.

In any case, I have questioned whether any form of PDF is well-suited to long-term storage, given that one or two small errors in the wrong parts of a PDF document can lose its entire contents. It’s ironic that a format which can unintentionally retain information which we thought had been removed is also so susceptible to even minor damage.

Modern file formats, based on XML, should in theory be more robust, and capable of recovering much more content even after substantial damage. Yet try altering a few bytes in them, and you’ll discover that the apps which use them all too often let us down. They usually don’t even attempt to make sense of what they could, throw an error, and that’s it, a ruby wedding anniversary is lost in the mists of time.

Some file formats are designed to check the integrity of their contents, for example using cross-reference tables or hash keys. When these don’t match up, instead of attempting to recover as much of the intact data as they can, apps simply refuse to play. This can of course be protective, particularly if the document has been deliberately tampered with.

Much of our present focus is on backups. When that massive Keynote presentation won’t open, stock advice is to return to the last version saved by Time Machine or those kept in the macOS version system. Ten years on, those backups will have long since gone, and because macOS stores versions separately from documents, those versions will have vanished too. Our current recovery systems are geared to the immediate present and recent past, no more than a few years ago at the very most, and rely on finding a replacement for a damaged document.

Currently, if you can’t open an old file, your best chance is to pay a data recovery service to try. For those documents of commercial value, that offers a way ahead. Unless you’re very well-heeled, you won’t be able to afford that for personal records, such as those recording your parents’ ruby wedding anniversary, or the early years of your children’s lives.

We’re slowly succumbing to the supposition that digital archives are preferable to traditional media such as print on paper. Corrupt the catalog in a PDF document, and that’s probably the last that you’ll see of its contents. Although a printed book which has lost its spine and contents pages isn’t as immediately useful as one that’s intact, you don’t need those to be able to read its body from cover to cover.

The photos in my parents’ wedding album have slowly faded over the years, but I can still see their faces clearly despite the nearly seventy years that have passed. When it comes to opening PICT screenshots from less that twenty years ago, though, I’m fast running out of luck, the PostScript original draft of my thesis won’t open at all, and I’ve several movies of our kids which seem irretrievably broken.

Wouldn’t it be good to see a Recover command in every File menu?