Explainer: Archiving isn’t backing up

There’s a common assumption that keeping good backups of your important documents ensures that they’re also archived. However, the aims and techniques of archiving and backup are very different. If in fifty years time someone were to go to today’s backups to retrieve one of those files, chances are they wouldn’t find the file, and even if they did, they’d be unable to access it. This article explains the differences.

By archiving, I mean putting precious files somewhere where they can be retrieved in at least ten years time. They will include financial, business and employment records, as well as all finished work which you want to record for posterity. For most people, they will also include a careful selection of still images, movies, and the more important documents you might write, such as books, theses and papers. They’re what you and the law wants you to keep in perpetuity.

Essential considerations are:

  • storage medium
  • file formats
  • indexing and access
  • physical storage
  • integrity checks.

Storage medium

While your backups are most likely to be kept on hard disks or SSDs, neither of those is in the least suitable for archives. Instead, you need a removable medium, today probably Blu-ray disks intended for archival use, such as M-DISC. I’ll be looking in detail at how you can do that in the next week or two.

If you have copious archives of importance beyond your family, then you should look at systems preferred by professionals, such as Sony’s Optical Disk Archive. However, they depend on proprietary formats, media and peripherals, which are also significantly more expensive.

File formats

While it’s fine to archive documents in their original format, as you use in your backups, it’s also important to extract their contents into more durable formats. Among those most likely to prove durable for the next 50-100 years are:

  • ASCII and UTF-8 for text files,
  • JPEG and PNG for still images,
  • video, audio and rich media using one of the widely-used compression standards and file formats,
  • XML-based open document standards,
  • CSV for data,
  • PDF provided that it complies with one of the archival standards PDF/A-1 to /A-3.

You may find it worthwhile tarring together large collections of smaller files, but don’t use an unusual compression or archive format, which might prove opaque in the future.

Indexing and access

For large collections, even when structured carefully, a thorough list of contents in UTF-8 text format is essential. While there are index and search tools which could help, in this respect too archives are very different from backups. If you’re going to be gathering TB of files, look at some of the commercial solutions. Although some are free to use, like Greenstone, they aren’t intended for casual users and might prove too demanding.

Physical storage

Archive optical disks should be stored in cases with centre hub security, not in sleeves. They must be kept in a cool, dry and dark container, in which there is no mould or fungus. They also need to be protected from physical threats such as flood and fire. Popular furniture for achieving this are firesafes, but you must then ensure that their combination or keys are readily available and not separated from the firesafe.

Don’t print on the disk itself, and keep paper records alongside the disks in the same container, but not inside the cases themselves.

Integrity checks

If you’re serious about maintaining your archives, some form of integrity checking, such as that provided by my free utilities Dintch, Fintch and cintch, is essential. Check a sample on each disk once a year, to ensure that none has started to deteriorate. If you do detect errors, that’s the time to burn a replacement.

Further reading

Wikipedia point of entry
British Library digital preservation site.