File Integrity 11 : Which RAID levels enable file recovery?

It’s good that more users are adopting RAID arrays for external storage, but essential to remember that the term RAID doesn’t necessarily bring any improvement in protection of file integrity. Across the seven standard levels of RAID, there is great variation in the protection afforded to the contents of the array, and even more when it comes to protecting the contents of files on it. This article looks at what each level means in practice.

For any storage system to protect the integrity of the files stored on it, it first needs to be able to detect whether a file can’t be read fully, or when reading it doesn’t deliver the expected data. The latter means keeping checksums or digests against which the current file can be compared. Should the data from any file not match its expected checksum/digest, the system then needs a means of repairing that copy, or of sourcing another copy which does match the checksum/digest. This is quite different from being able to recover the whole contents of the array in the event that one (or more) of its disks fails.

RAID 0, or striping, has no redundancy of data or error recovery at all. You can add file integrity checks, for example using Dintch or Fintch, but those will only indicate which files are damaged, not recover or repair them.

RAID 1, or mirroring, provides two or more complete copies of everything stored on the RAID system, which has inherent redundancy. In the event that one disk suffers failure, files can still be retrieved from the surviving disk until the RAID has been successfully rebuilt, normally by replacing the failed disk. Disk integrity checking is assumed to detect any errors, but to be confident that any given file is intact, file integrity checks are needed.

Care is needed to ensure that both storage media are drawn from different production batches, as hard disks (and possibly SSDs too) from the same batch tend to fail at about the same time. While double-disk failure may appear unlikely in theory, in practice it’s by no means unknown.

RAID 2 and 3 are very rarely encountered. Level 2 incorporates basic error correcting code (ECC), and 3 has a dedicated parity disk, so both should offer some level of detection and repair.

RAID 4, 5 and 6 appear very similar, and their differences might seem technical, but they are important when deciding which level to use.

RAID 4 uses block-level striping with parity data stored on a dedicated disk, which should support at least some degree of ECC. However, recovery depends on normal function of the parity disk, making this less popular.

RAID 5 uses block-level striping with parity data distributed across the disks, and is widely used as a good compromise between resilience and parity overhead (hence cost). In its most advanced form in ZFS, where it’s also known as RAID-Z1, it’s coupled with integrity checks and automatic recovery.

RAID 6 is similar to 5 but uses an additional parity block, which increases its resilience to disk failure at the expense of space efficiency. This too involves an ECC implementation, which is used in an advanced form in RAID-Z2 in ZFS.

Popular non-RAID designations for drive arrays include JBOD (Just a Bunch of Disks), Spanned or BIG, and Unraid. None of those is expected to incorporate any form of integrity checking or file recovery.

One important issue with all RAID systems is that they are designed to protect the integrity of the files stored on them, and don’t normally check that copying any files to or from the RAID storage preserves their integrity. For example, Time Machine backups don’t include a verification step in which backed up copies of files are checked against their originals. Any errors which occur during copying can therefore result in the RAID array preserving the integrity of corrupt files.

Another area of uncertainty is file system error. Although this should be extremely rare, damaged or corrupt file system data stored on a RAID array can have unpredictable effects on its contents.

Unless you pay extra for a hardware RAID system which implements RAID functions in its firmware, you’ll be relying on one of two software implementations:

  • AppleRAID supports only two RAID levels, 0 (striping) and 1 (mirroring), together with JBOD, according to Apple.
  • SoftRAID supports RAID levels 0 (striping), 1 (mirroring), 4 and 5 (both using striping with parity), and the combination of 1+0, according to its product page.

AppleRAID’s lack of support for any of the more sophisticated and resilient levels 4-6 is frustrating.

Specific RAID levels do provide often sophisticated mechanisms for preserving file integrity, including both detection and correction of corruption. However, unless they’re properly integrated into a system-wide scheme, their benefit is constrained.

Further detailed comparisons are given in Wikipedia.