How to design a method which not only detects errors in file data, but enables them to be corrected? A tale of Hamming, Reed, Solomon and CDs.
error correcting
Storage has to be reliable, efficient and resilient. However, efficiency and resilience oppose one another. What’s the best solution? New file formats, CRC in the file system, or what?
Can you use error-correcting code to repair very large files, for example of around 20 GB or more?
What difference does it make when a file has a block of 512 bytes corrupted instead of just a single byte? Results from image formats and recovery using ECC.
How can you squeeze recovery data into smaller storage space than you’d need for a second copy of a file? Using codes, explained here.
For all their compactness and ease of access, are our files going to prove less durable than a clay tablet recording a commercial transaction over 4,000 years ago?
Tests bring some surprises, with encrypted sparse bundles looking resilient to small amounts of corruption.
Looks at plain text, CSV, XML, JSON, RTF, RTFD, .docx, .xlsx, and PDF. Which should you trust with your important documents in archives?
Which format – alongside Camera Raw – should you store archived images in: JPEG, PNG, TIFF or Apple’s new HEIC?
Simply having ECC enabled doesn’t mean that damaged files can be recovered. It appears promising, but needs careful real-world evaluation.