It’s common to need to share documents in which some of the content must be held back. If you work in areas such as law, public administration, education, or healthcare you will be aware of the frequent need to redact or obscure information from documents which have to be made public, or at least passed to those who musn’t see some of their details.
If you ever post images or documents to social networks, such as Facebook, Twitter, Linked In, or even a blog, you should also be very mindful of what might need to be held back or removed from them.
Redaction and obscuring are also common points of failure. Earlier this month, a US school board released a report about a pupil who shot dead 17 people in a Florida High School. It contained a lot of private information which couldn’t be released, and the school board thought that they had redacted it. However, much of that information had actually been left in the document, which was then used by a newspaper as the basis of a long article about the gunman.
You don’t have to look hard to find far worse errors. In 2009, a major UK bank inadvertently released many credit card and other financial details in bankruptcy forms which were filed electronically. Later that year, the US Transportation Security Administration published a manual about its screening methods which unintentionally exposed details which it had intended to keep secret. And in 2014, the New York Times published the name of an NSA agent which should have been redacted from a collection of leaked security documents.
In a recent exchange on Twitter, @SwiftOnSecurity noted that Microsoft Word doesn’t have a redaction feature, and invited suggestions for alternatives to using a paid-for version of Adobe Acrobat to perform this task. Many responses suggested printing documents out, using the time-honoured black marker, and scanning them back in – which isn’t as crazy as it might seem.
The problem with trying to remove content from modern documents is their complexity. Word processor and other formats can include a great deal of old content which you can’t see in the app’s view of that document. Even when you might think that you’ve obliterated the words that can’t be released, they may still lurk there, somewhere inside the file.
This is most obvious – and has caught most unawares – in old Microsoft Word formats, which often contain chunks of text which were removed from the document many edits ago, but have somehow lingered on in some unshown section of the document’s file. Apps which are trying to be smart might save changes for some future undo action, and copies of the pasteboard/clipboard at the time that they’re saved. For someone who knows their way around that document format, it is chilling how much history can be recovered.
The first question to ask is what medium you need to publish or pass on. There are some – video and other complex formats – which are so dangerous that you need the advice of a specialist if you’re going to do the job properly. It might seem simple to pixellate out someone’s face, or just edit out a short section of movie, but ensuring that is done properly requires knowledge and skill. Work with someone who does it for living to get the best results.
Audio is another minefield of file formats and potentially extractable data. If you have to supply audio, you can edit it and use a ‘flat’ format in relative safety, but if you need to use acoustic filters or a format which you don’t know well, consult a professional rather than getting it wrong.
Words and images may seem simple, but they are the ones which catch most people out. Your next question should be in what format you are going to deliver the redacted documents.
PDF has become very popular, as it was intended for exactly this type of task. But simply drawing black boxes over content you wish to protect will do absolutely nothing, and is one of the commonest causes of redaction failure. If you can edit that PDF, so can the recipient, and the chances are they will have far better tools for doing so.
Paid-for versions of Adobe Acrobat, and some third-party PDF editors, include a proper redaction tool which can ensure that others can’t see the content you need to keep private. But PDF is also a complex document format in which it is easy to hide unseen text and images. PDF can also readily hold metadata, which will not be visible in the body of a document.
I will work through redacting a PDF document in another article, but suffice it to say that the redaction itself is only part of the process: you must also check that any hidden text or other content is completely removed before you can consider the job properly done.
Most other binary, proprietary, and even open file formats are potential nightmares, except for simple, flat image formats such as JPEG, and of course good old plain text.
Images pose two problems of their own. The first is the image format to be used. Popular formats like JPEG and PNG don’t maintain the layers which some image editors produce during editing, but those used by most image editors, such as Photoshop, normally do. If you black out part of a document and then deliver it in a format which preserves its layers, you could discover that it is a trivial task to remove your black overlays.
Blacking out sections of an image as if still using a black marker is unsightly, and most people prefer to blur sensitive information. I do this frequently for images which I post here, which may also appear in more public places like Twitter, and when submitting screenshots for publication in MacFormat and elsewhere.
Blurring needs to be performed with care, and checked carefully. Think criminal, and see what you could read into your blurred box, particularly with the help of a little knowledge and unblurring software. In the above example, you can guess that my Startup Disk is named Macintosh HD, which was blurred at the default setting and is a likely choice in any case. Hopefully, the more extensively blurred Serial Number is not as easy to guess.
Finally, as you’re about to send someone a file from your Mac, ensure that it isn’t taking other things with it. Saved versions aren’t transferred with files when they are copied or moved elsewhere, so at least you shouldn’t need to worry about them, unless your editor itself stores such information in the document. But the file could have extended attributes (xattrs) containing metadata or thumbnails (images) which need to be checked before release: my free editor xattred is a reliable way of doing that.
There are document management systems which have sophisticated features for protecting or redacting sections of the documents which they maintain. If you work for a large organisation which often has to handle these issues, they could be a good way to go, but are aimed at corporate users rather than the likes of you and I.