PDF without Adobe: 13 PDF documents can readily leak data

The PDF document format is extremely complex, and like other complex document formats is prone to retaining content which you thought had been removed. This article looks at the risk of different macOS PDF apps leaking information by accident.

The best-known example of this is in redaction. Many embarrassing news stories have been based on PDF documents which were assumed to have had sensitive information stripped out, but careful investigation discovered the redacted content intact. I look here at three actions which you can take to remove or obscure content in a PDF document: redaction, which is designed to remove content so that it can’t be recovered, simple removal of text by deletion, and deletion of PDF annotations.

The apps which I have tested are:

  • Preview 10.1 (Mojave 10.14.3), which doesn’t offer redaction or editing of content, only annotations;
  • PDF Expert 2.4.22, from the App Store, which is full-featured;
  • PDFpenPro 10 10.2.2 from the App Store, which is full-featured;
  • Adobe Acrobat (Pro) DC 19.010.20098, which forms the benchmark.

Redaction

To test this, I redacted all the text in a single-page PDF containing a single paragraph. Before editing, the contents of that page were 2075 bytes compressed (flated). Following redaction, I then saved the document as normal, although Acrobat offered to “find and remove hidden information” in the document, an offer which I accepted. Neither PDF Expert nor PDFpenPro offered anything prior to a normal save.

All three apps wrote PDF documents in which the redacted text content appeared to have been removed, although they adopted slightly different techniques to achieve that. Both PDF Expert and PDFpenPro offer commands to save the document ‘flattened’, and I would recommend anyone using either of those two apps to do this following redaction, as it appears to write cleaner files in which all traces of redacted content have been removed.

Adobe Acrobat offers additional options to clean redacted files which may inspire even greater confidence that all redacted content has been removed, and there is nothing left to cause embarrassment. These are best-suited to government and large organisations who need a more extensive audit trail, perhaps.

Redaction remains a minefield for PDF documents. Longer documents contain huge quantities of data distributed in hundreds or thousands of different objects, most of it in compressed format so not easily checked with a text editor, for example. If I were trying to redact very sensitive information, I’d be keen to check it through using a forensic tool before putting my trust in it.

Simple deletion

If you need to be confident of the removal of text from a PDF document, then don’t trust deletion in a PDF editor. Here, I took the same single-page PDF document that I used to test redaction, simply selected all the text in its single paragraph and removed it. For many users, this is not suitable for a redaction method anyway, as it doesn’t show the redaction, just an area of missing text.

The approach taken by both PDF Expert and Adobe Acrobat retained the deleted content in its entirety, but added an amended page with the text removed from it. Only PDFpenPro rewrote the original object which had contained the text content, in edited form with that text removed.

Using Save As or flattening a document after simple deletion should purge the original content, but no app suggested doing that to enforce the effect of the deletion.

Apart from when using a proper redaction tool, users should assume that all content removed from plain sight is still retained in the document’s data, and can be recovered by forensic analysis of the file.

Removal of annotations

Annotations are often used during the development of documents for release or publication, and may contain comments and other material whose release would be embarrassing or worse. An important step late in PDF production is the removal of such notes and annotations, and the user needs to be able to trust their PDF editor to strip these out thoroughly.

At last, Preview makes its entrance, as it has extensive annotation tools. Unfortunately these don’t always implement annotations cleanly. Adding a single short note to a 12 KB PDF swells that document to 35 KB, which is astonishingly inefficient. The good news is that the note contents aren’t saved in plain text form, but neither are they removed when deleted, so could still be recovered.

PDF Expert, PDFpenPro and Adobe Acrobat all store notes in PDF documents so that their contents are in plain text. All someone needs is a good text editor, and they can browse the contents of all those annotations. Not only that, but when you think you have removed an annotation, its plain text content is left in the document. This is because those three apps modify the end of the PDF file to ensure that the annotation isn’t displayed, without removing its content from the data. Again, using Save As or flattening the PDF file should cleanse these remains.

PDF Expert is the only one of these apps which offers a single command to remove all annotations from a document. Although that works very well for what you see, it too doesn’t strip note contents from the file.

Conclusions

PDF is a dangerous document format, in which old content can remain hidden for the lifetime of a document unless the user forces the entire file to be restructured and rewritten. Simple deletion of content and removal of annotations should never be trusted to do anything more than remove those items from view. At the very least, you should Save As or save to ‘flattened’ form if you want old content stripped. And there isn’t any way to check that it has been removed from your document after all that.

If you can, avoid trusting any secrets or other sensitive information to PDF. It may come back to bite you.