Discovering hidden information in PDF documents can be incredibly easy. Take, for example, the response filed early in January to the special counsel’s allegations that Paul Manafort lied to prosecutors, which you can obtain from here. Open that document in my free PDF reader Podofyllin, and every single character which was been ‘redacted’ from it is clearly visible in its text view.
The reason for this is that the attempt to ‘redact’ the original PDF was performed using an open source tool developed jointly by Foxit and Google, PDFium, which doesn’t actually support true redaction at all. Instead, the person who edited the file drew black boxes over the text which they wanted to redact. Use a normal PDF viewer and it looks like the underlying text has gone, but in fact it’s still there, as revealed so simply by Podofyllin.
Podofyllin can perform the same revelatory analysis of court documents served in February 2017 in the case of Six4Three against Facebook, which give insight into Facebook’s knowledge about its privacy gaps.
But this is easy stuff. What about discovering complete prior versions of a PDF document, including orphaned content which isn’t now visible, but remains hidden in the PDF file?
That is possible in some PDF documents, thanks to the use of incremental saving. When PDF has been edited but only saved using the regular Save command, rather than Save As or saving ‘flattened’, many apps amend the existing source in the file by adding changes to the end. In the case of the Manafort and Facebook documents above, at least the last save had re-written the whole PDF file, so those can’t reveal their editing history, but many PDFs can.
Here’s an example PDF, with uninteresting content I’m afraid, in which incremental saving has left five versions, including the current one, in its file.
Discovering and saving such old versions of PDFs is the new feature in the latest version of Podofyllin, 1.0b10, which is now available from here: podofyllin10b10
and from Downloads above. It’s a single menu command, with all the work being done by the app, which also detects whether the PDF is likely to have used incremental saving.
These are all useful tools for viewing the results of other people’s mistakes, but they’re also invaluable for anyone who edits PDFs. Few PDF editors use the macOS version system, and Podofyllin may be your only way of reverting to a previous version of a PDF which you’re editing. It’s also an excellent way of checking that the PDF you have been preparing for release or publication is ready for others to view, or whether it still contains content that mustn’t be included.
Postscript:
Versions prior to 1.0b10 appear to work fine in Mojave, and I believe are good in High Sierra too. However, they don’t work properly on Sierra, due to different behaviour in macOS. This should have been corrected in version 1.0b10. I am extremely grateful to EcleX, who tested these out on Sierra for me, and enabled me to work around these issues.