One reason for there being so many gaffes with PDF files – where sensitive information is disclosed inadvertently – is that it’s so hard to see inside a PDF document. Open one in a good text editor like BBEdit, and almost its entire contents will be binary junk. Even scrolling through it looking for errors is well night pointless.
Alternatives, such as the structure view in Adobe Acrobat Pro, don’t reveal all the data which could be hidden inside a PDF, because they rely on the file’s own Catalog, and will miss orphaned content, for example.
One goal that I have for Podofyllin, my free PDF viewer, is that it will make it easier to check what really is inside a PDF document, and help prevent the errors which are gross enough to reach the news every few weeks – like that in the Paul Manafort case. And I now have another new version to offer which takes a big step along the road to making PDF more comprehensible.
Open the Source window for any PDF document in Podofyllin, and instead of drowning in streams of noise, you can now see its structure and every object, including any which have become orphaned. The app parses the source of the PDF, strips out unreadable binary streams, and displays its text content with syntactic colouring which looks particularly good in Dark Mode.
Two versions of the source are provided: that in the lower of the two views shows exactly what is in the PDF file, and that in the upper view shows how Quartz 2D in macOS renders it internally, which is the same as will be exported by Podofyllin as ‘flattened’ PDF. Both views now support the Find command, so you can search for buried text quite easily.
Podofyllin version 1.0b11 for Sierra, High Sierra and Mojave is now available from here: podofyllin10b11
and from Downloads above.
Now that Podofyllin is starting to parse the PDF properly, I will be progressing that to try to detect issues such as orphaned objects, and hopefully to recover text from compressed streams. By popular request, I will also be looking at saving window sizes and positions in a future version.