Podofyllin now exports Rich Text and delves deeper into PDF documents

Having added support to my Rich Text editor DelightEd to open PDF files as Rich Text, I promised to add a matching feature to Podofyllin, so that you can export the whole of a PDF in Rich Text format. I hope that is what I have accomplished in this new version.

There are limitations, unfortunately. PDF can draw each word, even individual characters, quite separately, and doesn’t use block mark-up similar to RTF. So you won’t transfer all the style information that you might hope for. Despite that, I think that you should find it a useful additional feature.

I have also been working further on Podofyllin’s internals, in which it parses and analyses PDF content independently of the macOS PDFKit (Quartz 2D) engine. This new version analyses lists of children in Pages objects, so that it can now start building their structural tree. I have accordingly restructured the source window, dividing it into three views rather than two.

podofyllin33

The top view shows the structure and objects of the PDF as seen by macOS, using Quartz 2D; the view below that uses the same format to display the contents of the PDF document as saved to disk. The new bottom view contains Podofyllin’s analysis of both of those.

At present, this analysis is limited, and includes:

  • The version of PDF used.
  • The ‘official’ page count of the final document.
  • Numbers of entries in the Info and Catalog dictionaries.
  • The total number of objects.
  • The total number of page objects (which may exceed the final page count).
  • Numbers of Catalogs, Info objects, and Trailers; although usually one of each, again these can be multiple, particularly when a PDF has been saved incrementally.
  • The number of versions of the PDF content found within the file.

The last four items are given separately for the original file on disk and the PDF as seen in macOS, which can be simpler or more complex, and represents the PDF when ‘flattened’ for saving.

Finally, Podofyllin has been ported to Swift 5 and Xcode 10.2, which proved fairly painless.

Version 1.0b15 of Podofyllin is available from here: podofyllin10b15
and in Downloads above.

It’s my intention now to focus on analysing the structure of the PDF in more detail, looking for example for orphaned pages and other objects. If you have any other feature requests for Podofyllin, please add them as comments below so that I can consider them for the next release.