DelightEd now opens the rich text content of PDFs, as well as HTML, RTF and text files

Several here have been asking about extracting Rich Text from PDF documents, something which you can already do by copying text from them in Podofyllin. Prior to my adding this as a new feature to Podofyllin, I thought it would be a worthwhile addition to my Rich Text editor DelightEd.

If you’re already using DelightEd version 1.1, you’ll know that it can open not only RTF files but HTML and plain text too, and can save documents to each of those formats. One aspect of this which bothered me was saving over an existing HTML file, and this is more critical when dealing with PDFs.

So this new version of DelightEd adds a safeguard when saving an open HTML document for the first time: this is treated as a Save As command, and you are prompted to enter a new filename. Of course if you want to overwrite the original HTML file, that is easy to do, and this only happens when you try to save for the first time – it doesn’t keep nagging you.

I have built in identical protection when opening PDF documents, where it is even more important because of the extensive changes which DelightEd will make to most PDF. However, when you open a PDF or HTML document whose contents you want to preserve, you should save it immediately to a different filename. If you don’t, the autosave feature may sooner or later silently overwrite your original file.

There doesn’t appear to be an easy way around this. If I disable autosaving, then DelightEd can’t save versions of its documents, which is a significant loss when you’re editing RTF. So on balance, I think that it’s better to leave autosaving enabled, and to leave the user to ensure that they don’t overwrite a precious original, just as in any other app with autosave.

delighted120

When you open a PDF document in DelightEd, the app scans it page by page, and extracts as much styled text as it can find. Because of the way that PDF handles text styling, this doesn’t capture the full layout and styling, but is still a great advance on working with plain text. Opening very large PDFs does take quite a while: one of my test documents is over 1500 pages, and spins the beachball for quite a while, but eventually opens its nearly half a million words.

Unfortunately, because of some issues with the way that macOS generates PDF, saving a PDF from a document in DelightEd results in something rather odd: the saved document becomes one huge page, and isn’t paginated according to your page setup. In the case of my large test document, that’s a single page which is almost 600,000 pixels deep. If you want to save a paginated PDF document, use the Print command to Save As PDF instead.

So my recommended workflow for importing Rich Text from an existing PDF document is to open a copy of that document in DelightEd, and save that on opening in RTF before working on it. This ensures that the original document is preserved intact, and that your editorial work on its Rich Text conversion is saved to the macOS version system.

At the moment, both saving to PDF and printing use the appearance of the document window. If that’s in Dark Mode, then any printed or PDF version will remain in Dark Mode too. If you want a normal Light Mode PDF or print, switch the document window to Light Mode first, using Command-3. This does, though, have its benefits: currently PDF doesn’t have any mechanism of rendering documents in Dark Mode, but if you save them from DelightEd, you can create a version which is fixed in Dark Mode. I don’t know of any other PDF tool which will do this, and will look at adding it to Podofyllin as a feature too.

DelightEd version 1.2 for Sierra, High Sierra and Mojave is now available from here: DelightEd12
and from Downloads above.

Share this:

Related