Q&A: Fragmented old PDF

Q I needed to access an old and long document which was created using Microsoft Word 2008. For simplicity I opened a PDF file made from it, but cannot get any sense from that file: when I copy paragraphs of text from it, they paste in with each word on its own line. How can I get this text to flow into its original paragraphs?

A This was a bug which appeared when Word 2008 turned a document using certain fonts into PDF. The resulting file contains each word as a separate chunk of PDF, breaking its flow, and making it impossible to copy and paste into other applications.

One of the offending fonts was Cambria, a TrueType font that not only came bundled with Word 2008, but had become the document default for many of its styles. So unless a Word user specifically chose different settings, the body text of their documents used Cambria, and PDF files generated from them prove unusable when copied elsewhere.

The simplest solution is to go back to the original Word document. If you want to create a fresh PDF, select all the text, and change its font to an established and bug-free one such as Times New Roman. If you want to copy the content, it would be much easier for the document to be saved as plain text or Rich Text, which will then even bypass the potential complexities of PDF.

Comments Remember that the primary purpose of Acrobat PDF is to preserve the layout and appearance of the page, not to give easy access to its content. Even when unaffected by bugs such as this, PDF is far from ideal when you need to access the text content of a document, but plain text and similar basic formats are much more amenable to such use.

Updated from the original, which was first published in MacUser volume 26 issue 11, 2010.