The PDF format has been designed to be rich in content, which can include:
- Images, which can be in formats which have been used to exploit vulnerabilities, such as JPEG.
- URIs, which can link to malicious sites.
- Embedded files, which could be executable malicious code.
- JavaScript objects, which can exploit vulnerabilities in the reader app in particular.
These make PDF a good vector for the transmission of malware, and its opaque format offers the malware developer effective ways of concealing a wide range of malicious components.
Security and exploitation
PDF documents have long been used as a vector for the distribution of malware products. These first came to prominence in 2008, when there was a spate of malicious PDFs which attacked PCs. In 2011, OSX/Revir-B was delivered in what posed as a Chinese language PDF, but turned out to be a Trojan. Most recently the Trojan named ‘Komplex’. which was used in targeted attacks, has been hidden in a PDF which appears to detail aspects of a Russian aerospace programme.
macOS normally scans PDF documents using XProtect before opening them, just as it does with JPEG images. Here’s a log excerpt showing XProtect being called to check a PDF file before it’s opened in my app Podofyllin, after double-clicking the document in the Finder:
30.360916 LaunchServices LaunchApplication: appToLaunch={ "ApplicationType"="Foreground", "CFBundleExecutablePath"="/Applications/Podofyllin.app/Contents/MacOS/Podofyllin", "CFBundleExecutablePathDeviceID"=16777220, "CFBundleExecutablePathINode"=16767785, "CFBundleIdentifier"="co.eclecticlight.Podofyllin", "CFBundleName"="Podofyllin", "CFBundlePackageType"="APPL", "LSApplicationLockedInStoppedStateKey"=true, "LSBundlePath"="/Applications/Podofyllin.app", "LSBundlePathDeviceID"=16777220, "LSBundlePathINode"=16767779, "LSExecutableFormat"="LSExecutableMachOFormat", "LSLaunchedInQuarantine"=true } modifiers: { "AddPSNArgument"=true, "LSAdditionalEnvironmentVars"={ }, "LSLaunchAsync"=true, "LSLaunchStopped"=true, "LSLaunchStoppedTemporarily"=true } args=[ NULL ]
30.373097 XprotectService Calling SecAssessmentCreate with URL <private>, context <private>
30.392467 XprotectService SecAssessment results: <private> (null)
30.394549 AE AESendMessage(aevt,odoc <private>
30.395284 Podofyllin AE RECEIVED:(aevt,odoc) <private>
following which Podofyllin opened the document as requested.
That said, XProtect doesn’t appear to be capable of detecting Revir-B, although its A strain is detected, and Apple’s recent bad habit of obfuscating the identities of malware makes it impossible to determine whether there is any protection now in macOS against Komplex.
Protection against JavaScript exploits is rather better, though: Preview and the Quartz engine don’t appear to run any JavaScript content in PDFs, and any opened automatically within Safari should be subject to that browser’s more general JavaScript policy and protection. However, most Safari installations automatically open PDF documents within the browser, rather than saving them to disk. Malware doesn’t yet seem to have tried to exploit this widespread behaviour.
If you open PDFs received in unsolicited messages, or those on remote sites of doubtful credentials, then you are at significant risk of being exposed to malware within them, and should ensure that you are well protected and suitably cautious. PDF documents remain a vector for the delivery of malware.
Integrity of PDFs
PDF is a complex file format, and, as I have remarked here, any intended or inadvertent changes to the data in a PDF are liable to make the document unusable, and its content inaccessible, unless they are made by a good PDF editor.
This makes it difficult, but by no means impossible, to tamper with the contents of a PDF document without leaving traces which should be apparent in the file’s data. However, the opacity of PDF files is sufficient to let forgeries appear completely convincing. For example, text content can be edited to change its sense completely. Evidence of that editing may appear in the PDF data, but not in the document as seen by the user.
Many users trust a system in which they add an image of their signature as an annotation to a PDF document to ‘sign’ it – a feature of Preview, for example. What they don’t always appreciate is that doing so doesn’t prevent that document from being edited later, making it appear that they have signed up to something very different.
Digital signatures should prevent such tampering from taking place. They use security certificates to create a hash of the PDF document which will verify that it hasn’t been tampered with, in a similar way to developer certificates and code signatures. However, digital signing isn’t free. Services such as DocuSign make it relatively inexpensive, and convenient to use. They also cannot be applied to documents by the macOS Quartz engine, so require a compatible PDF app such as Adobe Acrobat.
If a PDF hasn’t been digitally signed, you can’t put trust in its contents without carrying out careful forensic analysis of its data.
Forensics and hidden data in PDF files
PDFs are widely used for official documents, and pose a significant threat of unintentionally releasing content. This has most commonly occurred when attempts to redact sections have been incompetent and incomplete.
Redacting PDFs requires a proper tool and meticulous attention to detail to ensure that the redacted content isn’t left behind in the file for others to discover and unmask. I have aready looked in detail at performing redaction using Adobe Acrobat.
The forensic analysis of PDF documents is a specialist area in which there are tools which claim to be able to reconstruct previous versions of a PDF which has been edited, and other tools can recover hidden content which the user thought had been removed.
Annotations are particularly amenable to forensic analysis because of the way in which apps save them in the PDF file. In most cases, annotations are appended to the end of the document data in the order in which they are added. Careful analysis of the data can therefore provide quite a detailed history of that document since it was first created. Few users are aware of this potential, as it is hidden in the opaque data in the PDF file.
One simple process which can remove that history is to write the whole document out as a fresh file, something which is unusual in the case of PDFs. However, as with redaction, it can be very difficult to tell what remains hidden in PDF data.