Let’s play unintended consequences, in this case with macOS security and privacy protection, the file system and Time Machine backups. Although this is quite a complex and technical game, I promise you a reward at the end: plenty of your backups could be wasted space. How could you possibly refuse?
The chain starts with security and privacy protection, which in recent versions of macOS have taken to using extended attributes a great deal. One of the commonest uses of Apple’s bundled Preview app is to read PDF documents, which is a default set in macOS. Every time that Preview opens a PDF document, it marks that document by writing a quarantine flag to it, if that hasn’t already been done. That happens regardless of whether you save that document or not. Apple has never explained this behaviour, but as the quarantine flag is used to mark potentially malicious files, we must presume that this is for security. Even when that PDF was created on that same Mac and has never left its storage, Preview follows this same behaviour.
In Catalina, there’s another even more puzzling reason for an extended attribute being written to a document. This time it appears to be part of the per-document privacy controls which were introduced in 10.15. The extended attribute involved is named
com.apple.macl, and like all those quarantine flags, has never been mentioned let alone documented by Apple.
As I explained previously, fiddling with a file’s extended attributes has complicated consequences. It doesn’t change important file attributes, like the date of last modification, and extended attributes are even ignored when reporting a file’s size. Third-party backup apps generally rely on the file modification date to determine whether a file needs to be backed up: if that date is more recent than the last backup, then the file is included in the next backup. One unfortunate consequence of this is that changes to a file which don’t alter its modification date, such as changes to its extended attributes, don’t normally result in a new copy of that file being made in the backup.
Time Machine has to work differently. This is because it can do one of two things for a file which is included in the volume or folder it has to back up: it can either save a complete copy of that file, including its attributes and extended attributes (metadata), or it can create a hard link to the previous version of that file. If a file’s metadata change, then the hard link will not reflect that change, but will show a previous version of the metadata, which isn’t an accurate reflection of the file’s state at the time of the backup.
Time Machine has a cunning system which works around this problem. Instead of using the modification date, it tries to use the FSEvents database, which logs all changes which are made to files on each volume, including changes to file metadata.
When metadata used to change relatively infrequently, this had little in the way of adverse effects. Now that security and privacy protection are doing so much with extended attributes, the unintended consequence is that many of the files which are copied into each Time Machine backup haven’t actually changed in substance, but a quarantine flag has been added, for instance.
It’s easy to demonstrate this in action if you’re making Time Machine backups. Simply create a sizeable PDF file which doesn’t have a quarantine flag attached to it, or strip the flag from a file which already has one. Leave the file alone for the next automatic backup. After that, open the document using Preview, which will in a fraction of a second automatically write a quarantine flag to it. Leave it for the next automatic backup, and that backup will contain a second copy of that PDF which only differs in that quarantine flag, maybe as little as 31 bytes in all. Imagine this happening to many 10 GB movie clips and you see where this is heading.
The saving grace is that apps like Preview which write quarantine flags so frequently shouldn’t normally replace an existing quarantine flag, and some third-party apps now write quarantine flags to every PDF file which they save. Apple’s new
com.apple.macl extended attribute isn’t as constant, though, and can lead to multiple otherwise identical copies of the same files being backed up.
The amount of space being wasted in Time Machine backups by unnecessarily duplicated files is hard to estimate, and will vary widely between users. If you regularly acquire or generate PDFs which don’t come with quarantine flags attached, and open them all using Preview, then you’re likely to have duplicates of pretty well all your PDFs in your Time Machine backups.
Maybe the time has come for Time Machine to ignore FSEvents records for some changes to extended attributes. When 31 bytes of metadata cause many MB or GB to be backed up, there’s clearly something wrong with the system, and security is tripping up Time Machine.