Files in macOS have a lot of attributes, or metadata, which provide important information, such as the date that they were last modified. It’s not always clear which of these should be expected to change when different parts of a file change. This article explains a bit more about some of these which could readily confuse.
When you change the data within a file, its date of last modification is also altered. This large PDF file hasn’t been modified since it was created almost seven years ago, but macOS also tracks when it was last opened, through an extended attribute (xattr) of type com.apple.lastuseddate
, and the datestamp there is used for the Last opened field.
If you were to change some of that file’s attributes (metadata), such as its permissions, that also doesn’t alter the date of modification. As you do that without even opening the file, changing permissions alone won’t affect the Last opened date either.
Some apps, including my utility for checking the integrity of files, Dintch, don’t touch the data in a file, or its regular attributes like datestamps or permissions, but instead write to xattrs. In Dintch’s case, it calculates the SHA256 digest (checksum) of a file and writes that to a custom xattr.
Changing a file’s xattrs doesn’t alter the data in the file itself. Xattrs aren’t even stored in the main data area of a volume, but apart, in the file system metadata. So adding or changing a xattr won’t change the date of modification, nor the date that the file was last opened. In fact, unless you look at the xattrs using a utility such as my xattr editor xattred, you won’t even notice what’s happened.
Some users have noticed that changing permissions or xattrs does normally trigger a fresh copy of that file to be made when Time Machine next performs a back up. This is a particular concern when you change a lot of permissions, by resetting or repairing Home folder permissions, or tag a large folder using Dintch. But the same doesn’t happen when you use a third-party backup app instead of Time Machine. How can that be?
This all depends on the method that the backup system uses to determine which files need to be backed up. Most third-party tools, like ChronoSync, look at file modification dates to determine whether a file needs to be backed up. If that date is more recent than the last backup, that file will be backed up in the next backup. As changing permissions or xattrs doesn’t affect file modification dates, those actions won’t cause the file to be backed up again.
Time Machine is different. Instead of looking at file modification dates, it looks in a database stored on each volume which records all the events which occur in the file system, FSEvents. As actions on the file’s attributes and its xattrs are recorded in FSEvents, they will normally result in a fresh backup being made of the file, even though its data hasn’t been changed. The reasoning behind this is that, in a Time Machine backup, when a file isn’t backed up, a hard link is instead created to its previous copy. As that also connects to its old permissions, other attributes, and xattrs, it isn’t an accurate copy of the file as it has now become. There’s no way to make that hard link but add on the changed permissions or xattrs, for instance. That’s one drawback of conventional backup tools, compared with Time Machine.
It’s easy to demonstrate this on a folder which is being backed up hourly. Put a large document, such as my 7.9 MB PDF, in that folder, and let it get backed up the first time, as it stands. Once that has been done, change its permissions or xattrs, and watch it being backed up again an hour later.
I have one folder which is backed up every hour by both Time Machine and ChronoSync. It’s worth comparing what they did when I twiddled with that large PDF’s xattrs. Here are the log messages extracted from each of the Time Machine backups:
13:05 Copied 8 items (7.9 MB) from volume External1. Linked 494. Moved 0
14:05 Copied 5 items (7.9 MB) from volume External1. Linked 55. Moved 0
15:05 Copied 5 items (7.9 MB) from volume External1. Linked 55. Moved 0
ChronoSync made only one backup copy of that file over the same period:
13:05 Scanned: 220755 Processed: 3 Data copied: 7.91 MB Skipped: 0 Errors: 0
14:05 Scanned: 220755 Processed: 1 Data copied: 0 bytes Skipped: 0 Errors: 0
15:05 Scanned: 220755 Processed: 1 Data copied: 0 bytes Skipped: 0 Errors: 0
At 15:10, that PDF file was safely in both Time Machine’s backups and ChronoSync’s, and its data were identical, but only Time Machine showed the changes in xattrs which had taken place. The cost to the Time Machine backup was significant though: it had three complete copies of identical data. Scale that up to a whole Data volume and that could amount to tens or even hundreds of GB.
On Sunday I’ll explain how this advantage in Time Machine turns sour, and wastes your backup storage. In the meantime, feel free to work out for yourself how this could backfire.