Skip to content

The Eclectic Light Company

Macs, painting, and more
Main navigation
  • Downloads
  • M1 & M2 Macs
  • Mac Problems
  • Mac articles
  • Art
  • Macs
  • Painting
hoakley March 29, 2020 Macs, Technology

Last Week on My Mac: How Time Machine backups waste space

Let’s play unintended consequences, in this case with macOS security and privacy protection, the file system and Time Machine backups. Although this is quite a complex and technical game, I promise you a reward at the end: plenty of your backups could be wasted space. How could you possibly refuse?

The chain starts with security and privacy protection, which in recent versions of macOS have taken to using extended attributes a great deal. One of the commonest uses of Apple’s bundled Preview app is to read PDF documents, which is a default set in macOS. Every time that Preview opens a PDF document, it marks that document by writing a quarantine flag to it, if that hasn’t already been done. That happens regardless of whether you save that document or not. Apple has never explained this behaviour, but as the quarantine flag is used to mark potentially malicious files, we must presume that this is for security. Even when that PDF was created on that same Mac and has never left its storage, Preview follows this same behaviour.

In Catalina, there’s another even more puzzling reason for an extended attribute being written to a document. This time it appears to be part of the per-document privacy controls which were introduced in 10.15. The extended attribute involved is named com.apple.macl, and like all those quarantine flags, has never been mentioned let alone documented by Apple.

As I explained previously, fiddling with a file’s extended attributes has complicated consequences. It doesn’t change important file attributes, like the date of last modification, and extended attributes are even ignored when reporting a file’s size. Third-party backup apps generally rely on the file modification date to determine whether a file needs to be backed up: if that date is more recent than the last backup, then the file is included in the next backup. One unfortunate consequence of this is that changes to a file which don’t alter its modification date, such as changes to its extended attributes, don’t normally result in a new copy of that file being made in the backup.

Time Machine has to work differently. This is because it can do one of two things for a file which is included in the volume or folder it has to back up: it can either save a complete copy of that file, including its attributes and extended attributes (metadata), or it can create a hard link to the previous version of that file. If a file’s metadata change, then the hard link will not reflect that change, but will show a previous version of the metadata, which isn’t an accurate reflection of the file’s state at the time of the backup.

Time Machine has a cunning system which works around this problem. Instead of using the modification date, it tries to use the FSEvents database, which logs all changes which are made to files on each volume, including changes to file metadata.

When metadata used to change relatively infrequently, this had little in the way of adverse effects. Now that security and privacy protection are doing so much with extended attributes, the unintended consequence is that many of the files which are copied into each Time Machine backup haven’t actually changed in substance, but a quarantine flag has been added, for instance.

It’s easy to demonstrate this in action if you’re making Time Machine backups. Simply create a sizeable PDF file which doesn’t have a quarantine flag attached to it, or strip the flag from a file which already has one. Leave the file alone for the next automatic backup. After that, open the document using Preview, which will in a fraction of a second automatically write a quarantine flag to it. Leave it for the next automatic backup, and that backup will contain a second copy of that PDF which only differs in that quarantine flag, maybe as little as 31 bytes in all. Imagine this happening to many 10 GB movie clips and you see where this is heading.

The saving grace is that apps like Preview which write quarantine flags so frequently shouldn’t normally replace an existing quarantine flag, and some third-party apps now write quarantine flags to every PDF file which they save. Apple’s new com.apple.macl extended attribute isn’t as constant, though, and can lead to multiple otherwise identical copies of the same files being backed up.

The amount of space being wasted in Time Machine backups by unnecessarily duplicated files is hard to estimate, and will vary widely between users. If you regularly acquire or generate PDFs which don’t come with quarantine flags attached, and open them all using Preview, then you’re likely to have duplicates of pretty well all your PDFs in your Time Machine backups.

Maybe the time has come for Time Machine to ignore FSEvents records for some changes to extended attributes. When 31 bytes of metadata cause many MB or GB to be backed up, there’s clearly something wrong with the system, and security is tripping up Time Machine.

Share this:

  • Twitter
  • Facebook
  • Reddit
  • Pinterest
  • Email
  • Print

Like this:

Like Loading...

Related

Posted in Macs, Technology and tagged backup, extended attributes, file system, FSEvents, metadata, PDF, Preview, Time Machine, xattr. Bookmark the permalink.

21Comments

Add yours
  1. 1
    Martin on March 29, 2020 at 7:57 am

    Interesting. But I think it’s an advantage of Time Machine to also backup extended attributes. IMHO Apple should rethink the mechanism to make this more efficient than it’s right now. Maybe they already have, but considered it too expensive to do for the benefits to gain?

    LikeLiked by 1 person

    • 2
      hoakley on March 29, 2020 at 8:42 am

      Thanks.
      I agree that backing up xattrs is worthwhile when they carry useful info.
      Unfortunately, that’s no longer the case for quarantine flags on documents. To have an entire 100 GB file backed up because 31 bytes of meaningless xattr have changed is frankly stupid.
      Howard.

      LikeLike

  2. 3
    cwinte on March 29, 2020 at 12:11 pm

    How well is the current filesystem architecture suited to handling the attributes distinctly from the file content? Sounds rather like the old data & resource fork, but the metadata around a file sounds one like it should reside in the filesystem rather than the file proper… That way as OS versions change they can copy, migrate and use or ignore the attributes as the OS needs or wishes.
    How do all the extended attributes behave when the file is on networked/dropbox/fileshares on non macOS? It all seems something of a minefield!

    LikeLiked by 1 person

    • 4
      hoakley on March 29, 2020 at 12:29 pm

      Thank you.
      Yes, xattrs are stored in the file system metadata. I’ve written several articles here looking at how they’re handled in different situations, including iCloud and some other file systems. Most now preserve most xattrs, but it is complex.
      Howard.

      LikeLike

      • 5
        cwinte on March 29, 2020 at 12:47 pm

        Thanks for the eclectic illumination. I don’t quite see why TimeMachine backs up a file due to metadata changes. Surely it really wants to keep versioned FS metadata.
        Do you get the impression Cupertino is actually clear and consistent around what the flagging is about, what its implications are and if us users should be able to have some choices?
        Sounds like the quarantine flag does little of real use, but I’d be glad to hear of examples.
        All I’ve ever had is one more step I must perform to do what I wanted to anyway. I get that not all users always know what they might be allowing or every security implication, but if we are forced into routine bypassing when we know there is no risk (a constant yelling of Wolf, wolf!) then we are more likely to fall prey if a real wolf arrives… To that end Apple should surely add minimal extra cautions rather than splash them everywhere.

        LikeLiked by 1 person

        • 6
          hoakley on March 29, 2020 at 2:20 pm

          Time Machine determines what to back up on the basis of FSEvents. When a file’s attributes or xattrs change, then that’s recorded as a change in FSEvents, therefore results in the file being backed up. But as TM backs up files, not deltas or blocks, the only thing it can do is back the whole file up – hence the problem.
          The quarantine flag on documents isn’t, AFAIK, documented for developers, and users aren’t supposed to know anything about it either. As with all these recent security measures, there’s no explanation, and no opt out. You can’t stop this behaviour in Preview and other apps – it just happens. The only workaround is never to open a PDF with Preview.
          Howard.

          LikeLike

      • 7
        Christian on March 29, 2020 at 2:01 pm

        Yes Howard, this IS a complicated issue. Have a look here: https://en.wikipedia.org/wiki/Comparison_of_file_synchronization_software#Commercial

        “Delta copying” is a seemingly rare feature. Am I wrong that “Delta copying” will prevent backing up the whole file when only a few bytes have been changed inside?

        LikeLiked by 1 person

        • 8
          hoakley on March 29, 2020 at 2:25 pm

          Thank you.
          Time Machine is a file-based backup system, because of its use of hard links in the backup. As I explained, there’s no way that you can create a partial hard link to updated metadata – the hard link goes to a complete file, including its associated metadata.
          Interestingly, though, APFS snapshots aren’t file-based, I believe, but block-based, so will be more economical in their ‘backups’.
          Howard.

          LikeLike

  3. 9
    Thomas Tempelmann (@tempelorg) on March 29, 2020 at 2:00 pm

    Huh, doesn’t macOS nowadays also report an “attribute change” date? Which hopefully gets updated whenever the EAs changed? And backup programs would check this date?

    LikeLiked by 1 person

    • 10
      hoakley on March 29, 2020 at 2:23 pm

      Thanks, Thomas.
      Which attribute would that be? I can’t see one in the documentation, or at least no way of reading it.
      As for TM, it doesn’t normally use attributes to determine which files to back up, but FSEvents, which records changed attributes and changed xattrs as a change in that file, so precipitates a backup.
      Howard.

      LikeLike

      • 11
        Thomas Tempelmann (@tempelorg) on March 29, 2020 at 4:06 pm

        POSIX knows this date, APFS has it (change_time), and there’s the API key NSURLAttributeModificationDateKey. (And yes, I know about FSEvents)

        LikeLiked by 1 person

        • 12
          hoakley on March 29, 2020 at 4:19 pm

          Thank you, Thomas. That’s useful info.
          Interestingly, none of those is exposed in the Swift interfaces: NSURLAttributeModificationDateKey only appears in Obj-C, and va_change_time appears to be APFS internal and not exposed. It’s also unclear from the minimal documentation whether they refer to xattrs as well as regular attributes.
          TM certainly doesn’t check them, and in most backups doesn’t check file modification dates either, simply uses FSEvents. ChronoSync uses plain file modification dates, which I suspect are relied on by other file-based backup utilities.
          Howard.

          LikeLike

  4. 13
    Duncan on March 29, 2020 at 3:41 pm

    Howard: “Interestingly, though, APFS snapshots aren’t file-based, I believe, but block-based, so will be more economical in their ‘backups’.”

    In your estimation, if/when we get to an APFS-based Time Machine (TM2, as you’ve called it) would that then mitigate the current all-or-nothing file backup problem that you’ve described in this article?

    (In many more ways than one I get the feeling that Catalina will be seen as a ‘placeholder’ version of MacOS that Apple pushed out the door to meet their self-imposed annual cycle. The follow-on version has a lot of cleanup to do, which will hopefully rectify a number of problems that Catalina introduced.)

    LikeLiked by 1 person

    • 14
      hoakley on March 29, 2020 at 4:22 pm

      Sorry, I have absolutely no idea, as it depends on the architecture of the replacement backup store used by TM 2.0. If it’s file-based, then I think it would be very hard to do anything different from the present scheme. If it’s ASR, there might be more scope, but that still depends on how the backup is structured.
      Howard.

      LikeLike

      • 15
        Duncan on March 29, 2020 at 5:02 pm

        Thanks for your reply. Of course you can’t speak for Apple’s future plans – I guess I was asking if an APFS-based TM2 at least offers the *possibility* of incremental xattr backup, versus the current all-or-nothing situation we’re now seeing. From what I understand it does.

        (And yes, the caveat of using APFS on spinning disks still remains, so maybe Apple is waiting for a tipping point where most people are using SSDs as Time Machine storage before switching over. Hopefully that day is soon.)

        LikeLiked by 1 person

  5. 16
    Raoul on March 30, 2020 at 12:04 am

    Given that Apple were going to adopt ZFS way back in 2007 with MacOS X 10.5, it suggests that Apple are already keen on block-based backups, or to be more accurate taking a snapshot of a dataset.
    When used with a Copy On Write (CoW) filesystem (which APFS is) there is virtually no cost to the system to take a snapshot of a dataset. no initial index being built, no data transfer.

    Of course, once the snapshot exists, it then needs to be transferred to other media, but given that this operation occurs at the blow level also, the speed gains go through the roof!
    Make a change to a 1GB PSD file, only the changed blocks need to be sent to the backup rather than the whole file again.

    CoW Filesystems that are block-base aware are an absolute treat to work with! Add checksumming of the blocks (which Apple have done for metadata only) to the mix and you’re in heaven.

    I expect TM to be a lovely tool in the not too distant future once Apple fully migrate over to APFS and add all the bells and whistles. In the mean time, I’ll stick with Open ZFS on macOS to protect my data. ;))

    LikeLiked by 1 person

    • 17
      hoakley on March 30, 2020 at 6:29 am

      Thank you.
      If you read my many articles here about Time Machine since it started using APFS snapshots in macOS 10.13, and its future, you’ll see a lot of info about how they are currently used, and what you can do with them, and how to do it.
      However, there are some major gaps. One is being able to save snapshot deltas, and how to present backups consisting of snapshots and their deltas to the user so that they can restore individual files from them – which is another big gap in the current snapshot system.
      At present, snapshots are a great way to restore a whole volume, as used by ASR. But a backup system has to be able to do a lot more than volume restore. And that’s where the problems lie.
      Howard.

      LikeLike

  6. 18
    Raoul on March 30, 2020 at 12:19 am

    And further to the post above, snapshots already exist on macOS.
    https://bombich.com/kb/ccc5/leveraging-snapshots-on-apfs-volumes
    and
    https://derflounder.wordpress.com/2019/05/08/creating-managing-and-using-apple-file-system-snapshots-for-startup-drive-backups/

    And a great read about how close Apple were to using ZFS.
    http://dtrace.org/blogs/ahl/2016/06/15/apple_and_zfs/

    LikeLiked by 1 person

    • 19
      hoakley on March 30, 2020 at 6:31 am

      Thank you.
      I’m delighted that you’ve discovered snapshots, which have been used in APFS since macOS 10.13.
      In your reading, you might like to look at some of the many articles on this blog about snapshots. There are several dozen, which extend far beyond those which you cite.
      Howard.

      LikeLike

  7. 20
    Michael Tsai - Blog - Xattrs Make Time Machine Backups Waste Space on March 30, 2020 at 9:05 pm

    […] Howard Oakley: […]

    LikeLike

  8. 21
    Xattrs Make Time Machine Backups Waste Space | Business Marketing Journal on May 30, 2020 at 6:19 pm

    […] Howard Oakley: […]

    LikeLike

·Comments are closed.

Quick Links

  • Downloads
  • Mac Troubleshooting Summary
  • M1 & M2 Macs
  • Mac problem-solving
  • Painting topics
  • Painting
  • Long Reads

Search

Monthly archives

  • January 2023 (74)
  • December 2022 (74)
  • November 2022 (72)
  • October 2022 (76)
  • September 2022 (72)
  • August 2022 (75)
  • July 2022 (76)
  • June 2022 (73)
  • May 2022 (76)
  • April 2022 (71)
  • March 2022 (77)
  • February 2022 (68)
  • January 2022 (77)
  • December 2021 (75)
  • November 2021 (72)
  • October 2021 (75)
  • September 2021 (76)
  • August 2021 (75)
  • July 2021 (75)
  • June 2021 (71)
  • May 2021 (80)
  • April 2021 (79)
  • March 2021 (77)
  • February 2021 (75)
  • January 2021 (75)
  • December 2020 (77)
  • November 2020 (84)
  • October 2020 (81)
  • September 2020 (79)
  • August 2020 (103)
  • July 2020 (81)
  • June 2020 (78)
  • May 2020 (78)
  • April 2020 (81)
  • March 2020 (86)
  • February 2020 (77)
  • January 2020 (86)
  • December 2019 (82)
  • November 2019 (74)
  • October 2019 (89)
  • September 2019 (80)
  • August 2019 (91)
  • July 2019 (95)
  • June 2019 (88)
  • May 2019 (91)
  • April 2019 (79)
  • March 2019 (78)
  • February 2019 (71)
  • January 2019 (69)
  • December 2018 (79)
  • November 2018 (71)
  • October 2018 (78)
  • September 2018 (76)
  • August 2018 (78)
  • July 2018 (76)
  • June 2018 (77)
  • May 2018 (71)
  • April 2018 (67)
  • March 2018 (73)
  • February 2018 (67)
  • January 2018 (83)
  • December 2017 (94)
  • November 2017 (73)
  • October 2017 (86)
  • September 2017 (92)
  • August 2017 (69)
  • July 2017 (81)
  • June 2017 (76)
  • May 2017 (90)
  • April 2017 (76)
  • March 2017 (79)
  • February 2017 (65)
  • January 2017 (76)
  • December 2016 (75)
  • November 2016 (68)
  • October 2016 (76)
  • September 2016 (78)
  • August 2016 (70)
  • July 2016 (74)
  • June 2016 (66)
  • May 2016 (71)
  • April 2016 (67)
  • March 2016 (71)
  • February 2016 (68)
  • January 2016 (90)
  • December 2015 (96)
  • November 2015 (103)
  • October 2015 (119)
  • September 2015 (115)
  • August 2015 (117)
  • July 2015 (117)
  • June 2015 (105)
  • May 2015 (111)
  • April 2015 (119)
  • March 2015 (69)
  • February 2015 (54)
  • January 2015 (39)

Tags

APFS Apple AppleScript Apple silicon backup Big Sur Blake bug Catalina Consolation Console diagnosis Disk Utility Doré El Capitan extended attributes Finder firmware Gatekeeper Gérôme HFS+ High Sierra history of painting iCloud Impressionism iOS landscape LockRattler log logs M1 Mac Mac history macOS macOS 10.12 macOS 10.13 macOS 10.14 macOS 10.15 macOS 11 macOS 12 macOS 13 malware Mojave Monet Monterey Moreau MRT myth narrative OS X Ovid painting Pissarro Poussin privacy realism Renoir riddle Rubens Sargent scripting security Sierra SilentKnight SSD Swift symbolism Time Machine Turner update upgrade Ventura xattr Xcode XProtect

Statistics

  • 13,763,099 hits
Blog at WordPress.com.
Footer navigation
  • About & Contact
  • Macs
  • Painting
  • Language
  • Tech
  • Life
  • General
  • Downloads
  • Mac problem-solving
  • Extended attributes (xattrs)
  • Painting topics
  • Hieronymus Bosch
  • English language
  • LockRattler: 10.12 Sierra
  • LockRattler: 10.13 High Sierra
  • LockRattler: 10.11 El Capitan
  • Updates: El Capitan
  • Updates: Sierra, High Sierra, Mojave, Catalina, Big Sur
  • LockRattler: 10.14 Mojave
  • SilentKnight, silnite, LockRattler, SystHist & Scrub
  • DelightEd & Podofyllin
  • xattred, Metamer, Sandstrip & xattr tools
  • 32-bitCheck & ArchiChect
  • T2M2, Ulbow, Consolation and log utilities
  • Cirrus & Bailiff
  • Taccy, Signet, Precize, Alifix, UTIutility, Sparsity, alisma
  • Revisionist & DeepTools
  • Text Utilities: Nalaprop, Dystextia and others
  • PDF
  • Keychains & Permissions
  • LockRattler: 10.15 Catalina
  • Updates
  • Spundle, Cormorant, Stibium, Dintch, Fintch and cintch
  • Long Reads
  • Mac Troubleshooting Summary
  • LockRattler: 11.0 Big Sur
  • M1 & M2 Macs
  • Mints: a multifunction utility
  • LockRattler: 12.x Monterey
  • VisualLookUpTest
  • Virtualisation on Apple silicon
  • LockRattler: 13.x Ventura
Secondary navigation
  • Search

Post navigation

A Blossom Festival in paintings 1
A Blossom Festival in paintings 2

Begin typing your search above and press return to search. Press Esc to cancel.

  • Follow Following
    • The Eclectic Light Company
    • Join 3,130 other followers
    • Already have a WordPress.com account? Log in now.
    • The Eclectic Light Company
    • Customize
    • Follow Following
    • Sign up
    • Log in
    • Copy shortlink
    • Report this content
    • View post in Reader
    • Manage subscriptions
    • Collapse this bar
%d bloggers like this: