hoakley January 3, 2024 Macs, Technology

What should we know about APFS special files?

We may have been using APFS for nearly seven years, but some of its features remain thoroughly opaque. On Christmas Day, I posed the puzzle of 60 TB of snapshots being removed from a 2 TB disk. While we all accept that may be “technically correct”, for ordinary users it makes no sense. Suggestions that they should be “educated” miss the point that the Finder has to be accessible to all users, whether or not they have a degree in Computer Science. If my eleven year-old granddaughter can’t make sense of it, then the Finder is a failure.

Today I turn to another thorny issue raised by the ingenuity of APFS: the size of its special file types, sparse and ‘clone’ files. As usual, I start with a practical demonstration.

Demonstration

If you’re using macOS virtual machines (VMs) on an Apple silicon Mac, one of their VMs is an excellent subject for this. If you don’t have one of those, then you can create a read-write disk image (UDRW) using Disk Utility. Ensure that it’s in APFS format, and make it nice and large, say 25 GB. Once it has been created and mounted, unmount it, mount it again, then unmount it, to ensure that it’s now become stored as a sparse file.

Select the VM or disk image, and use the Finder’s Get Info command to check its size.

clonesparse1

In my case, I’ve used a 100 GB VM, whose size is given as 107 GB, although it only takes 18.47 GB on disk. Then, select the VM or disk image and press Command-D to duplicate it in the Finder. Select the duplicate, and Get Info on it.

clonesparse2

That copy has the same size, and the same lesser space taken on disk, although the Finder duplicated it in the twinkling of an eye, which would only be possible if it had been ‘cloned’ rather than copied.

Sparse files

APFS is one of many file systems that can reduce the space taken to store some large files not by compression, but by storing only the data they need, as seen in this demonstration. Disk images, whether forming the greatest part of a VM, or as a separate file, start off as being almost entirely empty, and only grow as contents are added to them.

When macOS mounts a disk image, APFS performs a Trim on it, to gather all its free space together. When that image is saved, that free space isn’t written to the file, as it would just waste space. By writing that disk image in a special sparse file format, disk space required is reduced from slightly more than 100 GB to around 18 GB.

Clone files

Although known commonly as ‘clones’, these aren’t exact copies at all, but two separate files that, initially at least, share the same data on disk. When the Finder duplicates a file for you, APFS creates the file system metadata for that new file, giving it a new inode number, but the file’s data are initially stored in the same extents as the original. As those two files change, their unique data is written to new extents on disk, and they steadily drift apart until they become completely independent.

The only clue given here by the Finder that two VMs or disk images are clones and share data in this way are their names. Change the name of the copy and move it away, keeping it in the same volume, and you’d never know that its data were being shared with another file, nor the identity of the original.

Recognising sparse and clone files

Aside from the intentional discrepancy reported in Get Info for sparse files, telling which are sparse and which are clones isn’t possible in the macOS GUI. To understand more, I’ll use my free utility Precize, which reports more information culled from corners of the file system.

clonesparse3

The original disk image inside the VM has an inode number of 22513585, given in its volfs and FileRefURL paths at the top, a Disk size considerably smaller than its total file size, and ticks both the Sparse and Clone checkboxes at the foot.

clonesparse4

The duplicated disk image has a different inode number of 24847441, identical sizes, and the same two checkboxes ticked. To the left of those checkboxes, the Ref count on each copy is 1, confirming that neither is hard-linked. Even here, using as much information as I can glean from APFS, there’s no way to tell which file has been cloned from which.

Effect on disk space

Although the only mention in the macOS GUI is in the context of space taken on disk as sparse files, this could mislead the user into thinking that a VM or disk image that only takes 18.47 GB on disk can be copied to disk with a capacity of 25 GB, for instance. This is easy to test using another disk image: create another read-write disk image with APFS as its file system, of a size sufficient to accommodate that given ‘on disk’ but too small for its full size. Try copying the original VM or disk image to it, and the Finder will refuse on the grounds that it’s too large for that disk.

However, if you copy the VM or disk image to an APFS volume that does have sufficient free space to accommodate its full size, the space used according to Disk Utility and the Finder is considerably less than that size, although significantly larger than it takes on its original volume. In my case, for a VM originally taking 18 GB on disk, when copied to another APFS volume it used 25 GB.

If you try that out, watch the progress dialog carefully during copying. It starts by claiming that it has the full size (100+ GB) to copy, and proceeds as if that were the case. Then, as soon as the progress bar reaches the size actually taken on disk, in this case only a quarter of the way through, copying completes almost instantly. Maybe the Finder was more surprised at that than the user.

While APFS preserves sparse files when copying them to another APFS volume, that doesn’t work for other file systems such as HFS+, where the source file has to be fully expanded as it’s being copied, requiring additional time as well as the full disk space. None of this works for clone files, which can only remain cloned within the same APFS volume, of course.

The benefits of sparse and clone files

In terms of disk space used, the benefits of sparse and clone files aren’t as obvious as you might like. Because of their potential to swell to full size, sparse files can’t be copied to a volume that isn’t large enough to cope with that, but once they have been copied they only require their current size on disk. In that sense, telling the user in the Get Info dialog that a sparse file only occupies a small amount of disk space can build unrealistic expectations, although currently it’s the only means in macOS for the user to discover that file is stored in sparse format.

As far as the user is concerned, the greatest benefits come in speed of handling, and effects on SSD ‘wear’. Creating clone files is almost instant, even if they’re huge, and because of their efficiency in the use of storage extents they minimise erase-write cycles on SSD storage. Not informing the user that two files are clones of one another also avoids potential confusion that could arise if they were to think that clones behaved like hard-linked files, in that changing one of a pair of clones doesn’t change the content of the other.

User information

Sparse and clone files are essentially omitted from user documentation of macOS. One place I had expected Apple to provide information about the storage of disk images in sparse file format was in its explanation of different types of disk image and their creation. Although sparse bundles and sparse disk images are described as being “an expandable file that shrinks and grows as needed”, there’s no mention of flexibility of size for read/write disk images now that they’re stored as sparse files. Man hdiutil seems similarly unaware of this change that dates back to Monterey.

A little knowledge

The problem for users with sparse and clone files, like so many of the advanced features of APFS, is that knowing just a little is dangerous. An obvious example is giving figures for space taken on disk in the Get Info dialog. Armed with that information, but without deeper understanding, a user might expect to be able to copy a sparse file of 18 GB size on disk, and a full size of 100 GB, to a volume that has only 20 GB available. Equally, they’d be surprised when that same sparse file was copied to an HFS+ volume and exploded to its full size, or it was copied over a network and took forever to transfer the full 100 GB.

These difficulties are no less for the Finder, as illustrated by the behaviour of its progress dialog when copying a sparse file to another volume. For plain files, the amount of data to be transferred is the regular file size. For a sparse file, that depends on whether the transfer mode and destination support its sparse format. Even then, the copied file may not be the same size as the source, as demonstrated above.

Magic works best when the spectator either knows nothing about the sleight of hand involved, or is another skilled magician.

18Comments

Add yours

1

Enzo Vincenzo on January 3, 2024 at 9:29 am
Reply

By ignoring the problems, not acting immediately and even more so by not providing explanations, it is as if Apple is saying: “macOS is mine and I’m in charge. Let everyone save themselves in their own way.”

Maybe, instead of Finder, I would write…:
“The Mac has to be accessible to all users, whether or not they have a degree in Computer Science. If my eleven year-old granddaughter can’t make sense of it, then the Mac is a failure.” and not just the Finder. 😉

O.T.
On this your very clear and incisive sentence, I reiterate my alarm at the failure to respect the Privacy of the macOS lockscreen, hoping that sooner or later someone at Apple will understand and take action.
Everyone should be protected, from 11 year old children who do not perceive the risks, up to the elderly with little familiarity with technology…
The fact is that Windows creates lockscreens by default with different backgrounds from the Home screen and also Linux, iPhone, iPad, Androids and all Macs from their creation up to High Sierra. At least Apple Engineers gave us an option instead of nothing…

Thank you, Howard, for your commitment to not overlooking every aspect that catches your attention! And if as an IT expert and developer you understand the technical aspects and it is important that you talk about them, as an elderly person and as a sensitive Doctor I understand the negative aspects of the new macOS which can cause irreversible damage to inexperienced people, from 10 to 90 years old, even graduates. In the same way that you discover and list technical problems, for my part I have witnessed serious damage of various kinds caused by the fact that macOS leaves the personal image chosen for the Desktop visible on the lockscreen.

I will be the only one to tell the world, but I feel this as a constraint and arrogance. And it is more or less from the time associated with this arrogant attitude that Apple’s lack of information regarding many new aspects of macOS also begins, such as the one you talk about in this post and others…

LikeLiked by 1 person
- 2
  
  Enzo Vincenzo on January 3, 2024 at 9:35 am
  Reply
  
  Kindly, could you correct my post and write Finder and Mac where appears due to Blog error? Thank you. I wanted to highlight the name with <> and the Blog cut it off
  Correction.
  Maybe, instead Finder I would write…:
  “The Mac has to be accessible to all users, whether or not they have a degree in Computer Science. If my eleven year-old granddaughter can’t make sense of it, then the Mac is a failure.” and not just the Finder.
  
  LikeLiked by 1 person
  - 3
    
    hoakley on January 3, 2024 at 9:43 am
    Reply
    
    Thank you.
    I have made that correction: I’m afraid that angle brackets cause problems in WordPress comments, and are worth avoiding as they can cause unexpected results!
    Howard.
    
    LikeLike
    - 4
      
      Enzo Vincenzo on January 3, 2024 at 9:54 am
      
      Thank you! To avoid making the post heavier, perhaps now you could also delete my message in which I suggest the correction.
      I also noticed that in my post I wrote I.T. instead of O.T. and if it doesn’t bother you… you can correct for elegance 🙂
      Finally you can also delete these request messages and leave only my first post.
      
      LikeLiked by 1 person
    - 5
      
      hoakley on January 3, 2024 at 10:00 am
      
      Thank you, but I don’t delete comments where they form part of the record, only when they’re duplicates, or inappropriate.
      I’m not sure what you mean by an OT, though? IT seemed to make more sense to me!
      Howard.
      
      LikeLike
    - 6
      
      hoakley on January 3, 2024 at 10:02 am
      
      Ah – I have seen the O.T. now and changed it as requested.
      Howard.
      
      LikeLike
- 7
  
  hoakley on January 3, 2024 at 10:11 am
  Reply
  
  Thank you. I’m not sure that you’re understanding my point about these special file types.
  I don’t think that Apple is being arrogant here (and I’d rather avoid getting into discussions about lock screens here, please). There are lots of things on Macs that ordinary users see simply as magic, and I think sparse and clone files are a good example. In fact, apart from that one hint in Get Info, ordinary users could happily and successfully use their Macs without knowing about them at all: they just work, although there are some small cracks that suggest something is going on, like the odd behaviour of the progress dialog when copying a sparse file.
  Problems come when you know a little about them, but don’t have a fuller understanding, which is easy given Apple’s lack of documentation, even for developers. Then you can make incorrect assumptions, and see the discrepancies and contradictions that lie just below the surface.
  Perhaps the challenge here is providing advanced users with more complete information, something that Apple so seldom does these days. We’re just left to get on with it, even when we misunderstand or make errors, and I’m sure that some will accuse me of doing the same here.
  Howard.
  
  LikeLiked by 2 people
  - 8
    
    Tristan Hubsch on January 3, 2024 at 5:40 pm
    Reply
    
    “A little learning is a dang’rous thing.” (Or, “Better be ignorant of a matter than half know it,” 16 centuries earlier.) You are right: “Mac = ‘magic that just works’ ” is how “the rest of us” think, and at Apple’s continuing and quite explicitly advertising behest. (With such a mindset, what’s a few terabytes between magicians?) For my own sins and for the most part, I too am one of “the rest of us.” …except when the gremlins and other demimagical inhabitants in my workhorse gum up a work sequence that worked just fine for years and until a few weeks earlier, when I’m forced to avail myself of “a little learning,” mostly “catch-as-catch-can” until some syzygy of incantations “does the trick,” and I’m left relieved that the work project is saved, but deeply unhappy with creeping but inevitable loss of a profound understanding.
    
    In the case at hand, I indeed see Apple at quite inexcusable a fault for “mixing apples and oranges,” and reporting the deletion of a 60TB of storage space it coulda/woulda/shoulda needed (under whatever imaginable circumstances) for the content that is being erased, but which is evidently taking up much less space, within your 2TB storage volume. Yes, we (should) have all learned by now that the size of a file is a rather fluid notion, given various methods of compression… However, data that the target audience (→ “the rest of us”) is likely to compare should always be reported in simply comparable units. Mixing apples and oranges is never “technically correct” — unless the quotes are meant sarcastically.
    
    LikeLiked by 2 people
    - 9
      
      Tristan Hubsch on January 3, 2024 at 5:50 pm
      
      PS. “compression” above stands for any and all “compression, cloning, overlap, …” storage-saving tricks.
      
      LikeLiked by 2 people
    - 10
      
      hoakley on January 3, 2024 at 7:49 pm
      
      Thank you.
      Howard.
      
      LikeLike
11

Tristan Hubsch on January 3, 2024 at 7:53 pm
Reply

Also, the “progress bar” pretense (esp. the iterative kind, where one begets another begets a third…) have reached a stage where they convey merely that “something is happening.” Even the “xx minutes left” caption that sometimes accompanies them is most often unhinged from all customary ways of measuring time. (Your passage on the Finder being more surprised than the user might in fact have a germ of an explanation, i.e., the updater app is probably as surprised at the herky-jerky prog…og…og…gress.) Might as well put up a rectangle of the Game of Life, just to reassure the user that the Mac hasn’t croaked.

LikeLiked by 1 person
- 12
  
  hoakley on January 3, 2024 at 10:42 pm
  Reply
  
  Thank you. I feel an article coming on for Saturday.
  Howard.
  
  LikeLiked by 1 person
13

Tristan Hubsch on January 3, 2024 at 7:56 pm
Reply

Ooops (sorry): ’twas supposed to be: “the user might” and “at the herky-jerky prog…”

LikeLiked by 1 person
- 14
  
  hoakley on January 3, 2024 at 10:43 pm
  Reply
  
  I have made those corrections, thank you.
  Howard.
  
  LikeLiked by 1 person
15

Mario Wolczko on January 4, 2024 at 4:11 am
Reply

This may be of interest: https://github.com/mwolczko/extents

LikeLiked by 1 person
- 16
  
  hoakley on January 4, 2024 at 7:09 am
  Reply
  
  Thank you so much – impressive (and painstaking) work. Congratulations!
  I think you have demonstrated well the difficulties involved.
  Howard.
  
  LikeLike
17

pantulis on February 17, 2024 at 11:30 pm
Reply

Based on work by Dyorgio Nascimento, I wrote this small tool to help identify cloned files in an APFS filesystem.

https://github.com/pantulis/apfs-check-clones

This is the first C code I’ve written in 20 years!

LikeLiked by 1 person
- 18
  
  hoakley on February 18, 2024 at 8:07 am
  Reply
  
  Well done – and thank you for the link.
  I’m sure that will come in handy for someone, although of course it doesn’t solve the more general problem in deduplication.
  Howard.
  
  LikeLike