COW and clones: how they save space and SSD wear

APFS was primarily developed for solid-state storage in Macs and Apple’s devices down to the 8 GB on the earliest series of Watch. As a modern file system it has features that cater for SSDs rather than hard disks, among them its use of copy-on-write and clone files. This article explains how they work, and their impact on different storage media and the user.

Copy on Write

One of the big enemies of hard disk storage is fragmentation. When the storage blocks containing a file’s data get split up and occupy different locations on the platter, reading and writing that file takes longer as the heads have to seek those locations. To minimise fragmentation and time spent seeking, hard disks try to store each file in contiguous storage, and where possible to overwrite existing blocks.

SSDs are in effect the exact opposite: there’s no seeking involved, but the erase-write cycle takes most time, and each erase eats into the expected life of the SSD. Overwriting existing storage would be painfully slow, though, as it would have to be erased first. Instead of trying to write in place, APFS therefore tries to copy on write (COW).

Take an example of a file initially occupying four contiguous blocks of storage FA-FD.

cowclones1

When the content of block FC is changed in an edit, that block is written out to a new location, copied on writing. On a hard disk, that would be a bad move, as the file has become fragmented; when reading it from disk, the heads would have to seek from FB to AB and back to FD. On an SSD, AB will already have been erased during housekeeping, ready to be written to, so COW is quickest. It’s also a lot safer, as the old block FC won’t be erased immediately. If anything goes wrong during the write, the file system can easily fall back to its previous state. If snapshots are being made, then the data in FC won’t be erased until it’s no longer required by a snapshot. COW thus has multiple roles in APFS.

Clones

You’ll recall from my previous article that an APFS clone pair consists of two File System Objects that share the same data to begin with. For this example, I’ll colour their shared storage blocks in purple, and their unique blocks in pink or blue depending on which file they belong to.

cowclones2

When the clone pair is first made, they’re very efficient, as the pair only takes up the space of a single file, apart from the second File System Object, which is relatively small.

If you then open the blue clone and make changes to what has been stored in block FC, under COW those will be written out to a new block at AB.

cowclones3

After this first edit, the pink file’s data is stored in blocks FA, FB, FC, and FD as it was originally; the blue file’s data is now stored in FA, FB, AB (changed data) and FD. As this follows COW, it minimises the amount of storage required, now a total of just 5 blocks, minimises erase cycles on the SSD, and is safest in the event of anything going wrong. It also doesn’t require any additional space to be kept in a snapshot.

Eventually, changes made to the cloned pair of files reach the point where they differ in each of their storage blocks, and are effectively completely separate files.

cowclones4

Now the pink file’s data remains stored in FA, FB, FC and FD, but the blue file’s data is in AC, AD, AB and AE. The two files now take twice the storage space that they did when they were first cloned, and a total of four extra blocks that need to be retained in any snapshot.

This explains how the space occupied by clone files usually increases over time and editing, and how the disk space required by a snapshot also grows over time. Normally, file systems like APFS report the space actually used at that moment in time, and can’t predict its growth potential in the future. So when you are editing a clone file, even though the size of the file you’re editing may not change, the space it requires in storage is likely to increase.

In this simplified example, the end result wouldn’t be particularly bad on a hard disk. In reality, file changes are much messier and fragmentation is more severe. But in APFS the sting in the tail, that causes most performance problems, isn’t so much in the file data as the file system itself. As that changes it becomes fragmented, forcing many more seeks to access objects and their structures, until it all comes to a grinding halt, with no easy solution. No file system can be ideal for all storage media.

Summary

cowclones0

9Comments

Add yours

1

kapitainsky on May 4, 2023 at 8:03 am

“Overwriting existing storage would be painfully slow, though, as it would have to be erased first. Instead of trying to write in place, APFS therefore tries to copy on write (COW).”

It is not entirely correct. Overwriting the same location (from OS perspective – the same LBA address) on SSD does not overwrite the same NAND cells and does not bring any performance penalty. SSD drive firmware maintains its own mapping table between LBA addresses and cells locations. Let’s say LBA 100 is stored in page 400. What happens when OS overwrites this address is that page 400 will be marked as stalled and new LBA 100 content stored in another page.

LikeLiked by 1 person
- 2
  
  hoakley on May 4, 2023 at 11:37 am
  
  Thank you.
  “It is not entirely correct” is incorrect. “Overwriting the same location (from OS perspective – the same LBA address)” I didn’t write that, you did. Yes, of course there’s wear-levelling and other processes going on in the SSD which also prevent the same memory being used. That’s not what I’m writing about: I’m referring here to what has been done on hard disks, which is to write in place, i.e. in exactly the same storage location, not “from the OS perspective”, as that isn’t the same place, is it?
  Howard.
  
  LikeLiked by 1 person
  - 3
    
    kapitainsky on May 4, 2023 at 12:13 pm
    
    Yes and this is why COW writing approach does not make any difference for SSD wear. Sector is written – SSD firmware decides where it goes physically and if to erase anything. In your example writing new AB block or overwriting FC block are for SSD internals the same operation.
    
    Actually COW without TRIM is worse for SSD. When you overwrite block FC SSD can mark physical block as “dirty” and erase it later. When you instead write block AB from disk perspective there are two blocks with data. But for this we have TRIM.
    
    LikeLiked by 1 person
    - 4
      
      hoakley on May 4, 2023 at 12:24 pm
      
      Thank you.
      “this is why COW writing approach does not make any difference for SSD wear”
      Whoever said it does? If you’d care to re-read the article, you’ll see that I make no such claim, as the number of erase-write cycles is the same whether you write in place or copy on write.
      What makes the difference is cloning (you know, what the rest of the article is about), where COW is part of the process that minimises the number of writes performed.
      HFS+ doesn’t clone, as it makes no sense on a hard disk: duplicate a file and you make a full copy, so that as that copy changes, its data can be written in place.
      APFS does clone, which brings economy not only in the use of space, but in the number of erase-write cycles, which is important for SSDs.
      Sorry, perhaps I simply didn’t explain this clearly enough, but my wife understood it, and she’s my editor-in-chief!
      Howard.
      
      LikeLiked by 1 person
    - 5
      
      kapitainsky on May 4, 2023 at 12:28 pm
      
      Indeed if editor-in-chief understands it differently there is nothing more I can add:) You are right about clones. Maybe I misread first part of your article.
      
      LikeLiked by 1 person
6

Jake Richards on May 4, 2023 at 2:22 pm

Brilliantly elegant explanation, as usual, Howard. Thanks for making simple that which is sometimes obtuse!

LikeLiked by 1 person
- 7
  
  hoakley on May 4, 2023 at 9:32 pm
  
  Thank you.
  Howard.
  
  LikeLike
8

Foo on May 13, 2023 at 8:23 pm

The article is a bit misleading. Every write to a file does not cause COW behavior. COW will happen in the case of an extent having a reference count greater than 1. That only happens if the file is cloned or a snapshot has been taken and that extent is part of the snapshot.

If there are no snapshots on the volume and the file has not been cloned, writes are just as they are on HFS+ and overwrite existing blocks.

LikeLiked by 1 person
- 9
  
  hoakley on May 13, 2023 at 8:49 pm
  
  Thank you.
  But what’s written to the SSD doesn’t actually overwrite the existing memory, because of wear-levelling etc. does it, whereas hard disks do try to overwrite the exact physical locations on the platter, to avoid fragmentation? I also think that COW always occurs for changes saved in the file system metadata, doesn’t it?
  The problem here is that if all these processes are spelled out in full detail, most users give up trying to understand them after about ten seconds, and don’t even get as far as clones. As I think I said above, this is primarily for pedagogic rather than pedantic purposes.
  Howard.
  
  LikeLike

Copy on Write

Clones

Summary

Share this:

Related