Copy, move and clone files in APFS, a primer

They’re some of the commonest actions in the Finder: copying and moving files around within and between volumes. This primer explains how they work on APFS volumes in Catalina, where they aren’t as simple as they used to be in HFS+.

Key concepts

In APFS, volumes behave differently from the way that they do in HFS+. Although some compare them with traditional folders, that’s misleading and will only confuse. Each volume has its own file system, complete with directories and other metadata, including extended attributes. Volumes within the same container share the same space on disk, so they can flexibily grow and shrink sharing the same free space. But they don’t share file systems: drag a file from one volume to another and a new copy of that file is made on the second volume. You can’t share data or inodes between APFS volumes, although inodes are managed differently across Catalina’s new Volume Groups.

iCloud storage is mapped to files on your startup (Data) volume, and effectively forms part of that volume even when all those files have been evicted to remote storage. When a file is evicted to iCloud, it’s still represented by a local stub file. The important consequence of this is that iCloud Drive behaves as if it’s an extension of your startup volume, not a different volume altogether.

APFS clone files (‘copy on write clones’) are in between normal files and hard links. When you make a copy or duplicate of a file to the same volume, APFS creates a new file which shares the same data as the original. As the copy diverges, during editing, the changed data is saved separately, and the common data storage reduces until the files have no shared data at all. This all happens transparently, and there’s no easy way to tell whether two files share common data, nor how much overlaps. This only happens within the same file system and volume, but its effect is seen on free space, which affects all volumes within that container.

Apple explains this well in the man page for clonefile():
The cloned file dst shares its data blocks with the src file but has its own copy of attributes, extended attributes and ACLs which are identical to those of the named file src with the exceptions [of ownership information, setuid and setgid bits]. Subsequent writes to either the original or cloned file are private to the file being modified (copy-on-write).

Another approach is to contrast cloning with hard links:
With a hard link, two or more references in the file system are fixed to the same object in storage. Any changes made to the data in that object are therefore reflected identically no matter which reference you use to access the data. In a clone, the two references are actually to distinct objects in the file system. When you make changes to the data via one of those references, the data are split for the two references so that they will actually see quite different content.

Basic behaviours

Within the same volume:

  • dragging a file moves it between folders;
  • Option-dragging a file copies it between folders, leaving the original where it was;
  • Command-dragging a file moves it between folders, removing the original.

Between different volumes:

  • dragging a file copies it between volumes;
  • Option-dragging a file copies it between volumes, leaving the original where it was;
  • Command-dragging a file moves it between volumes, removing the original.

Because iCloud Drive folders act as if they’re part of your startup volume, they behave according to whether the other volume is the startup volume, in which case actions follow within-volume rules, or another volume, in which case actions are between volumes.

If you want to play safe and know whether dragging a file will result in a copy or move, always use the modifier keys. This is probably the safest way to work with iCloud Drive, so you don’t have to consider whether the other volume is the startup volume or not.

When to expect cloning to occur

Whenever you copy or duplicate a file in the Finder, macOS tries to do this by cloning it. That requires the file and its copy to be on the same APFS volume, but there doesn’t appear to be any maximum or minimum size requirement. Older versions of macOS (10.13 and 10.14) may not use cloning when copying or duplicating folders, but Catalina now clones the files inside the folder, at least when that folder is relatively shallow.

Determining whether a file has been truly copied or cloned is extremely difficult using standard tools. Both the Finder and Terminal’s ls command claim that cloned files occupy the same amount of storage space as true copies. Often the only way to tell is how long the copy/duplicate process takes, and that only works for very large files which would normally result in a progress bar due to the time they take.

The only control the user has over whether a file being copied or duplicated is cloned is in the command tool cp, which has an option to force cloning to occur where possible. In
cp -c oldfilename newfilename
the -c option forces a clone to be made.

If you want to ensure that two copies of the same file on the same volume aren’t clones, and use separate data storage, the only robust method is to copy the file to a different volume, change its name there (if necessary) and copy/move it back to the first volume.

Links and aliases

Links and aliases are also affected by moving and copying. Their effect on symbolic links is variable, but in general they are at high risk of breaking, particularly when copied/moved to a different volume with different paths. Hard links are effectively the same as the original file, and when copied/moved to another volume the whole file will be copied/moved. Finder aliases and bookmarks are designed to be more robust than symbolic links, but in practice can readily break too, particularly when copied/moved to a different volume.

Neither symbolic nor hard links are cloned, although in theory at least there’s no reason that Finder aliases and bookmarks shouldn’t be cloned. How you could determine whether they are seems to be more difficult.

Danger in clones

Cloning is an excellent way in the short term to minimise the storage space required by copies of large files. But, if either file is being actively developed, these benefits are strictly temporary, as they diverge in content and progressively require more separate storage space. This gives rise to the strange effect that making small changes to a cloned file may increase its storage requirements disproportionately.

More sinister are its implications for data integrity. Take for example a 100 GB video file, from which two copies are made to enable changes to be made to the end. When those edits are complete, the structure of the three files is:
File 1: ABC
File 2: ABD
File 3: ABE
with A and B as data common to all three, resulting from the original cloning. If corruption now occurs to the data in A or B, all three files become corrupt unless that takes place through the file system and triggers the use of separate storage. Thus, failure in a storage block or ‘bit rot’ damages all three files at once. Currently, there doesn’t appear to be any means of discovering which data are common to more than one file, which makes this difficult to detect or understand. All the user will see is that suddenly all three files are broken. Neither are there any tools capable of ‘de-cloning’ files.

This emphasises the importance of sound workflows for files of importance, and of checking their integrity.