You generally expect a volume to be mounted using its name: if it’s named MyDisk, when it has been mounted it should appear on your Mac as /Volumes/MyDisk. There’s one important exception to this: the current Data volume, by default named either Macintosh HD - Data
(Intel) or simply Data
(Apple silicon), isn’t mounted in /Volumes at all, but at /System/Volumes/Data. This article looks at another situation where APFS volumes will appear at mount points that differ from their volume name, when there’s the potential for a name collision, such as one resulting from Unicode normalisation.
The problem
Some characters in Unicode can be represented by more than one means. Although this is usually expressed in terms of accented Roman characters like e-acute é, this is worst in Korean, where it seems much of the character set comes in more than one form. In the simple case of é, there are two ways of encoding this: either decomposed into the letter e with a superimposed acute accent making UTF-8 65 cc 81
, or the single composed character é with UTF-8 c3 a9
. What to our eyes are indistinguishable are completely different when used in a file name.
In the past, in HFS+ in particular, the file system took care of this for us by converting both composed and decomposed forms into one, for HFS+ the decomposed form, or Form D. But modern file systems like APFS have none of that, and store the Unicode they’re given. When entered at the keyboard, that’s more likely to be decomposed é, but when generated in an app it could well be composed é instead. Early in the life of APFS, this caused many problems for those using accented or Korean characters in file and folder names, and has been progressively addressed since.
The last time that I looked at the state of normalisation in APFS, almost two years ago, I found there were still issues and bugs, most weirdly with the fact that a volume whose name contains composed forms (Form C) was inaccessible to Spotlight indexing. That wasn’t helped by the fact that, unlike the Finder, Disk Utility doesn’t normalise the names of APFS volumes, and lets you create two volumes in the same container that differ only in their normalisation, and appear identical to the user.
You can still see this in Ventura today: here are two volumes side by side in a single container, with what appear to be identical names, but different contents.
But when viewed in the Finder’s Get Info dialogs, you can see that they do in fact have different names as shown there.
Names and paths
APFS and Disk Utility are very flexible when it comes to naming volumes. If you want, you can mix cases across volumes that aren’t case-sensitive, and even use exactly the same name for as many volumes as you want, because the file system doesn’t identify volumes by name. What’s most important to APFS is the UUID of the volume: try mounting two volumes with identical UUIDs and you’ll see what I mean.
What’s important to macOS is where that volume is mounted, its mount path. If two volumes, no matter what their names, came to share the same mount path then chaos would ensue. So the crucial step here is allocating each volume to be mounted a unique mount path, that’s robust in the face of potential name conflicts.
What happens is that the volume name is normalised to Unicode Form D, or decomposed, and compared without case-sensitivity to existing mount paths. If there’s any clash, then a number is appended after the normalised name to form the additional mount path. That’s perhaps best illustrated with some examples:
- If the volume
one
is already mounted at/Volumes/one
, and the next to be mounted is also namedone
, the second will be mounted at/Volumes/one 1
to avoid the obvious name conflict. - If the volume
OneTwo
is already mounted at/Volumes/OneTwo
, and the next to be mounted is namedonetwo
, the second will be mounted at/Volumes/onetwo 1
to ensure no conflict for the case-insensitive. - If the volume
CaféÅngstrom
(composed, Form C) is already mounted at/Volumes/CaféÅngstrom
(there normalised to decomposed, Form D), and the next to be mounted is namedCaféÅngstrom
(decomposed, Form D), the second will be mounted at/Volumes/CaféÅngstrom 1
(decomposed, Form D), so they’re both normalised and don’t conflict.
To help the user, volume names passed through the Finder or in Terminal are normalised by decomposition to Form D, but no changes are made to case. Those changes aren’t necessary when dealing with folder or file names, as unlike volumes they can’t coexist in the same way.
Spotlight fix
Spotlight does now correctly index and search volumes whose name contains composed forms (Form C).
Remaining problems with normalisation
Internally, APFS and macOS should now behave consistently and robustly with respect to Unicode normalisation, but they can’t solve all the problems you could come across.
Apps and scripts can still get normalisation wrong, particularly if they make assumptions about the system normalising file names for them. They can also use the wrong calls to compare strings: for example, Form C and D strings should appear the same when comparing them using NSString compare()
, but may differ with an IsEqual()
comparison. It’s not uncommon for developers to completely ignore potential problems arising from Unicode forms and normalisation.
Problems can also arise when transferring files to and from other file systems. Although most modern file systems don’t normalise any more, there are still some that do, and can create havoc unless you’re aware of their behaviour.
The final message is that you should never assume that a volume’s name will be used unchanged in its mount path, as I hope I have demonstrated above.