File problems in iOS 10.3 and macOS 10.13: What’s in a name?

When you’re testing out a key part of an operating system, such as a file system, the most difficult part is getting sufficient beta-testers. For Apple’s new APFS, the file system which is intended to replace the Mac Extended File System, HFS+, in the autumn, there is a simple solution: deploy it onto all iOS, watchOS, and tvOS devices. And that is exactly what Apple has done in iOS 10.3, watchOS 3.2, and tvOS 10.2.

So far, this huge beta-test seems to be going quite well. There are two main elements to the test: live on-the-fly conversion of storage to use APFS, and the new file system in use. Inevitably a few users experienced problems during the iOS 10.3 update which may be attributable to the conversion, but as far as I can see, they were very few, and for the great majority of users the conversion appears to have worked perfectly.

Early experience has brought one important issue to the fore, though, which may well trip up a lot of Macs and their users in the autumn: file (and folder) names.

In the early days, naming files and folders was simple: 8.3 using but a small subset of ASCII text. Macs brought greater flexibility, and with it came a dark pit of complexity which APFS is likely to wallow in. The central issue is how to encode file (and folder) names.

You’ll know about Unicode, and wonder why that isn’t the solution. Not only is Unicode not the complete solution, but it’s also a lot of the problem.

Take the file names Café.jpg and Café.jpg. They look identical, but if you paste them into a good text editor to count the number of characters in each, you’ll spot a bizarre difference: the first contains nine characters, the second only eight. Paste their e-acute characters into the Emoji & Symbols panel, and you’ll see that the first uses two Unicode characters U+0065 and U+0301 to make up one visible character, whilst the second uses the single U+00E9 character – which is why their character counts differ.

Deciding how a file system deals with thousands of similar problems in Unicode is a vital part of making that file system ready for the real world, and for the hundreds of millions of its users across the world. There are related decisions to be made about how the file system handles case: should it allow the two files Café.jpg and café.jpg to co-exist within the same folder, or claim that they are effectively identical, thus conflict?

The most common solution to reconciling these issues is to ‘normalise’ file and folder names according to a set of rules. If you make all e-acute characters follow a single Unicode encoding, then the file system will only see normalised forms, and be able to handle them accordingly. Just as you won’t be able to have two files which appear to have the same name but use different Unicode routes to do so, when you search for a file named Café.jpg, you won’t miss files with the same name but slightly different encoding.

Such issues are bad enough for users, but they become critical for developers. If they do the wrong thing, you can end up with crucial files being overwritten, and all sorts of strange effects. If you want to experience related problems, try using the case-sensitive version of HFS+ on your Mac, and see how many apps trip over it; I’m told that most Adobe products are well-known for failing in that circumstance.

The version of HFS+ which we almost universally use for the startup volumes of our Macs does not preserve normalisation, but does preserve case even though it is case-insensitive. Filenames in HFS+ are normalised using Unicode 3.2 Form D (excluding some character substitutions). That is what we, and all our current apps, are used to, and expect to use.

APFS is offered in case-sensitive and case-insensitive versions, but iOS 10.3 only uses case-sensitive APFS. According to Apple:
The case-insensitive variant of APFS is normalization-preserving, but not normalization-sensitive. The case-sensitive variant of APFS is both normalization-preserving and normalization-sensitive. Filenames in APFS are encoded in UTF-8 and are not normalized.

Assuming that macOS 10.13 standardises on case-sensitive APFS as does iOS 10.3, converting from HFS+ to APFS thus moves file and folder names from a case-insensitive, non-normalisation-preserving file system which normalises filenames, to a case-sensitive, normalisation-preserving and normalisation-sensitive file system which doesn’t normalise filenames. They are almost exact opposites!

For users, this can be expected to cause issues in proportion to the difference between your native language and plain English characters. For the great majority of users in North America, the UK, Australia, and other parts of the world in which English is the first language, you’ll probably not notice the difference. Languages which use Roman script with accented characters may well result in some problems, but they are likely to be sporadic and relatively minor. As for Korean, that could be interesting to say the least.

As for apps, Apple glibly advises developers to use high-level APIs (NSFileManager/FileManager and NSURL) when interacting with the file system, which it assures us will “avoid introducing bugs in your code with mismatched Unicode normalisation in filenames.”

Which is fine where you can do that. Whoever wrote that text is blissfully ignorant of the many creaky old parts of the APIs for which time has stood still: my own current case in point is Keychain Services. It has calls which are the only way of discovering certain important pieces of information, such as the pathname to the default user keychain, which have not been revised since Mac OS X Jaguar nearly fifteen years ago. Apple’s documentation does not tell us whether the paths so returned are normalised, or whether they will become bug-generators when used on APFS.

For developers, the whole area is fast becoming a minefield. Read some of the discussion generated here or here, for example, and you will see how confused and confusing the whole subject has become.

It is, though, becoming clear that a substantial number of current apps, and more older ones, will break when run on APFS. They generally won’t ‘unexpectedly quit’, but most will manifest strange behaviour in some way – something which is likely to be a nightmare to diagnose and fix. Some bugs will appear only when certain files are stored in iCloud, but work fine when they’re kept locally, or the other way around. Bugs resulting from normalisation incompatibility are notoriously odd and hard to pin down.

Six months before we are likely to be switching over to APFS, the on-the-fly conversion is looking very sound, although it has yet to meet volumes with millions of files. I’m fairly sure that the new file system will prove quite robust in use too, although even tiny rates of failure could quickly result in serious problems. But the one thing that is likely to cause the most grief is the old, well-known problem of the naming of files and folders.