APFS: iOS 11 and High Sierra will fix normalisation problems

Apple has now finally committed to addressing normalisation problems which have been plaguing some users of iOS 10.3 and those trying early releases of APFS in macOS. In essence, iOS 11 and macOS 10.13 High Sierra will effectively normalise file and folder names on APFS volumes.

Before WWDC last month, Apple’s position was that APFS stored file and folder names without performing any normalisation on them. This contrasted with HFS+, which normalises those names. Responsibility was placed on developers to ensure that they used the correct operating system calls to avoid problems with normalisation.

It has become clear in iOS 10.3, the first release of APFS without normalisation, and versions 10.3.1 and 10.3.2, that this approach was flawed. Although the great majority of users experienced no problems, if you obtained files from a source which preferred a different form of normalisation from that used in HFS+, and used accented and non-Roman characters which normalised, then this failed.

The most obvious problems arose with iOS users who transferred files from Windows (which prefers a different normalisation form to HFS+) which were named using Korean and other character sets, although this even included European languages with accented characters like ñ and é. There’s a chilling series of messages on the Apple Developer Forums in which an iOS app developer details how users running iOS 10.3 were transferring files using iTunes for Windows, but could not access those files once they were on an iOS device.

One developer, who produces iMazing, even incorporated normalisation into their product to ensure its compatibility with Windows files bearing names which would cause problems on APFS.

At WWDC just over a month ago, Apple’s APFS team explained how iOS 11 will bring normalisation to APFS. Documentation was changed, but details of the variant of APFS intended to be the default for High Sierra remained a bit vague.

Apple has now changed the APFS documentation again, making it clear that normalisation is going to be handled by APFS for both iOS 11 and High Sierra. Indeed, the problems so far with iOS 10.3 have driven Apple to fix this early in iOS 10.3.3 and Sierra 10.12.6, due for release very shortly.

To quote Apple’s latest position:
This means that developers don’t need to do any additional work to ensure correct normalization behavior in these versions of macOS and iOS.

Both iOS and macOS will initially offer what Apple terms runtime normalisation, which carries some overhead in performance, but native normalisation which is built into APFS itself will follow in iOS 11 and macOS 10.13 High Sierra. iOS users will need to perform an erase restore in order to benefit from native normalisation when iOS 11 is first released, but others will be upgraded to that in a future release of iOS 11. I suspect that the full High Sierra release will offer native normalisation from the outset, although Apple hasn’t yet committed to that.

There remain some questions whose answers will become clearer as testing of iOS 11 and High Sierra proceed. Runtime normalisation is most probably implemented in runtime libraries which will need to be updated to take advantage. This suggests that apps produced using Xcode 8.3.3 and earlier may not be able to use runtime normalisation, and will have to be rebuilt using a later release of Xcode before they become problem-free.

This is also going to make it important for those currently using iOS 10.3.0-10.3.2, and macOS 10.12.0-10.12.5 with APFS volumes, to upgrade to iOS 11 and macOS 10.12.6 or 10.13 – or they may still be stuck with normalisation issues in the future.

APFS and HFS+ will continue to have different approaches to tackle the problems of normalisation. Apple states that:
APFS preserves the normalization of the filename and uses hashes of the normalized form of the filename to provide normalization insensitivity, whereas HFS+ stores the normalized form of the filename on disk to provide normalization insensitivity.

One consequence of this is the order in which filenames are returned when listing them using readdir(2) in the standard C library libc: HFS+ has returned filenames in lexicographic order after normalisation, whereas APFS will return filenames in order according to their hashes. I’m sure that will catch someone out.

Finally, Apple continues to state that APFS does not support the hard links on which Time Machine’s current backups rely. If you convert an existing Time Machine backup to APFS, then all those millions of hard links will be converted to symbolic links or aliases. It remains to be seen whether that is a good idea, whether that conversion still allows them to be used by Time Machine, or whether backups will use a different scheme in High Sierra.

I think that APFS is going to be ready after all.