Watch that space: More fun with Unicode in file names

This isn’t one of this weekend’s Mac riddles, but how can you possibly have four files all with the same name side by side in the same folder?

filenames01

Whichever way you look at them, they really do have the same name, don’t they?

filenames02

Except when you list them in Terminal. All of a sudden spaces appear where the Finder didn’t reveal them.

filenames03

The secret, of course, lies in Unicode, specifically its ZERO WIDTH SPACE character, U+200B or in UTF-8 E2 80 8B. All is revealed when you copy one of those apparently identical file names and paste it into my free utility Apfelstrudel.

filenames04

What looks in the Finder like the eight characters T h i s F i l e actually contains a leading and trailing ZERO WIDTH SPACE, shown here as e2 80 8b. Terminal perceptively displays the ZERO WIDTH SPACE character as a regular space, but if you copy from its output it hasn’t actually changed that character, as it’s still copied accurately.

I’m sure that this phenomenon hasn’t escaped the attention of those who want to compromise our Macs. Imagine someone using ZERO WIDTH SPACE characters to spoof important files which we expect to be there. But bless the Unicode folk, as it’s nothing to do with them, they just make the standard. Who would have thought that an invisible character could ever cause problems? Indeed, I’m disappointed that they haven’t yet come up with a whole Roman alphabet of ZERO WIDTH SMALL LETTER A and so on.

I’m very grateful to @0xdade for drawing attention to this.