How fix a URL which breaks because of Unicode content

macOS handles all text as Unicode, but there are still some oddities that can catch you out. The most likely is with embedded URLs – web links in particular. Every so often you’ll place a link in a document, only to discover that the link doesn’t work properly. This article explains how to work around that.

This caught me out when I was placing web links in a Storyspace 3 document as detailed here. I did all the right things, but when I clicked on the link, it was dead. I checked everything, and it seemed to be set up correctly, but still the link refused to work. Once I had drilled down and produced some simple test examples, I was able to inform the app’s developers, who identified the problem very quickly and will include a fix in the next releases of Storyspace and Tinderbox.

There is a workaround which you, as a user, can deploy so that such broken links can still work properly.

URLs on the internet are generally now handled using Unicode characters, but there are still parts which don’t. They have to resort to embedding UTF-8 encoded characters instead of pure Unicode. You will have seen this sometimes. For example, a link like
https://en.wikipedia.org/wiki/Salomé_(1918_film)
contains a single non-ASCII character, é. Throw that at something which is not able to handle full Unicode characters and it will fail. Instead, that URL can contain embedded encoded UTF-8 to cover that one character.

Bring up the floating character pane, paste the letter é into its search box, and you’ll see that the hexadecimal UTF-8 code for that character is C3 A9. This means that in our encoded version, it will be represented as the two hex codes %C3%A9, as % is used to escape to this encoding method.

Try the encoded URL of
https://en.wikipedia.org/wiki/Salom%C3%A9_(1918_film)
and you should find that works fine and brings up the correct page.

If you can’t be bothered to encode your characters by hand with the help of the characters pane, Don’s Tools provides a free encoding system online, although it goes further than you should need and encodes everything outside the minimal ASCII character set.

In Storyspace 3, for example, you can manually alter your link URL to replace non-ASCII characters either when you first make the Web Link, in this dialog

salomest61

or you can fix a broken link by selecting the writing space and using the Browse Links command in the View menu, which will show all the links for that writing space. Select the broken link and you can then edit its URL.

salomest62

This only applies to links for which you can edit the URL, of course. If you know that a type of link doesn’t work properly with Unicode URLs but cannot edit them before or after pasting them in, you will not be able to use this workaround, I’m afraid.

You may find a similar workaround for other issues that arise with Unicode text in other circumstances: discover how to enter escaped and encoded UTF-8, and use that instead of straight Unicode.