How WordPress comments and Property Lists can trip you up

Did you realise that when you write a comment to one of these articles, you’re actually writing XML? Much of the time it makes little difference, but occasionally it bites back by annoying us. Some people like to use ‘angle brackets’, the mathematical less than < and greater than > symbols, for various purposes. Unfortunately, because they’re two of the five characters with special meanings in XML, this has unexpected effects.

Another place where you may come to write XML without being fully aware is when editing Property Lists (including Preference files) in macOS. This caught a user out recently when they were editing an extended attribute in my free utility xattred: that particular extended attribute is stored in Property List format, so when they inadvertently used the ampersand & they were unable to save their changed extended attribute.

This is perhaps easier to cope with in Property Lists, which at least have the decency to tell you that they’re written in XML. At the top of every Property List you should see the preamble
<?xml version="1.0" encoding="UTF-8"?>

WordPress comments don’t tell you that, of course, but lull you into a sense of false security before tripping you up with at least two (sometimes three) of those five special characters.

In theory, apps like xattred and the WordPress comment editor could allow for this and perform automatic character substitution. The problem they face in doing that is that they can’t distinguish reliably between, say, a regular ampersand and one used correctly to construct an ‘escape’ character. Although I seldom use characters like < in my articles, there are times when you can’t avoid them. For me, they’re most common in quotations from logs and the command line. Knowing of the problem, I manually replace each < with &lt; in the article. But what if my editor were to automatically replace every & with &amp;? That would of course be far worse.

So apps like xattred, and the WordPress comment editor, have to make the ill-founded assumption that everyone knows that they’re typing in XML and will make the appropriate substitutions.

For the record, here are the five characters which XML treats differently, and how you can type them in so the correct character appears:

  • < less than – use &lt; every time;
  • > greater than – use &gt; every time;
  • & ampersand – use &amp; when necessary;
  • ” double-quote – this is seldom a problem, and only when used inside another XML double-quotation, where you can use &quot;
  • ‘ apostrophe or single-quote – this is seldom a problem, and only when used inside another XML single-quotation, where you can use &apos;

I will add information about this to xattred’s Help book in its next revision, but if you’ll forgive me, I think that I’ve now escaped enough for today and will sit down and do some graphics rather than XML.