Writing your own content-blocker for Safari 9 in El Capitan: 2

The first article in this series covered the basics of writing your own content-blocker for Safari version 9 in OS X 10.11 El Capitan. I will now move on to explain more fully the different options available for triggers and actions.

Triggers

The only trigger which we used in the first article was url-filter, which set to
"https?://(www.)?nastysite.*"
to capture http://www.nastysite.com and its many variants, including under HTTPS, and
".*"
which included everything.

The complete list of supported triggers is:

  • if-domain to specify an included domain
  • load-type to specify the relation to the main resource, either first-party if the load has the same origin as the document, or third-party if it does not
  • resource-type to specify a list of resource types (see below)
  • unless-domain to specify an exception domain
  • url-filter to specify the URL form
  • url-filter-is-case-sensitive to set whether url-filter arguments are treated as being case-sensitive.

Resource types which can be specified include document, image, style-sheet, script, font, raw (anything), svg-document, media, and popup. These indicate how the load engine intends to use the resource, not necessarily the type of the resource. So markup containing
<img src="sheet.css">
is handed to the load engine as an image, and should be treated as an image resource type, rather than a style-sheet.

Here are some examples:

  • "if-domain": ["nasty.com", "nastier.com"] specifies nasty.com and nastier.com as being included in the list of domains
  • "resource-type": ["image", "style-sheet", "script"] specifies any resource loaded as an image, style sheet, or script
  • "unless-domain": ["trusty.com"] specifies trusty.com as an exception from the list of domains
  • "url-filter": "trackingcode\.js" specifies any URL containing the string trackingcode.js
  • "url-filter-is-case-sensitive": true makes subsequent url-filter arguments case sensitive

Actions

There are only two types of field which can be specified in an action: type, which must be followed by a string which defines what to do when the rule is activated, and selector, which defines a selector list to apply to a page. Hence supported action types include:

  • "type": "block" to specify blocking action by aborting the load of the resource
  • "type": "block-cookies" to specify blocking cookies, but remaining within Safari’s current privacy policy
  • "type": "ignore-previous-rules" to cause all previous rules to be ignored
  • "type": "css-display-none" to hide elements of the page according to the list of selectors (which must be given)
  • "selector": "" provides a list of selectors, whose display property will be set to none, hiding them.

All selectors supported by WebKit are allowed in content extensions, and those can include compound selectors, and any in CSS Selectors Level 4. If you are unsure which can be used, please consult the W3C documentation and Surfin’ Safari.

Here are some example actions:

  • "type": "block" blocks loading of that resource
  • "type": "block-cookies" blocks cookies
  • "type": "css-display-none", "selector": "#newsletter, :matches(.main-page, .article) .annoying-overlay" hides the content of the overlay matched by the selectors in the newsletter stylesheet.

Regex

The regular expressions supported are a subset of those in JavaScript:

  • a dot ‘.’ matches any character
  • ranges of characters are given as [a-z], for example, to match lower case Roman letters
  • ‘?’ means 0 or 1 occurrences of the previous expression, ‘+’ means one or more, and ‘*’ means any number including 0; however these should be minimised as much as possible, as they carry a significant performance hit
  • characters are grouped using parentheses ()
  • ‘^’ indicates the beginning of the line, but may only be the first character of the expression
  • ‘$’ indicates the end of the line, but may only be the last character of the expression.

All strings passed as arguments to the url-filter trigger are treated as regular expressions. This means that the dot (full stop, period) appearing in domain names should be given as ‘\.’ or it will be interpreted as meaning any character: you will note that the example code given here does not always follow that! URLs are already in lower case ASCII, and expansions of regular expressions must yield ASCII.

Parsing of rules

Rules are read and parsed into Safari in the order in which they appear in the JSON file. The only exception to this is when the action ignore-previous-rules appears, in which case all the rules above that will then be ignored, and only those below it will be followed.

There is a practical limit on the number of rules which can be parsed, of around 50,000. However, long before Safari gets to that point, you will find that applying the rules takes so much time that there is little point in performing any blocking.

In addition to making minimal and most efficient use of quantifiers ?+* in the url-filter trigger, you should try to define CSS rules before any action of type ignore-previous-rules, make the triggers as specific as possible, and group together rules with similar actions. These should lead to improved performance, as well as keeping logical order within the content-blocker.

Further information

Surfin’ Safari