More Fun Scripting with Swift in Xcode: String search

AppleScript has an almost hidden string search feature which can be used in code of the form:
set AppleScript's text item delimiters to the search_string set the item_list to every text item of this_text
and returns a list split at each occurrence of the search_string. There are no such subtleties that I have found in Swift, but equally there’s a whole gamut of different types and methods of searching strings.

When working on Consolation 3, I wanted to offer the user a choice between simple and regex-based string searching. For the former, there are three variants of String.contains() functions:

contains – case-sensitive, non-literal,
localizedCaseInsensitiveContains – case-insensitive, localised,
localizedStandardContains – case-insensitive, diacritic-insensitive, locale-aware.

They’re simple to call too, for example:
theBoolean = theTextString.localizedCaseInsensitiveContains(theSearchString)
and that is the function which I used.

Choosing which to use is fairly obvious with respect to case-sensitivity, but can get confusing when it comes to diacritics and Unicode characters which may or may not normalise. One easy way of assessing the properties of string comparisons in the face of normalisation is to look at them in my free app Apfelstrudel (from Downloads above).

regexapfel

For example, for the string
Ångstrom café mañana
which normalises to different Forms C and D, it reports:
Forms C and D differ Forms C and D strings are equal on Swift == String comparison. Forms C and D strings are NOT equal on NSString isEqual() comparison. Forms C and D strings are NOT equal on NSString localizedStandardCompare() comparison. Forms C and D strings are equal on NSString compare() comparison with caseInsensitive. Form C contains Form D string using NSString contains(). Form C contains Form D string using NSString localizedStandardContains().

Offering regex is rather more complex, but has many more options and features. The basis of this is in NSRegularExpression, which may look fairly intimidating and opaque. It yields quickly to a methodical approach.

I first created a variable to hold my regex, a ‘not found’ range (as the result is going to be an NSRange, not a Boolean), and the regex options and matching options. The latter two are now nested types:
var theRegEx = NSRegularExpression() let theNFRange = NSRange(location: NSNotFound, length: 0) let regexOptions: NSRegularExpression.Options = NSRegularExpression.Options(rawValue: 0) let regexMatchingOptions: NSRegularExpression.MatchingOptions = NSRegularExpression.MatchingOptions(rawValue: 0)

When I was ready to initialise my NSRegularExpression variable, I called that in a do - try - catch structure, to handle the possibility of failure:
do { try theRegEx = NSRegularExpression.init(pattern: theSearchStr, options: regexOptions) } catch { // handle the failure }

To perform the regex find, I first had to make a range containing the whole of the text to be searched
let theRange = NSMakeRange(0, theTextString.characters.count)
then call rangeOfFirstMatch() to get the range of the first match of the search string already set in theRegEx
let theMatch = theRegEx.rangeOfFirstMatch(in: theTextString, options: regexMatchingOptions, range: theRange)
if that returned range is ‘not found’, it will be an equal range to my theNFRange, i.e. {NSNotFound, 0}
theBoolean = !(NSEqualRanges(theMatch, theNFRange))

If you’re only doing this once, you could instead call numberOfMatches(), which saves making the range comparison: if the number returned is zero, then there are no matches. But here I am calling it on potentially thousands of lines from the log. I think that stopping the search with the first match is likely to be more computationally efficient when there are matches.

I hope that using regex searches with Swift is now much clearer.

Share this:

Related