We’re all familiar with Occam’s Razor, sometimes known as the Law of Parsimony, which we often apply when solving problems. In our proposed solution, we shouldn’t ‘multiply entities without necessity’. Thus, we tend to prefer the simplest solution, that which makes fewest assumptions, provided that other solutions are equally valid (and make the same predictions). It’s widely applicable, except I think for software and modern operating systems.
Last week’s example was what at first seemed like a single simple bug in Time Machine’s backups. I was amazingly lucky (or perhaps unfortunate) to catch this, as it’s only likely to occur on one night each year, and can only affect Macs which are left awake running Time Machine backups throughout the night.
It just so happened that, when my Mac was still on summer time, it made a set of snapshots for a backup at 01:01:27 local time, then exactly an hour later tried to make another set for its next backup. Because by that time it had reset its local clock time by putting it an hour back, it tried to make those snapshots at 01:01:27 again. Because they’re named using local time as a suffix, those snapshots couldn’t be made, and the whole backup failed.
That, at least, was the cleanshaven Occam approach: a single bug in macOS, which is extremely unlikely to occur, and merely loses a single set of backups once in a (very) blue moon.
Closer examination of the log records raised further questions, and a couple of other experts (Ingo and Michael Tsai) weren’t happy with my explanation. So far, I have since identified four separate problems which I rated as bugs, each of which played a part in that incident. Not only that, but in the course of digging through my own source code, to check that my log browser Consolation wasn’t the source of some of the odd observations, I noticed three completely unrelated bugs in that app, which I have now fixed.
This is by no means a unique occurrence in modern software and operating systems. Critics will undoubtedly claim this is due to the failure of operating system vendors and third-party developers to ensure that they fix bugs and release bug-free code. In theory, of course, that can only be true, but it makes the assumption that bug-free code is feasible.
Many years ago I recall talking to some software engineers who worked for one of the large early computer companies in the UK, coding mainly in COBOL. They astonished me by insisting that, if each of them added one new line of code to their product each working day, they felt proud of their accomplishment. Around that time, I was developing CAD/CAM software for classic Macs. Some days I wrote a hundred or more lines of Object Pascal. If I couldn’t have done that, then my products couldn’t have been brought to market in the few months which was commercially acceptable.
Not only that, but Apple, Microsoft and other operating system vendors are now trying to maintain such huge and – in parts – antiquated codebases that no matter how many engineers they employ, fixing all bugs is a Sisyphean task. It has parallels in providing medical and other aid to disasters, where the system of triage was developed to prioritise need and match the delivery of care. Triage is only a formalisation of something we do every day, assigning priorities when trying to manage multiple events during complex activities like driving or child care.
Another factor which makes bug-fixing Sisyphean is the inter-relatedness of it all. It’s a well-known phenomenon that fixing one bug often breaks other code, which may, however unintentionally, have been reliant on the way that the original bug worked. So far from the first fix solving anything, bugs propagate from that patch, and if you constantly shave using Occam’s Razor, you generate more bugs than you fix. Modern software now behaves like complex systems, which defy simple analysis.
There’s a subtle irony here, in that one of the first programming languages which marked the transition of software to complex systems was named in honour of William of Ockham, who devised the Razor: designed by David May at Inmos, and first released in 1983, the language occam was one of the first concurrent programming languages, which helped ensure that modern software is so complex and interwoven. It also influenced Python.
Perhaps it’s time, for modern software, to replace Occam’s Razor with a principle more appropriate: bugs will necessarily be multiple. If you think you’ve found one, you can be fairly confident that you’ll find more, so don’t stop until you have dug deeper and found those others. In the past, proposed antitheses to Occam’s Razor have been termed anti-razors. In this case, I suggest that Macco’s Beard might be more appropriate.