Why faster is seldom quicker

The laws determining how fast a conventional bicycle will go are simple, and can be succinctly summarised as “the more you pay, the less you get, the faster you go”.

Light bikes, that can be propelled more readily up hill and down dale, are fabricated from expensive materials such as carbon fibre composites, for small markets that are prepared to pay heftily for them. Depending on whether you want to go fast on the flat, in which case air resistance is your greatest foe, or you want to defy gravity by ascending long, steep Alpine passes, you may have to invest in aerodynamically faired bikes, or insanely light ones.

Only the bluntest of bike vendors would dare tell you the overriding law that the more you train, and the less you weigh, the faster you go.

Switch to computers, and many will tell you that Moore’s Law is the underlying determinant of speed.

In fact Gordon E Moore, co-founder of Intel, did not claim that processor performance would increase exponentially, but that the number of transistors that can be incorporated into cheap chips would double every two years. It is evident that Moore’s real law has a physical limit, in that transistors have a minimum physical size, and he recognised that. Processor performance is even more constrained, because it is not a simple function of squeezing more transistors onto single chips, but gets confounded by instruction sets, clock speeds, cooling, and more.

This year’s keynote at Apple’s WorldWide Developer Conference (WWDC) once again promised performance improvements in OS X. I cannot remember a WWDC in which that has not been at least part of the promised future. Indeed if I add them all together, by now my Mac should have already finished doing the things that I will think of in a few minutes time.

Mac OS X 10.6 Snow Leopard was, I think, the first to fully embrace the hardware solution that nearly thrust the UK to the vanguard of computer design. First announced, to the wild enthusiasm that only WWDC can engender, on 8 June 2008 it was not demonstrated in public until the following WWDC, but shipped the following August.

Nearly thirty years ago, Iann Barron’s INMOS, based at Aztec West near Bristol and fabricating its chips in Newport, Wales, came very close to capturing the performance computing market worldwide, with its Transputer parallel processor. Had they been able to fund development and growth better, our Macs might now contain a cluster of Transputers instead of Intel cores. Even by today’s standards, Transputers were radically different, being designed to process tasks in parallel, talking to one another over tightly-coupled high-speed links, and running software written in Occam, a dedicated parallel programming language.

Macs had multiple processors before the switch to Intel, but since then even the most basic of laptops boasts two processor cores. With an operating system that endeavours to balance the load across as many cores as are available, two quad-core processors should run four times as fast as a single dual-core. Design and implement your benchmark carefully, and you can demonstrate this, but in the real world, life is not as simple.

Designing software to run in parallel on several processors (or cores) has been a problem since the heady days of the Transputer. Then, our standard demonstration was calculating and displaying the fractal graphics of the Mandelbrot set, something readily farmed out to as many or as few processors as you wish.

As fascination with fractal graphics has waned, so has the realisation that most software cannot simply be split into parallel chunks. Even repetitive tasks like transcoding compressed video will not scale as readily. So when we ramp up the demands on our 12-core Mac Pros, we very seldom get a full return on our investment.

Snow Leopard delivered Grand Central Dispatch, novel technologies to help software developers get the most out of multi-processor, multi-core Macs. We also now have OpenCL, which lets apps use the power of the Graphics Processor Unit (GPU) in the hope of accelerating them further.

Designing algorithms which can benefit from multiple cores and GPUs is not only non-trivial, it remains desperately difficult for humans, or so-called parallelising developer tools. Even when you do manage to rewrite your code to be distributed across different cores, you then hit the next performance bottleneck of disk access, or network speed, or …

At least we are now spared those tedious Mandelbrot demos. Although maybe they would have been light relief during this year’s WWDC Keynote.

Updated from the original, which was first published in MacUser volume 24 issue 18, 2008.