At the start of computers there were numbers, and ever since then they have played a major part in both apps and system software. This article explains some of the numeric types used by your Mac, how they differ, and can catch you out.
Numbers on computers fall into two broad classes: those which are represented exactly, which are mainly integers, and those which are normally approximated, including most floating point numbers.
These are the simplest to represent in binary and hexadecimal, and those which play the fewest tricks. They come in several varieties, determined by their size in bytes, and whether they can be both negative and positive. Old crocks like me still remember when the standard integer was represented in just eight bits. The largest unsigned integer was then 1111 1111 in binary, or FF in hexadecimal, that’s 255 in regular decimal notation. If one of those bits had to be used to indicate whether the number is positive or negative, instead of ranging from 0-255, they could only be between -127 and +127.
Soon integers grew to 16 bits, then 32, and now the standard length of 64 bits, offering a range of numbers beyond our comprehension, or even the largest of distributed file systems.
Most problems that arise in integers do so from one of four causes:
- the order of bytes, which can be ‘big-endian’ or ‘little-endian’ according to processor type and setting;
- conversion between different lengths;
- whether signed or unsigned;
- overflow, in which the product of two integers requires a number larger than the maximum for their length.
Together, these can result in quite complex errors. For example, suitably misinterpreted as a signed integer using the wrong byte order, the 32-bit unsigned integer for 65,535 (0000 FFFF) can become -2,147,418,112 (FFFF 0000). Integer arithmetic opens up other possibilities for error, such as the evergreen favourite of dividing by zero, but compared with floating point operations those are relatively straightforward.
Floating point numbers
Integers are fine for counting integral objects such as people and file sizes, but in the real world most things have to be measured in floating point or decimal numbers like 3.14159. In maths, those numbers are drawn from a continuous range which has to include extremely large positive and negative values, and many which are very close to zero. They’re most familiar to us from engineering or scientific notation which expresses them in terms of a number from 1.0 to almost 10.0, multiplied by a power of ten, e.g. 1.68301 x 10e-6, which is just above zero at 0.00000168301.
The most widely used form of floating point number in macOS is the Double, which uses 64 bits to encode a number using similar principles to angineering/scientific notation, only the powers used aren’t decimal but binary, which makes them more difficult to grok. In decimal notation, with the radix 10, 0.00000168301 has the significand of 1.68301 and the exponent of -6, making it 1.68301 x 10e-6. As a computer Double, the radix is 2 (binary), so it has a significand of 1.76476389376 and an exponent of -20, making it 1.76476389376e-20.
Some Doubles are exact expressions of the number they’re trying to represent. An obvious example is 1.0, represented as 1.0e0. But even fairly simple numbers like 71.3927 are confusing, with a representation of 1.1155109375e6 (radix 2).
Unlike mathematical numbers, there’s a finite number of different Doubles, and their distribution is far from even. The same Double which represents 71.39270000000000 also represents 71.39270000000001, and all the numbers in between them, all but one of which is only an approximation. Around those numbers, there are roughly 70 trillion different floating point numbers per unit (1.0) step in number. These become more dense around zero, and less dense at the extreme ends of the number line. As Doubles become larger in absolute value (disregarding their sign), so they become less precise in absolute but not relative terms.
Because they’re only approximations, Doubles suffer several problems which can adversely affect calculating with them. These include rounding and cancellation errors.
Rounding errors occur because Doubles have fixed length, so the last place has to be rounded up or down to give the best approximation to the real number. The standard for floating point (IEEE 754) specifies no less than five different rounding functions, which can result in a Double being rounded up or down. Although the relative errors which result from rounding should be small, they can accumulate in long series of calculations to the point where they affect overall accuracy.
Cancellation errors can be very large, even when only the result of a single operation. This term refers to potentially very inaccurate results from subtracting numbers which are very close in value. When almost all the digits of the result are lost, these errors can be catastrophic, and may cause the order of calculations to determine the result.
These can be illustrated by two simple calculations, each of which should return a result of exactly 0.0:
((10000000.001 - 10000000.000) - 0.001) * 1.0e8
(10000000.001 - (10000000.000 + 0.001)) * 1.0e8
Yet using Swift Doubles, the first returns the incorrect result of 0.016391277311150754.
With a whole IEEE standard to themselves, floating point numbers have grown their own subdivision of errors and non-errors. The most commonly encountered of these is the NaN, Not a Number, which used to puzzle those plugging through spreadsheets when a formula attempted a heinous crime such as division by zero. The joy of NaNs is their propagation: once a NaN creeps into a calculation, it’s likely to turn the whole thing NaN; I’ve always wondered why this phenomenon was never termed NaNbread. Then there are two different signed zeroes, +0 and -0, or if you really want a choice, why not have an unsigned zero too, and then decide whether you want all three to be equal or not.
Some systems also support extended precision beyond Doubles. One of the advances brought by the first widely used maths coprocessor, Intel’s 8087, was the availability of 80-bit Extended calculations. Although valuable for some, in general, mixing precisions leads to further strange errors which can be very hard to trace. macOS tries to avoid those, and ARM processors don’t have any Extended features, which have to be implemented in additional libraries for those that need them.
You will occasionally come across other numeric formats, including fixed point and arbitrary precision. These don’t normally have any direct support by general purpose processors, but are implemented in libraries, making them considerably slower and non-transferable. And then there are arrays of numbers in vectors and matrices, complex numbers, and everything else that mathematicians have devised. There seems no end.
Start with Jean-Michel Muller et al (2018), Handbook of Floating-Point Arithmetic, 2nd ed, Birkhäuser, ISBN 978 3 319 76525 9. Then progress to Peter Kornerup and David W Matula (2010), Finite Precision Number Systems and Arithmetic, Cambridge UP, ISBN 978 0 521 76135 2. Complete the basics with Jean-Michel Muller (2006), Elementary Functions, Algorithms and Implementation, 3rd ed, Birkhäuser, ISBN 978 1 4899 7981 0. You can then progress to matrices, on which there is a huge literature.