Explainer: numbers in macOS

At the start of computers there were numbers, and ever since then they have played a major part in both apps and system software. This article explains some of the numeric types used by your Mac, how they differ, and can catch you out.

Numbers on computers fall into two broad classes: those which are represented exactly, which are mainly integers, and those which are normally approximated, including most floating point numbers.

Integers

These are the simplest to represent in binary and hexadecimal, and those which play the fewest tricks. They come in several varieties, determined by their size in bytes, and whether they can be both negative and positive. Old crocks like me still remember when the standard integer was represented in just eight bits. The largest unsigned integer was then 1111 1111 in binary, or FF in hexadecimal, that’s 255 in regular decimal notation. If one of those bits had to be used to indicate whether the number is positive or negative, instead of ranging from 0-255, they could only be between -127 and +127.

Soon integers grew to 16 bits, then 32, and now the standard length of 64 bits, offering a range of numbers beyond our comprehension, or even the largest of distributed file systems.

Most problems that arise in integers do so from one of four causes:

the order of bytes, which can be ‘big-endian’ or ‘little-endian’ according to processor type and setting;
conversion between different lengths;
whether signed or unsigned;
overflow, in which the product of two integers requires a number larger than the maximum for their length.

Together, these can result in quite complex errors. For example, suitably misinterpreted as a signed integer using the wrong byte order, the 32-bit unsigned integer for 65,535 (0000 FFFF) can become -2,147,418,112 (FFFF 0000). Integer arithmetic opens up other possibilities for error, such as the evergreen favourite of dividing by zero, but compared with floating point operations those are relatively straightforward.

Floating point numbers

Integers are fine for counting integral objects such as people and file sizes, but in the real world most things have to be measured in floating point or decimal numbers like 3.14159. In maths, those numbers are drawn from a continuous range which has to include extremely large positive and negative values, and many which are very close to zero. They’re most familiar to us from engineering or scientific notation which expresses them in terms of a number from 1.0 to almost 10.0, multiplied by a power of ten, e.g. 1.68301 x 10e-6, which is just above zero at 0.00000168301.

The most widely used form of floating point number in macOS is the Double, which uses 64 bits to encode a number using similar principles to angineering/scientific notation, only the powers used aren’t decimal but binary, which makes them more difficult to grok. In decimal notation, with the radix 10, 0.00000168301 has the significand of 1.68301 and the exponent of -6, making it 1.68301 x 10e-6. As a computer Double, the radix is 2 (binary), so it has a significand of 1.76476389376 and an exponent of -20, making it 1.76476389376e-20.

Some Doubles are exact expressions of the number they’re trying to represent. An obvious example is 1.0, represented as 1.0e0. But even fairly simple numbers like 71.3927 are confusing, with a representation of 1.1155109375e6 (radix 2).

Unlike mathematical numbers, there’s a finite number of different Doubles, and their distribution is far from even. The same Double which represents 71.39270000000000 also represents 71.39270000000001, and all the numbers in between them, all but one of which is only an approximation. Around those numbers, there are roughly 70 trillion different floating point numbers per unit (1.0) step in number. These become more dense around zero, and less dense at the extreme ends of the number line. As Doubles become larger in absolute value (disregarding their sign), so they become less precise in absolute but not relative terms.

Because they’re only approximations, Doubles suffer several problems which can adversely affect calculating with them. These include rounding and cancellation errors.

Rounding errors occur because Doubles have fixed length, so the last place has to be rounded up or down to give the best approximation to the real number. The standard for floating point (IEEE 754) specifies no less than five different rounding functions, which can result in a Double being rounded up or down. Although the relative errors which result from rounding should be small, they can accumulate in long series of calculations to the point where they affect overall accuracy.

Cancellation errors can be very large, even when only the result of a single operation. This term refers to potentially very inaccurate results from subtracting numbers which are very close in value. When almost all the digits of the result are lost, these errors can be catastrophic, and may cause the order of calculations to determine the result.

These can be illustrated by two simple calculations, each of which should return a result of exactly 0.0:
((10000000.001 - 10000000.000) - 0.001) * 1.0e8
and
(10000000.001 - (10000000.000 + 0.001)) * 1.0e8
Yet using Swift Doubles, the first returns the incorrect result of 0.016391277311150754.

With a whole IEEE standard to themselves, floating point numbers have grown their own subdivision of errors and non-errors. The most commonly encountered of these is the NaN, Not a Number, which used to puzzle those plugging through spreadsheets when a formula attempted a heinous crime such as division by zero. The joy of NaNs is their propagation: once a NaN creeps into a calculation, it’s likely to turn the whole thing NaN; I’ve always wondered why this phenomenon was never termed NaNbread. Then there are two different signed zeroes, +0 and -0, or if you really want a choice, why not have an unsigned zero too, and then decide whether you want all three to be equal or not.

Some systems also support extended precision beyond Doubles. One of the advances brought by the first widely used maths coprocessor, Intel’s 8087, was the availability of 80-bit Extended calculations. Although valuable for some, in general, mixing precisions leads to further strange errors which can be very hard to trace. macOS tries to avoid those, and ARM processors don’t have any Extended features, which have to be implemented in additional libraries for those that need them.

Others

You will occasionally come across other numeric formats, including fixed point and arbitrary precision. These don’t normally have any direct support by general purpose processors, but are implemented in libraries, making them considerably slower and non-transferable. And then there are arrays of numbers in vectors and matrices, complex numbers, and everything else that mathematicians have devised. There seems no end.

Further reading

Start with Jean-Michel Muller et al (2018), Handbook of Floating-Point Arithmetic, 2nd ed, Birkhäuser, ISBN 978 3 319 76525 9. Then progress to Peter Kornerup and David W Matula (2010), Finite Precision Number Systems and Arithmetic, Cambridge UP, ISBN 978 0 521 76135 2. Complete the basics with Jean-Michel Muller (2006), Elementary Functions, Algorithms and Implementation, 3rd ed, Birkhäuser, ISBN 978 1 4899 7981 0. You can then progress to matrices, on which there is a huge literature.

6Comments

Add yours

1

David Charlap on April 24, 2021 at 3:57 pm

Great article. Here are few other interesting points:

Although sometimes used in apps, the Intel 80-bit floats (aka “long double”) were intended to be used to minimize round-off error. if I remember correctly, the 8087 and related FPUs will perform computations at 80-bit precision and will then convert the results to 32- or 64-bit afterward, in order to eliminate potential round-off errors from intermediate processing steps (that should be hidden to software).

I don’t know if this is still the case, but the technique (computing at higher precision than visible to the user) goes back to the days of calculators which frequently perform computations with 2-3 more digits than are visible on the display, in order to minimize round-off error.

Another interesting bit of trivia, although unrelated to macOS, is that internal representations vary across CPU architectures. Although all modern desktop systems today use twos-complement integers and IEEE floating point numbers (with big- and little-endian variants of each), this isn’t universally the case.

For instance, IBM mainframe architectures (360, 370 and its successors to this day) use sign-magnitude integers and power-of-16 floats.

For example, on an IBM mainframe, an 8-bit signed integer representing -1 is 10000001, compared to a twos-complement CPU where it is 11111111. This also means that IBM signed integers have a representation for -0.

IBM floats are also interesting. Like IEEE floats, they have a sign bit, a mantissa and an exponent, but IBM’s exponents represent powers of 16, not powers of 2. This provides a greater range of values, but potentially less precision, since changes to the exponent result in 4-bit shifts in the mantissa. It also means that IBM systems can’t assume the most significant bit of the mantissa to be a 1 (as they can with IEEE floats), so you lose the extra bit of precision that IEEE “normalized” representations provide.

Years ago, I had to write software for transferring binary data between PCs (running OS/2) and IBM 9370 mainframes. It was fun coming up with integer and floating pointer conversion functions that worked entirely in terms of binary operations (shifts, masks, etc.) because it was far too expensive, computationally, to do it arithmetically on the slow processors of the time.

LikeLiked by 1 person
- 2
  
  hoakley on April 24, 2021 at 10:04 pm
  
  Thank you.
  Yes, in the 8087 the Extended format was primarily intended for internal use. However, it had to be supported by all compilers which supported the co-processor. In those days I used products from MicroWay, which were the bee’s knees, and led me astray into Transputers.
  Howard.
  
  LikeLike
3

Michael Newbery on April 25, 2021 at 1:22 am

David Charlap covered IBM360. The Burroughs Large Systems machines had a 48 bit word, with 96 bit doubles. All numbers were held as floating point (with a power of 8 exponent). Integers were simply floating point numbers with an exponent of zero. All numbers are signed, so zero is +0 or -0 (they compare equal); the maximum integer is 549,755,813,887; the maximum double integer is 302,231,454,903,657,293,676,544; single precision numbers are in the range 4.313591466736256e+68 to 8.758115402030107e-47, and doubles are (8**13-1)*8**32767 to 8**(-32755), which most current processors will simply fail to cope with.
But wait, these machines could also be coerced to do decimal arithmetic (4 bit BCD representation of digits) to keep the accountants happy (and eliminate a whole lot of currency rounding problems).

Meanwhile, some languages (such as Ruby) will also happily let you define Rationals, where 1/3 is not approximated to 0.33333 (or, more problematically, its binary representation), but as Rational(1,3), which can be very useful for avoiding some errors.

And the takeaway from all of this… use numerical libraries whenever possible, where some very smart people have created algorithms that try to eliminate the side-effects of these issues.

LikeLiked by 2 people
- 4
  
  hoakley on April 25, 2021 at 7:27 pm
  
  Thank you.
  The only problem with using some libraries is your reliance on their accuracy and potential errors. For example, while Apple used to publish an excellent and highly-detailed account of its numerics, in the SANE manual, I don’t recall any documentation on numerical support in Mac OS X, at least not of comparable detail or quality. In some cases, even the description of the function is sketchy, there’s no information on the algorithms used, nor their expected errors or limitations.
  Howard.
  
  LikeLike
  - 5
    
    Michael Newbery on April 25, 2021 at 10:50 pm
    
    Ah, yes, the sort of libraries I was meaning were things like IMSL, NAG, Octave and the like.
    Alas, the sort of detailed documentation which used to be provided by Apple on basic system functions (e.g. SANE) seems to have quietly vanished.
    
    LikeLiked by 1 person
    - 6
      
      hoakley on April 26, 2021 at 8:12 am
      
      Thank you.
      Yes, I’ve had licences for IMSL and NAG in the past. They remain seriously specialist, very expensive, and as far as I can see at present don’t make use of any of the features of the M1 like GPU and the neural engine. As for accessing them from the likes of Swift, I think that I’d be on my own.
      Howard.
      
      LikeLike

Share this:

Related