hoakley April 22, 2021 Macs, Technology

Can you trust floating-point arithmetic on Apple Silicon?

In the last five months, we’ve read endless benchmarks run on M1 Macs, and a great deal about their speed. This article asks the other essential question: how accurate are they? Specifically, how does ARM floating-point arithmetic compare with that on Intel processors?

If speed benchmarks seem a bit geeky, floating-point arithmetic might appear as dull as ditchwater. But it’s very important, as so much in macOS relies on the processor’s floating-point instructions working perfectly. Way back in the days of crude colour displays, screen graphics were computed using integers, after all that’s what a display pixel was. For a long time now, they’ve been replaced by floating-point numbers, so every calculation for the display relies on floating-point arithmetic.

One good way of testing the ARM CPU’s floating-point accuracy against that of Intel processors is to look at some well-known calculations which are generally performed incorrectly, yielding results which vary in their errors. I have chosen three from the Handbook of Floating-Point Arithmetic (see reference), in which current Intel processors don’t return the result obtained by exact calculation. This may seem perverse, but looking through a vast number of correctly-performed calculations tells you far less. By their errors you shall know them!

The Muller-Kahan Test (Muller 1.3.2.1)

This calculates a sequence which seems to converge to an incorrect limit, when compared against exact calculations. Swift code is:
let max = 21 var u : Double = 2.0 var v: Double = -4.0 var w: Double for i in 3...max { w = 111.0 - 1130.0/v + 3000.0/(v*u) u = v v = w }

The value of v should, using exact arithmetic, converge on 6 as the number of iterations (max) tends towards infinity. On otherwise accurate processors, rounding errors occur even in early iterations, and the sequence converges on 100. On Intel processors, when max = 21, the final value of v is 99.8985692661829, and that’s exactly the same when run on an M1.

The Chaotic Bank (Muller 1.3.2.2)

This is phrased in a story in which a man goes to a bank, which promises him that, if he deposits exactly $(e – 1) in an account, they will deduct $1 each year as their fee, and multiply the remaining balance by the age of the account in years plus one, i.e. double at the end of the first year, triple at the end of the second, until it’s multiplied by 25 in the 25th and final year. Swift code is:
var account: Double = 1.71828182845904523536028747135 for i in 1...25 { account = (Double(i)*account) - 1.0 }

Exact arithmetic shows that the amount in the account tends to 0. However, if the initial estimate of (e – 1) is slightly below the actual value, the result tend to minus infinity; if that initial estimate is slightly above actual, the result tends to positive infinity. On Intel processors, the final value of account is 1201807247.4104486, and again that’s exactly the same when run on an M1.

Rump’s Function (Muller 1.3.2.3)

Rump designed a function in 1988 which has continued to return incorrect results since he first ran it on an IBM S/370 computer. The numbers have been carefully chosen here to be exactly representable in binary floating-point arithmetic with a precision of more than 17 bits. Swift code is:
let a:Double = 77617.0 let b:Double = 33096.0 let b2:Double = b*b let b4:Double = b2*b2 let b6:Double = b4*b2 let b8:Double = b4*b4 let a2:Double = a*a let firstexpr:Double = (11.0*a2*b2) - b6 - (121.0*b4) - 2.0 let f = (333.75*b6) + (a2*firstexpr) + (5.5*b8) + (a/(2.0*b))

On Rump’s IBM S/370, this returned 1.172603… in single, double and extended precision, although the exact result is -0.827396…. On a modern Intel processor using Doubles, this returns -1.1805916207174113e+21, and that’s exactly the same when run on an M1.

Conclusion

On each of these three unusual test cases, the erroneous results returned by the M1 using double precision arithmetic are identical to those returned by current Intel processors, specifically a 3.2 GHz 8-Core Intel Xeon W. These imply that results from floating-point arithmetic on the two processors should be almost (if not completely) identical, even when they don’t match the results of exact calculation.

Reference

Muller, Brunie et al. (2018) Handbook of Floating-Point Arithmetic, 2nd ed, Birkhäuser, ISBN 978 3 319 76525 9.

13Comments

Add yours

1

Tudorminator on April 22, 2021 at 1:54 pm

Very Interesting, thank you. I must admit that my knowledge of processor architectures or floating point arithmetic is, um, a little rusty (ahem), but this somehow feels like a deliberate decision on Apple’s part. I mean replicating Intel’s errors on a very different architecture.

LikeLiked by 1 person
- 2
  
  hoakley on April 22, 2021 at 10:05 pm
  
  Thank you.
  It’s not so much replicating Intel’s errors: these are inherent in working with limited precision floating point. However, they provide a fingerprint of the processor’s handling of Doubles which is determined by the microcode and algorithms used. I don’t know whether those are set by ARM or by Apple, but they suggest to me that the two processor families calculate using identical algorithms. Whether that’s a deliberate decision for compatibility is something we can only speculate.
  Howard.
  
  LikeLike
3

G.J. Parker on April 22, 2021 at 6:44 pm

Converted the three code segments to fortran (!) and compiled with either gfortran (10.2.0_4) or ifort (2021.1.2) and compared to swift (4.2.1) all on Intel Core i7

swift returns results identical to above.

gfortran returns the same values except for the first example where it appends an extra ’03’. IEEE floating point flags do not affect the result.

ifort with out IEEE floating point flags reproduce the first example, drops two digits on the second and the third where it actually drops the sign and one less digit. With IEEE floating point flags, only the second example returns the same result though with 2 less digits.

Glad to see swift being consistent across versions and processors. Interesting that gcc/gfortran returns almost identical results. Not sure what to say about ifort other than Intel will drop ifort for macOS going forward.

LikeLiked by 1 person
- 4
  
  hoakley on April 22, 2021 at 10:11 pm
  
  Thank you. Those are what I would have suspected, given that the languages should be compiling down to the same instructions, and it’s those which are generating the errors which are reflected here.
  I’m aiming at reproducing these in ARM assembler, to look at that with better control.
  Howard.
  
  LikeLike
5

Samuel Herschbein on April 22, 2021 at 9:14 pm

Mojave running on:
1) Mac Pro 2009 flashed to 5,1 with dual Xeon X5690s.
2) MacBook Air 13″ Mid-2013 with i7-4650U.

I wrote a bash script which calls bc (listed below). Both Macs reported identical results which are far closer to expected. I don’t have an M1 Mac to test on.

MullerKahan: 6.23471861516283893680
ChaoticBank: .03989743111127765155840000000
RumpFunction: -.82739605994682136815

Script:
#!/bin/bash
echo “”

MullerKahan=$(bc -l <<-EndOfScript
max = 21;
u = 2.0;
v = -4.0;
for (i = 3; i <= max; i++) {
w = 111.0 – (1130.0 / v) + (3000.0 / (v*u))
u = v
v = w
};
v;
EndOfScript
)
echo "MullerKahan: $MullerKahan"

ChaoticBank=$(bc -l <<-EndOfScript
account = 1.71828182845904523536028747135;
for (i = 1; i <= 25; i++) {
account = (i * account) – 1.0
};
account;
EndOfScript
)
echo "ChaoticBank: $ChaoticBank"

RumpFunction=$(bc -l <<-EndOfScript
a = 77617.0;
b = 33096.0;
b2 = b * b;
b4 = b2 * b2;
b6 = b4 * b2;
b8 = b4 * b4;
a2 = a * a;
firstexpr = (11.0 * a2 * b2) – b6 – (121.0 * b4) – 2.0;
(333.75 * b6) + (a2 * firstexpr) + (5.5 * b8) + (a / (2.0 * b));
EndOfScript
)
echo "RumpFunction: $RumpFunction"

echo ""

LikeLiked by 1 person
- 6
  
  Samuel Herschbein on April 22, 2021 at 9:45 pm
  
  FYI: bc -l defaults bc to use scale=20. The Chaotic Bank result is longer since e-1 (1.718…) is specified to more than 20 decimals.
  
  LikeLiked by 1 person
- 7
  
  hoakley on April 22, 2021 at 10:16 pm
  
  Thank you. Those are fascinating results, as bc is, I gather, arbitrary precision, so does a great deal more at higher precision than mere Doubles.
  In this case, it’s not the accuracy that’s important, but the comparison between the two architectures. Had there been substantial differences, then they might have led to Intel and ARM Macs generating different results to the same computations, which could for instance determine how display graphics appear, and whether series converge as expected. So the errors are more fingerprints than tests to be ‘passed’.
  Howard.
  
  LikeLike
8

Krzysztof on April 23, 2021 at 6:50 am

I’m curious how precise is their Roseta 2 – you could compile this code to x64 and run under emulation.

LikeLiked by 1 person
- 9
  
  hoakley on April 23, 2021 at 7:17 am
  
  Thank you. It’s easier than that: just a tick in a box in the Get Info dialog. And the results under Rosetta 2 are identical.
  I would have expected that anyway, as Rosetta 2 merely translates instructions from Intel to ARM. As the underlying instructions should be doing the same thing anyway, I would have been surprised to see any difference.
  Your mileage may vary when using something like bc or an interpreted language.
  Howard.
  
  LikeLike
10

Grumpy on April 23, 2021 at 11:09 am

Ah the joy of the so called Reals as digitally approximated by IEEE754 or similar, next stop full on numerical analysis!

I would expect these sorts of hardware numerics details to be part of the necessary ARM compliance rather than Apple Silicon implementation details — that said there may be some implementation details in very exceptional cases that may be undefined by ARM numeric compliance were Apple was free to choose. I have not researched any of this but I would be quite surprised to see the types of differences we once saw between POWER/68K vs intel numerics — one would expect ARM to have deliberately followed the relevant IEEE standards and to necessarily have a mode for maximum intel compatibility if there was in fact differences in results. Anyone who has had the near boundless joy of working across different hardware numerics and different platforms compilers for something like wavelet based image compression can attest to how the smallest of hardware implementation details really does matter when you need exact binary compatibility of the results.

See chapter 3 here for the relevant mathematical formalizations:

https://books.google.com/books/about/Computer_Arithmetic_and_Validity.html?id=kFR0yMDS6d4C

tldr is that FP numbers and the basic operations upon them do not form the mathematical objects (fields/rings) with the “nice” properties people are familiar with and expect from everyday arithmetic.

As a small aside this part of computer science touches on just a very tiny part of what is a fundamental and deep mathematical subject.

For what should be an interesting conversation headed down the rabbit hole ask your local physicist or philosopher why exactly it is valid idea to model reality using the reals (continuous) vs discrete mathematics and just what are the assumptions made in and the implications of doing so.

LikeLiked by 1 person
- 11
  
  hoakley on April 23, 2021 at 9:28 pm
  
  Thank you.
  I think the situation with the M1 SoC is rather different. We can’t know exactly what of the ARM design Apple has licensed, or what in the M1 was designed by ARM, and what by Apple. For instance, the M1 integrates three separate numeric processing units: the ARM cores, with their floating point maths, the GPUs, which are Apple-designed, and its neural engine, which again is Apple not ARM. We also know that M1 cores support instructions which aren’t in the licensed ARM instruction set.
  IEEE 754 isn’t a particularly tightly prescriptive standard either – for example, one of the most important aspect in any implementation is the method used for rounding. The standard offers a choice of five different rounding methods, which can round any number up or down, according to the options chosen in that implementation.
  Anyway – more in tomorrow’s explainer article.
  Howard.
  
  LikeLike
12

Grumpy on April 23, 2021 at 11:50 am

Here is the FP section of the ARM docs

https://developer.arm.com/documentation/den0042/latest/Floating-Point/Floating-point-basics-and-the-IEEE-754-standard

Anyway all else being equal, if I recall correctly from what is now ancient non-ARM related experience it was the different precisions during individual calculation operations that were the biggest FP hardware differences causing problems in the past where one chip was using 80 bits of precision vs 64 for the other for intermediate results and then truncation/rounding issues happened when storing the result. The various rounding modes, over/underflow and NAN handling could be aligned, managed or avoided but it was very much harder to guarantee binary compatibility of calculated results.

I wonder how actual validation of FP hardware (or modern digital hardware in general) is done? A formal proof of correctness of the specific circuit logic combined with some kind of testable and composable verification of the correctness of the physical state of the implementation or ?

LikeLiked by 1 person
- 13
  
  hoakley on April 23, 2021 at 9:34 pm
  
  Thank you. I’m not sure that document refers to the ARM products licensed by Apple – ARM has a bewildering variety of different documents available, none of which seems to relate directly to what we know as ARM64. However, it’s fascinating to read the occasional confessions that, for instance, discarding very small results may be performed for efficiency, although that’s ‘strictly not in compliance’ with the standard!
  The issues over extended, 80-bit floats largely came with 8087 co-processor, which remains one of the most advanced maths co-processors and a huge influence over where floating point has gone. And yes, there remain a lot of problems with conversion between 64- and 80-bit floating point values.
  Howard.
  
  LikeLike

·Comments are closed.

Share this:

Related