On the maximum relative error when computing x n in floating-point - PowerPoint PPT Presentation

On the maximum relative error when computing x n in floating-point arithmetic Jean-Michel Muller Joint work with S. Graillat and V. Lefèvre INVA 2014

Thank you!

Floating-Point numbers, roundings Precision- p binary FP number (set F p ): either 0 or x = X · 2 e x − p + 1 , where X and e x ∈ Z , with 2 p − 1 ≤ | X | ≤ 2 p − 1. unlimited exponent range → results valid unless underflow/overflow occurs; X : integral significand of x ; 2 1 − p · X : significand of x ; e x : exponent of x .

Floating-Point numbers, roundings In general, the sum, product, quotient, etc., of two FP numbers is not an FP number: it must be rounded; correct rounding: Rounding function ◦ , and when ( a ⊤ b ) is performed, the returned value is ◦ ( a ⊤ b ) ; default rounding function RN: ( i ) for all FP numbers y , | RN ( t ) − t | ≤ | y − t | ( ii ) if there are two FP numbers that satisfy ( i ), RN ( t ) is the one whose integral significand is even.

Relative error due to roundings, u and γ notations Let t ∈ R , 2 e ≤ t < 2 e + 1 . we have 2 e ≤ RN ( t ) ≤ 2 e + 1 , and | RN ( t ) − t | ≤ 2 e − p . (1) → upper bound on the relative error due to rounding t : � RN ( t ) − t � � ≤ 2 − p . � � (2) � � t � u = 2 − p : rounding unit.

Relative error due to roundings, u and γ notations 2 e − p 2 e t 2 e + 1 | t − ˆ ≤ 2 e − p t | ˆ t = RN ( t ) ≤ u · t . Figure 1: In precision-p binary FP arithmetic, in the normal range, the relative error due to rounding to nearest is bounded by u = 2 − p .

Relative error due to roundings, u and γ notations Floating-point multiplication a × b : exact result z = ab ; computed result ˆ z = RN ( z ) ; ( 1 − u ) · z ≤ ˆ z ≤ ( 1 + u ) · z ; (3) → when we approximate π n = a 1 · a 2 · · · · · · · a n by ˆ π n = RN ( · · · RN ( RN ( a 1 · a 2 ) · a 3 ) · · · · ) · a n ) , we have Theorem 1 ( 1 − u ) n − 1 π n ≤ ˆ π n ≤ ( 1 + u ) n − 1 π n . (4)

Relative error due to roundings, u and γ notations → relative error on the product a 1 · a 2 · · · · · · · a n bounded by ψ n − 1 = ( 1 + u ) n − 1 − 1 . if we define (Higham) ku γ k = 1 − ku , then, as long as ku < 1 (always holds in practical cases), k · u ≤ ψ k ≤ γ k . → classical bound: γ n − 1 . For “reasonable” n , ψ n − 1 is very slightly better than γ n − 1 , yet γ n − 1 is easier to manipulate; in single and double precision we never observed a relative error ≥ ( n − 1 ) · u .

Special case: n ≤ 4 The bound on the relative error due to rounding can be slightly improved (using a remark by Jeannerod and Rump): if 2 e ≤ t < 2 e + 1 , then | t − RN ( t ) | ≤ 2 e − p = u · 2 e , and if t ≥ 2 e · ( 1 + u ) , then | t − RN ( t ) | / t ≤ u / ( 1 + u ) ; if t = 2 e · ( 1 + τ · u ) with τ ∈ [ 0 , 1 ) , then | t − RN ( t ) | / t = τ · u / ( 1 + τ · u ) < u / ( 1 + u ) , → the maximum relative error due to rounding is bounded by u / ( 1 + u ) (attained → no further improvement); → we can replace (4) by � n − 1 � n − 1 � u � u 1 − π n ≤ ˆ π n ≤ 1 + π n . (5) 1 + u 1 + u

Special case: n ≤ 4 Property 1 If 1 ≤ k ≤ 3 then � k � u 1 + < 1 + k · u . 1 + u k = 2: − ( 1 + 2 u ) = − u 2 · ( 1 + 2 u ) � 2 � u 1 + < 0 ; ( 1 + u ) 2 1 + u k = 3: − ( 1 + 3 u ) = − u 3 · ( 2 + 3 u ) � 3 � u 1 + < 0 . ( 1 + u ) 3 1 + u k = n − 1 → for n ≤ 4, the relative error of the iterative product of n FP numbers is bounded by ( n − 1 ) · u .

The particular case of computing powers “General” case of an iterated product: no proof for n ≥ 5 that ( n − 1 ) · u is a valid bound (when starting the study we conjectured this is the case); → focus on x n , where x ∈ F p and n ∈ N ; we assume the “naive” algorithm is used: y ← x for k = 2 to n do y ← RN ( x · y ) end for return y notation: ˆ x j = value of y after the iteration corresponding to k = j in the for loop.

Main result We wish to prove Theorem 2 Assume p ≥ 5 (holds in all practical cases). If � 2 1 / 3 − 1 · 2 p / 2 , n ≤ then x n − x n | ≤ ( n − 1 ) · u · x n . | ˆ we can assume 1 ≤ x < 2; two cases: x close to 1, and x far from 1.

Preliminary results First, ( 1 − u ) n − 1 ≥ 1 − ( n − 1 ) · u for all n ≥ 2 and u ∈ [ 0 , 1 ] . → the left-hand bound of ( 1 − u ) n − 1 π n ≤ ˆ π n ≤ ( 1 + u ) n − 1 π n . suffices to show that 1 − ( n − 1 ) · u · x n ≤ ˆ x n → to establish the Theorem, we only need to focus on the right-hand bound.

Preliminary results For t � = 0, define t t = 2 ⌊ log 2 | t |⌋ . We have, Lemma 3 Let t be a real number. If 2 e ≤ w · 2 e ≤ | t | < 2 e + 1 , e ∈ Z (6) (in other words, if w ≤ | t | ) then � RN ( t ) − t � � ≤ u � � w . � � t �

w | t − RN ( t ) | ≤ u t w y 2 e z 2 e + 1 | y − ˆ y | | z − ˆ z | u u = 1 + u (largest) = y z 2 − u y = RN ( y ) ˆ ˆ z = RN ( z ) Figure 2: The bound on the relative error due to rounding to nearest can be reduced to u / ( 1 + u ) . Furthermore, if we know that w ≤ t = t / 2 e , then | RN ( t ) − t | / t ≤ u / w.

0.06 0.05 0.04 0.03 0.02 0.01 0 1 2 3 4 5 6 7 8 t Figure 3: Relative error due to rounding, namely | RN ( t ) − t | / t , for 1 5 ≤ t ≤ 8, and p = 4.

Local maximum error for x 6 as a function of x ( p = 53) Figure 4: The input interval [ 1 , 2 ) is divided into 512 equal-sized subintervals. In each subinterval, we calculate x 6 for 5000 consecutive FP numbers x , compute the relative error, and plot the largest attained error.

Main idea behind the proof At least once in the execution of the algorithm, x · y is far enough from 1 to sufficiently reduce the error bound on the multiplication y ← RN ( x · y ) , so that the overall error bound becomes ≤ ( n − 1 ) · u . y ← x for k = 2 to n do y ← RN ( x · y ) end for return y ψ n − 1 = ( 1 + u ) n − 1 − 1 = ( n − 1 ) u + 1 / 2 n 2 − 3 / 2 n + 1 u 2 + · · · � � → we have to save ≈ n 2 2 u 2 , which requires one of the values x · y to be larger than ≈ 1 + n 2 2 u .

What we are going to show Unless x is very near 1, at least once x · y ≥ 1 + n 2 u , so that in (4) the term ( 1 + u ) n − 1 can be replaced by � u � ( 1 + u ) n − 2 · 1 + . 1 + n 2 u → we need to bound this last quantity. We have, Lemma 4 If 0 ≤ u ≤ 2 / ( 3 n 2 ) and n ≥ 3 then � u � ( 1 + u ) n − 2 · 1 + ≤ 1 + ( n − 1 ) · u . (7) 1 + n 2 u

Proof of Lemma 4 (with the help of Bruno Salvy) Proving the Lemma reduces to proving that P ( u ) = ( 1 + ( n − 1 ) u )( 1 + n 2 u ) − ( 1 + u ) n − 2 ( 1 + n 2 u + u ) ≥ 0 for 0 ≤ u ≤ 2 / ( 3 n 2 ) . We have ln ( 1 + u ) ≤ u − u 2 2 + u 3 3 . ln ( 1 + u ) ≤ u ⇒ ( n − 2 ) ln ( 1 + u ) < 1 / ( 2 n ) ≤ 1 / 6; For 0 ≤ t ≤ 1 / 6, e t ≤ 1 + t + 3 5 t 2 ; → for 0 ≤ u ≤ 2 / ( 3 n 2 ) , to prove that P ( u ) ≥ 0 it suffices to prove that � n 2 u + 1 � Q ( n , u ) = ( 1 + ( n − 1 ) u ) � 3 u 3 � 2 � 2 u 2 + 1 2 u 2 + 1 5 ( n − 2 ) 2 � � u − 1 3 u 3 � + 3 u − 1 − 1 + ( n − 2 ) (8) � n 2 u + u + 1 � × ≥ 0 .

Proof of Lemma 4 (with the help of Bruno Salvy) By defining a = n 2 u , ( 5 n 2 / a 2 ) · Q ( n , u ) is equal to 29 2 a + 19 + 3 a 2 − 17 a − 7 a ( 82 a − 5 ) − 1 S ( n , a ) = − 3 a + 2 + 2 n n 2 6 n 3 a ( 33 a 2 − 187 a + 20 ) a 2 ( 12 a 2 − 153 a + 52 ) a 2 ( 33 a − 8 ) − 1 + 1 + 1 12 n 4 3 n 5 12 n 6 (9) a 3 ( a 2 − 14 a + 21 ) − a 3 ( 4 a − 7 ) a 4 ( a − 2 ) a 4 ( 5 a − 8 ) − 1 + 4 − 1 n 7 3 n 8 3 n 9 3 n 10 a 5 a 5 + 4 n 11 − 4 3 3 n 12 We wish to show that S ( n , a ) ≥ 0 for 0 ≤ a ≤ 2 / 3.

We examine the terms of S ( n , a ) separately. For a ∈ [ 0 , 2 / 3 ] and n ≥ 3: − 3 a + 2 is always larger than 0; � 29 n − 1 is always larger than 19 / ( 2 n ) ; 2 a + 19 � 2 3 a 2 − 17 a − 7 is always larger than − 6 / n ; n 2 a ( 82 a − 5 ) − 1 is always larger than − 7 / ( 10 n ) ; 6 n 3 a ( 33 a 2 − 187 a + 20 ) − 1 is always larger than − 17 / ( 10000 n ) ; 12 n 4 a 2 ( 33 a − 8 ) 1 is always larger than − 3 / ( 10000 n ) ; 3 n 5 a 2 ( 12 a 2 − 153 a + 52 ) 1 is always larger than − 69 / ( 10000 n ) ; 12 n 6 − a 3 ( 4 a − 7 ) is always larger than 0; n 7 a 3 ( a 2 − 14 a + 21 ) − 1 is always larger than − 6 / ( 10000 n ) ; n 8 3 a 4 ( a − 2 ) 4 is always larger than − 6 / ( 100000 n ) ; n 9 3 a 4 ( 5 a − 8 ) a 5 − 1 and 4 n 11 are always larger than 0; 3 n 10 3 a 5 − 4 n 12 is always larger than − 1 / ( 1000000 n ) . 3 → for 0 ≤ a ≤ 2 / 3 and n ≥ 3, S ( n , a ) ≥ 2790439 / ( 1000000 n ) .

Two remarks Remark 1 � 2 / 3 · 2 p / 2 . If ∃ k ≤ n s.t. RN ( x · ˆ Assume n ≤ x k − 1 ) ≤ x · ˆ x k − 1 (i.e., if in the algorithm at least one rounding is done downwards), then x n ≤ ( 1 + ( n − 1 ) · u ) x n . ˆ Proof. We have x n ≤ ( 1 + u ) n − 2 x n . ˆ Lemma 4 implies ( 1 + u ) n − 2 < 1 + ( n − 1 ) · u . Therefore, x n ≤ ( 1 + ( n − 1 ) · u ) x n . ˆ

On the maximum relative error when computing x n in floating-point - PowerPoint PPT Presentation

On the maximum relative error when computing x n in floating-point arithmetic Jean-Michel Muller Joint work with S. Graillat and V. Lefvre INVA 2014 Thank you! Floating-Point numbers, roundings Precision- p binary FP number (set F p ):

Finding Maxima and Minima For a function of two variables what does a relative maximum or relative

MAXIMUM CARDS MAXIMUM CARDS What is a Maximum Card ? The Maximum Card is the one which contains a

Chapter 11: The R.M.S. Error for Regression Errors: A has a large positive error B has a large

Relative Extrema Michael Freeze MAT 151 UNC Wilmington Summer 2013 1 / 13 Section 5.2 ::

ERROR DETECTON & CORRECTION Error Detection EDC= Error Detection and Correction bits

Absolute and relative error Let z = exact answer to some problem, z = computed answer using

Relative Clause clause adds additional information to the noun in the sentence. 1 Direct

Advanced Lesson 22 Topic 22: Dreams. Grammar: Relative clauses and reduced relative clauses

What is the maximum efficiency that What is the maximum efficiency that What is the maximum

Relative Density Chapters 3.5 Relative Density 1 2/5/2015 Minimum Density Pluviate soil from

Human Error and Human Error Identification Techniques adapted from an IE 545 presentaton by

An Overview of Human Error Drawn f rom J . Reason, Human Error , Cambridge, 1990 Aaron Brown CS

Questions From Chapter 1 Figure 1.1: Testing life cycle Ch 12 Error vocabulary 1

Error Detection Codes Error Detection Two types Nave scheme Error Detection Codes

llvm::Error Rich Error Handling in LLVM Error Handling History LLVMs APIs historically

Maximum Likelihood properties Maximum parsimony Maximum likelihood Experimental design

http://fpanalysistools.org/ 1 This work was performed under the auspices of the U.S. Department

\ Context and motivation (half) A New Probabilistic Rounding Error Analysis 2/18 the rise of

AM205: lecture 2 Assignment 0 solutions will be posted on Friday Assignment 1 will be

Probabilistic Rounding Error Analysis for Sums Eric Hallman North Carolina State University

Computation of the error functions erf and erfc in arbitrary precision with correct rounding

FL FLOATING TING-POIN OINT T ROUTINES OUTINES Zvonimir Rakamari FL FLOATING TING-POINT

GRAM-SCHMIDT ORTHOGONALIZATION WITH STANDARD AND NON-STANDARD INNER PRODUCT: ROUNDING ERROR

Introduction to Computer Arithmetic for Efficient Hardware Implementations Arnaud Tisserand

On the maximum relative error when computing x n in floating-point - PowerPoint PPT Presentation

On the maximum relative error when computing x n in floating-point arithmetic Jean-Michel Muller Joint work with S. Graillat and V. Lefvre INVA 2014 Thank you! Floating-Point numbers, roundings Precision- p binary FP number (set F p ):

Finding Maxima and Minima For a function of two variables what does a relative maximum or relative

MAXIMUM CARDS MAXIMUM CARDS What is a Maximum Card ? The Maximum Card is the one which contains a

Chapter 11: The R.M.S. Error for Regression Errors: A has a large positive error B has a large

Relative Extrema Michael Freeze MAT 151 UNC Wilmington Summer 2013 1 / 13 Section 5.2 ::

ERROR DETECTON &amp; CORRECTION Error Detection EDC= Error Detection and Correction bits

Absolute and relative error Let z = exact answer to some problem, z = computed answer using

Relative Clause clause adds additional information to the noun in the sentence. 1 Direct

Advanced Lesson 22 Topic 22: Dreams. Grammar: Relative clauses and reduced relative clauses

What is the maximum efficiency that What is the maximum efficiency that What is the maximum

Relative Density Chapters 3.5 Relative Density 1 2/5/2015 Minimum Density Pluviate soil from

Human Error and Human Error Identification Techniques adapted from an IE 545 presentaton by

An Overview of Human Error Drawn f rom J . Reason, Human Error , Cambridge, 1990 Aaron Brown CS

Questions From Chapter 1 Figure 1.1: Testing life cycle Ch 12 Error vocabulary 1

Error Detection Codes Error Detection Two types Nave scheme Error Detection Codes

llvm::Error Rich Error Handling in LLVM Error Handling History LLVMs APIs historically

Maximum Likelihood properties Maximum parsimony Maximum likelihood Experimental design

http://fpanalysistools.org/ 1 This work was performed under the auspices of the U.S. Department

\ Context and motivation (half) A New Probabilistic Rounding Error Analysis 2/18 the rise of

AM205: lecture 2 Assignment 0 solutions will be posted on Friday Assignment 1 will be

Probabilistic Rounding Error Analysis for Sums Eric Hallman North Carolina State University

Computation of the error functions erf and erfc in arbitrary precision with correct rounding

FL FLOATING TING-POIN OINT T ROUTINES OUTINES Zvonimir Rakamari FL FLOATING TING-POINT

GRAM-SCHMIDT ORTHOGONALIZATION WITH STANDARD AND NON-STANDARD INNER PRODUCT: ROUNDING ERROR

Introduction to Computer Arithmetic for Efficient Hardware Implementations Arnaud Tisserand

ERROR DETECTON & CORRECTION Error Detection EDC= Error Detection and Correction bits