Algorithms using real numbers Floating point arithmetic limited - PDF document

Algorithms using “real” numbers • Floating point arithmetic – limited precision Algorithms using “real” numbers – computational errors – IEEE standard – type double in Java • Computational errors due to limited precision arithmetic • floating point arithmetic – typical errors involving summation, convergence, subtraction • typical computational errors due to limited precision arithmetic – how to avoid some of them • Applications of numerical algorithms • advice for constructing algorithms using real numbers – Ex. π -computation – graphical / geometric computations – scientific computations 1 2 Floating point arithmetic: limited precision • A computer can handle only a finite subset of the reals directly Floating point arithmetic: limited precision in hardware • These numbers make up a floating point number system • Each floating point number in F ( β, s, m, M ) has the form F ( β, s, m, M ) characterized by y = ± d 1 .d 2 . . . d s · β e , β ∈ N\{ 1 } a base a number of digits s ∈ N with m ≤ e ≤ M and 0 ≤ d k ≤ β − 1. • The mantissa d 1 .d 2 . . . d s represents d 1 + d 2 · β − 1 + · · · + d s · β − s +1 a smallest exponent m ∈ Z M ∈ Z largest exponent • the exponent part is β e • Each floating point number has the form • The system includes 0 = 0 . 0 . . . 0 · β e . y = ± d 1 .d 2 . . . d s · β e , • All other numbers are normalised: 1 ≤ d 1 ≤ β − 1 with m ≤ e ≤ M and 0 ≤ d k ≤ β − 1. 3 4 Floating point arithmetic: limited precision Floating point arithmetic: computational errors • Example: the number system F (2 , 3 , − 2 , 1) contains 33 numbers: • standard demo system is F (10 , 4 , − 99 , 99) • 1.573 og 0.1824 are representable, but the following are not − 1 . 11 · 2 1 . . . 1 . 573 + 0 . 1824 = 1 . 7554 − 1 . 00 · 2 − 2 1 . 573 − 0 . 1824 = 1 . 3906 0 . 00 × 1 . 573 0 . 1824 = 0 . 2869152 1 . 00 · 2 − 2 1 . 573 / 0 . 1824 = 8 . 6239035 . . . . . . • Exact arithmetic on a finite subset of reals is not possible. 1 . 11 · 2 1 • strategy for rounding/truncating to a representable number is needed − 4 − 3 − 2 − 1 0 1 2 3 4 5 6

Floating point arithmetic: computational errors Floating point arithmetic: computational errors • Algebraic laws invalid: • rounding/truncation defined by function fl : R �→ M , where M (1 . 418 ⊕ 2937) ⊖ 2936 = 2938 ⊖ 2936 = 2 . 000 is machine numbers 1 . 418 ⊕ (2937 ⊖ 2936) = 1 . 418 ⊕ 1 . 000 = 2 . 418 • machine arithmetic operations ⊕ , ⊖ , ⊗ , ⊘ defined by 1 . 418 ⊗ (2001 ⊖ 2000) 1 . 418 ⊗ 1 . 000 = = 1 . 418 x ⊕ y = fl ( x + y ) etc. (1 . 418 ⊗ 2001) ⊖ (1 . 418 ⊗ 2000) = 2837 ⊖ 2836 = 1 . 000 • In demo system F (10 , 4 , − 99 , 99) define fl by truncation, e.g. • The usual associative and distributive laws are not valid for machine arithmetic. Sometimes: 1 . 573 0 . 1824 = fl (1 . 573 + 0 . 1824) = fl (1 . 7554) = 1 . 755 ⊕ 1 . 573 0 . 1824 = fl (1 . 573 0 . 1824) = fl (1 . 3906) = 1 . 390 ⊖ − 1 . 573 0 . 1824 = fl (1 . 573 0 . 1824) = fl (0 . 2869152) = . 2869 ⊗ × ( a ⊕ b ) ⊖ c a ⊕ ( b ⊖ c ) � = 1 . 573 0 . 1824 = fl (1 . 573 / 0 . 1824) = fl (8 . 6239035 . . . ) = 8 . 623 ⊘ a ⊗ ( b ⊖ c ) � = ( a ⊗ b ) ⊖ ( a ⊗ c ) 7 8 Floating point arithmetic in Java IEEE standard for binary floating point arithmetic • Java implementations use IEEE standard for the primitive types • IEEE standard describes two number systems, approximately float and double on most computers Single precision F (2 , 24 , − 126 , 127) • Use modifier strictfp to enforce IEEE standard independent of Double precision F (2 , 53 , − 1022 , 1023) the underlying hardware. • The IEEE systems has more numbers: • Representable values in type double are ± b 1 .b 2 . . . b 53 · 2 e , – closes the representational gap around zero with xtra numbers where b i ∈ { 0 , 1 } and − 1022 ≤ e ≤ 1023 of the form 0 .b 1 b 2 . . . b 52 2 − 1022 ±∞ and NaN – has representations for ±∞ and NaN (= Not a Number) • Java syntax for ±∞ and NaN is: Double.NEGATIVE INFINITY , • The IEEE system has detailed rules for rounding to representable Double.POSITIVE INFINITY and Double.NaN numbers, ex. • Warning: legal java double literals may not be directly – overflow: 1 / 0 or 2 1023 · 2 has result ∞ representable: literal 0.2 is represented as – underflow: 1 / ∞ or 2 − 1022 − 52 / 2 has result 0 fl (0 . 2 10 ) = fl (0 . 00110011 . . . 2 ) = – 0 / 0 or ∞ − ∞ has result NaN 1 . 1001100110011001100110011001100110011001100110011010 · 2 − 3 9 10 Sources of errors in numerical computations Limited precision arithmetic • The limited precision of arithmetic in R (10 , 4 , − 99 , 99) is illustrated by computing • Hardware errors n →∞ (1 + 1 – Pentium bug n ) n e = lim • Limited precision arithmetic – summation: adding numbers of unequal magnitude (1 ⊕ 1 ⊘ 10 k ) 10 k 1 ⊘ 10 k 1 ⊕ (1 ⊘ 10 k ) k – subtraction: catastrophic cancellation 1 1 . 000 E-1 1 . 100 2 . 591 – convergence: may cycle or diverge 2 1 . 000 E-2 1 . 010 2 . 591 – formula: mathematically equivalent formulae have different 3 1 . 000 E-3 1 . 001 2 . 591 computational properties 4 1 . 000 E-4 1 . 000 1 . 000 • Logical errors 5 1 . 000 E-5 1 . 000 1 . 000 11 12

Limited precision arithmetic Limited precision arithmetic • The e -computation in R (2 , 24 , − 126 , 127): (1 ⊕ 1 ⊘ 2 k ) 2 k • The e -computation seems to give a reasonable result, if one stops 1 ⊘ 2 k 1 ⊕ (1 ⊘ 2 k ) k just before the result degenerates to 1? 1 0 . 5 1 . 5 2 . 25 • WRONG: 2 0 . 25 1 . 25 2 . 4414062 3 0 . 125 1 . 125 2 . 5657845 e 4 0 . 0625 1 . 0625 2 . 6379285 R (10 , 4 , − 99 , 99) 2 . 591 . . . . . . . . . . . . R (2 , 24 , − 126 , 127) 2.71 75977 22 2 . 3841858 E-7 1 . 0000002 2 . 7175977 R (2 , 53 , − 1022 , 1023) 2.7182818 08182473 23 1 . 1920929 E-7 1 . 0000001 2 . 7175977 exact 2.71828182845904523536 ... 24 5 . 9604645 E-8 1 . 0 1 . 0 • less than half the significant digits are correct! 25 2 . 9802322 E-8 1 . 0 1 . 0 13 14 Summation • try forward summation: Summation H n = ( · · · ((1 ⊕ 1 2 ) ⊕ 1 3 ) ⊕ · · · ⊕ 1 n ) • The n th harmonic number is defined by ⊕ , ⊖ , ⊗ , ⊘ forward n H n exact n 10 2 . 928 2 . 927 1 i = 1 + 1 2 + 1 3 + · · · + 1 � H n = n 100 5 . 187 5 . 142 i =1 1000 7 . 485 7 . 069 • Associative law is not valid. 10000 9 . 787 7 . 069 • Which order of summation is best? 100000 12 . 09 7 . 069 – from the left? 1000000 14 . 39 7 . 069 – from the right? • The computed sum is constant for n ≥ 1000 since – pairwise in binary tree? 1 • experiment in toy system R (10 , 4 , − 99 , 99) with truncation 1001 = 7 . 069 ⊕ 9 . 990 · 10 − 4 = 7 . 069 7 . 069 ⊕ 15 16 Summation Summation • try backward summation: H n = (1 ⊕ ( 1 2 ⊕ ( 1 3 ⊕ · · · ⊕ 1 • add terms pairwise in a tree structure n ) · · · )) H n = ( · · · ((1 ⊕ 1 2 ) ⊕ ( 1 3 ⊕ 1 4 ⊕ )) · · · ⊕ 1 n ) · · · )) ⊕ n H n exact ⊕ , ⊖ , ⊗ , ⊘ backward forward 10 2 . 928 2 . 928 2 . 927 ⊕ ⊕ 100 5 . 187 5 . 170 5 . 142 1000 7 . 485 7 . 284 7 . 069 ⊕ ⊕ ⊕ ⊕ 10000 9 . 787 8 . 069 7 . 069 100000 12 . 09 8 . 069 7 . 069 1 1 1 1 1 1 1 1 1000000 14 . 39 8 . 069 7 . 069 2 3 4 5 6 7 8 • By adding terms in increasing order, the error is reduced! 17 18

Summation Summation • similar experiment for java double, R (2 , 53 , − 1022 , 1023) n H n exact ⊕ , ⊖ , ⊗ , ⊘ tree-wise backward forward H 100000 error + , − , · , / exact 12 . 090146129863428 10 2 . 928 2 . 928 2 . 928 2 . 927 − 3 · 10 − 15 100 5 . 187 5 . 183 5 . 170 5 . 142 ⊕ , ⊖ , ⊗ , ⊘ “tree-wise” 12 . 0901461298634 31 20 · 10 − 15 ⊕ , ⊖ , ⊗ , ⊘ backward 12 . 0901461298634 08 1000 7 . 485 7 . 477 7 . 284 7 . 069 93 · 10 − 15 10000 9 . 787 9 . 778 8 . 069 7 . 069 ⊕ , ⊖ , ⊗ , ⊘ forward 12 . 090146129863 335 100000 12 . 09 12 . 07 8 . 069 7 . 069 • seems better than R (10 , 4 , − 99 , 99) ? 1000000 14 . 39 14 . 36 8 . 069 7 . 069 • truncation errors add up, but rounding errors may cancel out • By adding equal sized terms the error is diminished further! (though there is no guarantee!) 19 20 Convergence: monotone • Example: find root of polynomial f ( x ) using Newton iteration Summation • compute z = lim k →∞ x k for x k +1 = x k − f ( x k ) f ′ ( x k ) • Problem – compute x 1 + x 2 + · · · + x n • Warning: f ( x k ) y = f ( x ) – ⊕ not associative! • Advice: – Add terms in increasing order α z – Even better: add terms of equal size only x k +1 x k 21 22 Convergence: monotone √ Convergence: monotone • Computing 2 by Newton iteration • Example: squareroot computation, √ a R (10 , 4 , − 99 , 99) R (2 , 53 , − 1022 , 1023) • Find root of f ( x ) = x 2 − a . k x k x k 0 1 . 500 1 . 5000000000000000 • Newton iteration: compute 1 1 . 416 1 . 41 66666666666667 x k +1 = x k − x 2 k − a 2 1 . 414 1 . 41421 56862745099 2 x k 3 1 . 414 1 . 41421356237 46899 using x 0 = 1 4 1 . 414 1 . 414213562373095 1 2( a + 1) 5 1 . 414 1 . 4142135623730950 • Does it hold that 6 1 . 414 1 . 414213562373095 1 k →∞ x k = √ a lim √ exact 2 1 . 414 1 . 4142135623730950 for limited precision arithmetic? • the computation in R (2 , 53 , − 1022 , 1023) seems to be cycling? 23 24

Algorithms using real numbers Floating point arithmetic limited - PDF document

Algorithms using real numbers Floating point arithmetic limited precision Algorithms using real numbers computational errors IEEE standard type double in Java Computational errors due to limited precision

Complex Numbers Complex Numbers 1 / 19 Complex Numbers Complex numbers ( C ) are an extension of

Algorithms using real numbers (continued) Noter ch.4 Algorithms using real numbers

BY THE NUMBERS BY THE NUMBERS BY THE NUMBERS BY THE NUMBERS BY THE NUMBERS BY THE NUMBERS

Classes of Real Numbers All real numbers can be represented by a line: 1/2 1 0

An introduction to complex numbers The complex numbers Are the real numbers not sufficient? A

Real Numbers in Real Applications John Harrison Intel Corporation Real numbers for fun and

Dual Numbers Gino van den Bergen gino@dtecta.com Introduction Dual numbers extend real

Real Numbers and their Properties Types of Numbers Z + Natural numbers - counting numbers - 1

Real graduates, Real graduates, real transitions, real transitions, real stories: real

Binary Numbers 723 Binary Numbers 723 = 7x100 + 2x10 + 3x1 Binary Numbers 723 = 7x100 + 2x10 +

Signed numbers Goals unsigned numbers - non-negative integers signed numbers - positive/negative

Binary Numbers Binary numbers look like this Binary Numbers or Binary Code Binary numbers or

Monday Week 11 1,000 numbers 10,000 numbers 100,000 numbers 1/15 Sorting Algorithms randomly

Random Numbers, Files, and Onwards Random Numbers Computers cannot produce truly random numbers.

Review of Natural Numbers, Review of Irrational Numbers Real Numbers Properties of Exponents

Math 211 Math 211 Complex Numbers and Matrices October 29, 2001 2 Complex Numbers Complex

Wadim Zudilin 1 October 2015 59th Annual Meeting of the It is not known exactly where and

Monte Carlo Methods Why Monte Carlo? Computation of number ( 1) n +1 = 1 1 2 + 1

Deep Learning for Geometry Processing 3D Representations View-Based and Volumetric CNNs 3D

Rituparna, a king of Ayodhya said 5 000 years ago: I of dice possess the science and in

Mirosaw Brozis IBL in maths lesson - is it possible ? Poland, bromir1@poczta.fm WHY ? HOW ?

International Summer school on TeV Experimental Physics (iSTEP 2015) Second Bulletin

Introduction to the Data Library [DL] a tool to analyze and visualize data I. Introduction

The Web-based SWISS WORLD ATLAS interactive Evaluation of User Experiences Ch. Haeberling, L.