on various ways to split a floating point number
play

On various ways to split a floating-point number Claude-Pierre - PowerPoint PPT Presentation

On various ways to split a floating-point number Claude-Pierre Jeannerod Jean-Michel Muller Paul Zimmermann Inria, CNRS, ENS Lyon, Universit e de Lyon, Universit e de Lorraine France ARITH-25 June 2018 -1- Splitting a floating-point


  1. On various ways to split a floating-point number Claude-Pierre Jeannerod Jean-Michel Muller Paul Zimmermann Inria, CNRS, ENS Lyon, Universit´ e de Lyon, Universit´ e de Lorraine France ARITH-25 June 2018 -1-

  2. Splitting a floating-point number = ? X round(x) frac(x) 0 2 First bit of x = ufp(x) (for scalings) X All products are � b �� a computed exactly � 2 X � 2 with one FP � a 2 + b 2 → 2 k + multiplication 2 k 2 k X (Dekker product) X Dekker product (1971) -2-

  3. Splitting a floating-point number absolute splittings (e.g., ⌊ x ⌋ ), vs relative splittings (e.g., most significant bits, splitting of the significands for multiplication); no bit manipulations of the In each "bin", the sum is computed exactly binary representations (would result in less portable Matlab program in a paper by programs) → only FP Zielke and Drygalla (2003), operations. analysed and improved by Rump, Ogita, and Oishi (2008), reproducible summation, by Demmel & Nguyen. -3-

  4. Notation and preliminary definitions IEEE-754 compliant FP arithmetic with radix β , precision p , and extremal exponents e min and e max ; F = set of FP numbers. x ∈ F can be written � M � · β e , x = β p − 1 M , e ∈ Z , with | M | < β p and e min � e � e max , and | M | maximum under these constraints; significand of x : M · β − p +1 ; RN = rounding to nearest with some given tie-breaking rule (assumed to be either “to even” or “to away”, as in IEEE 754-2008); -4-

  5. Notation and preliminary definitions Definition 1 (classical ulp) The unit in the last place of t ∈ R is � β ⌊ log β | t |⌋− p +1 if | t | � β e min , ulp( t ) = β e min − p +1 otherwise. Definition 2 (ufp) The unit in the first place of t ∈ R is � β ⌊ log β | t |⌋ if t � = 0, ufp( t ) = 0 if t = 0. (introduced by Rump, Ogita and Oishi in 2007) -5-

  6. Notation and preliminary definitions exponent significand e x = 1.xxxxxxxxx . 2 e ufp(x) = 1.00000000 . 2 e ulp(x) = 0.00000001 . 2 Guiding thread of the talk: catastrophic cancellation is your friend. -6-

  7. Absolute splittings: 1. nearest integer Uses a constant C . Same operations as Fast2Sum, yet different assumptions. Algorithm 1 C 1.10000000000…0 Require: C , x ∈ F + x s ← RN( C + x ) x h ← RN( s − C ) s = 1.100 x ℓ ← RN( x − x h ) { optional } - C 1.10000000000…0 return x h { or ( x h , x ℓ ) } 0000 First occurrence we found: Hecker (1996) in radix 2 with C = 2 p − 1 or C = 2 p − 1 + 2 p − 2 . Use of latter constant referred to as the 1.5 trick. Theorem 3 Assume C integer with β p − 1 � C � β p . If β p − 1 − C � x � β p − C , then x h is an integer such that | x − x h | � 1 / 2 . Furthermore, x = x h + x ℓ . -7-

  8. Absolute splittings: 2. floor function An interesting question is to compute ⌊ x ⌋ , or more generally ⌊ x /β k ⌋ . Algorithm 2 Require: x ∈ F y ← RN( x − 0 . 5) C ← RN( β p − x ) s ← RN( C + y ) x h ← RN( s − C ) return x h Theorem 4 Assume β is even, x ∈ F , 0 � x � β p − 1 . Then Algorithm 2 returns x h = ⌊ x ⌋ . -8-

  9. Relative splittings expressing a precision- p FP number x as the exact sum of a ( p − s )-digit number x h and an s -digit number x ℓ ; first use with s = ⌊ p / 2 ⌋ (Dekker product, 1971) another use: s = p − 1 → x h is a power of β giving the order of magnitude of x . Two uses: evaluate ulp( x ) or ufp( x ). Useful functions in the error analysis of FP algorithms; -9-

  10. Relative splittings expressing a precision- p FP number x as the exact sum of a ( p − s )-digit number x h and an s -digit number x ℓ ; first use with s = ⌊ p / 2 ⌋ (Dekker product, 1971) another use: s = p − 1 → x h is a power of β giving the order of magnitude of x . Two uses: evaluate ulp( x ) or ufp( x ). Useful functions in the error analysis of FP algorithms; → exact information -9-

  11. Relative splittings expressing a precision- p FP number x as the exact sum of a ( p − s )-digit number x h and an s -digit number x ℓ ; first use with s = ⌊ p / 2 ⌋ (Dekker product, 1971) another use: s = p − 1 → x h is a power of β giving the order of magnitude of x . Two uses: evaluate ulp( x ) or ufp( x ). Useful functions in the error analysis of FP algorithms; → exact information power of β close to | x | : for scaling x , such a weaker condition suffices, and can be satisfied using fewer operations. -9-

  12. Relative splittings expressing a precision- p FP number x as the exact sum of a ( p − s )-digit number x h and an s -digit number x ℓ ; first use with s = ⌊ p / 2 ⌋ (Dekker product, 1971) another use: s = p − 1 → x h is a power of β giving the order of magnitude of x . Two uses: evaluate ulp( x ) or ufp( x ). Useful functions in the error analysis of FP algorithms; → exact information power of β close to | x | : for scaling x , such a weaker condition suffices, and can be satisfied using fewer operations. → approximate information -9-

  13. Veltkamp splitting x ∈ F and s < p → two FP numbers x h and x ℓ s.t. x = x h + x ℓ , with the significand of x h fitting in p − s digits, and the one of x ℓ in s digits ( s − 1 when β = 2 and s � 2). Remember: catastrophic can- cellation is your friend! Algorithm 3 Veltkamp’s splitting. s Require: C = β s + 1 and x in F - γ - xxxxxxxxxxxxxxxx + x γ ← RN( Cx ) δ ← RN( x − γ ) δ = - xxxxxxxxxxxxxxxx x h ← RN( γ + δ ) + γ xxxxxxxxxxxxxxxx x ℓ ← RN( x − x h ) 0000 return ( x h , x ℓ ) p - s Dekker (1971): radix 2 analysis, implicitly assuming no overflow; extended to any radix β by Linnainmaa (1981); works correctly even in the presence of underflows; Boldo (2006): Cx does not overflow ⇒ no other operation overflows. -10-

  14. Veltkamp splitting: FMA variant If an FMA instruction is available, we suggest the following variant, that requires fewer operations. Remarks Algorithm 4 FMA-based relative x ℓ obtained in parallel with x h splitting. even without an FMA, γ and Require: C = β s + 1 and x in F β s x can be computed in γ ← RN( Cx ) parallel, x h ← RN( γ − β s x ) the bounds on the numbers of x ℓ ← RN( Cx − γ ) digits of x h and x ℓ given by return ( x h , x ℓ ) Theorem 5 can be attained. Theorem 5 Let x ∈ F and s ∈ Z s.t. 1 � s < p. Barring underflow and overflow, Algorithm 4 computes x h , x ℓ ∈ F s.t. x = x h + x ℓ . If β = 2 , the significands of x h and x ℓ have at most p − s and s bits, respectively. If β > 2 then they have at most p − s + 1 and s + 1 digits, respectively. -11-

  15. Extracting a single bit (radix 2) computing ufp( x ) or ulp( x ), or scaling x ; Veltkamp’s splitting (Algorithm 3) to x with s = p − 1: the resulting x h has a 1-bit significand and it is nearest x in precision p − s = 1. For computing sign( x ) · ufp( x ), we can use the following algorithm, introduced by Rump (2009). Very rough explanation: Algorithm 5 q ≈ 2 p − 1 x + x Require: β = 2, ϕ = 2 p − 1 +1, ψ = r ≈ 2 p − 1 x 1 − 2 − p , and x ∈ F q ← RN( ϕ x ) → q − r ≈ x but in the massive r ← RN( ψ q ) cancellation we loose all bits δ ← RN( q − r ) but the most significant. return δ -12-

  16. Extracting a single bit (radix 2) These solutions raise the following issues. If | x | is large, then an overflow can occur in the first line of both Algorithms 3 and 5. To avoid overflow in Algorithm 5: scale it by replacing ϕ by 1 2 + 2 − p and returning 2 p δ at the end. However, this variant will not work for subnormal x . → to use Algorithm 5, we somehow need to check the order of magnitude of x . If we are only interested in scaling x , then requiring the exact value of ufp( x ) is overkill: one can get a power of 2 “close” to x with a cheaper algorithm. -13-

  17. Extracting a single bit (radix 2) Algorithm 6 sign( x ) · ulp H ( x ) for radix 2 and | x | > 2 e min . Require: β = 2, ψ = 1 − 2 − p , and x ∈ F a ← RN( ψ x ) δ ← RN( x − a ) return δ Theorem 6 If | x | > 2 e min , then Algorithm 6 returns � 1 2 ulp( x ) if | x | is a power of 2 , sign( x ) · ulp( x ) otherwise. Similar algorithm for ufp( x ), under the condition | x | < 2 e max − p +1 . -14-

  18. Underflow-safe and almost overflow-free scaling β = 2, p � 4; RN breaks ties “to even” or “to away”; η = 2 e min − p +1 : smallest positive element of F . Given a nonzero FP number x , compute a scaling factor δ s.t.: | x | /δ is much above the underflow threshold and much below the overflow threshold (so that, for example, we can safely square it); δ is an integer power of 2 ( → no rounding errors when multiplying or dividing by it). Algorithms proposed at the beginning of this talk: simple, but underflow or overflow can occur for many inputs x . -15-

  19. Underflow-safe and almost overflow-free scaling Following algorithm: underflow-safe and almost overflow-free in the sense that only the two extreme values x = ± (2 − 2 1 − p ) · 2 e max must be excluded. Algorithm 7 Require: β = 2, Φ = 2 − p + 2 − 2 p +1 , η = 2 e min − p +1 , and x ∈ F y ← | x | e ← RN(Φ y + η ) { or e ← RN(RN(Φ y ) + η ) without FMA } y s up ← RN( y + e ) δ ← RN( y s up − y ) return δ -16-

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend