efficient implementation of elementary functions in the
play

Efficient implementation of elementary functions in the - PowerPoint PPT Presentation

Efficient implementation of elementary functions in the medium-precision range Fredrik Johansson (LFANT, INRIA Bordeaux) 22nd IEEE Symposium on Computer Arithmetic (ARITH 22), Lyon, France, June 2015 1 / 27 Elementary functions Functions :


  1. Efficient implementation of elementary functions in the medium-precision range Fredrik Johansson (LFANT, INRIA Bordeaux) 22nd IEEE Symposium on Computer Arithmetic (ARITH 22), Lyon, France, June 2015 1 / 27

  2. Elementary functions Functions : exp, log, sin, cos, atan 2 / 27

  3. Elementary functions Functions : exp, log, sin, cos, atan My goal is to do interval arithmetic with arbitrary precision . 2 / 27

  4. Elementary functions Functions : exp, log, sin, cos, atan My goal is to do interval arithmetic with arbitrary precision . Input : floating-point number x = a · 2 b , precision p ≥ 2 Output : m , r with f ( x ) ∈ [ m − r , m + r ] and r ≈ 2 − p | f ( x ) | 2 / 27

  5. Precision ranges Hardware precision ( n ≈ 53 bits) ◮ Extensively studied - elsewhere! 3 / 27

  6. Precision ranges Hardware precision ( n ≈ 53 bits) ◮ Extensively studied - elsewhere! Medium precision ( n ≈ 100 - 10 000 bits) ◮ Multiplication costs M ( n ) = O ( n 2 ) or O ( n 1 . 6 ) ◮ Argument reduction + rectangular splitting: O ( n 1 / 3 M ( n )) ◮ In the lower range, software overhead is significant 3 / 27

  7. Precision ranges Hardware precision ( n ≈ 53 bits) ◮ Extensively studied - elsewhere! Medium precision ( n ≈ 100 - 10 000 bits) ◮ Multiplication costs M ( n ) = O ( n 2 ) or O ( n 1 . 6 ) ◮ Argument reduction + rectangular splitting: O ( n 1 / 3 M ( n )) ◮ In the lower range, software overhead is significant Very high precision ( n ≫ 10 000 bits) ◮ Multiplication costs M ( n ) = O ( n log n log log n ) ◮ Asymptotically fast algorithms: binary splitting, arithmetic-geometric mean (AGM) iteration: O ( M ( n ) log( n )) 3 / 27

  8. Improvements in this work 1. The low-level mpn layer of GMP is used exclusively ◮ Most libraries (e.g. MPFR) use more convenient types, e.g. mpz and mpfr , to implement elementary functions 4 / 27

  9. Improvements in this work 1. The low-level mpn layer of GMP is used exclusively ◮ Most libraries (e.g. MPFR) use more convenient types, e.g. mpz and mpfr , to implement elementary functions 2. Argument reduction based on lookup tables ◮ Old idea, not well explored in high precision 4 / 27

  10. Improvements in this work 1. The low-level mpn layer of GMP is used exclusively ◮ Most libraries (e.g. MPFR) use more convenient types, e.g. mpz and mpfr , to implement elementary functions 2. Argument reduction based on lookup tables ◮ Old idea, not well explored in high precision 3. Faster evaluation of Taylor series ◮ Optimized version of Smith’s rectangular splitting algorithm ◮ Takes advantage of mpn level functions 4 / 27

  11. Recipe for elementary functions exp( x ) sin( x ) , cos( x ) log(1 + x ) atan( x ) ↓ Domain reduction using π and log(2) ↓ x ∈ [0 , log(2)) x ∈ [0 , π/ 4) x ∈ [0 , 1) x ∈ [0 , 1) 5 / 27

  12. Recipe for elementary functions exp( x ) sin( x ) , cos( x ) log(1 + x ) atan( x ) ↓ Domain reduction using π and log(2) ↓ x ∈ [0 , log(2)) x ∈ [0 , π/ 4) x ∈ [0 , 1) x ∈ [0 , 1) ↓ Argument-halving r ≈ 8 times exp( x ) = [exp( x / 2)] 2 log(1 + x ) = 2 log( √ 1 + x ) ↓ x ∈ [0 , 2 − r ) ↓ Taylor series 5 / 27

  13. Better recipe at medium precision exp( x ) sin( x ) , cos( x ) log(1 + x ) atan( x ) ↓ Domain reduction using π and log(2) ↓ x ∈ [0 , log(2)) x ∈ [0 , π/ 4) x ∈ [0 , 1) x ∈ [0 , 1) ↓ Lookup table with 2 r ≈ 2 8 entries exp( t + x ) = exp( t ) exp( x ) log(1 + t + x ) = log(1 + t ) + log(1 + x / (1 + t )) ↓ x ∈ [0 , 2 − r ) ↓ Taylor series 6 / 27

  14. Argument reduction formulas What we want to compute: f ( x ) , x ∈ [0 , 1) Table size: q = 2 r t = i / q , i = ⌊ 2 r x ⌋ Precomputed value: f ( t ) , y ∈ [0 , 2 − r ) Remaining value to compute: f ( y ) , 7 / 27

  15. Argument reduction formulas What we want to compute: f ( x ) , x ∈ [0 , 1) Table size: q = 2 r t = i / q , i = ⌊ 2 r x ⌋ Precomputed value: f ( t ) , y ∈ [0 , 2 − r ) Remaining value to compute: f ( y ) , exp( x ) = exp( t ) exp( y ) , y = x − i / q sin( x ) = sin( t ) cos( y ) + cos( t ) sin( y ) , y = x − i / q cos( x ) = cos( t ) cos( y ) − sin( t ) sin( y ) , y = x − i / q 7 / 27

  16. Argument reduction formulas What we want to compute: f ( x ) , x ∈ [0 , 1) Table size: q = 2 r t = i / q , i = ⌊ 2 r x ⌋ Precomputed value: f ( t ) , y ∈ [0 , 2 − r ) Remaining value to compute: f ( y ) , exp( x ) = exp( t ) exp( y ) , y = x − i / q sin( x ) = sin( t ) cos( y ) + cos( t ) sin( y ) , y = x − i / q cos( x ) = cos( t ) cos( y ) − sin( t ) sin( y ) , y = x − i / q log(1 + x ) = log(1 + t ) + log(1 + y ) , y = ( qx − i ) / ( i + q ) atan( x ) = atan( t ) + atan( y ) , y = ( qx − i ) / ( ix + q ) 7 / 27

  17. Optimizing lookup tables m = 2 tables with 2 5 + 2 5 entries gives same reduction as m = 1 table with 2 10 entries Function Precision Entries Size (KiB) m r exp ≤ 512 1 8 178 11.125 exp ≤ 4608 2 5 23+32 30.9375 sin ≤ 512 1 8 203 12.6875 sin ≤ 4608 2 5 26+32 32.625 cos ≤ 512 1 8 203 12.6875 cos ≤ 4608 2 5 26+32 32.625 log ≤ 512 2 7 128+128 16 log ≤ 4608 2 5 32+32 36 atan ≤ 512 1 8 256 16 atan ≤ 4608 2 5 32+32 36 Total 236.6875 8 / 27

  18. Taylor series Logarithmic series : 3 x 3 + 1 5 x 5 − 1 7 x 7 + . . . atan( x ) = x − 1 log(1 + x ) = 2 atanh( x / ( x + 2)) With x < 2 − 10 , need 230 terms for 4600-bit precision 9 / 27

  19. Taylor series Logarithmic series : 3 x 3 + 1 5 x 5 − 1 7 x 7 + . . . atan( x ) = x − 1 log(1 + x ) = 2 atanh( x / ( x + 2)) With x < 2 − 10 , need 230 terms for 4600-bit precision Exponential series : 2! x 2 + 1 3! x 3 + 1 4! x 4 + . . . exp( x ) = 1 + x + 1 3! x 3 + 1 5! x 5 − . . . , 2! x 2 + 1 4! x 4 − . . . sin( x ) = x − 1 cos( x ) = 1 − 1 With x < 2 − 10 , need 280 terms for 4600-bit precision 9 / 27

  20. Taylor series Logarithmic series : 3 x 3 + 1 5 x 5 − 1 7 x 7 + . . . atan( x ) = x − 1 log(1 + x ) = 2 atanh( x / ( x + 2)) With x < 2 − 10 , need 230 terms for 4600-bit precision Exponential series : 2! x 2 + 1 3! x 3 + 1 4! x 4 + . . . exp( x ) = 1 + x + 1 3! x 3 + 1 5! x 5 − . . . , 2! x 2 + 1 4! x 4 − . . . sin( x ) = x − 1 cos( x ) = 1 − 1 With x < 2 − 10 , need 280 terms for 4600-bit precision � 1 − sin 2 ( x ) Above 300 bits: cos( x ) = � 1 + sinh 2 ( x ) Above 800 bits: exp( x ) = sinh( x ) + 9 / 27

  21. Evaluating Taylor series using rectangular splitting Paterson and Stockmeyer, 1973: i =0 � x i in O ( n ) cheap steps + O ( n 1 / 2 ) expensive steps � n 10 / 27

  22. Evaluating Taylor series using rectangular splitting Paterson and Stockmeyer, 1973: i =0 � x i in O ( n ) cheap steps + O ( n 1 / 2 ) expensive steps � n � x 2 � x 3 ( + � x + + ) + � � x 2 � x 3 x 4 ( � + � x + + ) + � x 2 � x 3 x 8 ( � + � x + + ) + � x 2 � x 3 x 12 ( + + + ) � � x 10 / 27

  23. Evaluating Taylor series using rectangular splitting Paterson and Stockmeyer, 1973: i =0 � x i in O ( n ) cheap steps + O ( n 1 / 2 ) expensive steps � n � x 2 � x 3 ( + � x + + ) + � � x 2 � x 3 x 4 ( � + � x + + ) + � x 2 � x 3 x 8 ( � + � x + + ) + � x 2 � x 3 x 12 ( + + + ) � � x ◮ Smith, 1989: elementary and hypergeometric functions 10 / 27

  24. Evaluating Taylor series using rectangular splitting Paterson and Stockmeyer, 1973: i =0 � x i in O ( n ) cheap steps + O ( n 1 / 2 ) expensive steps � n � x 2 � x 3 ( + � x + + ) + � � x 2 � x 3 x 4 ( � + � x + + ) + � x 2 � x 3 x 8 ( � + � x + + ) + � x 2 � x 3 x 12 ( + + + ) � � x ◮ Smith, 1989: elementary and hypergeometric functions ◮ Brent & Zimmermann, 2010: improvements to Smith 10 / 27

  25. Evaluating Taylor series using rectangular splitting Paterson and Stockmeyer, 1973: i =0 � x i in O ( n ) cheap steps + O ( n 1 / 2 ) expensive steps � n � x 2 � x 3 ( + � x + + ) + � � x 2 � x 3 x 4 ( � + � x + + ) + � x 2 � x 3 x 8 ( � + � x + + ) + � x 2 � x 3 x 12 ( + + + ) � � x ◮ Smith, 1989: elementary and hypergeometric functions ◮ Brent & Zimmermann, 2010: improvements to Smith ◮ FJ, 2014: generalization to D-finite functions 10 / 27

  26. Evaluating Taylor series using rectangular splitting Paterson and Stockmeyer, 1973: i =0 � x i in O ( n ) cheap steps + O ( n 1 / 2 ) expensive steps � n � x 2 � x 3 ( + � x + + ) + � � x 2 � x 3 x 4 ( � + � x + + ) + � x 2 � x 3 x 8 ( � + � x + + ) + � x 2 � x 3 x 12 ( + + + ) � � x ◮ Smith, 1989: elementary and hypergeometric functions ◮ Brent & Zimmermann, 2010: improvements to Smith ◮ FJ, 2014: generalization to D-finite functions ◮ New: optimized algorithm for elementary functions 10 / 27

  27. Logarithmic series Rectangular splitting: x + 1 2 x 2 + x 3 � 1 3 + 1 4 x + 1 5 x 2 + x 3 � 1 6 + 1 7 x + 1 8 x 2 �� 11 / 27

  28. Logarithmic series Rectangular splitting: x + 1 2 x 2 + x 3 � 1 3 + 1 4 x + 1 5 x 2 + x 3 � 1 6 + 1 7 x + 1 8 x 2 �� Improved algorithm with fewer divisions: x + 1 10+ 1 � 30 x 2 + x 3 � 20+15 x +12 x 2 + x 3 � � � 8 x +7 x 2 ����� 60 60 56 11 / 27

  29. Exponential series Rectangular splitting: 1+ x +1 x 2 +1 1+1 x +1 x 2 +1 1+1 x +1 � 3 x 3 � � � 6 x 3 � � 8 x 2 ������ 2 4 5 7 12 / 27

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend