Performance evaluation of an efficient double-double BLAS1 function - PowerPoint PPT Presentation

Performance evaluation of an efficient double-double BLAS1 function with error-free transformation and its application to explicit extrapolation methods Tomonori Kouya Shizuoka Institute of Science and Technology kouya.tomonori@sist.ac.jp ARITH26 June 10 – 12, 2019@Kyoto, Japan

Outline 1. Summary 2. Extrapolation to solve ODE 3. Error-free Transformation and its application 4. Numerical Experiments 5. Conclusion and future work

Summary ▶ For initial value problems of ordinary differential equations (ODEs), we want to obtain more precise double precision numerical solutions more quickly than when using double-double (DD) precision arithmetic. ▶ We have implemented lighter and accurate BLAS1 functions with EFT and used them to explicit extrapolation methods. ⇓ The presented routines can be effective for a large system of linear ODE and for small nonlinear ODE, especially when a harmonic sequence is used.

Initial value problem of ordinary differential equation Initial value problem of Ordinary Differential Equation (ODE for short) to be solved: d y dt = f ( t, y ) ∈ R n . (1) y (0) = y 0 Integration interval : [0 , t end ] ⇓ We compute y next ≈ y ( t next ) at each t next ∈ [0 , t end ] from y old ≈ y ( t old ) .

Extrapolation for ODE: Bulirsch-Stoer Algorithm Give a support sequence { w i } ， max. number of stages L ， relative tolerance ε R and absolute tolerance ε A . Support sequences: 2 , 4 , 8 , ..., 2 i , ... ⇒ Stable but Slow Romberg: Harmonic: 2 , 4 , 6 , 8 , ..., 2( i + 1) , ... ⇒ Unstable but Fast Process to calculate initial sequence: T i 1 ( i = 1 , 2 , ..., L ) : 1. h := ( t next − t old ) /w i − → t k := t old + kh ∈ [ t old , t next ] 2. t 0 := t old , y 0 ≈ y ( t 0 ) 3. Explicit Euler Method y 1 := y 0 + h f ( t 0 , y 0 ) 4. Explicit midpoint method to get y 2 , y 3 , ... y w i y k +1 := y k − 1 + 2 h f ( t k , y k ) ( k = 1 , 2 , ..., w i − 1) 5. Set the initial sequence for extrapolation: S ( h/w i ) := y w i

Extrapolation for ODE: Bulliursh-Stoer Algorithm (cont.) 1. T 11 := S ( h/w 1 ) 2. i = 2 , ..., L T i 1 := S ( h/w i ) For j = 2 , ..., i Extrapolation to get better approximation: ) − 1 (( ) 2 w i R ij := − 1 ( T i,j − 1 − T i − 1 ,j − 1 ) w i − j +1 T ij := T i,j − 1 + R ij Check convergence status if : ∥ R ij ∥ ≤ ε R ∥ T i,j − 1 ∥ + ε A (2) − → y next := T ij 3. y next := T LL if not converge

Application of EFT: FMA with error cf. S.Boldo & J-M. Muller ( s , e 1 , e 2 ) := FMAerror( a , x , y ) s := FMA ( a, x, y ) = ax + y ( u 1 , u 2 ) := TwoProd ( a, x ) ( α 1 , α 2 ) := TwoSum ( y, u 2 ) ( β 1 , β 2 ) := TwoSum ( u 1 , α 1 ) γ := β 1 ⊖ s ⊕ β 2 ( e 1 , e 2 ) := QuickTwoSum ( γ, α 2 ) return ( s , e 1 , e 2 ) s + e 1 + e 2 = ax + y where s = a ⊗ x ⊕ y | e 1 + e 2 | = 1 2 u | s | ( u is unit of round-off error ) | e 2 | = 1 2 u | e 1 |

Application of EFT2: FMA with error approximated cf. S.Boldo & J-M. Muller ( s , e ) := FMAerrorApprox( a , x , y ) s := FMA ( a, x, y ) ( u 1 , u 2 ) := TwoProd ( a, x ) ( α 1 , α 2 ) := TwoSum ( y, u 1 ) γ := α 1 ⊖ s e := ( u 2 ⊕ α 2 ) ⊕ γ return ( s , e ) When IEEE754 double precision arithmetic is used in FMAerrorApprox, the error bound is provided as | ( s + e ) − ( ax + b ) | ≤ 7 · 2 − 105 | s | .

Application of EFT: BLAS1 with error y := AXPY ( α, x , y ) y := α ⊗ x ⊕ y return y ⇓ ( y , e y ) := AXPYerror ( α, e α , x , e x , y , e y ) ( y , e 1 , e 2 ) := FMAerror ( α, x , y ) e y := e 1 ⊕ e 2 ⊕ α ⊗ e x ⊕ e α ⊗ x ⊕ e y return ( y , e y ) or ( y , e y ) := AXPYerrorA ( α, e α , x , e x , y , e y ) ( y , e ) := FMAerrorApprox ( α, x , y ) e y := e ⊕ α ⊗ e x ⊕ e α ⊗ x ⊕ e y return ( y , e y )

Application of EFT: BLAS1 with error x := SCAL ( α, x ) x := α ⊗ x return x ⇓ ( x , e x ) := SCALerror ( α, e α , x , e x ) ( w 1 , w 2 ) := TwoProd ( α, x ) w 2 := α ⊗ e x ⊕ e α ⊗ ( x ⊕ e x ) ⊕ w 2 ( x , e x ) := QuickTwoSum ( w 1 , w 2 ) return ( x , e x )

Extrapolation with EFT Approximation = ⇒ (Approximation, its error) f ( t k , y k ) := f k = ⇒ f ( t k + e t k , y k + e y k ) = f k + e f k Explicit Euler Method y 1 := y 0 + h f 0 ⇓ ( y 1 , e y 1 ) := ( y 0 , e y 0 ) ( y 1 , e y 1 ) := AXPYerror ( h, e h , f 0 , e f 0 , y 1 , e y 1 ) or := AXPYerrorA ( h, e h , f 0 , e f 0 , y 1 , e y 1 ) Explicit midpoint method y k +1 := y k − 1 + 2 h f k ( k = 1 , 2 , ..., w i − 1) ⇓ ( y k +1 , e y k +1 ) := ( y k − 1 , e y k − 1 ) ( y k +1 , e y k +1 ) := AXPYerror (2 ⊗ h, 2 ⊗ e h , f k , e f k , y k +1 , e y k +1 ) or := AXPYerrorA (2 ⊗ h, 2 ⊗ e h , f k , e f k , y k +1 , e y k +1 ) ( k = 1 , 2 , ..., w i − 1)

Extrapolation with EFT (cont.) Extrapolation Process Preliminary (DD): ( c ij , e c ij ) := 1 / (( w i /w i − j +1 ) 2 − 1) ( R ij , e R ij ) := ( T i,j − 1 , e T i,j − 1 ) ( T ij , e T ij ) := ( T i,j − 1 , e T i,j − 1 ) ( R ij , e R ij ) := AXPYerror ( − 1 , 0 , T i − 1 ,j − 1 , e T i − 1 ,j − 1 , R ij , e R ij ) or := AXPYerrorA ( − 1 , 0 , T i − 1 ,j − 1 , e T i − 1 ,j − 1 , R ij , e R ij ) ( R ij , e R ij ) := SCALerror ( c ij , e c ij , R ij , e R ij ) ( T ij , e T ij ) := AXPYerror (1 , 0 , R ij , e R ij , T ij , e T ij ) or := AXPYerrorA (1 , 0 , R ij , e R ij , T ij , e T ij )

Møller method The Møller method is proposed to reduce the accumulation of round-off errors incurred during the approximation of IVPs of ODEs and is a type of compensated summation. For the original summation S i := S i − 1 + z i − 1 , we compute it as follows: s i := z i − 1 ⊖ R i − 1 ( R 0 = 0) S i := S i − 1 ⊕ s i r i := S i ⊖ S i − 1 R i := r i ⊖ s i . ⇓ s i := z i − 1 ⊕ R ′ i − 1 ( R ′ 0 = 0) (3) ( S i , R ′ i ) := QuickTwoSum ( S i − 1 , s i ) .

Computing environment Ryzen AMD Ryzen 1700 (2.7 GHz), Ubuntu 16.04.5, GCC 5.4.0, QD 2.3.18[7], LAPACK 3.8.0. Corei7 Intel Core i7-9700K (3.6GHz), Ubuntu 18.04.2, GCC 7.3.0, QD 2.3.20, LAPACK 3.8.0.

Targetted algorithms Our targets of precision are IEEE754 double precision (Double) and DD provided by the QD library. The targeted algorithms are as follows: DEFT Double precision and AXPYerror Double precision, f + e f , and AXPYerrorA DEFTA DMøller Double precision Møller method. DEFTA means the usage of the FMAerrorA in the entire extrapolation process. For DEFT, DEFTA and DD computations, we used DD precision f . ▶ To check for convergence (4), ∥ R ij ∥ ≤ ε R ∥ T i,j − 1 ∥ + ε A (4) we used ε R = ε A = 0 unless otherwise specified. ▶ All EFT basic functions were coded as C macros.

Numerical experiments 1.       y 1 − y 1 exp( − t ) d . . . . . .  =  = ⇒ y ( t ) =       . . . dt     y n − ny n exp( − nt ) y (0) = [1 1 · · · 1] T , t ∈ [0 , 1 / 4] , n = 2048 . 2. [ y 1 ] [ ] d y 2 = − αy 2 dt y 2 1 sin t + 2 αy 1 y 2 cos t y (0) = [1 α ] T , t ∈ [0 , 37] where α = 0 . 99999999 . The analytical solution is [ ] 1 / (1 − α sin t ) y ( t ) = . α cos t/ (1 − α sin t ) 2

Problem 1: Simple Linear ODE       y 1 − y 1 exp( − x ) y 2 − 2 y 2 exp( − 2 x ) d        =  = ⇒ y ( t ) =  .   .   .  . . . dt       . . .     y n − ny n exp( − nx ) y (0) = [1 1 · · · 1] T , t ∈ [0 , 1 / 4] , n = 2048 .

Problem 1: Simple Linear ODE Romberg sequence: L = 4 at t end = 1 / 4 L = 4 Computational time (s) on Ryzen #steps DD DEFT DEFTA Double DMøller 512 1.79 1.41 0.99 0.2 0.33 1024 3.59 2.81 1.95 0.41 0.67 2048 7.18 5.64 3.82 0.81 1.33 4096 14.4 11.3 7.58 1.62 2.66 #steps Computational time (s) on Corei7 512 1.17 0.86 0.73 0.1 0.26 1024 2.33 1.69 1.47 0.21 0.52 2048 4.64 3.39 2.92 0.41 1.04 4096 9.34 6.75 5.87 0.82 2.07 #steps Max. Relative Error 512 1.8E-07 1.8E-07 1.8E-07 1.8E-07 1.8E-07 1024 1.2E-10 1.2E-10 1.2E-10 1.2E-10 1.2E-10 2048 9.3E-14 9.3E-14 9.3E-14 1.5E-13 9.4E-14 4096 8.2E-17 4.6E-16 4.6E-16 2.3E-13 4.3E-14

Problem 1: Simple Linear ODE Harmonic sequence: L = 6 at t end = 1 / 4 L = 6 Computational Time (s) on Ryzen #steps DD DEFT DEFTA Double DMøller 512 1.87 1.76 1.42 0.28 0.4 1024 3.74 3.53 2.84 0.55 0.81 2048 7.48 6.93 5.58 1.11 1.62 4096 14.9 10.4 8.38 2.22 3.24 #steps Computational Time (s) on Corei7 512 1.4 1.04 0.89 0.1 0.26 1024 2.8 2.07 1.78 0.21 0.52 2048 5.6 4.11 3.5 0.41 1.04 4096 11.2 6.17 5.27 0.82 2.07 #steps Max. Relative Error 512 4.3E-10 4.3E-10 4.3E-10 4.3E-10 4.3E-10 1024 1.7E-14 2.7E-14 2.7E-14 7.1E-13 6.6E-13 2048 8.4E-19 1.3E-14 1.3E-14 9.2E-13 7.2E-13 4096 4.6E-23 5.5E-15 5.5E-15 1.0E-12 7.6E-13

Problem 2: Resonance problem We pick up the following resonance problem that is necessary to control step sizes. [ y 1 d ] [ y 2 ] = − αy 2 y 2 1 sin t + 2 αy 1 y 2 cos t dt y (0) = [1 α ] T , t ∈ [0 , 37] where α = 0 . 99999999 . The analytical solution is [ y 1 ] [ ] 1 / (1 − α sin t ) = . α cos t/ (1 − α sin t ) 2 y 2 The algorithm of step size control is the same one proposed in Murofushi and Nagasaka[4], wherein the current step size is halved if the convergent condition (4) is not satisfied. The maximum stages are L = 12 for Romberg sequence and L = 18 for harmonic sequence as recommended in [4].

Performance evaluation of an efficient double-double BLAS1 function - PowerPoint PPT Presentation

Performance evaluation of an efficient double-double BLAS1 function with error-free transformation and its application to explicit extrapolation methods Tomonori Kouya Shizuoka Institute of Science and Technology kouya.tomonori@sist.ac.jp

More Java Graphics Shape Classes: Face Check out Faces from SVN Finish Java Graphics: text and

Chapter 12. Evaluation Research Chapter 12. Evaluation Research evaluation research? evaluation

User Interface Evaluation Empirical evaluation Heuristic evaluation 1 CS 349 - UI evaluation

What is a performance evaluation? Performance Management v. Performance Evaluation Evaluation

Telematics 2 & Performance Evaluation Chapter 4 Introduction to Performance Evaluation

Names Quattro S Double A Double S Double C Triple C Quattro C Variations All Boxer models

Double Chooz Experiment Status Double Chooz Experiment Status Jelena Maricic, Drexel University

Model Evaluation Model Evaluation Metrics for Performance Evaluation How to evaluate the

TAILORING YOUR DOUBLE OPT-IN EMAILS 14 February 2018 AGENDA DOUBLE OPT-IN MECHANISM 1. 2.

Double, Multiple, and Sequential Sampling Double-sampling In a double-sampling plan, a first

Double Entry System Session 04 Session Outline Double Entry System Normal Account

Double Dose of Double-Negation Translations Olivier Hermant CRI, MINES ParisTech June 2, 2014

4. Performance Analysis of Parallel Programs 4.1 Performance Evaluation of Computer User

Energy-efficient & High-performance Energy-efficient & High-performance Instruction Fetch

A High- A High -Performance Area Performance Area- -Efficient Efficient AES Cipher on a Many

Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Progress on

Richardsons Extrapolation Suppose h = 0 we have a formula N 1 ( h ) that approximates an

Richardson extrapolation and higher order QMC Takashi Goda School of Engineering, University of

Usefulness of a Carbon target in DUNE ND : first thoughts DUNE ND meeting 15 May 2019

Extracting neutrino oscillation parameters using a simultaneous fit of the e appearance and

4. Numerical Quadrature: Where analytical abilities end . . . 4. Numerical Quadrature: Where

Channels & Keyframes CSE169: Computer Animation Instructor: Steve Rotenberg UCSD, Spring

Vert Vertical ical wi wind nd pro profil file: e: Assessme Assessment nt of reg of

Numerical differentiation & How truncation error and Richardson extrapolation floating

Performance evaluation of an efficient double-double BLAS1 function - PowerPoint PPT Presentation

Performance evaluation of an efficient double-double BLAS1 function with error-free transformation and its application to explicit extrapolation methods Tomonori Kouya Shizuoka Institute of Science and Technology kouya.tomonori@sist.ac.jp

More Java Graphics Shape Classes: Face Check out Faces from SVN Finish Java Graphics: text and

Chapter 12. Evaluation Research Chapter 12. Evaluation Research evaluation research? evaluation

User Interface Evaluation Empirical evaluation Heuristic evaluation 1 CS 349 - UI evaluation

What is a performance evaluation? Performance Management v. Performance Evaluation Evaluation

Telematics 2 &amp; Performance Evaluation Chapter 4 Introduction to Performance Evaluation

Names Quattro S Double A Double S Double C Triple C Quattro C Variations All Boxer models

Double Chooz Experiment Status Double Chooz Experiment Status Jelena Maricic, Drexel University

Model Evaluation Model Evaluation Metrics for Performance Evaluation How to evaluate the

TAILORING YOUR DOUBLE OPT-IN EMAILS 14 February 2018 AGENDA DOUBLE OPT-IN MECHANISM 1. 2.

Double, Multiple, and Sequential Sampling Double-sampling In a double-sampling plan, a first

Double Entry System Session 04 Session Outline Double Entry System Normal Account

Double Dose of Double-Negation Translations Olivier Hermant CRI, MINES ParisTech June 2, 2014

4. Performance Analysis of Parallel Programs 4.1 Performance Evaluation of Computer User

Energy-efficient &amp; High-performance Energy-efficient &amp; High-performance Instruction Fetch

A High- A High -Performance Area Performance Area- -Efficient Efficient AES Cipher on a Many

Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Progress on

Richardsons Extrapolation Suppose h = 0 we have a formula N 1 ( h ) that approximates an

Richardson extrapolation and higher order QMC Takashi Goda School of Engineering, University of

Usefulness of a Carbon target in DUNE ND : first thoughts DUNE ND meeting 15 May 2019

Extracting neutrino oscillation parameters using a simultaneous fit of the e appearance and

4. Numerical Quadrature: Where analytical abilities end . . . 4. Numerical Quadrature: Where

Channels &amp; Keyframes CSE169: Computer Animation Instructor: Steve Rotenberg UCSD, Spring

Vert Vertical ical wi wind nd pro profil file: e: Assessme Assessment nt of reg of

Numerical differentiation &amp; How truncation error and Richardson extrapolation floating

Telematics 2 & Performance Evaluation Chapter 4 Introduction to Performance Evaluation

Energy-efficient & High-performance Energy-efficient & High-performance Instruction Fetch

Channels & Keyframes CSE169: Computer Animation Instructor: Steve Rotenberg UCSD, Spring

Numerical differentiation & How truncation error and Richardson extrapolation floating