AM P A R CudA Multiple Precision ARithmetic librarY When do - PowerPoint PPT Presentation

Implementation and performance evaluation of an extended precision floating-point arithmetic library for high-accuracy semidefinite programming Mioara Joldes, Jean-Michel Muller and Valentina Popescu ARITH 24 July 2017 AM P A R CudA Multiple Precision ARithmetic librarY

When do we need more precision? 1 / 14

When do we need more precision? Computing correctly rounded transcendental functions (ex. CRLIBM). 1 / 14

When do we need more precision? Computing correctly rounded transcendental functions (ex. CRLIBM). Dynamical systems field: compute periodic orbits (e.g., finding sinks in the 0.4 H´ enon map, iterating the Lorenz attractor), 0.2 x2 0 celestial mechanics (e.g., long term stability of the solar -0.2 system). -0.4 -1.5 -1 -0.5 0 0.5 1 1.5 x1 1 / 14

When do we need more precision? Computing correctly rounded transcendental functions (ex. CRLIBM). Dynamical systems field: compute periodic orbits (e.g., finding sinks in the 0.4 H´ enon map, iterating the Lorenz attractor), 0.2 x2 0 celestial mechanics (e.g., long term stability of the solar -0.2 system). -0.4 -1.5 -1 -0.5 0 0.5 1 1.5 x1 Optimization problems in experimental mathematics: computation of kissing numbers, bounds for binary codes, problems in control theory and structural design (e.g., the wing of Airbus A380). problems in quantum chemistry/information, etc. ⇒ solved using Semi-Definite Programming (SDP) 1 / 14

Outline Overview on Semi-Definite Programing SDPA-CAMPARY Performance and Numerical Results

What is SDP? convex optimization problem; extension of linear programming; applied to the cone of symmetric matrices with non-negative eigenvalues; the linear vector inequalities are replaced by linear matrix inequalities (LMI). 2 / 14

Formal definition – R n × n the space of size n × n real matrices; – S n ⊂ R n × n the subspace of real symmetric matrices, equipped with the inner product � A, B � S n = tr( A T B ) , where tr( A ) is the trace of A ; – A � O denotes a positive semidefinite matrix A . p ∗ = sup X ∈ S n � C, X � S n (P) s.t. � A i , X � S n = b i , i = 1 , . . . , m, X � O, d ∗ = inf y ∈ R m b T y (D) m � s.t. Y := y i A i − C � O, i =1 for given C, A i ∈ S n × n , i = 1 , . . . , m and b ∈ R m . Classical solving: Primal Dual Interior Point Method (PDIPM) Algorithm. 3 / 14

Existing SDP solvers in double -precision: SeDuMi, SDPT3, CSDP, MOSEK (proprietary software); exact rational arithmetic: SPECTRA; uses interval arithmetic: VSDP; supports higher extended precision: SDPA Family (DD, QD and GMP versions). 4 / 14

Existing SDP solvers in double -precision: SeDuMi, SDPT3, CSDP, MOSEK (proprietary software); exact rational arithmetic: SPECTRA; uses interval arithmetic: VSDP; supports higher extended precision: SDPA Family (DD, QD and GMP versions). SDPA features written in C/C++; starting with v6.0 it incorporates LAPACK for dense matrix computations; more recently it integrated MPACK (multiple-precision linear algebra package based on BLAS and LAPACK); MPACK also offers a GPU tuned implementation in double-double of the Rgemm routine. 4 / 14

What is CAMPARY? CudA Multiple-Precision ARithmetic librarY 5 / 14

What is CAMPARY? CudA Multiple-Precision ARithmetic librarY uses the multiple-term approach for extending the available precision → floating-point expansions; moderate arbitrary precision –few hundred bits– 5 / 14

What is CAMPARY? CudA Multiple-Precision ARithmetic librarY uses the multiple-term approach for extending the available precision → floating-point expansions; moderate arbitrary precision –few hundred bits– targets both CPU and GPU (compilers: GCC, NVCC) underlying FP format: binary32 (up to 12 terms) or binary64 (up to 39 terms) 5 / 14

What is CAMPARY? CudA Multiple-Precision ARithmetic librarY uses the multiple-term approach for extending the available precision → floating-point expansions; moderate arbitrary precision –few hundred bits– targets both CPU and GPU (compilers: GCC, NVCC) underlying FP format: binary32 (up to 12 terms) or binary64 (up to 39 terms) ◦ sequential algorithms: all basic operations ( + / − , × , ÷ , √ ) accurate algorithms - tight error bound “quick-and-dirty” algorithms - does not consider corner cases ⋆ optimized algorithms for double-word arithmetic 5 / 14

What is CAMPARY? CudA Multiple-Precision ARithmetic librarY uses the multiple-term approach for extending the available precision → floating-point expansions; moderate arbitrary precision –few hundred bits– targets both CPU and GPU (compilers: GCC, NVCC) underlying FP format: binary32 (up to 12 terms) or binary64 (up to 39 terms) ◦ sequential algorithms: all basic operations ( + / − , × , ÷ , √ ) accurate algorithms - tight error bound “quick-and-dirty” algorithms - does not consider corner cases ⋆ optimized algorithms for double-word arithmetic ◦ GPU-tuned parallel algorithms: + / − , × 5 / 14

What is CAMPARY? CudA Multiple-Precision ARithmetic librarY uses the multiple-term approach for extending the available precision → floating-point expansions; moderate arbitrary precision –few hundred bits– targets both CPU and GPU (compilers: GCC, NVCC) underlying FP format: binary32 (up to 12 terms) or binary64 (up to 39 terms) ◦ sequential algorithms: all basic operations ( + / − , × , ÷ , √ ) accurate algorithms - tight error bound “quick-and-dirty” algorithms - does not consider corner cases ⋆ optimized algorithms for double-word arithmetic ◦ GPU-tuned parallel algorithms: + / − , × thorough correctness proofs and error analysis 5 / 14

Integrating CAMPARY with MPACK Reminder: MPACK provides a GPU tuned implementation in double-double (DD) for matrix multiplication. 6 / 14

Integrating CAMPARY with MPACK Reminder: MPACK provides a GPU tuned implementation in double-double (DD) for matrix multiplication. 1. we replaced the underlying arithmetic for all CPU routines in DD; 6 / 14

Integrating CAMPARY with MPACK Reminder: MPACK provides a GPU tuned implementation in double-double (DD) for matrix multiplication. 1. we replaced the underlying arithmetic for all CPU routines in DD; 2. we re-implemented the GPU tuned Rgemm using CAMPARY: – classical blocking algorithm is employed; – for each element of a block a thread is created; – a specific number of threads is allocated per block also; – shared memory is used for each block; – reading is done from global memory. 6 / 14

18000 [25] CAMPARY 16000 14000 12000 10000 MFLOPs 8000 6000 4000 2000 0 0 500 1000 1500 2000 Dimension Performance of RGEMM with CAMPARY vs [Nakata2012] in DD on GPU. Max. performance: – 14 . 8 GFlops for CAMPARY, – 16 . 4 GFlops for [Nakata2012]. 7 / 14

1600 3D 4D 5D 6D 1400 8D 1200 1000 MFLOPs 800 600 400 200 0 0 100 200 300 400 500 600 700 800 900 1000 Dimension Performance of RGEMM with CAMPARY for n -double on GPU. Max. performance: – 1 . 6 GFlops for 3D, – 976 MFlops for 4D, – 660 MFlops for 5D, – 453 MFlops for 6D, – 200 MFlops for 8D. 8 / 14

SDPA-CAMPARY 9 / 14

SDPA-CAMPARY 1. started from the SDPA-DD package in which we changed the underlying arithmetic; 9 / 14

SDPA-CAMPARY 1. started from the SDPA-DD package in which we changed the underlying arithmetic; 2. linked the CAMPARY version of MPACK with it; 9 / 14

SDPA-CAMPARY 1. started from the SDPA-DD package in which we changed the underlying arithmetic; 2. linked the CAMPARY version of MPACK with it; 3. tested performance using standard problems from the SDPLIB package; 9 / 14

SDPA-CAMPARY 1. started from the SDPA-DD package in which we changed the underlying arithmetic; 2. linked the CAMPARY version of MPACK with it; 3. tested performance using standard problems from the SDPLIB package; 4. tested accuracy on binary codes problems from Sotirov’s collection. 9 / 14

AM P A R CudA Multiple Precision ARithmetic librarY When do - PowerPoint PPT Presentation

Implementation and performance evaluation of an extended precision floating-point arithmetic library for high-accuracy semidefinite programming Mioara Joldes, Jean-Michel Muller and Valentina Popescu ARITH 24 July 2017 AM P A R CudA

Patients Rights have no Borders. as well as risks! Catherine Donohoe, Irish National

CROSS-BORDER HEALTHCARE AND EUROPEAN UNION LAW Ferrara, 13 th March 2017 Fabiana Panin, PhD -

Data Protection Code of Conduct for Service Providers (DP CoC) VAMP workshop 7.9.2012

BEST PRACTICES FOLLOWED IN SAARC COUNTRIES Presented ed by: Sheema Haide der Director or Qu

Reflections on 10 years of FloPoCo Florent de Dinechin The FloPoCo project A generator of

Computing correctly rounded logarithms with fixed-point operations Julien Le Maire, Florent de

Introduction to Linux Kernel Modules Luca Abeni luca.abeni@santannapisa.it Linux Kernel Modules

Exact computations with an arithmetic known to be approximate MaGiX@LiX conference 2011

. 1 / 151 Computer Algebra Basic Information Working defjnition of Computer Algebra:

NOW Handout Page 1 1 Styles of Vector Architectures Components of Vector Processor Vector

The Power of Teacher Collaboration to Support Effective Teaching and Learning Diane J. Briars

Reinforcement Learning Lecture 8 Reinforcement Learning November 24, 2015 1 Wentworth

Learning to Randomize and Remember in Partially-Observed Environments Radford M. Neal, University

Recap: MDPs Op)mal Quan))es Markov decision processes:

Advanced Econometrics 2, Hilary term 2021 Reinforcement learning Maximilian Kasy Department of

Introduction to Reinforcement Learning Finale Doshi-Velez Harvard University Buenos Aires MLSS

Matching skills needs with skills reserves: Protecting workers and communities for a Just

Outline Storage local/mounted on Compute Elements $OSG_APP, $OSG_WN_TMP, $OSG_DATA

Multivariate GLMs Author: Nicholas Reich, transcribed by Kate Hoff Shutta and Herb Susmann

Finite mixture models Dr. Jarad Niemi STAT 615 - Iowa State University November 28, 2017 Jarad

Explainable Neural Computation via Stack Neural Module Networks (July, 2018) Ronghang Hu, Jacob

Some Discrete Distribution Families Many families of discrete distributions have been studied; we

Discrete Mathematics & Mathematical Reasoning Chapter 7: Discrete Probability Kousha

12/1/2019 Department of Veterinary and Animal Sciences Department of Veterinary and Animal

AM P A R CudA Multiple Precision ARithmetic librarY When do - PowerPoint PPT Presentation

Implementation and performance evaluation of an extended precision floating-point arithmetic library for high-accuracy semidefinite programming Mioara Joldes, Jean-Michel Muller and Valentina Popescu ARITH 24 July 2017 AM P A R CudA

Patients Rights have no Borders. as well as risks! Catherine Donohoe, Irish National

CROSS-BORDER HEALTHCARE AND EUROPEAN UNION LAW Ferrara, 13 th March 2017 Fabiana Panin, PhD -

Data Protection Code of Conduct for Service Providers (DP CoC) VAMP workshop 7.9.2012

BEST PRACTICES FOLLOWED IN SAARC COUNTRIES Presented ed by: Sheema Haide der Director or Qu

Reflections on 10 years of FloPoCo Florent de Dinechin The FloPoCo project A generator of

Computing correctly rounded logarithms with fixed-point operations Julien Le Maire, Florent de

Introduction to Linux Kernel Modules Luca Abeni luca.abeni@santannapisa.it Linux Kernel Modules

Exact computations with an arithmetic known to be approximate MaGiX@LiX conference 2011

. 1 / 151 Computer Algebra Basic Information Working defjnition of Computer Algebra:

NOW Handout Page 1 1 Styles of Vector Architectures Components of Vector Processor Vector

The Power of Teacher Collaboration to Support Effective Teaching and Learning Diane J. Briars

Reinforcement Learning Lecture 8 Reinforcement Learning November 24, 2015 1 Wentworth

Learning to Randomize and Remember in Partially-Observed Environments Radford M. Neal, University

Recap: MDPs Op)mal Quan))es Markov decision processes:

Advanced Econometrics 2, Hilary term 2021 Reinforcement learning Maximilian Kasy Department of

Introduction to Reinforcement Learning Finale Doshi-Velez Harvard University Buenos Aires MLSS

Matching skills needs with skills reserves: Protecting workers and communities for a Just

Outline Storage local/mounted on Compute Elements $OSG_APP, $OSG_WN_TMP, $OSG_DATA

Multivariate GLMs Author: Nicholas Reich, transcribed by Kate Hoff Shutta and Herb Susmann

Finite mixture models Dr. Jarad Niemi STAT 615 - Iowa State University November 28, 2017 Jarad

Explainable Neural Computation via Stack Neural Module Networks (July, 2018) Ronghang Hu, Jacob

Some Discrete Distribution Families Many families of discrete distributions have been studied; we

Discrete Mathematics &amp; Mathematical Reasoning Chapter 7: Discrete Probability Kousha

12/1/2019 Department of Veterinary and Animal Sciences Department of Veterinary and Animal

Discrete Mathematics & Mathematical Reasoning Chapter 7: Discrete Probability Kousha