am p a r
play

AM P A R CudA Multiple Precision ARithmetic librarY When do - PowerPoint PPT Presentation

Implementation and performance evaluation of an extended precision floating-point arithmetic library for high-accuracy semidefinite programming Mioara Joldes, Jean-Michel Muller and Valentina Popescu ARITH 24 July 2017 AM P A R CudA


  1. Implementation and performance evaluation of an extended precision floating-point arithmetic library for high-accuracy semidefinite programming Mioara Joldes, Jean-Michel Muller and Valentina Popescu ARITH 24 July 2017 AM P A R CudA Multiple Precision ARithmetic librarY

  2. When do we need more precision? 1 / 14

  3. When do we need more precision? Computing correctly rounded transcendental functions (ex. CRLIBM). 1 / 14

  4. When do we need more precision? Computing correctly rounded transcendental functions (ex. CRLIBM). Dynamical systems field: compute periodic orbits (e.g., finding sinks in the 0.4 H´ enon map, iterating the Lorenz attractor), 0.2 x2 0 celestial mechanics (e.g., long term stability of the solar -0.2 system). -0.4 -1.5 -1 -0.5 0 0.5 1 1.5 x1 1 / 14

  5. When do we need more precision? Computing correctly rounded transcendental functions (ex. CRLIBM). Dynamical systems field: compute periodic orbits (e.g., finding sinks in the 0.4 H´ enon map, iterating the Lorenz attractor), 0.2 x2 0 celestial mechanics (e.g., long term stability of the solar -0.2 system). -0.4 -1.5 -1 -0.5 0 0.5 1 1.5 x1 Optimization problems in experimental mathematics: computation of kissing numbers, bounds for binary codes, problems in control theory and structural design (e.g., the wing of Airbus A380). problems in quantum chemistry/information, etc. ⇒ solved using Semi-Definite Programming (SDP) 1 / 14

  6. Outline Overview on Semi-Definite Programing SDPA-CAMPARY Performance and Numerical Results

  7. What is SDP? convex optimization problem; extension of linear programming; applied to the cone of symmetric matrices with non-negative eigenvalues; the linear vector inequalities are replaced by linear matrix inequalities (LMI). 2 / 14

  8. Formal definition – R n × n the space of size n × n real matrices; – S n ⊂ R n × n the subspace of real symmetric matrices, equipped with the inner product � A, B � S n = tr( A T B ) , where tr( A ) is the trace of A ; – A � O denotes a positive semidefinite matrix A . p ∗ = sup X ∈ S n � C, X � S n (P) s.t. � A i , X � S n = b i , i = 1 , . . . , m, X � O, d ∗ = inf y ∈ R m b T y (D) m � s.t. Y := y i A i − C � O, i =1 for given C, A i ∈ S n × n , i = 1 , . . . , m and b ∈ R m . Classical solving: Primal Dual Interior Point Method (PDIPM) Algorithm. 3 / 14

  9. Existing SDP solvers in double -precision: SeDuMi, SDPT3, CSDP, MOSEK (proprietary software); exact rational arithmetic: SPECTRA; uses interval arithmetic: VSDP; supports higher extended precision: SDPA Family (DD, QD and GMP versions). 4 / 14

  10. Existing SDP solvers in double -precision: SeDuMi, SDPT3, CSDP, MOSEK (proprietary software); exact rational arithmetic: SPECTRA; uses interval arithmetic: VSDP; supports higher extended precision: SDPA Family (DD, QD and GMP versions). SDPA features written in C/C++; starting with v6.0 it incorporates LAPACK for dense matrix computations; more recently it integrated MPACK (multiple-precision linear algebra package based on BLAS and LAPACK); MPACK also offers a GPU tuned implementation in double-double of the Rgemm routine. 4 / 14

  11. Outline Overview on Semi-Definite Programing SDPA-CAMPARY Performance and Numerical Results

  12. What is CAMPARY? CudA Multiple-Precision ARithmetic librarY 5 / 14

  13. What is CAMPARY? CudA Multiple-Precision ARithmetic librarY uses the multiple-term approach for extending the available precision → floating-point expansions; moderate arbitrary precision –few hundred bits– 5 / 14

  14. What is CAMPARY? CudA Multiple-Precision ARithmetic librarY uses the multiple-term approach for extending the available precision → floating-point expansions; moderate arbitrary precision –few hundred bits– targets both CPU and GPU (compilers: GCC, NVCC) underlying FP format: binary32 (up to 12 terms) or binary64 (up to 39 terms) 5 / 14

  15. What is CAMPARY? CudA Multiple-Precision ARithmetic librarY uses the multiple-term approach for extending the available precision → floating-point expansions; moderate arbitrary precision –few hundred bits– targets both CPU and GPU (compilers: GCC, NVCC) underlying FP format: binary32 (up to 12 terms) or binary64 (up to 39 terms) ◦ sequential algorithms: all basic operations ( + / − , × , ÷ , √ ) accurate algorithms - tight error bound “quick-and-dirty” algorithms - does not consider corner cases ⋆ optimized algorithms for double-word arithmetic 5 / 14

  16. What is CAMPARY? CudA Multiple-Precision ARithmetic librarY uses the multiple-term approach for extending the available precision → floating-point expansions; moderate arbitrary precision –few hundred bits– targets both CPU and GPU (compilers: GCC, NVCC) underlying FP format: binary32 (up to 12 terms) or binary64 (up to 39 terms) ◦ sequential algorithms: all basic operations ( + / − , × , ÷ , √ ) accurate algorithms - tight error bound “quick-and-dirty” algorithms - does not consider corner cases ⋆ optimized algorithms for double-word arithmetic ◦ GPU-tuned parallel algorithms: + / − , × 5 / 14

  17. What is CAMPARY? CudA Multiple-Precision ARithmetic librarY uses the multiple-term approach for extending the available precision → floating-point expansions; moderate arbitrary precision –few hundred bits– targets both CPU and GPU (compilers: GCC, NVCC) underlying FP format: binary32 (up to 12 terms) or binary64 (up to 39 terms) ◦ sequential algorithms: all basic operations ( + / − , × , ÷ , √ ) accurate algorithms - tight error bound “quick-and-dirty” algorithms - does not consider corner cases ⋆ optimized algorithms for double-word arithmetic ◦ GPU-tuned parallel algorithms: + / − , × thorough correctness proofs and error analysis 5 / 14

  18. Integrating CAMPARY with MPACK Reminder: MPACK provides a GPU tuned implementation in double-double (DD) for matrix multiplication. 6 / 14

  19. Integrating CAMPARY with MPACK Reminder: MPACK provides a GPU tuned implementation in double-double (DD) for matrix multiplication. 1. we replaced the underlying arithmetic for all CPU routines in DD; 6 / 14

  20. Integrating CAMPARY with MPACK Reminder: MPACK provides a GPU tuned implementation in double-double (DD) for matrix multiplication. 1. we replaced the underlying arithmetic for all CPU routines in DD; 2. we re-implemented the GPU tuned Rgemm using CAMPARY: – classical blocking algorithm is employed; – for each element of a block a thread is created; – a specific number of threads is allocated per block also; – shared memory is used for each block; – reading is done from global memory. 6 / 14

  21. 18000 [25] CAMPARY 16000 14000 12000 10000 MFLOPs 8000 6000 4000 2000 0 0 500 1000 1500 2000 Dimension Performance of RGEMM with CAMPARY vs [Nakata2012] in DD on GPU. Max. performance: – 14 . 8 GFlops for CAMPARY, – 16 . 4 GFlops for [Nakata2012]. 7 / 14

  22. 1600 3D 4D 5D 6D 1400 8D 1200 1000 MFLOPs 800 600 400 200 0 0 100 200 300 400 500 600 700 800 900 1000 Dimension Performance of RGEMM with CAMPARY for n -double on GPU. Max. performance: – 1 . 6 GFlops for 3D, – 976 MFlops for 4D, – 660 MFlops for 5D, – 453 MFlops for 6D, – 200 MFlops for 8D. 8 / 14

  23. SDPA-CAMPARY 9 / 14

  24. SDPA-CAMPARY 1. started from the SDPA-DD package in which we changed the underlying arithmetic; 9 / 14

  25. SDPA-CAMPARY 1. started from the SDPA-DD package in which we changed the underlying arithmetic; 2. linked the CAMPARY version of MPACK with it; 9 / 14

  26. SDPA-CAMPARY 1. started from the SDPA-DD package in which we changed the underlying arithmetic; 2. linked the CAMPARY version of MPACK with it; 3. tested performance using standard problems from the SDPLIB package; 9 / 14

  27. SDPA-CAMPARY 1. started from the SDPA-DD package in which we changed the underlying arithmetic; 2. linked the CAMPARY version of MPACK with it; 3. tested performance using standard problems from the SDPLIB package; 4. tested accuracy on binary codes problems from Sotirov’s collection. 9 / 14

  28. Outline Overview on Semi-Definite Programing SDPA-CAMPARY Performance and Numerical Results

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend