The Multiprecision Effort in the US Exascale Computing Project - PowerPoint PPT Presentation

The Multiprecision Effort in the US Exascale Computing Project ICERM: Variable Precision in Mathematical and Scientific Computing May 7 th / May 8 th 2020 Hartwig Anzt & FiNE@KIT in collaboration withJack Dongarra & ICL, Ulrike Meier Yang, Enrique Quintana-Orti, and manyothers... www.kit.edu KIT – The Research University in the Helmholtz Association

What is the Multiprecision Effort in ECP Coordinated effort across all math library projects • of the US Exascale Computing Project; Administratively part of the xSDK4ECP project led by Ulrike Meier Yang; • Link between multiprecision efforts of ECP project partners • and create synergies across the individual efforts; Ulrike Meier Yang (LLNL) Evaluate status quo and develop and deploy production-ready software; • Algorithm focus on linear solvers, eigenvalue solvers, preconditioners, • multigrid methods, FFT, Machine Learning (ML) technology; Hardware focus on leadership computers (Summit, Frontier…); • We are focusing on performance, not (bit-wise) reproducibility; • H. Anzt: The Multiprecision Effort in the US Exascale Computing Project 05/08/2020 2

Floating point formats and performance on GPUs 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 NVIDIA GPU generation Fermi Pascal Volta Tesla Kepler Maxwell Rel. compute performance 1 : 8 1 : 8 1 : 24 1 : 32 1 : 2 : 4 1 : 2 : 16* double : single : half Rel. memory performance 1 : 2 1 : 2 1 : 2 1 : 2 1 : 2 : 4 1 : 2 : 4 *Tensor cores H. Anzt: The Multiprecision Effort in the US Exascale Computing Project 05/08/2020 3

Floating point formats and performance on GPUs 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 NVIDIA GPU generation Fermi Pascal Volta Tesla Kepler Maxwell Rel. compute performance 1 : 8 1 : 8 1 : 24 1 : 32 1 : 2 : 4 1 : 2 : 16* double : single : half Rel. memory performance 1 : 2 1 : 2 1 : 2 1 : 2 1 : 2 : 4 1 : 2 : 4 *Tensor cores For compute-bound applications, the performance gains from using lower precision depend on the architecture. Up to 16x for FP16 on Volta, up to 32x for FP32 on Maxwell. For memory-bound applications, the performance gains from using lower precision are architecture-independent and correspond to the floating point format complexity (#bits). Generally, 2x for FP32, 4x for FP16. H. Anzt: The Multiprecision Effort in the US Exascale Computing Project 05/08/2020 4

Take-Away Performance of compute-bound algorithms depends on format support of hardware . • Performance of memory-bound algorithms scales hardware-independent with inverse of format complexity . • Experiments based on the Ginkgo library https://ginkgo-project.github.io/ ginkgo/examples/adaptiveprecision-blockjacobi/adaptiveprecision-blockjacobi.cpp H. Anzt: The Multiprecision Effort in the US Exascale Computing Project 05/08/2020 5

IEEE 754 Floating Point Formats Exponent Significand Sign bit Broadly speaking…. The length of the exponent determines the range of the values • that can be represented; The length of the significand determines how accurate values • can be represented; Figure courtesy of Ignacio Laguna, LLNL IDEAS Webinar #34 by Ignacio Laguna on Tools and Techniques for Floating-Point Analysis H. Anzt: The Multiprecision Effort in the US Exascale Computing Project 05/08/2020 6

IEEE 754 Floating Point Formats double precision (FP64) Broadly speaking…. The length of the exponent determines the range of the values • that can be represented; The length of the significand determines how accurate values • single can be represented; precision (FP32) Figure courtesy of Ignacio Laguna, LLNL half precision (FP16) IDEAS Webinar #34 by Ignacio Laguna on Tools and Techniques for Floating-Point Analysis H. Anzt: The Multiprecision Effort in the US Exascale Computing Project 05/08/2020 7

Floating point formats and accuracy The length of the exponent determines the range of the values that can be represented; • The length of the significand determines how accurate values can be represented; • Rounding effects accumulate over a sequence of computations; • Let us focus on linear systems of the form Ax=b. The conditioning of a linear system reflects how sensitive • the solution x is with regard to changes in the right-hand side b. Rule of thumb: • relative residual accuracy = ( unit round-off ) * (linear system’s condition number) N. Higham: Accuracy and stability of numerical algorithms. SIAM, 2002. H. Anzt: The Multiprecision Effort in the US Exascale Computing Project 05/08/2020 8

Floating point formats and accuracy Linear System Ax=b with cond(A) ≈ 10 4 Experiments based on the Ginkgo library https://ginkgo-project.github.io/ ginkgo/examples/adaptiveprecision-blockjacobi/adaptiveprecision-blockjacobi.cpp H. Anzt: The Multiprecision Effort in the US Exascale Computing Project 05/08/2020 9

Floating point formats and accuracy Linear System Ax=b with cond(A) ≈ 10 4 … Double Precision - ValueType = double; + ValueType = float; … Accuracy improvement ~10 12 relative residual accuracy = ( unit round-off ) * (linear system’s condition number) Experiments based on the Ginkgo library https://ginkgo-project.github.io/ ginkgo/examples/adaptiveprecision-blockjacobi/adaptiveprecision-blockjacobi.cpp H. Anzt: The Multiprecision Effort in the US Exascale Computing Project 05/08/2020 10

Floating point formats and accuracy Linear System Ax=b with cond(A) ≈ 10 4 Double Precision Single Precision Accuracy improvement ~10 12 Accuracy improvement ~10 4 relative residual accuracy = ( unit round-off ) * (linear system’s condition number) Experiments based on the Ginkgo library https://ginkgo-project.github.io/ ginkgo/examples/adaptiveprecision-blockjacobi/adaptiveprecision-blockjacobi.cpp H. Anzt: The Multiprecision Effort in the US Exascale Computing Project 05/08/2020 11

Floating point formats and accuracy Linear System Ax=b with cond(A)= 10 3 Double Precision Single Precision Accuracy improvement ~10 13 Accuracy improvement ~10 4 Experiments based on the Ginkgo library https://ginkgo-project.github.io/ ginkgo/examples/adaptiveprecision-blockjacobi/adaptiveprecision-blockjacobi.cpp H. Anzt: The Multiprecision Effort in the US Exascale Computing Project 05/08/2020 12

Floating point formats and accuracy Linear System Ax=b with cond(A) ≈ 10 4 Double Precision Single Precision Single Precision is 10% faster! Experiments based on the Ginkgo library https://ginkgo-project.github.io/ ginkgo/examples/adaptiveprecision-blockjacobi/adaptiveprecision-blockjacobi.cpp H. Anzt: The Multiprecision Effort in the US Exascale Computing Project 05/08/2020 13

Floating point formats and accuracy Linear System Ax=b with cond(A) ≈ 10 7 apache2 from SuiteSparse Double Precision Single Precision Accuracy improvement ~10 9 No improvement Experiments based on the Ginkgo library https://ginkgo-project.github.io/ ginkgo/examples/adaptiveprecision-blockjacobi/adaptiveprecision-blockjacobi.cpp H. Anzt: The Multiprecision Effort in the US Exascale Computing Project 05/08/2020 14

Take-Away Performance of compute-bound algorithms depends on format support of hardware . • Performance of memory-bound algorithms scales hardware-independent with inverse of format complexity . • relative residual accuracy = (unit round-off) * (linear system’s condition number) • If the problem is well-conditioned , and a low-accuracy solution is acceptable , • use a low precision format . (i.e. IEEE single precision, or even IEEE half precision.) H. Anzt: The Multiprecision Effort in the US Exascale Computing Project 05/08/2020 15

Take-Away Performance of compute-bound algorithms depends on format support of hardware . • Performance of memory-bound algorithms scales hardware-independent with inverse of format complexity . • relative residual accuracy = (unit round-off) * (linear system’s condition number) • If the problem is well-conditioned , and a low-accuracy solution is acceptable , • use a low precision format . (i.e. IEEE single precision, or even IEEE half precision.) Framework for exploring the effect of floating point format in iterative solvers: https://github.com/ginkgo-project/ginkgo Terry Cojean Goran Flegar Thomas Pratik Nayak Mike Tsai Tobias Ribizel Fritz Göbel Grützmacher H. Anzt: The Multiprecision Effort in the US Exascale Computing Project 05/08/2020 16

The Multiprecision Effort in the US Exascale Computing Project - PowerPoint PPT Presentation

The Multiprecision Effort in the US Exascale Computing Project ICERM: Variable Precision in Mathematical and Scientific Computing May 7 th / May 8 th 2020 Hartwig Anzt & FiNE@KIT in collaboration withJack Dongarra & ICL, Ulrike Meier

Faster multiprecision integer division William Hart June 22, 2015 William Hart Faster

Numerical Recipes for Multiprecision Computations Henri Cohen May 13, 2014 IMB, Universit e

Why Nobody Should Care About Operating Systems for Exascale Operating Systems for Exascale Ron

exascale road in China Ruibo WANG National University of Defense Technology Contents NUDT

Major Challenges to Achieve Exascale Performance Shekhar Borkar Intel Corp. April 29, 2009

HPC Future Look Exascale and Challenges Outline Future architectures Exascale initiatives

The Exascale Computing Project (ECP) Paul Messina, ECP Director Stephen Lee, ECP Deputy Director

Exascale Computing Project: Software Technology Perspective Rajeev Thakur, Argonne National Lab.

The U.S. D.O.E. Exascale Computing Project Goals and Challenges Paul Messina, ECP Director

The Rise of Multiprecision Computations Nick Higham School of Mathematics The University of

Stochastic arithmetic in multiprecision Stef Graillat Joint work with Fabienne Jzquel and

Reliable multiprecision implementation of a class of special functions Team: A. Cuyt, V.B.

Multiprecision Multiplication on ARMv8 ZHE LIU 1 , KIMMO JRVINENDL 2 , WEIQIANG LIU 3 ,

Reliable multiprecision arithmetic for number theory Fredrik Johansson LFANT seminar, IMB /

Feder ederal al Time Time and and Effort Effort Reporting Requirements Reporting

Time to Start over? Software for Exascale William Gropp www.cs.illinois.edu/~wgropp Why Is

Openmoko is dead. Long live OpenPhoenux! Nikolaus Schaller, Lukas Mrdian LinuxTag, Berlin, May

P1 P1 Math th parents Workshop 2018 2018 FACILITATORS: MDM SABARIAH, MS ANGELA TANG,MDM

Luca Bedogni Dipartimento di Scienze dellInformazione Universit di Bologna Outline Android

Functional Encryptions and Cloudy Applications Function on a Cloudy Day Giuseppe Persiano

Leaf poset and multi-colored hook length property . . . Masao Ishikawa Department of

Cadence tools Brandon Rumberg 1 Tools to cover Creating piecewise linear (PWL) files

Procesos forma+vos del profesorado: ar+culacin de conocimientos y

F is for Compsci 201 Work, Nbody, ArrayLists Folder aka Directory where things