Iterative Solution of Linear Systems in Iterative Solution of Linear - PowerPoint PPT Presentation

UnConventional High Performance Computing UnConventional High Performance Computing (UCHPC) 2010 workshop at: (UCHPC) 2010 workshop at: Iterative Solution of Linear Systems in Iterative Solution of Linear Systems in Electromagnetics (and not only): Electromagnetics (and not only): Experiences with CUDA Experiences with CUDA D. De Donno, A. Esposito, G. Monti, L. Tarricone D. De Donno , A. Esposito, G. Monti, L. Tarricone Innovation Engineering Department Innovation Engineering Department University of Salento - - Lecce Lecce - - Italy Italy University of Salento August 30st, 2010 August 30st, 2010

Outline Outline  Background on CEM issues  Motivations and objectives  Design and implementation  Experimental results  Conclusions and ongoing work Danilo De Donno – – University of Salento, Lecce, Italy University of Salento, Lecce, Italy Danilo De Donno

Background Background One of the fundamental steps of numerical computing is the ability to solve linear systems:  Ax b These systems arise very frequently in scientific computing, from finite difference or finite element approximations to partial differential equations. For example, in Computational ElectroMagnetics Computational ElectroMagnetics (CEM CEM) the process of modeling the interaction of electromagnetic fields with physical objects and the environment give rise to linear systems with large number of unknowns. Scattering problems RF components and 3D EM cicruits EMC analysis Antenna design Danilo De Donno – – University of Salento, Lecce, Italy University of Salento, Lecce, Italy Danilo De Donno

Method of Moments (MoM) Method of Moments (MoM) linear systems linear systems In CEM a key role is played by the Method of Moments Method of Moments (MoM MoM) which transforms the integral-differential Maxwell's equations into a linear system of algebraic equations. Solve linear system Solve linear system MoM- -based based MoM EM simulator EM simulator   Z I V Object discretization Object discretization Z is the MoM impedance matrix containing the complex complex reaction terms between basis functions. In large large EM problems Z can be reduced to an unstructured unstructured and significantly sparse sparse matrix without affecting the numerical accuracy. ITERATIVE SOLVERS (i.e. CG, BiCG) are preferred in these cases. ITERATIVE SOLVERS Danilo De Donno – – University of Salento, Lecce, Italy University of Salento, Lecce, Italy Danilo De Donno

Objective Objective We implemented a Bi Bi- -Conjugate Gradient Conjugate Gradient (BiCG BiCG) iterative solver for GPUs. It tackles unstructured sparse unstructured sparse matrices matrices with double precision complex double precision complex data data . Recently, cheap and powerful graphics processors graphics processors (GPU GPU) are emerging as a valide alternative to supercomputers and computational grids. Our implementation takes advantage from CUDA, a standard C language extension CUDA for parallel application development on NVIDIA GPUs. NVIDIA A CUDA application consists of SPMD SPMD computations ( kernels kernels ) performed by threads threads running in parallel on the GPU streaming multiprocessors streaming multiprocessors (SMs SMs). Danilo De Donno – – University of Salento, Lecce, Italy University of Salento, Lecce, Italy Danilo De Donno

The BiCG algorithm (1/2) The BiCG algorithm (1/2) BiCG is a generalization of CG CG ( Conjugate Gradient Conjugate Gradient ) method. BiCG CG method BiCG method CG method BiCG method  real symmetric matrices  real non-symmetric matrices  complex Hermitian matrices  complex non-Hermitian matrices In the initialization phase of BiCG, the following variables are defined:    * r b Ax r r Residual and bi- -residual residual Residual and bi 0 0 0 0   * p r p p Direction and bi- Direction and bi -direction direction 0 0 0 0       1 1 * Pre- -conditioned residual and bi conditioned residual and bi- -residual residual d M r d M r Pre 0 0 0 0    * T Initial residual error Initial residual error d r 0 0 Danilo De Donno – – University of Salento, Lecce, Italy University of Salento, Lecce, Italy Danilo De Donno

The BiCG algorithm (2a/2) The BiCG algorithm (2a/2) In the BiCG main loop, the following steps are repeated for each iteration: Calculate the step lenght parameter and form the new solution estimate. timate. 1. Calculate the step lenght parameter and form the new solution es 1. STEP 1 STEP 1   q A p  1 i i   H q A p  1 i i     1 i  * i p q  1 i i     x x p   1 1 i i i i Danilo De Donno – – University of Salento, Lecce, Italy University of Salento, Lecce, Italy Danilo De Donno

The BiCG algorithm (2b/2) The BiCG algorithm (2b/2) In the BiCG main loop, the following steps are repeated for each iteration: Update residual and bi- -residual, with and without preconditioning. residual, with and without preconditioning. 2. Update residual and bi 2. STEP 2 STEP 2     r r q  1 i i i i     * r r q  1 i i i i    1 d M r i i    1 d M r i i Danilo De Donno – – University of Salento, Lecce, Italy University of Salento, Lecce, Italy Danilo De Donno

The BiCG algorithm (2c/2) The BiCG algorithm (2c/2) In the BiCG main loop, the following steps are repeated for each iteration: Calculate the residual error ρ ρ and the bi and the bi- -conjugacy coefficient conjugacy coefficient β β . . 3. Calculate the residual error 3. STEP 3 STEP 3    * T d r i i i    i   i 1 i Danilo De Donno – – University of Salento, Lecce, Italy University of Salento, Lecce, Italy Danilo De Donno

The BiCG algorithm (2d/2) The BiCG algorithm (2d/2) In the BiCG main loop, the following steps are repeated for each iteration: Update next direction and bi- -direction vectors. direction vectors. 4. Update next direction and bi 4. STEP 4 STEP 4     p d p  1 i i i i     * p d p  1 i i i i Iteration is continued till the convergence criterion is satisfied: i r   2 Values of ε commonly used are 10 -6 / 10 -7 . b 2 Danilo De Donno – – University of Salento, Lecce, Italy University of Salento, Lecce, Italy Danilo De Donno

Design of the GPU- -enabled BiCG enabled BiCG Design of the GPU In the GPU-enabled BiCG algorithm, the main loop execution is controlled on the host side, whereas the computations inside are performed on the GPU. INITIALIZATION consists in: INITIALIZATION  reading and storing the system matrix in a given sparse format;  allocating data structures on the GPU and calculating BiCG initial variables. BiCG MAIN LOOP consists of the BiCG MAIN LOOP iterative invocation of parallel CUDA kernels performing the BiCG operations. FINALIZATION consists in retrieving FINALIZATION final results from GPU global memory. Danilo De Donno – – University of Salento, Lecce, Italy University of Salento, Lecce, Italy Danilo De Donno

Implementation (1/3) Implementation (1/3) Four basic CUDA kernels are enough to completely describe the BiCG main loop: Operation Description FLOPS Operation Description FLOPS Sparse matrix-vector SpMV 8· nnz product Scalar product of two Dot product 8· N vectors Element-wise product of E-w product 6· N two vectors ax+y (a scalar, x and y axpy 8· N vectors) Recall that we are considering non non- -symmetric symmetric and non non- -Hermitian Hermitian sparse sparse matrices with N rows and nnz non-zero complex complex coefficients. Danilo De Donno – – University of Salento, Lecce, Italy University of Salento, Lecce, Italy Danilo De Donno

Implementation (2a/3) Implementation (2a/3)   SpMV – this CUDA kernel implements the Bell and Garland q A p SpMV  1 i i algorithm (*) which is the best performing code currently avaible   H q A p  for solving sparse matrix-vector product. 1 i i Supported sparse matrix formats Main modifications to the original code Supported sparse matrix formats Main modifications to the original code  CRS ( Compressed  double precision complex matrix support double precision complex matrix support   CRS Compressed Row Row Storage )  CUDA grid, register number, shared and CUDA grid, register number, shared and  Storage  HYB  ( hybrid texture memory exploitation optimized for texture memory exploitation optimized for HYB hybrid ELLpack- - ELLpack COOrdinate format ) double precision complex data. double precision complex data. COOrdinate format (*) N. Bell and M. Garland: “ Implementing sparse matrix Implementing sparse matrix- -vector multiplication on throughput oriented vector multiplication on throughput oriented processors ”, In Supercomputing ’09, Nov. 2009. processors Danilo De Donno – – University of Salento, Lecce, Italy University of Salento, Lecce, Italy Danilo De Donno

Iterative Solution of Linear Systems in Iterative Solution of Linear - PowerPoint PPT Presentation

UnConventional High Performance Computing UnConventional High Performance Computing (UCHPC) 2010 workshop at: (UCHPC) 2010 workshop at: Iterative Solution of Linear Systems in Iterative Solution of Linear Systems in Electromagnetics (and not

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

2/17/2017 Continued from yesterday >java RealQueen 5 SOLUTION: 1 3 5 2 4 SOLUTION: 1 4 2 5

Parallel Numerical Algorithms Chapter 4 Sparse Linear Systems Section 4.3 Iterative

Iterative Methods Mostly for SPD systems Iterative Linear conjugate gradient and its variants

Iterative Closest Point (ICP) Algorithm. L 1 solution... Yaroslav Halchenko CS @ NJIT Iterative

Basic Techniques II: Iterative Compression Marek Cygan Institute of Informatics University of

Chapter 12: Iterative Methods ES 240: Scientific and Engineering Computation. Iterative Methods

Development Figures are from : Agile and Iterative Development: A Manager's Guide, Craig

SOLUTION OF LINEAR ALGEBRAIC EQUATIONS I. Hajj 2017 Linear Equation Solution Methods Consider a

Graphics 2014 Linear Algebra II Linear Maps & Matrices Linear Maps & Matrices CORE

Iterative Solution of Very Large Systems of Linear Equations 1990-2000 Jan Mandel Professor

Iterative improvement algorithms Prof. Tuomas Sandholm Carnegie Mellon University Computer

7. Iterative Methods: Roots and Optima Citius, Altius, Fortius! 7. Iterative Methods: Roots and

A Chebyshev-based two-stage iterative method as an alternative to the direct solution of linear

Section 1.5 Solution Sets of Linear Systems Plan For Today Describe and draw the solution set of

Linear Systems CS3220 Summer 2008 Jonathan Kaldor Systems of Linear Equations Want to find

Structure and general approach of HiTI Integrated industry and science cooperation to progress

Identification of generalized impedance boundary conditions in inverse scattering problems L.

Laplace Transform Prof. Seungchul Lee Industrial AI Lab. Two Famous Linear Transforms Fourier

Process Theories and Graphical Language Aleks Kissinger Institute for Computing and Information

Operator Overloading Mark Redekopp 2 Download $ wget

Examples using Complex class class Complex { double rep, imp; public: Complex (double rp=0,

Operator overloading (1 of 9) Operator overloading (2 of 9) Let us assume a class to handle

Complex symmetric composition operators on spaces of analytic functions Based on a joint work

Iterative Solution of Linear Systems in Iterative Solution of Linear - PowerPoint PPT Presentation

UnConventional High Performance Computing UnConventional High Performance Computing (UCHPC) 2010 workshop at: (UCHPC) 2010 workshop at: Iterative Solution of Linear Systems in Iterative Solution of Linear Systems in Electromagnetics (and not

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

2/17/2017 Continued from yesterday &gt;java RealQueen 5 SOLUTION: 1 3 5 2 4 SOLUTION: 1 4 2 5

Parallel Numerical Algorithms Chapter 4 Sparse Linear Systems Section 4.3 Iterative

Iterative Methods Mostly for SPD systems Iterative Linear conjugate gradient and its variants

Iterative Closest Point (ICP) Algorithm. L 1 solution... Yaroslav Halchenko CS @ NJIT Iterative

Basic Techniques II: Iterative Compression Marek Cygan Institute of Informatics University of

Chapter 12: Iterative Methods ES 240: Scientific and Engineering Computation. Iterative Methods

Development Figures are from : Agile and Iterative Development: A Manager's Guide, Craig

SOLUTION OF LINEAR ALGEBRAIC EQUATIONS I. Hajj 2017 Linear Equation Solution Methods Consider a

Graphics 2014 Linear Algebra II Linear Maps &amp; Matrices Linear Maps &amp; Matrices CORE

Iterative Solution of Very Large Systems of Linear Equations 1990-2000 Jan Mandel Professor

Iterative improvement algorithms Prof. Tuomas Sandholm Carnegie Mellon University Computer

7. Iterative Methods: Roots and Optima Citius, Altius, Fortius! 7. Iterative Methods: Roots and

A Chebyshev-based two-stage iterative method as an alternative to the direct solution of linear

Section 1.5 Solution Sets of Linear Systems Plan For Today Describe and draw the solution set of

Linear Systems CS3220 Summer 2008 Jonathan Kaldor Systems of Linear Equations Want to find

Structure and general approach of HiTI Integrated industry and science cooperation to progress

Identification of generalized impedance boundary conditions in inverse scattering problems L.

Laplace Transform Prof. Seungchul Lee Industrial AI Lab. Two Famous Linear Transforms Fourier

Process Theories and Graphical Language Aleks Kissinger Institute for Computing and Information

Operator Overloading Mark Redekopp 2 Download $ wget

Examples using Complex class class Complex { double rep, imp; public: Complex (double rp=0,

Operator overloading (1 of 9) Operator overloading (2 of 9) Let us assume a class to handle

Complex symmetric composition operators on spaces of analytic functions Based on a joint work

2/17/2017 Continued from yesterday >java RealQueen 5 SOLUTION: 1 3 5 2 4 SOLUTION: 1 4 2 5

Graphics 2014 Linear Algebra II Linear Maps & Matrices Linear Maps & Matrices CORE