Dense Linear Algebra on Heterogeneous Platforms: State of the Art - PowerPoint PPT Presentation

Dense Linear Algebra on Heterogeneous Platforms: State of the Art and Trends Paolo Bientinesi AICES, RWTH Aachen pauldj@aices.rwth-aachen.de ComplexHPC Spring School 2013 Heterogeneous computing - Impact on algorithms June 7th, 2013 Uppsala University, Sweden Paolo Bientinesi (AICES, RWTH Aachen) 1 / 34

Setting the stage 1 Part 1: blocked algorithms 2 Part 2: multithreading, fork-join 3 Part 3: multihtreading, algorithms-by-blocks 4 5 Part 4: streaming Paolo Bientinesi (AICES, RWTH Aachen) 2 / 34

Dense Linear Algebra ××××××××××××   ×××××××××××× ××××××××××××   ××××××××××××   ××××××××××××   ××××××××××××   M =  ××××××××××××   ××××××××××××    ××××××××××××   ××××××××××××   ×××××××××××× ×××××××××××× Paolo Bientinesi (AICES, RWTH Aachen) 3 / 34

Dense Linear Algebra ××× × ××× ×   ×××××××××××× ××××× ×××××   ××××××××××××   ××× ××××× ××   ×××× ××××× ×   M =  ××× × ××× ×   ××××××××××××    ××××× ×××××   ××××××××××××   ××× ××××× ×× ×××× ××××× × Paolo Bientinesi (AICES, RWTH Aachen) 3 / 34

Dense Linear Algebra ×× × × × ×   × × ××× ×× × ×× × ××   × ×× ×× ×   × × ×× ×   × ××× × ×   M =  × × × × × ×   ×× ××××× ××    × × ××   ××× × ×××   × × × ×× ×× × ×× × ×× × Paolo Bientinesi (AICES, RWTH Aachen) 3 / 34

Dense Linear Algebra ×   ×× ×××   ××××   ××××   ×××× ×   M =  ×××××××   ×× ×××××    ×××××××××   ××××××× ×   ×××××××××× ×××××××××××× Paolo Bientinesi (AICES, RWTH Aachen) 3 / 34

Dense Linear Algebra ××   ××× ×××   ×××   ×××   ×××   M =  ×××   ×××    ×××   ×××   ××× ×× Paolo Bientinesi (AICES, RWTH Aachen) 3 / 34

Dense Linear Algebra Linear systems Eigenproblems Ax = b , AX = B , least squares, . . . Ax = λx , AX = BX Λ , SVD, . . . Paolo Bientinesi (AICES, RWTH Aachen) 4 / 34

Dense Linear Algebra Linear systems Eigenproblems Ax = b , AX = B , least squares, . . . Ax = λx , AX = BX Λ , SVD, . . . Support routines factorizations, reductions, . . . Paolo Bientinesi (AICES, RWTH Aachen) 4 / 34

Dense Linear Algebra Matrix equations AX + XB = C , A = A + A − 1 , . . . 2 Linear systems Eigenproblems Ax = b , AX = B , least squares, . . . Ax = λx , AX = BX Λ , SVD, . . . Support routines factorizations, reductions, . . . Paolo Bientinesi (AICES, RWTH Aachen) 4 / 34

Organization in layers Paolo Bientinesi (AICES, RWTH Aachen) 5 / 34

Organization in layers BLAS Paolo Bientinesi (AICES, RWTH Aachen) 5 / 34

Organization in layers BLAS x, y ∈ R n , α ∈ R BLAS-1: y := y + αx dot := α + x T y Paolo Bientinesi (AICES, RWTH Aachen) 5 / 34

Organization in layers BLAS A, L ∈ R n × n , x, y ∈ R n BLAS-2: y := y + Ax y := L − 1 x x, y ∈ R n , α ∈ R BLAS-1: y := y + αx dot := α + x T y Paolo Bientinesi (AICES, RWTH Aachen) 5 / 34

Organization in layers BLAS A, B, C, L ∈ R n × n BLAS-3 : C := C + AB C := L − 1 B A, L ∈ R n × n , x, y ∈ R n BLAS-2: y := y + Ax y := L − 1 x x, y ∈ R n , α ∈ R BLAS-1: y := y + αx dot := α + x T y Paolo Bientinesi (AICES, RWTH Aachen) 5 / 34

Organization in layers LAPACK LL T = A, Q T TQ = A, . . . LU = A, QR = A, BLAS A, B, C, L ∈ R n × n BLAS-3 : C := C + AB C := L − 1 B A, L ∈ R n × n , x, y ∈ R n BLAS-2: y := y + Ax y := L − 1 x x, y ∈ R n , α ∈ R BLAS-1: y := y + αx dot := α + x T y Paolo Bientinesi (AICES, RWTH Aachen) 5 / 34

Organization in layers other libraries ScaLAPACK, Elemental, PETSc, . . . LAPACK LL T = A, Q T TQ = A, . . . LU = A, QR = A, BLAS A, B, C, L ∈ R n × n BLAS-3 : C := C + AB C := L − 1 B A, L ∈ R n × n , x, y ∈ R n BLAS-2: y := y + Ax y := L − 1 x x, y ∈ R n , α ∈ R BLAS-1: y := y + αx dot := α + x T y Paolo Bientinesi (AICES, RWTH Aachen) 5 / 34

Example: AX = B (full A ) AX = B Linear System LX = B LU = A Triangular LU System Factorization LX = B C = AB + C C = AB + C Triangular Gemm Gemm System C = AB + C Gemm Paolo Bientinesi (AICES, RWTH Aachen) 6 / 34

Why BLAS-3? Why GEMM? Paolo Bientinesi (AICES, RWTH Aachen) 7 / 34

Why BLAS-3? Why GEMM? BLAS Mem. refs. Ratio #FLOPS x, y ∈ R n , α ∈ R BLAS-1: y := y + αx Paolo Bientinesi (AICES, RWTH Aachen) 7 / 34

Why BLAS-3? Why GEMM? BLAS Mem. refs. Ratio #FLOPS Level 1 2 n 3 n 2 / 3 x, y ∈ R n , α ∈ R BLAS-1: y := y + αx Paolo Bientinesi (AICES, RWTH Aachen) 7 / 34

Why BLAS-3? Why GEMM? BLAS Mem. refs. Ratio #FLOPS Level 1 2 n 3 n 2 / 3 A ∈ R n × n , x, y ∈ R n BLAS-2: y := y + Ax x, y ∈ R n , α ∈ R BLAS-1: y := y + αx Paolo Bientinesi (AICES, RWTH Aachen) 7 / 34

Why BLAS-3? Why GEMM? BLAS Mem. refs. Ratio #FLOPS 2 n 2 n 2 2 Level 2 Level 1 2 n 3 n 2 / 3 A ∈ R n × n , x, y ∈ R n BLAS-2: y := y + Ax x, y ∈ R n , α ∈ R BLAS-1: y := y + αx Paolo Bientinesi (AICES, RWTH Aachen) 7 / 34

Why BLAS-3? Why GEMM? BLAS Mem. refs. Ratio #FLOPS 2 n 2 n 2 2 Level 2 Level 1 2 n 3 n 2 / 3 A, B, C, ∈ R n × n BLAS-3 : C := C + AB A ∈ R n × n , x, y ∈ R n BLAS-2: y := y + Ax x, y ∈ R n , α ∈ R BLAS-1: y := y + αx Paolo Bientinesi (AICES, RWTH Aachen) 7 / 34

Why BLAS-3? Why GEMM? BLAS Mem. refs. Ratio #FLOPS 2 n 3 4 n 2 Level 3 n/ 2 2 n 2 n 2 2 Level 2 Level 1 2 n 3 n 2 / 3 A, B, C, ∈ R n × n BLAS-3 : C := C + AB A ∈ R n × n , x, y ∈ R n BLAS-2: y := y + Ax x, y ∈ R n , α ∈ R BLAS-1: y := y + αx Paolo Bientinesi (AICES, RWTH Aachen) 7 / 34

Why BLAS-3? Why GEMM? BLAS Mem. refs. Ratio #FLOPS 2 n 3 4 n 2 Level 3 n/ 2 2 n 2 n 2 2 Level 2 Level 1 2 n 3 n 2 / 3 A, B, C, ∈ R n × n BLAS-3 : C := C + AB A ∈ R n × n , x, y ∈ R n BLAS-2: y := y + Ax x, y ∈ R n , α ∈ R BLAS-1: y := y + αx Morale BLAS-3: The larger the problem the better, as long as it fits in memory. GEMM is the building block for all the other BLAS-3 kernels, and for LAPACK. Paolo Bientinesi (AICES, RWTH Aachen) 7 / 34

Part 1: Blocked algorithms Simple example: Cholesky factorization Input: Matrix A , symmetric and positive definite. Determine L (lower triangular matrix) such that LL T = A Goal: � L T L � L = = ? L BL L BR Paolo Bientinesi (AICES, RWTH Aachen) 8 / 34

Cholesky factorization iteration i DONE DONE Paolo Bientinesi (AICES, RWTH Aachen) 9 / 34

Cholesky factorization iteration i + 1 unblocked algorithm blocked algorithm ❅ ❅ ❘ sqrt CHOL ✲ syr trsv TRSM SYRK Paolo Bientinesi (AICES, RWTH Aachen) 9 / 34

Cholesky factorization iteration i + 1 unblocked algorithm blocked algorithm DONE DONE DONE DONE Paolo Bientinesi (AICES, RWTH Aachen) 9 / 34

Cholesky: unblocked vs. blocked algorithms Paolo Bientinesi (AICES, RWTH Aachen) 10 / 34

Part 2: Parallelism? fork-join Solution #1: Multithreaded BLAS (+ vector instructions) Chol LU TRSM TRSM GEMM TRSM GEMM Paolo Bientinesi (AICES, RWTH Aachen) 11 / 34

Part 2: Parallelism? fork-join Solution #1: Multithreaded BLAS (+ vector instructions) Chol LU TRSM TRSM GEMM TRSM GEMM Advantage: ease of use. Legacy code! Drawback: unnecessary synchronization points OpenBLAS, ATLAS, BLIS, old versions of MKL, . . . Paolo Bientinesi (AICES, RWTH Aachen) 11 / 34

Multithreaded BLAS Xeon, 32 physical cores, MKL Efficiency of GEMM 1 0.8 Efficiency 0.6 1 thread 0.4 2 threads 4 threads 8 threads 0.2 16 threads 32 threads 0 1000 2000 3000 4000 5000 6000 7000 8000 Matrix dimension Paolo Bientinesi (AICES, RWTH Aachen) 12 / 34

Development of LA libraries - New architecture / new architectural features Paolo Bientinesi (AICES, RWTH Aachen) 13 / 34

Development of LA libraries - New architecture / new architectural features - GEMM Paolo Bientinesi (AICES, RWTH Aachen) 13 / 34

Development of LA libraries - New architecture / new architectural features - GEMM → peak performance [FFT, SpMV] Paolo Bientinesi (AICES, RWTH Aachen) 13 / 34

Development of LA libraries - New architecture / new architectural features - GEMM → peak performance [FFT, SpMV] - BLAS3, factorizations, AX=B Paolo Bientinesi (AICES, RWTH Aachen) 13 / 34

Development of LA libraries - New architecture / new architectural features - GEMM → peak performance [FFT, SpMV] → - BLAS3, factorizations, AX=B LINPACK benchmark Paolo Bientinesi (AICES, RWTH Aachen) 13 / 34

Dense Linear Algebra on Heterogeneous Platforms: State of the Art - PowerPoint PPT Presentation

Dense Linear Algebra on Heterogeneous Platforms: State of the Art and Trends Paolo Bientinesi AICES, RWTH Aachen pauldj@aices.rwth-aachen.de ComplexHPC Spring School 2013 Heterogeneous computing - Impact on algorithms June 7th, 2013 Uppsala

Lecture 14: Dense Linear Algebra David Bindel 18 Oct 2010 Where we are This week: dense

Automation in Dense Linear Algebra Paper by Paolo Bientinesi and Robert van de Geijn Presented by

Chapter 1 What is Linear Algebra? Chapter 1 What is Linear Algebra? The study of linear

Graphics 2014 Linear Algebra II Linear Maps & Matrices Linear Maps & Matrices CORE

Dense Flow Visualization Lecture 10 February 27, 2020 General Overview Dense methods in 2D

A Massively Parallel Dense Symmetric A Massively Parallel Dense Symmetric A Massively Parallel

Results for different matrices and comparisons Dense Matrices Rectangular Matrices

Linear Algebra Linear algebra has become as basic and as applicable as calculus, and

PV Math Department MCL Vision Credit Options Credit General General/Post- College Honors

CS 294-73 Software Engineering for Scientific Computing Lecture 10:Dense Linear

Linear algebra explained in four pages Excerpt from the N O BULLSHIT GUIDE TO LINEAR ALGEBRA by

Matrices Basic Linear Algebra Overview Lecture will cover why matrices and linear algebra

MATRICES AND LINEAR ALGEBRA Linear Algebra Matrix manipulation is the original essence of

Expressive Linear Algebra in Haskell Henning Thielemann 2019-08-21 Expressive Linear Algebra in

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Basic Concepts in Linear Algebra Department of Mathematics Boise State University September 14,

Grand Unified File Index Development, Deployment, and Performance Update Dominic Manno May 22,

Derivations from the disc algebra into natural modules Yemon Choi University of Saskatchewan

Reap Rewards Now 25 January 2012 Speaker Profile Cliff Sperber German Martinez Jennifer Slomack

Data-Intensive Applications on Numerically-Intensive Supercomputers David Daniel / James Ahrens

The Global Status of Citizen Cyberscience David P. Anderson Space Sciences Laboratory U.C.

Storage Lessons from HPC: Extreme Scale Computing Driving Economical Storage Solutions into

1-Steiner Routing by Kahng/Robins Perform 1-Steiner Routing by Kahng/Robins Need an initial

Disasters Dr. Robin Murphy, Professor & Director Center for Emergency Informatics Center for

Dense Linear Algebra on Heterogeneous Platforms: State of the Art - PowerPoint PPT Presentation

Dense Linear Algebra on Heterogeneous Platforms: State of the Art and Trends Paolo Bientinesi AICES, RWTH Aachen pauldj@aices.rwth-aachen.de ComplexHPC Spring School 2013 Heterogeneous computing - Impact on algorithms June 7th, 2013 Uppsala

Lecture 14: Dense Linear Algebra David Bindel 18 Oct 2010 Where we are This week: dense

Automation in Dense Linear Algebra Paper by Paolo Bientinesi and Robert van de Geijn Presented by

Chapter 1 What is Linear Algebra? Chapter 1 What is Linear Algebra? The study of linear

Graphics 2014 Linear Algebra II Linear Maps &amp; Matrices Linear Maps &amp; Matrices CORE

Dense Flow Visualization Lecture 10 February 27, 2020 General Overview Dense methods in 2D

A Massively Parallel Dense Symmetric A Massively Parallel Dense Symmetric A Massively Parallel

Results for different matrices and comparisons Dense Matrices Rectangular Matrices

Linear Algebra Linear algebra has become as basic and as applicable as calculus, and

PV Math Department MCL Vision Credit Options Credit General General/Post- College Honors

CS 294-73 Software Engineering for Scientific Computing Lecture 10:Dense Linear

Linear algebra explained in four pages Excerpt from the N O BULLSHIT GUIDE TO LINEAR ALGEBRA by

Matrices Basic Linear Algebra Overview Lecture will cover why matrices and linear algebra

MATRICES AND LINEAR ALGEBRA Linear Algebra Matrix manipulation is the original essence of

Expressive Linear Algebra in Haskell Henning Thielemann 2019-08-21 Expressive Linear Algebra in

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Basic Concepts in Linear Algebra Department of Mathematics Boise State University September 14,

Grand Unified File Index Development, Deployment, and Performance Update Dominic Manno May 22,

Derivations from the disc algebra into natural modules Yemon Choi University of Saskatchewan

Reap Rewards Now 25 January 2012 Speaker Profile Cliff Sperber German Martinez Jennifer Slomack

Data-Intensive Applications on Numerically-Intensive Supercomputers David Daniel / James Ahrens

The Global Status of Citizen Cyberscience David P. Anderson Space Sciences Laboratory U.C.

Storage Lessons from HPC: Extreme Scale Computing Driving Economical Storage Solutions into

1-Steiner Routing by Kahng/Robins Perform 1-Steiner Routing by Kahng/Robins Need an initial

Disasters Dr. Robin Murphy, Professor &amp; Director Center for Emergency Informatics Center for

Graphics 2014 Linear Algebra II Linear Maps & Matrices Linear Maps & Matrices CORE

Disasters Dr. Robin Murphy, Professor & Director Center for Emergency Informatics Center for