A Parallel Generator of Non-Hermitian Matrices computed from Known - PowerPoint PPT Presentation

A Parallel Generator of Non-Hermitian Matrices computed from Known Given Spectra Xinzhe WU 1 , 2 Serge G. Petiton 1 , 2 Herv´ e Galicher 3 Christophe Calvin 4 1Maison de la Simulation/CNRS, Gif-sur-Yvette, 91191, France 2 CRIStAL, Universit´ e de Lille, France 3 King Abdullah University of Science and Technology, Saudi Arabia 4 CEA Saclay, France Minisymposium 89: Scalable Eigenvalue Computation March 09, 2018 SIAM Parallel Processing for Scientific Computing 2018, Tokyo, Japan

Introduction Outline Introduction 1 A Scalable Matrix Generator from Given Spectra (SMG2S) 2 Experimentations, evaluation and analysis 3 Accuracy Verification 4 Conclusion and Perspectives 5 2 / 24

Introduction Eigenvalues and eigenvalue problems Eigenvalues and eigenvectors For a square matrix A , if there is a vector u ∈ C n such that Au = λ u for some scalar λ , then λ is called the eigenvalue of A with corresponding (right) eigenvector u . Applications of eigenvalue problems : 1 numerical simulation the Schr¨ odinger equation [8], molecular simulation [11], geology [7], etc. preconditioners for solving linear systems, e.g. UCGLE [12]. 2 machine learning and pattern recognition principal component analysis (PCA) [4] Fisher discriminant analysis (FDA) [2] clustering [9], etc. 3 / 24

Introduction Requirement of large-scale matrix generator The backgroud: the eigenvalue problem size in both machine learning and numerical simulation is increasing; the numerical methods should be ajusted to the coming exascale platforms. Thus there are three special requirements on the test matrices for the evaluation of numerical algorithms: their spectra must be known and can be easily controlled; they should be both sparse, non-Hermitian and non-trivial; they could have a very high dimension, which includes the non-zero element numbers and/or the matrix dimension to evaluate the algorithms on large-scale systems. 4 / 24

Introduction Related works The related work: the Time Davis collection [5]; the Matrix Market collection [3]; Bai’s collection [1]; J. Demmel’s generation suite in 1989 to benchmark LAPACK [6], etc. Only the proposed method by J. Demmel generate the test matrices with given spectra, which can transfer the diagonal matrix with given spectra into a dense matrix with same spectra using the orthogonal matrices, and then reduce them to unsymmetric band ones by the Householder transformation. This method requires O ( n 3 ) time and O ( n 2 ) storage even for generating a small bandwidth matrix. 5 / 24

A Scalable Matrix Generator from Given Spectra (SMG2S) Outline Introduction 1 A Scalable Matrix Generator from Given Spectra (SMG2S) 2 Experimentations, evaluation and analysis 3 Accuracy Verification 4 Conclusion and Perspectives 5 6 / 24

A Scalable Matrix Generator from Given Spectra (SMG2S) Mathematical notations (H. Galicher et. al) For all matrices A ∈ C n × n , M ∈ C n × n , n ∈ N , a linear operator Ê A A of matrix M determined by matrix A can be set up as Formule (1): I Ê A A : C n × n → C n × n , (1) M → AM − MA . k ÿ ( Ê A A ) k ( M 0 ) = ( − 1) m C m k A k − m M 0 A m . (2) m =0 M i +1 = M i + 1 i !( Ê A A ) i ( M 0 ) , i ∈ (0 , + ∞ ) . (3) i In order to make ] ( A A ) tends to 0 in limited steps, it is necessary that A = B − 1 PB , then we set the matrix P to be nilpotent, and the matrix B to be the identity matrix I ∈ N n × n for simplification based on the preliminary theoretical research [10]. 7 / 24

A Scalable Matrix Generator from Given Spectra (SMG2S) SMG2S Algorithm (H. Galicher et. al) The SMG2S algorithm is given as: Algorithm 1 Matrix Generation Method Input: Spec in ∈ C n , h , d Output: M t ∈ C n × n 1: Insert random elements in h lower diagonals of M o ∈ N n × n 2: Insert Spec in on the diagonal of M 0 and M 0 = (2 d − 2)! M 0 3: Randomly insert 1 and 0 on sub-diagonal of A ∈ N n × n with the maxi- mum continuous length of 1 to be d 4: for i = 0 , · · · , 2( d − 2) − 1 do M i +1 = M i + ( r 2 d − 2 k = i +1 k )( Ê A A ) i ( M 0 ) 5: 6: end for 1 7: M t = (2 d − 2)! M 2 d − 2 8 / 24

A Scalable Matrix Generator from Given Spectra (SMG2S) Parallel Implementation of CPUs and GPUs (X. Wu and S. Petiton) We implement SMG2S on homogenous and heterogeneous machines. The former is implemented based on MPI and PETSc 1 , the latter is based on MPI, CUDA, and PETSc. The kernel of implementation is the SpGEMM. Host (CPU) Host (CPU) Device (GPU) d , ) eff d , ) eff d d ) abc , ) abc , ` d = ) abc d _ iej d + ) eff d _ ekl d d , _ eff d , _ ekl d d _ abc _ iej g , ) eff g , ) eff g g ) abc , ) abc , )×_ ` g = ) abc g + ) eff g _ iej g _ ekl g ` = ) × _ g , _ eff g , _ ekl g g _ abc _ iej h , ) eff h , ) eff ) abc h h MPI & CUDA , ) abc , ` h = ) abc h h h h _ iej + ) eff _ ekl h , _ eff h h , _ ekl _ abc _ iej h MPI MPI CUDA Figure: The structure of a CPU-GPU implementation of SpGEMM, where each GPU is attached to a CPU. The GPU is in charge of the computation, while the CPU handles the MPI communication among processes. 1 Portable, Extensible Toolkit for Scientific Computation 9 / 24

Experimentations, evaluation and analysis Outline Introduction 1 A Scalable Matrix Generator from Given Spectra (SMG2S) 2 Experimentations, evaluation and analysis 3 Accuracy Verification 4 Conclusion and Perspectives 5 10 / 24

Experimentations, evaluation and analysis Experimental hardware environment We implement SMG2S on the supercomputers Tianhe-2 and Romeo . The node specfication for the two platforms is given as following: Table: Node Specifications of the cluster ROMEO and Tianhe-2 Machine Name ROMEO Tiahhe-2 Nodes Number BullX R421 ◊ 130 16000 ◊ nodes Mother Board SuperMicro X9DRG-QF Specific Infiniband CPU 2 ◊ Intel Ivy Bridge 8 cores 2.6 GHz 2 ◊ Intel Ivy Bridge 12 cores 2.2 GHz Memory DDR3 32GB DDR3 64GB Accelerator NVIDIA GPU Tesla K20X ◊ 2 Intel Knights Corner ◊ 3 11 / 24

Experimentations, evaluation and analysis Strong and Weak Scalability Evaluation (X. Wu and S. Petiton) The strong and weak scaling tests on CPUs are given as: 10 6 CD-SS CD-WS CD-SS CD-WS 10 5 CS-SS CS-WS CS-SS CS-WS 10 5 RD-SS RD-WS RD-SS RD-WS RS-SS RS-WS RS-SS RS-WS 10 4 10 4 Time (s) Time (s) 10 3 10 3 10 2 10 2 10 1 10 1 48 96 192 384 768 1536 16 32 64 128 256 Number of CPU cores (Tianhe-2) Number of CPU cores (ROMEO) Figure: Strong and weak scalability on Tianhe-2 and Romeo . A base 2 logarithmic scale is used for X-axis, and a base 10 logarithmic scale for Y-axis.“CD” is short for “complex double”, “CS” for “complex single”, “RD” for “real double”, “RS” for “real single”, “SS” for “strong scalability”, and “WS” for “weak scalability”. On Tianhe-2 , the matrix size for strong scalability is 1 . 6 × 10 7 , and the matrix sizes for weak scalability range from 1 . 0 × 10 6 to 3 . 2 × 10 7 . On Romeo , the matrix size for strong scalability is 3 . 2 × 10 6 , and the matrix sizes for weak scalability range from 4 . 0 × 10 5 to 6 . 4 × 10 6 . h and d are respectively 8 and 4. 12 / 24

Experimentations, evaluation and analysis Strong and Weak Scalability Evaluation (X. Wu and S. Petiton) The strong and weak scaling tests on multi-GPUs are given as: 10 5 CD-SS CD-WS CS-SS CS-WS 10 4 RD-SS RD-WS RS-SS RS-WS Time (s) 10 3 10 2 10 1 4 8 16 32 64 Number of GPUs (ROMEO) Figure: Strong and weak scalability of GPUs on Romeo . A base 2 logarithmic scale is used for X-axis, and a base 10 logarithmic scale for Y-axis.“CD” is short for “complex double”, “CS” for “complex single”, “RD” for “real double”, “RS” for “real single”, “SS” for “strong scalability”, and “WS” for “weak scalability”. The matrix size for strong scalability is 8 . 0 × 10 5 , and the matrix sizes for weak scalability range from 2 . 0 × 10 5 to 3 . 2 × 10 6 . h and d are respectively 8 and 4. 13 / 24

Experimentations, evaluation and analysis Multi-GPU Speedup Evaluation (X. Wu and S. Petiton) The multi-GPUs speedup over CPUs is given as: Weak Scaling Speedup of GPUs vs CPUs on ROMEO 3 . 0 SMG2S on CPU SMG2S on GPU 2 . 5 Speedup/4CPUs 2.2 2.2 2.1 2 . 0 1.9 1.9 1 . 5 1.2 1.2 1.2 1.0 1 . 0 0.9 0 . 5 0 . 0 4 8 16 32 64 CPU or GPU number Figure: Weak scaling speedup of GPUs vs CPUs on Romeo with real double scalar type. X-axis refers to computing unit number from 4 to 64, and Y-axis refers to the speedup of CPUs or GPUs over time spent by 4 CPUs with matrix size 2 . 0 × 10 5 . The matrix sizes for the weak scalability are respectively 2 . 0 × 10 5 , 4 . 0 × 10 5 , 8 . 0 × 10 5 , 1 . 6 × 10 6 and 3 . 2 × 10 6 . h and d are respectively 8 and 4. 14 / 24

Accuracy Verification Outline Introduction 1 A Scalable Matrix Generator from Given Spectra (SMG2S) 2 Experimentations, evaluation and analysis 3 Accuracy Verification 4 Conclusion and Perspectives 5 15 / 24

Accuracy Verification Verification method (X. Wu and S. Petiton) We proposed a method to check the ability of SMG2S to keep the given spectra based on the Shifted Inverse Power Method. Algorithm 2 Shifted Inverse Power Method Input: Matrix A , initial guess for desired eigenvalue σ , initial vector v 0 Output: Approximate eigenpair ( θ , v ) 1: y = v 0 2: for i = 1 , 2 , 3 · · · do θ = || y || ∞ , v = y / θ 3: Solve ( A − σ I ) y = v 4: 5: end for Check error error = || Av Õ − λ v Õ || || Av Õ || 16 / 24

A Parallel Generator of Non-Hermitian Matrices computed from Known - PowerPoint PPT Presentation

A Parallel Generator of Non-Hermitian Matrices computed from Known Given Spectra Xinzhe WU 1 , 2 Serge G. Petiton 1 , 2 Herv e Galicher 3 Christophe Calvin 4 1Maison de la Simulation/CNRS, Gif-sur-Yvette, 91191, France 2 CRIStAL, Universit e

A Parallel Generator of Non-Hermitian Matrices computed from Given Spectra Xinzhe WU 1 , 2 Serge

A Parallel Generator of Non-Hermitian Matrices computed from Known Given Spectra Xinzhe WU 1 , 2

A Scalable Generator of Non-Hermitian Test Matrices computed from Given Spectra for Large-scale

Results for different matrices and comparisons Dense Matrices Rectangular Matrices

Products of Non-Hermitian Random Matrices David Renfrew Department of Mathematics University of

MATHEMATICS 1 CONTENTS Matrices Special matrices Operations with matrices Matrix

A 1 -Representability of Hermitian K -Theory. 1 A 1 -Representability of Hermitian K -Theory.

Computed Tomography Outline X-RAY Computed Tomography Artifacts and Sources of Error

Eigenvalues of sums of hermitian matrices and the cohomology of Grassmannians Edward Richmond

Signatures of Hermitian forms and Hermitian forms unitary representations Char formulas for

III The The Shilov Shilov boundary boundary III V Positive Positive Hermitian Hermitian

JUST THE MATHS SLIDES NUMBER 9.10 MATRICES 10 (Symmetric matrices & quadratic forms)

STATPHYS 20, Paris, 2024 juillet 1998, T1028:PO01/50 A problem of random matrices: hermitian

On the interface between Hermitian and normal random matrices Yacin Ameur. Centre for

ARM memory generator Arm Memory generator Make sure you create a folder similar to what you

Build your own VTA design with Chisel Luis Vega VTA-generator vision VTA-generator vision

SPECTRAL SCHUR COMPLEMENT TECHNIQUES FOR SYMMETRIC EIGENVALUE PROBLEMS VASSILIS KALANTZIS ,

Direct methods for symmetric eigenvalue Algorithms problems Eigenvalues of a tridiagonal matrix

Lie Theory From Basics to the Heisenberg Lie Group Noah Migoski IU Math DRP April, 2020 Noah

THE MSSM FROM SS BREAKING MARIANO QUIROS, ICREA/IFAE HEP 2006 THE MSSM FROM SS BREAKING

Identifying the frequency selection of fluid/structure instabilities when the interaction is

Playing with Maya thru MEL/ API Min Gyu Choi Kwangwoon University Alias Maya Alias|Wavefront

NEMO5 on Blue Waters - A Flexible Package for Nanoelectronics Modeling Problems Jim Fonseca

Linear Transformation Transformation Linear with CG & animation with CG & animation

A Parallel Generator of Non-Hermitian Matrices computed from Known - PowerPoint PPT Presentation

A Parallel Generator of Non-Hermitian Matrices computed from Known Given Spectra Xinzhe WU 1 , 2 Serge G. Petiton 1 , 2 Herv e Galicher 3 Christophe Calvin 4 1Maison de la Simulation/CNRS, Gif-sur-Yvette, 91191, France 2 CRIStAL, Universit e

A Parallel Generator of Non-Hermitian Matrices computed from Given Spectra Xinzhe WU 1 , 2 Serge

A Parallel Generator of Non-Hermitian Matrices computed from Known Given Spectra Xinzhe WU 1 , 2

A Scalable Generator of Non-Hermitian Test Matrices computed from Given Spectra for Large-scale

Results for different matrices and comparisons Dense Matrices Rectangular Matrices

Products of Non-Hermitian Random Matrices David Renfrew Department of Mathematics University of

MATHEMATICS 1 CONTENTS Matrices Special matrices Operations with matrices Matrix

A 1 -Representability of Hermitian K -Theory. 1 A 1 -Representability of Hermitian K -Theory.

Computed Tomography Outline X-RAY Computed Tomography Artifacts and Sources of Error

Eigenvalues of sums of hermitian matrices and the cohomology of Grassmannians Edward Richmond

Signatures of Hermitian forms and Hermitian forms unitary representations Char formulas for

III The The Shilov Shilov boundary boundary III V Positive Positive Hermitian Hermitian

JUST THE MATHS SLIDES NUMBER 9.10 MATRICES 10 (Symmetric matrices &amp; quadratic forms)

STATPHYS 20, Paris, 2024 juillet 1998, T1028:PO01/50 A problem of random matrices: hermitian

On the interface between Hermitian and normal random matrices Yacin Ameur. Centre for

ARM memory generator Arm Memory generator Make sure you create a folder similar to what you

Build your own VTA design with Chisel Luis Vega VTA-generator vision VTA-generator vision

SPECTRAL SCHUR COMPLEMENT TECHNIQUES FOR SYMMETRIC EIGENVALUE PROBLEMS VASSILIS KALANTZIS ,

Direct methods for symmetric eigenvalue Algorithms problems Eigenvalues of a tridiagonal matrix

Lie Theory From Basics to the Heisenberg Lie Group Noah Migoski IU Math DRP April, 2020 Noah

THE MSSM FROM SS BREAKING MARIANO QUIROS, ICREA/IFAE HEP 2006 THE MSSM FROM SS BREAKING

Identifying the frequency selection of fluid/structure instabilities when the interaction is

Playing with Maya thru MEL/ API Min Gyu Choi Kwangwoon University Alias Maya Alias|Wavefront

NEMO5 on Blue Waters - A Flexible Package for Nanoelectronics Modeling Problems Jim Fonseca

Linear Transformation Transformation Linear with CG &amp; animation with CG &amp; animation

JUST THE MATHS SLIDES NUMBER 9.10 MATRICES 10 (Symmetric matrices & quadratic forms)

Linear Transformation Transformation Linear with CG & animation with CG & animation