A Parallel Generator of Non-Hermitian Matrices computed from Given - - PowerPoint PPT Presentation

a parallel generator of non hermitian matrices computed
SMART_READER_LITE
LIVE PREVIEW

A Parallel Generator of Non-Hermitian Matrices computed from Given - - PowerPoint PPT Presentation

A Parallel Generator of Non-Hermitian Matrices computed from Given Spectra Xinzhe WU 1 , 2 Serge G. Petiton 1 , 2 Yutong Lu 3 1 Maison de la Simulation, Gif-sur-Yvette, 91191, France 2 CRIStAL, Universit e de Lille, France 3 National


slide-1
SLIDE 1

A Parallel Generator of Non-Hermitian Matrices computed from Given Spectra

Xinzhe WU1,2 Serge G. Petiton1,2 Yutong Lu3

1Maison de la Simulation, Gif-sur-Yvette, 91191, France 2CRIStAL, Universit´

e de Lille, France

3National Supercomputing Center in Guangzhou, Sun Yat-sen University, China

VECPAR18 S˜ ao Pedro, Brazil, 2018

slide-2
SLIDE 2

Introduction

Outline

1

Introduction

2

A Scalable Matrix Generator from Given Spectra (SMG2S)

3

Experimentations, evaluation and analysis

4

Accuracy Verification

5

Application: Krylov Solvers Evaluation using SMG2S

6

Conclusion and Perspectives

Xinzhe WU (MDLS, France) A Scalable Test Matrix Generator S˜ ao Pedro, Brazil, 2018 2 / 35

slide-3
SLIDE 3

Introduction

Linear System Solvers and Spectra

When we solve the linear systems Ax = b by the Krylov Subspace methods, such as GMRES (Saad and Schultz (1986)), with A a non-Hermitian matrix. The spectra have more or less the impact during the procedure of resolution by these methods, such as:

1 Convergence Analysis; 2 Preconditioners; 3 Deflation of eigenvalues; 4 Recyling of eigenvalues for a sequence of linear systems; 5 etc. Xinzhe WU (MDLS, France) A Scalable Test Matrix Generator S˜ ao Pedro, Brazil, 2018 3 / 35

slide-4
SLIDE 4

Introduction

Requirement of large-scale matrix generator

Today: the linear problem size is increasing; the numerical methods should adjust to the coming exascale platforms. Thus there are four special requirements on the test matrices for the eval- uation of numerical algorithms: their spectra must be known and can be customized; they should be sparse, non-Hermitian and non-trivial; they could have a very high dimension to evaluate the algorithms on large-scale systems; they should be generated in parallel with good scalability performance and low memory requirement during the procedure of generation.

Xinzhe WU (MDLS, France) A Scalable Test Matrix Generator S˜ ao Pedro, Brazil, 2018 4 / 35

slide-5
SLIDE 5

Introduction

Related works

The related work: Saad’s SPARSKIT (Saad (1990)); Tim Davis collection (Davis and Hu (2011)); Matrix Market collection (Boisvert et al. (1997)); Bai’s collection (Bai et al. (1996)) Galeri package of Trilinos to generate simple well-know finite element and finite difference matrices;

  • J. Demmel’s generation suite in 1989 to benchmark LAPACK

(Demmel and McKenney (1989)), etc. Only the method by Demmel generate matrices with given spectra, which can transfer the diagonal matrix into a dense matrix by the orthogonal matrices, and then reduce them to unsymmetric band ones by Householder

  • transformation. This method requires O(n3) time and O(n2) storage even

for generating a small bandwidth matrix.

Xinzhe WU (MDLS, France) A Scalable Test Matrix Generator S˜ ao Pedro, Brazil, 2018 5 / 35

slide-6
SLIDE 6

A Scalable Matrix Generator from Given Spectra (SMG2S)

Outline

1

Introduction

2

A Scalable Matrix Generator from Given Spectra (SMG2S)

3

Experimentations, evaluation and analysis

4

Accuracy Verification

5

Application: Krylov Solvers Evaluation using SMG2S

6

Conclusion and Perspectives

Xinzhe WU (MDLS, France) A Scalable Test Matrix Generator S˜ ao Pedro, Brazil, 2018 6 / 35

slide-7
SLIDE 7

A Scalable Matrix Generator from Given Spectra (SMG2S)

Mathematical notations

Based on the preliminary theoretical work of H. Galicher (Galicher et al. (2014)), for all matrices A ∈ Cn×n, M ∈ Cn×n, n ∈ N, a linear operator AA

  • f matrix M determined by matrix A can be set up as Formule (1):

AA :Cn×n → Cn×n, M → AM − MA. (1) ( AA)k(M0) =

k

  • m=0

(−1)mCm

k Ak−mM0Am.

(2) Mi+1 = Mi + 1 i!( AA)i(M0), i ∈ (0, +∞). (3) In order to make (AA)

i

tends to 0 in limited steps, we select A to be a nilpotent matrix.

Xinzhe WU (MDLS, France) A Scalable Test Matrix Generator S˜ ao Pedro, Brazil, 2018 7 / 35

slide-8
SLIDE 8

A Scalable Matrix Generator from Given Spectra (SMG2S)

Nilpotent Matrix

The selected nilpotent matrix is given as:

1 1 1 1 1 1 1 𝑞 𝑒 𝑜 …

Figure: Nilpotent Matrix.

If p = 1, with d ∈ N∗, or p = 2 with d ∈ N∗ to be even, the nilpotency of A is d + 1.

Xinzhe WU (MDLS, France) A Scalable Test Matrix Generator S˜ ao Pedro, Brazil, 2018 8 / 35

slide-9
SLIDE 9

A Scalable Matrix Generator from Given Spectra (SMG2S)

SMG2S Algorithm

The SMG2S algorithm is given as: Algorithm 1 Matrix Generation Method Input: Specin ∈ Cn, p, h, d Output: Mt ∈ Cn×n

1: Insert random elements in h lower diagonals of Mo ∈ Cn×n 2: Insert Specin on the diagonal of M0 and M0 = (2d)!M0 3: Generate the nilpotent matrix A ∈ Nn×n with parameters p and d 4: for i = 0, · · · , 2d − 1 do 5:

Mi+1 = Mi + (2d

k=i+1 k)(

AA)i(M0)

6: end for 7: Mt =

1 (2d)!M2d

Xinzhe WU (MDLS, France) A Scalable Test Matrix Generator S˜ ao Pedro, Brazil, 2018 9 / 35

slide-10
SLIDE 10

A Scalable Matrix Generator from Given Spectra (SMG2S)

Matrix Generation Example

Through SMG2S, this nilpotent matrix can transfer an low band matrix to be a band matrix which have same spectrum.

h h l < 2pd

Figure: Matrix Generation Example.

Operation complexity is max(O(hdn), O(d2n)). If d ≪ n and h ≪ n, it turns out to be O(n) operations and memory space.

Xinzhe WU (MDLS, France) A Scalable Test Matrix Generator S˜ ao Pedro, Brazil, 2018 10 / 35

slide-11
SLIDE 11

A Scalable Matrix Generator from Given Spectra (SMG2S)

Matrix Generation Example

Through SMG2S, this nilpotent matrix can transfer an low band matrix to be a band matrix which have same spectrum.

Figure: Matrix Generation Sparsity Pattern.

Operation complexity is max(O(hdn), O(d2n)). If d ≪ n and h ≪ n, it turns out to be O(n) operations and memory space.

Xinzhe WU (MDLS, France) A Scalable Test Matrix Generator S˜ ao Pedro, Brazil, 2018 11 / 35

slide-12
SLIDE 12

A Scalable Matrix Generator from Given Spectra (SMG2S)

Parallel Implementation of CPUs and GPUs

We implement SMG2S on homogenous and heterogeneous machines. The former is implemented based on MPI and PETSc, the latter is based on MPI, CUDA, and PETSc. The kernel of implementation is the SpGEMM.

)×_

` = )×_

Host (CPU) Host (CPU) Device (GPU) MPI MPI CUDA MPI & CUDA )abc

d , )eff d

, _abc

d , _eff d

)abc

g , )eff g

, _abc

g , _eff g

)abc

h , )eff h

, _abc

h , _eff h

)abc

d , )eff d

, _iej

d , _ekl d

)abc

g , )eff g

, _iej

g , _ekl g

)abc

h , )eff h

, _iej

h , _ekl h

`d = )abc

d

_iej

d

+ )eff

d

_ekl

d

`g = )abc

g

_iej

g + )eff g

_ekl

g

`h = )abc

h

_iej

h

+ )eff

h

_ekl

h

Figure: The structure of a CPU-GPU implementation of SpGEMM, where each GPU is attached to a CPU. The GPU is in charge of the computation, while the CPU handles the MPI communication among processes.

Xinzhe WU (MDLS, France) A Scalable Test Matrix Generator S˜ ao Pedro, Brazil, 2018 12 / 35

slide-13
SLIDE 13

A Scalable Matrix Generator from Given Spectra (SMG2S)

Optimized Communication Implementation on CPUs

The implementation of SMG2S, especially the parallel SpGEMM kernel’s communication can be specifically optimized based on the particular prop- erty of nilpotent matrix A.

Proc 0 Proc 1 Proc 2 Proc 3 M (a) AM M MA 𝑞 (b) 𝑞 𝑞 𝑞

𝑒 +1 2𝑒 + 2 𝑒 +1 2𝑒 + 2

Figure: (a) AM operation; (b) MA operation.

Xinzhe WU (MDLS, France) A Scalable Test Matrix Generator S˜ ao Pedro, Brazil, 2018 13 / 35

slide-14
SLIDE 14

Experimentations, evaluation and analysis

Outline

1

Introduction

2

A Scalable Matrix Generator from Given Spectra (SMG2S)

3

Experimentations, evaluation and analysis

4

Accuracy Verification

5

Application: Krylov Solvers Evaluation using SMG2S

6

Conclusion and Perspectives

Xinzhe WU (MDLS, France) A Scalable Test Matrix Generator S˜ ao Pedro, Brazil, 2018 14 / 35

slide-15
SLIDE 15

Experimentations, evaluation and analysis

Experimental hardware environment

We implement SMG2S on the supercomputers Tianhe-2 and Romeo. The node specfication for the two platforms is given as following:

Table: Node Specifications of the cluster ROMEO and Tianhe-2

Machine Name ROMEO Tianhe-2 Nodes Number BullX R421 × 130 16000 × nodes Mother Board SuperMicro X9DRG-QF Specific Infiniband CPU 2×Intel Ivy Bridge 8 cores 2.6 GHz 2×Intel Ivy Bridge 12 cores 2.2 GHz Memory DDR3 32GB DDR3 64GB Accelerator NVIDIA GPU Tesla K20X × 2 Intel Knights Corner × 3 Xinzhe WU (MDLS, France) A Scalable Test Matrix Generator S˜ ao Pedro, Brazil, 2018 15 / 35

slide-16
SLIDE 16

Experimentations, evaluation and analysis

Scalability and Speedup Evaluation I

The scaling evaluations of CPUs on ROMEO are given as:

16 32 64 128 256 Number of CPUs (Tianhe-2) 101 102 103 Time (s)

complex double real double complex double (optimized) real double (optimized)

(a) CPU strong scaling on ROMEO.

16 32 64 128 256 Number of CPUs (Tianhe-2) 101 102 103 Time (s)

complex double real double complex double (optimized) real double (optimized)

(b) CPU weak scaling on ROMEO.

Xinzhe WU (MDLS, France) A Scalable Test Matrix Generator S˜ ao Pedro, Brazil, 2018 16 / 35

slide-17
SLIDE 17

Experimentations, evaluation and analysis

Scalability and Speedup Evaluation II

The scaling evaluations of CPUs on Tianhe-2 are given as:

48 96 192 384 768 1536 Number of CPUs (Tianhe-2) 100 101 102 103 Time (s)

complex double real double complex double (optimized) real double (optimized)

(c) CPU strong scaling on Tianhe-2.

48 96 192 384 768 1536 Number of CPUs (Tianhe-2) 100 101 102 103 Time (s)

complex double real double complex double (optimized) real double (optimized)

(d) CPU weak scaling on Tianhe-2.

Xinzhe WU (MDLS, France) A Scalable Test Matrix Generator S˜ ao Pedro, Brazil, 2018 17 / 35

slide-18
SLIDE 18

Experimentations, evaluation and analysis

Scalability and Speedup Evaluation III

The scaling evaluations of GPUs on ROMEO are given as:

4 8 16 32 64 Number of GPUs (ROMEO) 102 103 Time (s)

complex double real double

(e) GPU strong scaling on ROMEO.

4 8 16 32 64 Number of GPUs (ROMEO) 102 103 Time (s)

complex double complex double

(f) GPU weak scaling on ROMEO.

Xinzhe WU (MDLS, France) A Scalable Test Matrix Generator S˜ ao Pedro, Brazil, 2018 18 / 35

slide-19
SLIDE 19

Experimentations, evaluation and analysis

Scalability and Speedup Evaluation IV

The speedup evaluations of different implementations are given as:

4 8 16 32 64

CPU or GPU number

2 4 6 8 10

Speedup/4CPUs

1.0 0.9 1.0 1.0 1.0 1.9 1.9 1.9 1.8 1.8 8.4 8.4 8.1 8.0 7.9

SMG2S on CPU SMG2S on GPU Optimized SMG2S on CPU

(g) Speedup of different implementation

Xinzhe WU (MDLS, France) A Scalable Test Matrix Generator S˜ ao Pedro, Brazil, 2018 19 / 35

slide-20
SLIDE 20

Accuracy Verification

Outline

1

Introduction

2

A Scalable Matrix Generator from Given Spectra (SMG2S)

3

Experimentations, evaluation and analysis

4

Accuracy Verification

5

Application: Krylov Solvers Evaluation using SMG2S

6

Conclusion and Perspectives

Xinzhe WU (MDLS, France) A Scalable Test Matrix Generator S˜ ao Pedro, Brazil, 2018 20 / 35

slide-21
SLIDE 21

Accuracy Verification

Verification method

We proposed a method to check the ability of SMG2S to keep the given spectra based on the Shifted Inverse Power Method. Algorithm 2 Shifted Inverse Power Method Input: Matrix A, initial guess for desired eigenvalue σ, initial vector v0 Output: Approximate eigenpair (θ, v)

1: y = v0 2: for i = 1, 2, 3 · · · do 3:

θ = ||y||∞, v = y/θ

4:

Solve (A − σI)y = v

5: end for

Check error error = ||Av′−λv′||

||Av′||

Xinzhe WU (MDLS, France) A Scalable Test Matrix Generator S˜ ao Pedro, Brazil, 2018 21 / 35

slide-22
SLIDE 22

Accuracy Verification

Verification results I

The verification tests have been done with 4 different types of spectra.

6 8 10 12 14 Real Axis 2 4 6 8 10 Imaginary Axis

Initial Eigenvalues Computed Eigenvalues

(h) Clustered Eigenvalues I: acceptance = 93%, max

error = 2 × 10−2 −0.10 −0.05 0.00 0.05 0.10 Real Axis

−0.10 −0.05 0.00 0.05 0.10

Imaginary Axis

Initial Eigenvalues Computed Eigenvalues

(i) Clustered Eigenvalues II: acceptance = 100%, max

error = 7 × 10−5 Xinzhe WU (MDLS, France) A Scalable Test Matrix Generator S˜ ao Pedro, Brazil, 2018 22 / 35

slide-23
SLIDE 23

Accuracy Verification

Verification results II

The verification tests have been done with 4 different types of spectra.

5 10 15 Real Axis −10 −5 5 10 Imaginary Axis

Initial Eigenvalues Computed Eigenvalues

(j) Dominant Clustered Eigenvalues: acceptance = 94%,

max error = 3 × 10−2 6 8 10 12 14 Real Axis

−6 −4 −2 2 4 6

Imaginary Axis

Initial Eigenvalues Computed Eigenvalues

(k) Conjugate and Closest Eigenvalues: acceptance =

100%, max error = 3 × 10−7 Xinzhe WU (MDLS, France) A Scalable Test Matrix Generator S˜ ao Pedro, Brazil, 2018 23 / 35

slide-24
SLIDE 24

Application: Krylov Solvers Evaluation using SMG2S

Outline

1

Introduction

2

A Scalable Matrix Generator from Given Spectra (SMG2S)

3

Experimentations, evaluation and analysis

4

Accuracy Verification

5

Application: Krylov Solvers Evaluation using SMG2S

6

Conclusion and Perspectives

Xinzhe WU (MDLS, France) A Scalable Test Matrix Generator S˜ ao Pedro, Brazil, 2018 24 / 35

slide-25
SLIDE 25

Application: Krylov Solvers Evaluation using SMG2S

Workflow

An interface can be provided to PETSc, Trilinos, and any public or personal parallel solvers, which can restore the distributed data into the data struc- tures of different libraries. This feature can significantly reduce the I/O of applications and improve their efficiency to evaluate the numerical methods.

`` `` ` `

Parallel SMG2S Linear and Eigen Solvers Interface (PETSc, Trilinos, etc.)

#1 #2 #3 #4

`` `` ` `

#1 #2 #3 #4

Figure: SMG2S Workflow and Interface.

Xinzhe WU (MDLS, France) A Scalable Test Matrix Generator S˜ ao Pedro, Brazil, 2018 25 / 35

slide-26
SLIDE 26

Application: Krylov Solvers Evaluation using SMG2S

An Example

We have also using SMG2S to evaluate different iterative methods for solving non-Hermitian linear systems.

250 500 750 1000 1250 1500 1750

GMRES iteration step number

1e-10 1e-8 1e-6 1e-4 1e-2 1

Residual

SOR Jacobi No preconditioner a hybrid method

Figure: Convergence Comparison using a matrix generated by SMG2S.

Xinzhe WU (MDLS, France) A Scalable Test Matrix Generator S˜ ao Pedro, Brazil, 2018 26 / 35

slide-27
SLIDE 27

Conclusion and Perspectives

Outline

1

Introduction

2

A Scalable Matrix Generator from Given Spectra (SMG2S)

3

Experimentations, evaluation and analysis

4

Accuracy Verification

5

Application: Krylov Solvers Evaluation using SMG2S

6

Conclusion and Perspectives

Xinzhe WU (MDLS, France) A Scalable Test Matrix Generator S˜ ao Pedro, Brazil, 2018 27 / 35

slide-28
SLIDE 28

Conclusion and Perspectives

How to Get SMG2S

1 SMG2S is an open source software released under the GNU Lesser

General Public License v3.0.

2 Website: https://smg2s.github.io. 3 Development version is available on Github: https://github.com/SMG2S. 4 Documentation: https://smg2s.github.io/files/smg2s-manual.pdf. 5 Until now, SMG2S provides the interface to scientific computational

softwares PETSc and Trilinos/Tpetra and also C and Python program- ming languages. Contact us: xinzhe.wu@ed.univ-lille1.fr and serge.petiton@univ-lille1.fr

Xinzhe WU (MDLS, France) A Scalable Test Matrix Generator S˜ ao Pedro, Brazil, 2018 28 / 35

slide-29
SLIDE 29

Conclusion and Perspectives

Graphic User Interface

SMG2S provides the Graphic User Interface for the eigenvalues accuracy verification.

Xinzhe WU (MDLS, France) A Scalable Test Matrix Generator S˜ ao Pedro, Brazil, 2018 29 / 35

slide-30
SLIDE 30

Conclusion and Perspectives

Graphic User Interface

SMG2S provides the Graphic User Interface for the eigenvalues accuracy verification.

Xinzhe WU (MDLS, France) A Scalable Test Matrix Generator S˜ ao Pedro, Brazil, 2018 30 / 35

slide-31
SLIDE 31

Conclusion and Perspectives

Conclusion and Perspectives

Then

1 SMG2S is a method to generate large-scale non-Hermitian matrices

with good scalability and capacility to keep accuracy of given spectra;

2 Interfaces to more scientific libraries should be implemented; 3 Interface to Fotran can be supported in future.

The related talks:

  • X. Wu and S.G. Petiton: A Parallel Generator of Non-Hermintian Matrices

Computed from Known Given Spectra. SIAM PP18 in Tokyo, Japan.

  • X. Wu and S.G. Petiton: A Parallel Generator of Non-Hermitian Matrices

Computed from Given Spectra. PMAA18 in Z¨ urich, Switzerland.

Xinzhe WU (MDLS, France) A Scalable Test Matrix Generator S˜ ao Pedro, Brazil, 2018 31 / 35

slide-32
SLIDE 32

Conclusion and Perspectives

Acknowledgement

This work is partially supported by the ROMEO HPC Center Cham- pagne Ardenne for providing the use of cluster Romeo. This work is funded by the German-Japanese-French project MYX of French National Research Agency (ANR) under the SPPEXA frame- work.

Xinzhe WU (MDLS, France) A Scalable Test Matrix Generator S˜ ao Pedro, Brazil, 2018 32 / 35

slide-33
SLIDE 33

Conclusion and Perspectives

Thank you for your attentions! Questions?

Xinzhe WU (MDLS, France) A Scalable Test Matrix Generator S˜ ao Pedro, Brazil, 2018 33 / 35

slide-34
SLIDE 34

Conclusion and Perspectives

Theorem

Theorem Let’s consider the matrices A ∈ Cn×n, M0 ∈ Cn×n, n ∈ N∗. If M verifies:

    

dM(t) dt = AM(t) − M(t)A, M(t = 0) = M0. Then the matrices M(t) and M0 are similar, ∀A ∈ Cn×n.

Xinzhe WU (MDLS, France) A Scalable Test Matrix Generator S˜ ao Pedro, Brazil, 2018 34 / 35

slide-35
SLIDE 35

Conclusion and Perspectives

Proof. Denote respectively σ(M0) and σ(Mt) the spectra of M0 and Mt. If M0 is a diagonalisable matrix, ∀λ ∈ σ(M0), it exists an eigenvector v = 0 satisfies the relation: M0v = λv. (4) Denote v(t) by the matrix B ∈ In: v(t) = Btv = etABv. (5) We can get: d(Mtv(t) − λv(t)) dt = dMt dt v(t) + Mt dv(t) dt − λdv(t) dt = A(Mtv(t) − λv(t)) + λAv(t) − MtAv(t) + Mt dBt dt v − λdBt dt v. (6)

Xinzhe WU (MDLS, France) A Scalable Test Matrix Generator S˜ ao Pedro, Brazil, 2018 35 / 35

slide-36
SLIDE 36

Conclusion and Perspectives

Proof. With the definition of Bt in Equation (5), we have: dBt dt = ABt. (4) Thus the Equation (6) can be simplified as d(Mtv(t) − λv(t)) dt = A(Mtv(t) − λv(t)). (5) The initial condition for the above Equation is: Mtv(t) − λv(t)|t=0 = M0Bv − λBv = M0v − λv = 0. (6) Hence the solution of this differential Equation is 0 and ∀λ ∈ σ(M0), we have λ ∈ σ(Mt). Since dim(M0) = dim(Mt), we have σ(M0) = σ(Mt). M0 and Mt are similiar with same eigenvalues, but different eigenvectors.

Xinzhe WU (MDLS, France) A Scalable Test Matrix Generator S˜ ao Pedro, Brazil, 2018 35 / 35