A Parallel Generator of Non-Hermitian Matrices computed from Known - - PowerPoint PPT Presentation

a parallel generator of non hermitian matrices computed
SMART_READER_LITE
LIVE PREVIEW

A Parallel Generator of Non-Hermitian Matrices computed from Known - - PowerPoint PPT Presentation

A Parallel Generator of Non-Hermitian Matrices computed from Known Given Spectra Xinzhe WU 1 , 2 Serge G. Petiton 1 , 2 1 Maison de la Simulation, Gif-sur-Yvette, 91191, France 2 CRIStAL, Universit e de Lille, France PMAA18, Zurich, Jun. 2018


slide-1
SLIDE 1

A Parallel Generator of Non-Hermitian Matrices computed from Known Given Spectra

Xinzhe WU1,2 Serge G. Petiton1,2

1Maison de la Simulation, Gif-sur-Yvette, 91191, France 2CRIStAL, Universit´

e de Lille, France PMAA18, Zurich, Jun. 2018

slide-2
SLIDE 2

Introduction

Outline

1

Introduction

2

A Scalable Matrix Generator from Given Spectra (SMG2S)

3

Experimentations, evaluation and analysis

4

Accuracy Verification

5

Application: Krylov Solvers Evaluation using SMG2S

6

Conclusion and Perspectives

Xinzhe WU (MDLS, France) SMG2S: A Scalable Test Matrix Generator PMAA18, Zurich, Jun. 2018 2 / 27

slide-3
SLIDE 3

Introduction

Linear System Solvers and Spectra

When we solve the linear systems Ax = b by the Krylov Subspace methods, such as GMRES (Saad and Schultz (1986)), with A a non-Hermitian matrix. The spectra have more or less the impact during the procedure of resolution by these methods, such as:

1 Convergence Analysis; 2 Preconditioners; 3 Recyling of eigenvalues for a sequence of linear systems; 4 etc. Xinzhe WU (MDLS, France) SMG2S: A Scalable Test Matrix Generator PMAA18, Zurich, Jun. 2018 3 / 27

slide-4
SLIDE 4

Introduction

Requirement of large-scale matrix generator

Today: the linear problem size is increasing; the numerical methods should adjust to the coming exascale platforms. Thus there are four special requirements on the test matrices for the eval- uation of numerical algorithms: their spectra must be known and can be customized; they should be sparse, non-Hermitian and non-trivial; they could have a very high dimension to evaluate the algorithms on large-scale systems; they should be generated in parallel with low memory required during the procedure of generation.

Xinzhe WU (MDLS, France) SMG2S: A Scalable Test Matrix Generator PMAA18, Zurich, Jun. 2018 4 / 27

slide-5
SLIDE 5

Introduction

Related works

The related work: Saad’s SPARSKIT (Saad (1990)); Tim Davis collection (Davis and Hu (2011)); Matrix Market collection (Boisvert et al. (1997)); Bai’s collection (Bai et al. (1996)) Galeri package of Trilinos to generate simple well-know finite element and finite difference matrices;

  • J. Demmel’s generation suite in 1989 to benchmark LAPACK

(Demmel and McKenney (1989)), etc. Only the method by Demmel generate matrices with given spectra, which can transfer the diagonal matrix into a dense matrix by the orthogonal matrices, and then reduce them to unsymmetric band ones by Householder

  • transformation. This method requires O(n3) time and O(n2) storage even

for generating a small bandwidth matrix.

Xinzhe WU (MDLS, France) SMG2S: A Scalable Test Matrix Generator PMAA18, Zurich, Jun. 2018 5 / 27

slide-6
SLIDE 6

A Scalable Matrix Generator from Given Spectra (SMG2S)

Outline

1

Introduction

2

A Scalable Matrix Generator from Given Spectra (SMG2S)

3

Experimentations, evaluation and analysis

4

Accuracy Verification

5

Application: Krylov Solvers Evaluation using SMG2S

6

Conclusion and Perspectives

Xinzhe WU (MDLS, France) SMG2S: A Scalable Test Matrix Generator PMAA18, Zurich, Jun. 2018 6 / 27

slide-7
SLIDE 7

A Scalable Matrix Generator from Given Spectra (SMG2S)

Mathematical notations

Based on the preliminary theoretical of H. Galicher (Galicher et al. (2014)), for all matrices A œ Cn×n, M œ Cn×n, n œ N, a linear operator Ê AA of matrix M determined by matrix A can be set up as Formule (1):

I Ê

AA :Cn×n æ Cn×n, M æ AM ≠ MA. (1) ( Ê AA)k(M0) =

k

ÿ

m=0

(≠1)mCm

k Ak−mM0Am.

(2) Mi+1 = Mi + 1 i!( Ê AA)i(M0), i œ (0, +Œ). (3) In order to make ] (AA)

i

tends to 0 in limited steps, we select A to be a nilpotent matrix.

Xinzhe WU (MDLS, France) SMG2S: A Scalable Test Matrix Generator PMAA18, Zurich, Jun. 2018 7 / 27

slide-8
SLIDE 8

A Scalable Matrix Generator from Given Spectra (SMG2S)

Nilpotent Matrix

The selected nilpotent matrix is given as:

1 1 1 1 1 1 1 # $ % …

Figure: Nilpotent Matrix.

If p = 1, with d œ N∗, or p = 2 with d œ N∗ to be even, the nilpotency of A is d + 1.

Xinzhe WU (MDLS, France) SMG2S: A Scalable Test Matrix Generator PMAA18, Zurich, Jun. 2018 8 / 27

slide-9
SLIDE 9

A Scalable Matrix Generator from Given Spectra (SMG2S)

SMG2S Algorithm

The SMG2S algorithm is given as: Algorithm 1 Matrix Generation Method Input: Specin œ Cn, h, d Output: Mt œ Cn×n

1: Insert random elements in h lower diagonals of Mo œ Cn×n 2: Insert Specin on the diagonal of M0 and M0 = (2d ≠ 2)!M0 3: Generate the nilpotent matrix A œ Nn×n with parameters p and d 4: for i = 0, · · · , 2(d ≠ 2) ≠ 1 do 5:

Mi+1 = Mi + (r2d−2

k=i+1 k)( Ê

AA)i(M0)

6: end for 7: Mt =

1 (2d−2)!M2d−2

Xinzhe WU (MDLS, France) SMG2S: A Scalable Test Matrix Generator PMAA18, Zurich, Jun. 2018 9 / 27

slide-10
SLIDE 10

A Scalable Matrix Generator from Given Spectra (SMG2S)

Matrix Generation Example

Through SMG2S, this nilpotent matrix can transfer an low band matrix to be a band matrix which have same spectrum.

h h l < 2pd

Figure: Matrix Generation Example.

Operation complexity is max(O(hdn), O(d2n)). If d π n and h π n, it turns out to be O(n) operations and memory space.

Xinzhe WU (MDLS, France) SMG2S: A Scalable Test Matrix Generator PMAA18, Zurich, Jun. 2018 10 / 27

slide-11
SLIDE 11

A Scalable Matrix Generator from Given Spectra (SMG2S)

Parallel Implementation of CPUs and GPUs

We implement SMG2S on homogenous and heterogeneous machines. The former is implemented based on MPI and PETSc, the latter is based on MPI, CUDA, and PETSc. The kernel of implementation is the SpGEMM.

)×_

` = )×_

Host (CPU) Host (CPU) Device (GPU) MPI MPI CUDA MPI & CUDA )abc

d , )eff d

, _abc

d , _eff d

)abc

g , )eff g

, _abc

g , _eff g

)abc

h , )eff h

, _abc

h , _eff h

)abc

d , )eff d

, _iej

d , _ekl d

)abc

g , )eff g

, _iej

g , _ekl g

)abc

h , )eff h

, _iej

h , _ekl h

`d = )abc

d

_iej

d

+ )eff

d

_ekl

d

`g = )abc

g

_iej

g + )eff g

_ekl

g

`h = )abc

h

_iej

h

+ )eff

h

_ekl

h

Figure: The structure of a CPU-GPU implementation of SpGEMM, where each GPU is attached to a CPU. The GPU is in charge of the computation, while the CPU handles the MPI communication among processes.

Xinzhe WU (MDLS, France) SMG2S: A Scalable Test Matrix Generator PMAA18, Zurich, Jun. 2018 11 / 27

slide-12
SLIDE 12

A Scalable Matrix Generator from Given Spectra (SMG2S)

Optimized Communication Implementation on CPUs

The implementation of SMG2S, especially the parallel SpGEMM kernel’s communication can be specifically optimized based on the particular prop- erty of nilpotent matrix A.

Proc 0 Proc 1 Proc 2 Proc 3 M (a) AM M MA ! (b) ! ! !

" +1 2" + 2 " +1 2" + 2

Figure: (a) AM operation; (b) MA operation.

Xinzhe WU (MDLS, France) SMG2S: A Scalable Test Matrix Generator PMAA18, Zurich, Jun. 2018 12 / 27

slide-13
SLIDE 13

Experimentations, evaluation and analysis

Outline

1

Introduction

2

A Scalable Matrix Generator from Given Spectra (SMG2S)

3

Experimentations, evaluation and analysis

4

Accuracy Verification

5

Application: Krylov Solvers Evaluation using SMG2S

6

Conclusion and Perspectives

Xinzhe WU (MDLS, France) SMG2S: A Scalable Test Matrix Generator PMAA18, Zurich, Jun. 2018 13 / 27

slide-14
SLIDE 14

Experimentations, evaluation and analysis

Experimental hardware environment

We implement SMG2S on the supercomputers Tianhe-2 and Romeo. The node specfication for the two platforms is given as following:

Table: Node Specifications of the cluster ROMEO and Tianhe-2

Machine Name ROMEO Tianhe-2 Nodes Number BullX R421 ◊ 130 16000 ◊ nodes Mother Board SuperMicro X9DRG-QF Specific Infiniband CPU 2◊Intel Ivy Bridge 8 cores 2.6 GHz 2◊Intel Ivy Bridge 12 cores 2.2 GHz Memory DDR3 32GB DDR3 64GB Accelerator NVIDIA GPU Tesla K20X ◊ 2 Intel Knights Corner ◊ 3 Xinzhe WU (MDLS, France) SMG2S: A Scalable Test Matrix Generator PMAA18, Zurich, Jun. 2018 14 / 27

slide-15
SLIDE 15

Experimentations, evaluation and analysis

Scalability and Speedup Evaluation I

The scaling and speedup evaluations are given as:

48 96 192 384 768 1536

Number of CPU cores (Tianhe-2)

100 101 102 103

Time (s)

CD-SS RD-SS CD-WS RD-WS OCD-SS ORD-SS OCD-WS ORD-WS

(a) Strong and Weak Scaling of SMG2S on Tianhe-2

16 32 64 128 256

Number of CPU cores (ROMEO)

101 102 103 104

Time (s)

CD-SS RD-SS CD-WS RD-WS OCD-SS ORD-SS OCD-WS ORD-WS

(b) Strong and Weak Scaling of SMG2S on ROMEO

Xinzhe WU (MDLS, France) SMG2S: A Scalable Test Matrix Generator PMAA18, Zurich, Jun. 2018 15 / 27

slide-16
SLIDE 16

Experimentations, evaluation and analysis

Scalability and Speedup Evaluation II

4 8 16 32 64

Number of GPUs (ROMEO)

102 103

Time (s)

CD-SS RD-SS CD-WS RD-WS

(c) Strong and Weak Scaling of SMG2S on ROMEO with

multiple GPUs

4 8 16 32 64

CPU or GPU number

2 4 6 8 10

Speedup/4CPUs

1.0 0.9 1.0 1.0 1.0 1.9 1.9 1.9 1.8 1.8 8.4 8.4 8.1 8.0 7.9 SMG2S on CPU SMG2S on GPU Optimized SMG2S

(d) Speedup of different implementation

Xinzhe WU (MDLS, France) SMG2S: A Scalable Test Matrix Generator PMAA18, Zurich, Jun. 2018 16 / 27

slide-17
SLIDE 17

Accuracy Verification

Outline

1

Introduction

2

A Scalable Matrix Generator from Given Spectra (SMG2S)

3

Experimentations, evaluation and analysis

4

Accuracy Verification

5

Application: Krylov Solvers Evaluation using SMG2S

6

Conclusion and Perspectives

Xinzhe WU (MDLS, France) SMG2S: A Scalable Test Matrix Generator PMAA18, Zurich, Jun. 2018 17 / 27

slide-18
SLIDE 18

Accuracy Verification

Verification method

We proposed a method to check the ability of SMG2S to keep the given spectra based on the Shifted Inverse Power Method. Algorithm 2 Shifted Inverse Power Method Input: Matrix A, initial guess for desired eigenvalue σ, initial vector v0 Output: Approximate eigenpair (θ, v)

1: y = v0 2: for i = 1, 2, 3 · · · do 3:

θ = ||y||∞, v = y/θ

4:

Solve (A ≠ σI)y = v

5: end for

Check error error = ||AvÕ−λvÕ||

||AvÕ||

Xinzhe WU (MDLS, France) SMG2S: A Scalable Test Matrix Generator PMAA18, Zurich, Jun. 2018 18 / 27

slide-19
SLIDE 19

Accuracy Verification

Verification results I

The verification tests have been done with 4 different types of spectra.

6 8 10 12 14 Real Axis 2 4 6 8 10 Imaginary Axis

Initial Eigenvalues Computed Eigenvalues

(e) Spec1: Clustered Eigenvalues I

5 10 15 Real Axis −10 −5 5 10 Imaginary Axis

Initial Eigenvalues Computed Eigenvalues

(f) Spec2: Dominant Clustered Eigenvalue II

Xinzhe WU (MDLS, France) SMG2S: A Scalable Test Matrix Generator PMAA18, Zurich, Jun. 2018 19 / 27

slide-20
SLIDE 20

Accuracy Verification

Verification results II

−0.10 −0.05 0.00 0.05 0.10 Real Axis

−0.10 −0.05 0.00 0.05 0.10

Imaginary Axis

Initial Eigenvalues Computed Eigenvalues

(g) Spec3: Clustered Eigenvalues III

6 8 10 12 14 Real Axis

−6 −4 −2 2 4 6

Imaginary Axis

Initial Eigenvalues Computed Eigenvalues

(h) Spec4: Conjugate and Closest Eigenvalues

Figure: Verification using Different Types of Spectra.

Xinzhe WU (MDLS, France) SMG2S: A Scalable Test Matrix Generator PMAA18, Zurich, Jun. 2018 20 / 27

slide-21
SLIDE 21

Application: Krylov Solvers Evaluation using SMG2S

Outline

1

Introduction

2

A Scalable Matrix Generator from Given Spectra (SMG2S)

3

Experimentations, evaluation and analysis

4

Accuracy Verification

5

Application: Krylov Solvers Evaluation using SMG2S

6

Conclusion and Perspectives

Xinzhe WU (MDLS, France) SMG2S: A Scalable Test Matrix Generator PMAA18, Zurich, Jun. 2018 21 / 27

slide-22
SLIDE 22

Application: Krylov Solvers Evaluation using SMG2S

Workflow

An interface can be provided to PETSc, Trilinos, and any public or personal parallel solvers, which can restore the distributed data into the data struc- tures of different libraries. This feature can significantly reduce the I/O of applications and improve their efficiency to evaluate the numerical methods.

`` `` ` `

Parallel SMG2S Linear and Eigen Solvers Interface (PETSc, Trilinos, etc.)

#1 #2 #3 #4

`` `` ` `

#1 #2 #3 #4

Figure: SMG2S Workflow and Interface.

Xinzhe WU (MDLS, France) SMG2S: A Scalable Test Matrix Generator PMAA18, Zurich, Jun. 2018 22 / 27

slide-23
SLIDE 23

Application: Krylov Solvers Evaluation using SMG2S

An Example

We have also using SMG2S to evaluate different iterative methods for solving non-Hermitian linear systems.

250 500 750 1000 1250 1500 1750

GMRES iteration step number

1e-10 1e-8 1e-6 1e-4 1e-2 1

Residual

SOR Jacobi No preconditioner a hybrid method

Figure: Convergence Comparision Example.

Xinzhe WU (MDLS, France) SMG2S: A Scalable Test Matrix Generator PMAA18, Zurich, Jun. 2018 23 / 27

slide-24
SLIDE 24

Conclusion and Perspectives

Outline

1

Introduction

2

A Scalable Matrix Generator from Given Spectra (SMG2S)

3

Experimentations, evaluation and analysis

4

Accuracy Verification

5

Application: Krylov Solvers Evaluation using SMG2S

6

Conclusion and Perspectives

Xinzhe WU (MDLS, France) SMG2S: A Scalable Test Matrix Generator PMAA18, Zurich, Jun. 2018 24 / 27

slide-25
SLIDE 25

Conclusion and Perspectives

Conclusion and Perspectives

Then

1 SMG2S is a method to generate large-scale non-Hermitian matrices

with good scalability and capacility to keep accuracy of given spectra;

2 Interfaces to more scientific libraries should be implemented; 3 Beta version is available on Github: https://github.com/brunowu/SMG2S.git; 4 The package of the software will be finished soon; 5 It will be shared with researchers around world from related background.

The related talk:

  • X. Wu and S.G. Petiton: A Parallel Generator of Non-Hermintian Matrices

Computed from Known Given Spectra. SIAM PP18 in Tokyo, Japan. The related paper:

  • X. Wu and S.G. Petiton: A Parallel Generator of Non-Hermitian Matrices

computed from Given Spectra. (Submitted to IEEE CLUSTER2018).

Xinzhe WU (MDLS, France) SMG2S: A Scalable Test Matrix Generator PMAA18, Zurich, Jun. 2018 25 / 27

slide-26
SLIDE 26

Conclusion and Perspectives

Acknowledgement

We would like to thank Prof. Yutong LU and their team in the National Supercomputing Center in Guangzhou for providing the use of Tianhe- 2. This work is partially supported by the ROMEO HPC Center Cham- pagne Ardenne for providing the use of cluster Romeo. This work is funded by the the German-Japanese-French project MYX project of French National Research Agency (ANR) under the SPPEXA framework.

Xinzhe WU (MDLS, France) SMG2S: A Scalable Test Matrix Generator PMAA18, Zurich, Jun. 2018 26 / 27

slide-27
SLIDE 27

Conclusion and Perspectives

Thank you for your attentions! Questions?

Xinzhe WU (MDLS, France) SMG2S: A Scalable Test Matrix Generator PMAA18, Zurich, Jun. 2018 27 / 27