Sparse Approximate Inverse and Hybrid Algorithms for Matrix - - PowerPoint PPT Presentation

sparse approximate inverse and hybrid
SMART_READER_LITE
LIVE PREVIEW

Sparse Approximate Inverse and Hybrid Algorithms for Matrix - - PowerPoint PPT Presentation

www.bsc.es On Scalability Behaviour of Monte Carlo Sparse Approximate Inverse and Hybrid Algorithms for Matrix Computations Vassil Alexandrov (ICREA-BSC) SOFIA 2013, Overview About BSC HPC trends Monte Carlo History and Motivation


slide-1
SLIDE 1

www.bsc.es

SOFIA 2013,

Vassil Alexandrov (ICREA-BSC)

On Scalability Behaviour of Monte Carlo Sparse Approximate Inverse and Hybrid Algorithms for Matrix Computations

slide-2
SLIDE 2

2

Overview

About BSC HPC trends Monte Carlo – History and Motivation Parallel Algorithm Scaling behaviour Monte Carlo , SPAI and MSPAI Experimental results Conclusions

slide-3
SLIDE 3

BSC OVERVIEW

slide-4
SLIDE 4

Barcelona Supercomputing Center

The BSC-CNS (over 350 researchers) objectives:

R&D in Computer Sciences, Life Sciences and Earth Sciences Supercomputing support to external research

Application scope “Earth Sciences” Application scope “Astrophysics” Application scope “Engineering” Application scope “Physics” Application scope “Life Sciences” Compilers and tuning of application kernels Programming models and performance tuning tools Architectures and hardware technologies Extreme Computing Solving Problems With Uncertainty, Exascale Computing

slide-5
SLIDE 5

5

BSC

  • Four joint centres/institutes: IBM, Intel, Microsoft, CUDA
  • Severo Ochoa excellence award from Spanish

Government

  • PRACE - PATC training centre
  • 62 European and national grants
slide-6
SLIDE 6

6

MareNostrum III

IBM iDataPlex cluster with 3028 compute nodes

  • Peak Performance of 1 Petaflops
  • 48,448 Intel SandyBridge-EP E5-2670 cores at

2.6 GHz

  • Two 8 core CPUs per node (16 cores/node)
  • 94.625 TB of main memory (32 GB/node)
  • 1.9 PB of disk storage
  • Interconnection networks:
  • Infiniband
  • Gigabit Ethernet
  • Operating System: Linux - SuSe Distribution
  • Consisting of 36 racks
  • Footprint:120m2

Completed system - 48,448 cores and predicted to be in the top 25

slide-7
SLIDE 7

7

MinoTauro

NVIDIA GPU cluster with 128 Bull B505 blades

  • 2 Intel E5649 6-Core processors at 2.53 GHz

per node; in total 5544 cores

  • 2 M2090 NVIDIA GPU Cards
  • 24 GB of Main memory
  • Peak Performance: 185.78 TFlops
  • 250 GB SSD (Solid State Disk) as local

storage

  • 2 Infiniband QDR (40 Gbit each) to a non-

blocking network

  • RedHat Linux
  • 14 links of 10 GbitEth to connect to BSC

GPFS Storage The Green 500 list November 2012: #36 with 1266 Mflops/Watt, 81.5 kW total Power

slide-8
SLIDE 8

HPC TRENDS

slide-9
SLIDE 9
slide-10
SLIDE 10
slide-11
SLIDE 11

Projected Performance Development

slide-12
SLIDE 12

Road to Exascale

  • Prof. Jack Dongarra, ScalA12, SLC, USA
slide-13
SLIDE 13

MONTE CARLO METHODS FOR LINEAR ALGEBRA

slide-14
SLIDE 14

14

History – Serial Monte Carlo methods for matrix computation

Curtis - 1956 Halton – 1970s Michailov – 1980s Ermakov – 1980s Alexandrov, Fathi 1996-2000 – hybrid methods Sabelfeld – 1990s and 2009

slide-15
SLIDE 15

15

History – Parallel Monte Carlo methods for matrix computation

Alexandrov, Megson – 1994 O(nNT) Alexandrov, Fathi – 1998-2000 Alexandrov, Dimov – 1998-2002 resolvent MC Alexandrov, Weihrauch, Branford – 2000-2008 Karaivanova – 2000’s - QMC Alexandrov, Dimov, Weichrauch, Branford – 2005-2009 Alexandrov, Strassburg, 2010-2013 – hybrid methods

slide-16
SLIDE 16

16

Motivation

Many scientific and engineering problems revolve around: – inverting a real n by n matrix (MI)

  • Given A
  • Find A-1

– solving a system of linear algebraic equations (SLAE)

  • Given A and b
  • Solve for x, Ax = b
  • Or find A−1 and calculate x = A−1b
slide-17
SLIDE 17

17

Motivation cont.

Traditional Methods for dense matrices: – Gaussian elimination – Gauss-Jordan – Both take O(n3) steps Time prohibitive if – large problem size – real-time solution required

slide-18
SLIDE 18

18

Idea of the Monte Carlo method

Wish to estimate the quantity α Define a random variable ξ Where ξ has the mathematical expectation α Take N independent realisations ξi of ξ

– Then – And

slide-19
SLIDE 19

19

Reason for using Monte Carlo

O(NT) steps to find an element of the – matrix inverse A – solution vector x Where – N Number of Markov Chains – T length of Markov Chains Independent of n - size of matrix or problem Algorithms can be efficiently parallelised

slide-20
SLIDE 20

MC Important Properties

  • Efficient Distribution of the compute data
  • Minimum communication/ communication

reducing algorithms

  • Increased precision is achieved adding extra

computations (without restart)

  • Fault-Tolerance achieved through adding extra

computations

slide-21
SLIDE 21

21

Parallel Algorithm

Start with a diagonally dominant matrix ^B Make the split ^B = D - B1

– D is the diagonal elements of ^B – B1 has off-diagonal elements only

Compute A = D-1B1 If we had started with a matrix B that was not diagonally dominant then an additional splitting would have been made at the beginning, B = ^B - (^B - B), and a recovery section would be needed at the end of the algorithm to get B-1 from ^B-1

slide-22
SLIDE 22

22

Parallel Algorithm cont.

Finding ^B-1

slide-23
SLIDE 23

23

Parallel Algorithm cont.

Matrix Inversion using MCMC

slide-24
SLIDE 24

24

Parallel Algorithm cont.

Matrix Setup Use parallel Monte Carlo to find a rough inverse of A Use parallel iterative refinement to improve accuracy and retrieve final inverse

slide-25
SLIDE 25

25

Parallel Algorithm cont.

slide-26
SLIDE 26

26

Parallel Algorithm cont.

Almost linear speedup for large enough problem sizes Minimal inter-process communication necessary Work and communication balanced Smaller problem sizes suffer due to computation/communication ratio

slide-27
SLIDE 27

HPC SIMULATION

slide-28
SLIDE 28

Motivation

Predict behaviour on different system Find bottlenecks, sweet spot, scaling problems Easier than running on several machines Reproducible

slide-29
SLIDE 29

xSim – The Extreme-Scale Simulator

Developed at the Oak Ridge National Laboratory (ORNL), USA Real life application testing and algorithmic resilience options development (Extreme Computing Group at BSC and Computer Science and Mathematics Division at ORNL) Highly scalable solution that trades off accuracy for node

  • versubscription simulation

Execution of real applications, algorithms or their models atop a simulated HPC environment for

– Performance evaluation, including identification of resource contention and underutilization issues – Investigation at extreme scale, beyond the capabilities of existing simulation efforts

slide-30
SLIDE 30

xSim – The Extreme-Scale Simulator

The simulator is a library Virtual processors expose a virtual MPI to applications using a Parallel Discreet Event Simulation (PDES) Easy to use:

– Replace the MPI header – Compile and link with the simulator library – Run the MPI program

Support for C and Fortran MPI applications

slide-31
SLIDE 31

31

Scaling Behaviour

Core Scaling on a 240 core system

slide-32
SLIDE 32

32

Scaling Behaviour cont.

Core Scaling, logarithmic

slide-33
SLIDE 33

33

Scaling Behaviour cont.

Problem scaling

slide-34
SLIDE 34

MONTE CARLO AND SPAI

slide-35
SLIDE 35

35

Combination of Monte Carlo and SPAI

SPAI – SParse Approximate Inverse Preconditioner

– Computes a sparse approximate inverse M of given matrix A by minimizing ∥ AM − I ∥ in the Frobenius norm – Explicitly computed and can be used as a preconditioner to an iterative method – Uses BICGSTAB algorithm to solve systems of linear algebraic equations Ax = b

Sparse Monte Carlo Matrix Inverse Algorithm

– Computes an approximate inverse by using Monte Carlo methods – Uses an iterative filter refinement to retrieve the inverse

slide-36
SLIDE 36

36

Combination of Monte Carlo and SPAI cont.

Idea: using a Monte Carlo computed approximate matrix inverse as a preconditioner

– Considering

  • computation time for MC solution
  • Suitability of rough inverse
  • Matrix types that are not invertable via Frobenius norm
slide-37
SLIDE 37

37

Experiments

Preliminary results

– optimisation of parameters necessary

Randomly generated sparse and dense matrices Standard SPAI settings

– RHS taken from input matrix

Monte Carlo method without filter option

– Very rough inverse that already provides a good alternative

slide-38
SLIDE 38

38

Experiments

Computation time in seconds for both preconditioners

1 2 3 4 5 6 7 8 50 100 150 200 250 500 1000 2000 3000 4000 5000 Frobenius Monte Carlo

Sparsity 0.5 Sparsity 0.7

1 2 3 4 5 6 7 8 50 100 150 200 250 500 1000 2000 3000 4000 5000 Frobenius Monte Carlo

slide-39
SLIDE 39

39

Experiments cont.

Residual from BICGSTAB solution in SPAI

0.00E+00 1.00E-03 2.00E-03 3.00E-03 4.00E-03 5.00E-03 6.00E-03 7.00E-03 8.00E-03 9.00E-03 1.00E-02 50 100 150 200 250 500 1000 2000 3000 4000 5000 Frobenius Monte Carlo

Sparsity 0.5

0.00E+00 2.00E-03 4.00E-03 6.00E-03 8.00E-03 1.00E-02 1.20E-02 50 100 150 200 250 500 1000 2000 3000 4000 5000

Sparsity 0.7

slide-40
SLIDE 40

40

MC & SPAI

slide-41
SLIDE 41

41

MC & SPAI residuals

slide-42
SLIDE 42

42

Monte Carlo, SPAI & MSPAI

slide-43
SLIDE 43

43

slide-44
SLIDE 44

MONTE CARLO AND MSPAI

slide-45
SLIDE 45

45

Monte Carlo & MSPAI

slide-46
SLIDE 46

Comparison Monte Carlo - MSPAI

University of Florida Sparse Matrix Collection Symmetric Matrix from Parsec set n=268,000 nnz=18,5mio

slide-47
SLIDE 47

Residuals for Parsec data set

slide-48
SLIDE 48

Runtime for Si10H16

slide-49
SLIDE 49

www.bsc.es

Thank you!

49