SparseBLAS Products in UPC: an Evaluation of Storage Formats Jorge - - PowerPoint PPT Presentation

sparseblas products in upc an evaluation of storage
SMART_READER_LITE
LIVE PREVIEW

SparseBLAS Products in UPC: an Evaluation of Storage Formats Jorge - - PowerPoint PPT Presentation

Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions SparseBLAS Products in UPC: an Evaluation of Storage Formats Jorge Gonzlez-Domnguez*, scar Garca-Lpez, Guillermo L. Taboada,


slide-1
SLIDE 1

Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions

SparseBLAS Products in UPC: an Evaluation

  • f Storage Formats

Jorge González-Domínguez*, Óscar García-López, Guillermo L. Taboada, María J. Martín, Juan Touriño

Computer Architecture Group University of A Coruña (Spain) {jgonzalezd,oscar.garcia,taboada,mariam,juan}@udc.es

International Conference on Computational and Mathematical Methods in Science and Engineering CMMSE 2011

1/25

slide-2
SLIDE 2

Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions

1

Introduction

2

Sparse Matrix-Vector Product

3

Sparse Matrix-Matrix Product

4

Experimental Evaluation

5

Conclusions

2/25

slide-3
SLIDE 3

Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions

1

Introduction

2

Sparse Matrix-Vector Product

3

Sparse Matrix-Matrix Product

4

Experimental Evaluation

5

Conclusions

3/25

slide-4
SLIDE 4

Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions

UPC: a Suitable Alternative for HPC in Multi-core Era

Programming Models:

Traditionally: Shared/Distributed memory programming models Challenge: hybrid memory architectures PGAS (Partitioned Global Address Space)

PGAS Languages:

UPC -> C Titanium -> Java Co-Array Fortran -> Fortran

UPC Compilers:

Berkeley UPC GCC (Intrepid) Michigan TU HP , Cray and IBM UPC Compilers

4/25

slide-5
SLIDE 5

Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions

UPC: a Suitable Alternative for HPC in Multi-core Era

Programming Models:

Traditionally: Shared/Distributed memory programming models Challenge: hybrid memory architectures PGAS (Partitioned Global Address Space)

PGAS Languages:

UPC -> C Titanium -> Java Co-Array Fortran -> Fortran

UPC Compilers:

Berkeley UPC GCC (Intrepid) Michigan TU HP , Cray and IBM UPC Compilers

4/25

slide-6
SLIDE 6

Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions

Studied Numerical Operations

BLAS Libraries Basic Linear Algebra Subprograms Specification of a set of numerical functions Widely used by scientists and engineers SparseBLAS and PBLAS (Parallel BLAS) Studied Routines usmv: Sparse Matrix-Vector Product (α ∗ A ∗ x + y = y) usmm: Sparse Matrix-Matrix Product (α ∗ A ∗ B + C = C)

5/25

slide-7
SLIDE 7

Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions

Studied Storage Formats

Elements ordered by rows Coordinate Compressed Sparse Row (CSR) Block Sparse Row (BSR) Skyline with lower matrices Elements ordered by columns Compressed Sparse Column (CSC) Skyline with upper matrices Elements ordered by diagonals Diagonal

6/25

slide-8
SLIDE 8

Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions

1

Introduction

2

Sparse Matrix-Vector Product

3

Sparse Matrix-Matrix Product

4

Experimental Evaluation

5

Conclusions

7/25

slide-9
SLIDE 9

Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions

Syntax

α ∗ A ∗ x + y = y

Structures α -> Scalar A -> Sparse matrix x -> Dense vector y -> Dense vector

8/25

slide-10
SLIDE 10

Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions

Distribution by rows

Characterstics Well balanced computational workload in multiplication Unbalanced computational workload in final additions Gathering of data only with one copy per thread

9/25

slide-11
SLIDE 11

Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions

Distribution by columns

Characterstics Well balanced computational workload Gathering of data with one reduce per vector element

10/25

slide-12
SLIDE 12

Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions

Distribution by diagonals

Characterstics Unbalanced computational workload Gathering of data with one reduce per vector element

11/25

slide-13
SLIDE 13

Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions

1

Introduction

2

Sparse Matrix-Vector Product

3

Sparse Matrix-Matrix Product

4

Experimental Evaluation

5

Conclusions

12/25

slide-14
SLIDE 14

Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions

Syntax

α ∗ A ∗ B + C = C

Structures α -> Scalar A -> Sparse matrix B -> Dense matrix C -> Dense matrix

13/25

slide-15
SLIDE 15

Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions

Distribution by rows

Characterstics Well balanced computational workload in multiplication Unbalanced computational workload in additions Gathering of data only with one copy per thread

14/25

slide-16
SLIDE 16

Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions

Distribution by columns

Characterstics Well balanced computational workload in multiplication Well balanced computational workload in additions Gathering of data with one copy per thread and row

15/25

slide-17
SLIDE 17

Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions

1

Introduction

2

Sparse Matrix-Vector Product

3

Sparse Matrix-Matrix Product

4

Experimental Evaluation

5

Conclusions

16/25

slide-18
SLIDE 18

Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions

Experimental Results with a Regular Matrix (I)

17/25

slide-19
SLIDE 19

Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions

Experimental Results with a Regular Matrix (II)

10 20 30 40 50 60 8 16 32 64 Speedups Number of Threads matrix-vector product nemeth26-large coordinate csr bsr csc diagonal sky-upper sky-lower

18/25

slide-20
SLIDE 20

Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions

Experimental Results with a Regular Matrix (and III)

5 10 15 20 25 30 35 40 8 16 32 64 Speedups Number of Threads matrix-matrix product nemeth26 coordinate csr bsr csc diagonal sky-upper sky-lower

19/25

slide-21
SLIDE 21

Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions

Experimental Results with an Irregular Matrix (I)

20/25

slide-22
SLIDE 22

Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions

Experimental Results with an Irregular Matrix (II)

10 20 30 40 50 60 70 8 16 32 64 Speedups Number of Threads matrix-vector product exdata-large coordinate csr bsr csc diagonal

21/25

slide-23
SLIDE 23

Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions

Experimental Results with an Irregular Matrix (and III)

10 20 30 40 50 60 70 8 16 32 64 Speedups Number of Threads matrix-matrix product exdata coordinate csr bsr csc diagonal

22/25

slide-24
SLIDE 24

Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions

1

Introduction

2

Sparse Matrix-Vector Product

3

Sparse Matrix-Matrix Product

4

Experimental Evaluation

5

Conclusions

23/25

slide-25
SLIDE 25

Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions

Main Conclusions

Summary High speedups for both routines Best approach:

Sparse matrix-vector product -> by rows Sparse matrix-matrix product

If regular sparse matrix -> by rows If irregular sparse matrix -> by columns

Future Work Study the impact of performing each distribution

24/25

slide-26
SLIDE 26

Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions

SparseBLAS Products in UPC: an Evaluation

  • f Storage Formats

Jorge González-Domínguez*, Óscar García-López, Guillermo L. Taboada, María J. Martín, Juan Touriño

Computer Architecture Group University of A Coruña (Spain) {jgonzalezd,oscar.garcia,taboada,mariam,juan}@udc.es

International Conference on Computational and Mathematical Methods in Science and Engineering CMMSE 2011

25/25