SparseBLAS Products in UPC: an Evaluation of Storage Formats Jorge - PowerPoint PPT Presentation

Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions SparseBLAS Products in UPC: an Evaluation of Storage Formats Jorge González-Domínguez*, Óscar García-López, Guillermo L. Taboada, María J. Martín, Juan Touriño Computer Architecture Group University of A Coruña (Spain) {jgonzalezd,oscar.garcia,taboada,mariam,juan}@udc.es International Conference on Computational and Mathematical Methods in Science and Engineering CMMSE 2011 1/25

Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions Introduction 1 Sparse Matrix-Vector Product 2 Sparse Matrix-Matrix Product 3 Experimental Evaluation 4 Conclusions 5 2/25

Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions UPC: a Suitable Alternative for HPC in Multi-core Era Programming Models: PGAS Languages: Traditionally: Shared/Distributed memory programming models UPC -> C Challenge: hybrid memory architectures Titanium -> Java PGAS (Partitioned Global Address Co-Array Fortran -> Space) Fortran UPC Compilers: Berkeley UPC GCC (Intrepid) Michigan TU HP , Cray and IBM UPC Compilers 4/25

Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions Studied Numerical Operations BLAS Libraries Basic Linear Algebra Subprograms Specification of a set of numerical functions Widely used by scientists and engineers SparseBLAS and PBLAS (Parallel BLAS) Studied Routines usmv : Sparse Matrix-Vector Product ( α ∗ A ∗ x + y = y ) usmm : Sparse Matrix-Matrix Product ( α ∗ A ∗ B + C = C ) 5/25

Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions Studied Storage Formats Elements ordered by rows Coordinate Compressed Sparse Row (CSR) Block Sparse Row (BSR) Skyline with lower matrices Elements ordered by columns Compressed Sparse Column (CSC) Skyline with upper matrices Elements ordered by diagonals Diagonal 6/25

Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions Syntax α ∗ A ∗ x + y = y Structures α -> Scalar A -> Sparse matrix x -> Dense vector y -> Dense vector 8/25

Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions Distribution by rows Characterstics Well balanced computational workload in multiplication Unbalanced computational workload in final additions Gathering of data only with one copy per thread 9/25

Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions Distribution by columns Characterstics Well balanced computational workload Gathering of data with one reduce per vector element 10/25

Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions Distribution by diagonals Characterstics Unbalanced computational workload Gathering of data with one reduce per vector element 11/25

Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions Syntax α ∗ A ∗ B + C = C Structures α -> Scalar A -> Sparse matrix B -> Dense matrix C -> Dense matrix 13/25

Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions Distribution by rows Characterstics Well balanced computational workload in multiplication Unbalanced computational workload in additions Gathering of data only with one copy per thread 14/25

Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions Distribution by columns Characterstics Well balanced computational workload in multiplication Well balanced computational workload in additions Gathering of data with one copy per thread and row 15/25

Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions Experimental Results with a Regular Matrix (I) 17/25

Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions Experimental Results with a Regular Matrix (II) matrix-vector product nemeth26-large 60 coordinate csr 50 bsr csc diagonal 40 sky-upper Speedups sky-lower 30 20 10 0 8 16 32 64 Number of Threads 18/25

Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions Experimental Results with a Regular Matrix (and III) matrix-matrix product nemeth26 40 coordinate csr 35 bsr csc 30 diagonal sky-upper 25 Speedups sky-lower 20 15 10 5 0 8 16 32 64 Number of Threads 19/25

Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions Experimental Results with an Irregular Matrix (I) 20/25

Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions Experimental Results with an Irregular Matrix (II) matrix-vector product exdata-large 70 coordinate csr 60 bsr csc 50 diagonal Speedups 40 30 20 10 0 8 16 32 64 Number of Threads 21/25

Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions Experimental Results with an Irregular Matrix (and III) matrix-matrix product exdata 70 coordinate csr 60 bsr csc 50 diagonal Speedups 40 30 20 10 0 8 16 32 64 Number of Threads 22/25

Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions Main Conclusions Summary High speedups for both routines Best approach: Sparse matrix-vector product -> by rows Sparse matrix-matrix product If regular sparse matrix -> by rows If irregular sparse matrix -> by columns Future Work Study the impact of performing each distribution 24/25

Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions SparseBLAS Products in UPC: an Evaluation of Storage Formats Jorge González-Domínguez*, Óscar García-López, Guillermo L. Taboada, María J. Martín, Juan Touriño Computer Architecture Group University of A Coruña (Spain) {jgonzalezd,oscar.garcia,taboada,mariam,juan}@udc.es International Conference on Computational and Mathematical Methods in Science and Engineering CMMSE 2011 25/25

SparseBLAS Products in UPC: an Evaluation of Storage Formats Jorge - PowerPoint PPT Presentation

Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions SparseBLAS Products in UPC: an Evaluation of Storage Formats Jorge Gonzlez-Domnguez*, scar Garca-Lpez, Guillermo L. Taboada,

CoMo-UPC TMA evaluation service @ UPC Pere Barlet-Ros Josep Sanjus-Cuxart Advanced Broadband

KnowledgeWeb UPC Introduction Semantic Web Education Activities and Potential Contributions

EGNOS TUTORIAL Research g roup of A stronomy and GE omatics (gAGE/UPC) Universitat Politcnica

4. Multiagent Systems Design Part 6: Coordination (I). Explicit Coordination ems (SMA-UPC)

3. Reasoning in Agents Part 2: BDI Agents ems (SMA-UPC) Javier Vzquez-Salceda q Multiagent

1. Introduction ( (to Agents and Multiagent g g Systems) ems (SMA-UPC) Javier

RFID UPC Wallace Flint first suggested an automated checkout in 1932 UPC bar code formats

4. Multiagent Systems Design Part 4: Coordination models (I): ( ) Social Models ems (SMA-UPC)

Hybrid SAN & Cluster Enterprise Network Storage Hikvision Enterprise Network Storage

How UPC is good for Primary Care Clinicians I. How UPC is good for Vermonters II. Primary Care

Pr Prog ogram am UPC Collec UPC Collection tion Na National tional WIC Associa WIC

stereovision Miguel Ares and Santiago Royo (miguel.ares@oo.upc.edu , santiago.royo@upc.edu) COST

Requirements Reuse and Patterns: A Survey GESSI Cristina Palomares (GESSI - UPC) Carme Quer

I need to draw circuits and diagrams! Orestes Mas (orestes@tsc.upc.edu) - UPC Quality diagrams

2. Knowledge Representation and Communication Part 1 Part 1: ems (SMA-UPC) Knowledge

Fair CPU Time Accounting in CMP+SMT Processors Carlos Luque (UPC/BSC) Miquel Moreto

Objec(ves Defining your own func(ons Control flow Scope, variable life(me Jan 31, 2018

The Rodin Roadmap Commi.ee Laurent Voisin 2018-11-10 Development of the Rodin plaDorm

A linear-algebraic criterion for indecomposable generalized permutohedra y 1 s Kroupa 2 Milan

Implementing reproducibility in phonetic research: a computational workfmow Stefano Coretta

Selection of recent theory and phenomenology developments in forward physics within high-energy

s strt r

LISA LISA Three spacecrafts 2.5 10 9 m arms Laser Interferometry No seismic

61A Lecture 23 Wednesday, October 30 Announcements Homework 7 due Tuesday 11/5 @ 11:59pm.

Sambuz

Useful Links

Newsletter

Mail Us

SparseBLAS Products in UPC: an Evaluation of Storage Formats Jorge - PowerPoint PPT Presentation

Introduction Sparse Matrix-Vector Product Sparse Matrix-Matrix Product Experimental Evaluation Conclusions SparseBLAS Products in UPC: an Evaluation of Storage Formats Jorge Gonzlez-Domnguez*, scar Garca-Lpez, Guillermo L. Taboada,

CoMo-UPC TMA evaluation service @ UPC Pere Barlet-Ros Josep Sanjus-Cuxart Advanced Broadband

KnowledgeWeb UPC Introduction Semantic Web Education Activities and Potential Contributions

EGNOS TUTORIAL Research g roup of A stronomy and GE omatics (gAGE/UPC) Universitat Politcnica

4. Multiagent Systems Design Part 6: Coordination (I). Explicit Coordination ems (SMA-UPC)

3. Reasoning in Agents Part 2: BDI Agents ems (SMA-UPC) Javier Vzquez-Salceda q Multiagent

1. Introduction ( (to Agents and Multiagent g g Systems) ems (SMA-UPC) Javier

RFID UPC Wallace Flint first suggested an automated checkout in 1932 UPC bar code formats

4. Multiagent Systems Design Part 4: Coordination models (I): ( ) Social Models ems (SMA-UPC)

Hybrid SAN &amp; Cluster Enterprise Network Storage Hikvision Enterprise Network Storage

How UPC is good for Primary Care Clinicians I. How UPC is good for Vermonters II. Primary Care

Pr Prog ogram am UPC Collec UPC Collection tion Na National tional WIC Associa WIC

stereovision Miguel Ares and Santiago Royo (miguel.ares@oo.upc.edu , santiago.royo@upc.edu) COST

Requirements Reuse and Patterns: A Survey GESSI Cristina Palomares (GESSI - UPC) Carme Quer

I need to draw circuits and diagrams! Orestes Mas (orestes@tsc.upc.edu) - UPC Quality diagrams

2. Knowledge Representation and Communication Part 1 Part 1: ems (SMA-UPC) Knowledge

Fair CPU Time Accounting in CMP+SMT Processors Carlos Luque (UPC/BSC) Miquel Moreto

Objec(ves Defining your own func(ons Control flow Scope, variable life(me Jan 31, 2018

The Rodin Roadmap Commi.ee Laurent Voisin 2018-11-10 Development of the Rodin plaDorm

A linear-algebraic criterion for indecomposable generalized permutohedra y 1 s Kroupa 2 Milan

Implementing reproducibility in phonetic research: a computational workfmow Stefano Coretta

Selection of recent theory and phenomenology developments in forward physics within high-energy

s strt r

LISA LISA Three spacecrafts 2.5 10 9 m arms Laser Interferometry No seismic

61A Lecture 23 Wednesday, October 30 Announcements Homework 7 due Tuesday 11/5 @ 11:59pm.

Sambuz

Useful Links

Newsletter

Mail Us

Hybrid SAN & Cluster Enterprise Network Storage Hikvision Enterprise Network Storage