Design and Performance Issues of Cholesky and LU Solvers using - PowerPoint PPT Presentation

Introduction Cholesky Solver LU Solver Experimental Evaluation Conclusions Design and Performance Issues of Cholesky and LU Solvers using UPCBLAS Jorge González-Domínguez*, Osni A. Marques**, María J. Martín*, Guillermo L. Taboada*, Juan Touriño* *Computer Architecture Group, University of A Coruña, Spain {jgonzalezd,mariam,taboada,juan}@udc.es **Computational Research Division, Lawrence Berkeley National Laboratory, CA, USA OAMarques@lbl.gov 10th IEEE International Symposium on Parallel and Distributed Processing with Applications ISPA 2012 1/30

Introduction Cholesky Solver LU Solver Experimental Evaluation Conclusions Introduction 1 Cholesky Solver 2 LU Solver 3 Experimental Evaluation 4 Conclusions 5 2/30

Introduction Cholesky Solver LU Solver Experimental Evaluation Conclusions Introduction 1 Cholesky Solver 2 LU Solver 3 Experimental Evaluation 4 5 Conclusions 3/30

Introduction Cholesky Solver LU Solver Experimental Evaluation Conclusions UPC: a Suitable Alternative for HPC in Multi-core Era Programming Models: PGAS Languages: Traditionally: Shared/Distributed memory UPC (C) programming models Titanium (Java) Challenge: Hybrid memory architectures Co-Array Fortran (Fortran) PGAS (Partitioned Global Address Space) Main advantages of the PGAS model: simplifies programming allows an efficient use of one-sided communications 4/30

Introduction Cholesky Solver LU Solver Experimental Evaluation Conclusions UPCBLAS Characteristics of the Library Includes parallel BLAS routines built on top of UPC Focused on increasing the programmability Distributed matrices and vectors are represented by shared arrays Advantage: Shared arrays are implicitly distributed Drawback: Only 1D distributions allowed Good trade-off between programmability and performance UPCBLAS parallel functions call internally BLAS routines to perform the sequential computations in each thread 5/30

Introduction Cholesky Solver LU Solver Experimental Evaluation Conclusions UPCBLAS Matrix Vector Product int upc_blas_sgemv(UPCBLAS_DIMMDIST dimmDist, int block_size, int sec_block_size, UPCBLAS_TRANSPOSE transpose, int m, int n, float alpha, shared void *A, int lda, shared void *x, float beta, shared void *y); Syntax similar to sequential BLAS Pointers point to shared memory Additional parameters to specify the distribution dimmDist: enumerate value to specify the type of distribution (by rows or by columns) The meaning of block _ size and sec _ block _ size depends on the dimmDist value 6/30

Introduction Cholesky Solver LU Solver Experimental Evaluation Conclusions UPCBLAS Matrix Vector Product shared [16] float A [64]; shared [4] float x [8]; shared [2] float y [8] upc_blas_sgemv(upcblas_rowDist, 2, 4, upcblas_noTrans, 8, 8, alpha, (shared void *)A, 8, (shared void *)x, beta, (shared void *)y); 7/30

Introduction Cholesky Solver LU Solver Experimental Evaluation Conclusions UPCBLAS More Information Described in: J. González-Domíguez, M. J. Martín, G. L. Taboada, J. Touriño, R. Doallo, D. A. Mallón and B. Wibecan, “UPCBLAS: A Library for Parallel Matrix Computations in Unified Parallel C”, Concurrency and Computation: Practice and Experience, 2012 (In Press), available at http://dx.doi.org/10.1002/cpe.1914 8/30

Introduction Cholesky Solver LU Solver Experimental Evaluation Conclusions Description of the Problem Solution of Systems of Equations using UPCBLAS A ∗ X = B being: A a mxm matrix X and B mxn matrices ( X overwrites B ) First step: Factorization Cholesky: A = L ∗ L T LU: A = L ∗ U Second step: Two triangular solvers Cholesky : L ∗ Y = B and L T ∗ X = Y LU: L ∗ Y = B and U ∗ X = Y 9/30

Introduction Cholesky Solver LU Solver Experimental Evaluation Conclusions Introduction 1 Cholesky Solver 2 LU Solver 3 Experimental Evaluation 4 5 Conclusions 10/30

Introduction Cholesky Solver LU Solver Experimental Evaluation Conclusions Cholesky Factorization Two different block algorithms were implemented Both are based on BLAS3 routines Algorithm based on gemm (LAPACK) Algorithm based on syrk (ScaLAPACK) Only 1D distributions available: block-cyclic distribution by rows or by columns Block-cyclic distribution by rows A ij are submatrices 11/30

Introduction Cholesky Solver LU Solver Experimental Evaluation Conclusions Cholesky Solver based on gemm for i=0;i < NB;i=i+1 do if MYTHREAD has affinity to block i then A i , i = A i , i − A i , 0 .. i − 1 ∗ A T i , 0 .. i − 1 → syrk Sequential Cholesky Factorization of A i , i end A i + 1 .. N , i = A i + 1 .. N , i − A i + 1 .. N , 0 .. i − 1 ∗ A T i , 0 .. i − 1 → gemm Solve Z ∗ A T i , i = A i + 1 .. N , i → trsm A i + 1 .. N , i = Z end Solve Y ∗ A T = B → trsm Solve X ∗ A = Y → trsm 12/30

Introduction Cholesky Solver LU Solver Experimental Evaluation Conclusions Cholesky Solver based on syrk for i=0;i < NB;i=i+1 do if MYTHREAD has affinity to block i then Sequential Cholesky Factorization of A i , i end Solve Z ∗ A T i , i = A i + 1 .. N , i → trsm A i + 1 .. N , i = Z A i + 1 .. N , i + 1 .. N = A i + 1 .. N , i + 1 .. N − A i + 1 .. N , i ∗ A T i + 1 .. N , i → syrk end Solve Y ∗ A T = B → trsm Solve X ∗ A = Y → trsm 13/30

Design and Performance Issues of Cholesky and LU Solvers using - PowerPoint PPT Presentation

Introduction Cholesky Solver LU Solver Experimental Evaluation Conclusions Design and Performance Issues of Cholesky and LU Solvers using UPCBLAS Jorge Gonzlez-Domnguez*, Osni A. Marques**, Mara J. Martn, Guillermo L. Taboada, Juan

Math 211 Math 211 Lecture #14 M ATLAB s ODE Solvers September 26, 2003 2 Matlab Solvers

Asymptotics of Cholesky GARCH Models and Time-Varying Conditional Betas Serge Darolles, Christian

On Cholesky structures on real symmetric matrices and their applications Hideyuki ISHI (Osaka

Cholesky Decomposition Techniques in Quantum Chemical Implementations Outline What is

The Scalable Petascale Data-Driven Approach for the Cholesky Factorization with multiple GPUs

OpenFOAMs basic solvers for linear systems of equations Solvers, preconditioners, smoothers

SMT Solvers: A Disruptive Technology John Rushby Computer Science Laboratory SRI International

Performance Analysis of Tile Low-Rank Cholesky Factorization Using PaRSEC Instrumentation Tools

Introduction to SAT and SMT Solvers Interfacing Yosys and SMT Solvers for BMC and more using

Modeling Power and Energy of the Task-Parallel Cholesky Factorization on Multicore Processors

SAT Solvers Aditya Parameswaran Luv Kumar 1st June, 2005 and 14th June, 2005 Aditya

Solvers Principles and Architecture (SPA) Part 2 SMT Solvers Master Sciences Informatique (Sif)

SAT and SMT Murphy Berzish Overview Boolean Satisfiability (SAT) problem SAT solvers:

Solvers Principles and Architecture (SPA) Part 1 Anatomy of SAT Solvers Master Sciences

Using SMT solvers for binary analysis and exploitation A primer on SMT, SMT solvers, Z3 & angr

Solvers Principles and Architecture (SPA) Lecture 1 SAT Solvers Master Sciences Informatique

Privateering on the space frontier? The NewSpace quest for private property in outer space

J osefina Marmisolle, Gerardo Veroslavsky & Hctor de S anta Ana B arcelona, S pain.

MRDS Example: faecal pellet survey Photos from US Na.onal Park Service photo gallery Pellet

Measuring and Interpreting core inflation: evidence from Italy Biggeri L*., Laureti T and

Solving GTAP model in parallel using Doubly Bordered Block Diagonal ordering technique Pham

EDCNS, September 2016 Bern, Prof. Anne-Sylvie Ramelet Prof. Maria Katapodi PLAN History of

Internet Gouvernance & Geopoli1cs History & Principles Ecosystem TwiJer Sbas1en

SMMTs Production Outlook and Economic Forecast 28 January 2015 Ian Henry, Director,

Design and Performance Issues of Cholesky and LU Solvers using - PowerPoint PPT Presentation

Introduction Cholesky Solver LU Solver Experimental Evaluation Conclusions Design and Performance Issues of Cholesky and LU Solvers using UPCBLAS Jorge Gonzlez-Domnguez*, Osni A. Marques**, Mara J. Martn*, Guillermo L. Taboada*, Juan

Math 211 Math 211 Lecture #14 M ATLAB s ODE Solvers September 26, 2003 2 Matlab Solvers

Asymptotics of Cholesky GARCH Models and Time-Varying Conditional Betas Serge Darolles, Christian

On Cholesky structures on real symmetric matrices and their applications Hideyuki ISHI (Osaka

Cholesky Decomposition Techniques in Quantum Chemical Implementations Outline What is

The Scalable Petascale Data-Driven Approach for the Cholesky Factorization with multiple GPUs

OpenFOAMs basic solvers for linear systems of equations Solvers, preconditioners, smoothers

SMT Solvers: A Disruptive Technology John Rushby Computer Science Laboratory SRI International

Performance Analysis of Tile Low-Rank Cholesky Factorization Using PaRSEC Instrumentation Tools

Introduction to SAT and SMT Solvers Interfacing Yosys and SMT Solvers for BMC and more using

Modeling Power and Energy of the Task-Parallel Cholesky Factorization on Multicore Processors

SAT Solvers Aditya Parameswaran Luv Kumar 1st June, 2005 and 14th June, 2005 Aditya

Solvers Principles and Architecture (SPA) Part 2 SMT Solvers Master Sciences Informatique (Sif)

SAT and SMT Murphy Berzish Overview Boolean Satisfiability (SAT) problem SAT solvers:

Solvers Principles and Architecture (SPA) Part 1 Anatomy of SAT Solvers Master Sciences

Using SMT solvers for binary analysis and exploitation A primer on SMT, SMT solvers, Z3 &amp; angr

Solvers Principles and Architecture (SPA) Lecture 1 SAT Solvers Master Sciences Informatique

Privateering on the space frontier? The NewSpace quest for private property in outer space

J osefina Marmisolle, Gerardo Veroslavsky &amp; Hctor de S anta Ana B arcelona, S pain.

MRDS Example: faecal pellet survey Photos from US Na.onal Park Service photo gallery Pellet

Measuring and Interpreting core inflation: evidence from Italy Biggeri L*., Laureti T and

Solving GTAP model in parallel using Doubly Bordered Block Diagonal ordering technique Pham

EDCNS, September 2016 Bern, Prof. Anne-Sylvie Ramelet Prof. Maria Katapodi PLAN History of

Internet Gouvernance &amp; Geopoli1cs History &amp; Principles Ecosystem TwiJer Sbas1en

SMMTs Production Outlook and Economic Forecast 28 January 2015 Ian Henry, Director,

Introduction Cholesky Solver LU Solver Experimental Evaluation Conclusions Design and Performance Issues of Cholesky and LU Solvers using UPCBLAS Jorge Gonzlez-Domnguez*, Osni A. Marques**, Mara J. Martn, Guillermo L. Taboada, Juan

Using SMT solvers for binary analysis and exploitation A primer on SMT, SMT solvers, Z3 & angr

J osefina Marmisolle, Gerardo Veroslavsky & Hctor de S anta Ana B arcelona, S pain.

Internet Gouvernance & Geopoli1cs History & Principles Ecosystem TwiJer Sbas1en