Enabling Sparse Matrix Computation in Multi-locale Chapel Tyler - PowerPoint PPT Presentation

Enabling Sparse Matrix Computation in Multi-locale Chapel Tyler Simon Laboratory for Physical Sciences, College Park, MD Amer Tahir Milton Halem University of Maryland Baltimore County, MD

Mo Motivation, Objectives & Related Work rk Motivation Aims Ø “To provide a data structure in the Chapel Ø Large sparse matrices often appear in Science or programming language that enables the Engineering problems, Social Network analysis, implementation of CG benchmark for Topic modeling, Graph analytics, Sentiment compressed large sparse matrices” analysis, Cyber security and so on. Ø Chapel programming language currently unable to deal with distributed compressed/sparse matrices Ø Storage and processing of these matrices is not over multiple locales. possible on a single computer, so high Ø This work develops MSBD, a Multilocale Sparse performance computing (HPC) systems are used. Block Distribution for Chapel. Ø HPC systems are evaluated with benchmarks. Related Work Conjugate Gradient algorithm on sparse matrices Ø HPCG & NASCG is used in popular HPC benchmarks, HPCG and • Sequential, OpenMP and MPI reference NAS CG. implementations Ø Chapel port of NAS CG - uses CSR Ø Chapel, an emerging PGAS (partitioned global • Single-locale only – Doesn’t scale to multiple address space) language built for parallel nodes! computation offers flexibility and abstraction – Ø Unified Parallel C (UPC) and Titanium significantly less lines of code compared to • UPC implementation offers better speed than existing solutions (MPI/OpenMP). MPI but doesn’t scale as well as MPI based matrix multiplication

MS MSBD Overv rview • Proposed solution is a custom Chapel distribution for sparse matrices • Behaves like Block distribution but only non-zeros stored locally at compute nodes in Coordinate format (COO) – matrix values as [i, j, x], where x is the non- zero at row i and column j in the sparse matrix • Local-to-global mapping of indices and values done at each node as a communication optimization

MS MSBD Overv rview MSBD distributes sparse matrix by partitioning it over nodes: 1, 1, 3.0 1, 3, 1.0 1, 4, 2.0 Node 1 Node 2 2, 2, 4.0 3, 2, 7.0 3, 3, 5.0 4, 9.0 Node 3 Node 4 4, 6.0 4, 6.0 4, 5.0 4, 5.0 5, 7.0 Assigned to locales Sparse matrix Non-zeros in block partitions • Sparse matrix is mapped into fixed boundary partitions to locales • Sparse matrix values are accessed/modified only when required – reduces extra communication overhead

Evaluation Ev • MSBD evaluated by using NAS CG algorithm in Chapel • Sparse matrix A is a synthetic positive definite square matrix of 4% sparseness where each element ∈ (0.0, 1.0) • CG algorithm consists of 25 iterations • At the end of iterations, final result is compared with predefined values given in NAS CG benchmark for error • Multi-locale CG algorithm that uses MSDB is run parallel on varying number of nodes, 1 – 10. Multiple tests are done each for different size of the matrix, 14000, 50000 and 100000. The objective is to show scalability of the proposed MSBD.

Re Results

Co Conclusion • This work presents a generalized multi-locale sparse block distribution for Chapel, MSBD • MSBD partitions 2-D sparse data into blocks compressed in COO format that are assigned to nodes in the cluster • Using a Chapel NAS CG algorithm, MSBD is evaluated on UMBC’s Bluewave cluster and shown to be scalable

Contact tasimon@lps.umd.edu

Enabling Sparse Matrix Computation in Multi-locale Chapel Tyler - PowerPoint PPT Presentation

Enabling Sparse Matrix Computation in Multi-locale Chapel Tyler Simon Laboratory for Physical Sciences, College Park, MD Amer Tahir Milton Halem University of Maryland Baltimore County, MD Mo Motivation, Objectives & Related Work rk

Lecture IX The savannah game 1 The savannah as a bigraph ( thanks to Benson et al. ) F locale

Multi-locale Chapel Environment 2 Motivation Why PGAS (Partitioned Global Address Space)

High-performance and Memory-saving Sparse General Matrix-Matrix Multiplication for Pascal GPU

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

Exploiting Matrix Reuse and Data Locality in Sparse Matrix-Vector and Matrix-Transpose-Vector

Parallel Sparse Matrix-Vector and Matrix- Transpose-Vector Multiplication using Compressed Sparse

Sparse Matrix Partitioning, Reordering and Vector Multiplication Albert-Jan Yzelman, Utrecht

CHAPEL + LAPACK Ian Bertolacci NEW DOG, MEET OLD DOG. INTRO: WHAT IS CHAPEL Chapel is a

Chapel: Global HPCC Benchmarks and Status Update Brad Chamberlain Chapel Team CUG 2007 May 7,

Drupal 8s multilingual APIs Gbor Hojtsy DRUPAL 7 MULTILINGUAL DRUPAL 7 MULTILINGUAL Drupal

[3] The Matrix What is a matrix? Traditional answer Neo: What is the Matrix? Trinity: The answer

Matrix Multiplication Matrix Multiplication via Matrix-Vector Mult Defn. If matrix A is m n

Sparse Matrix Computation with PETSc Portable, Extensible Toolkit for

A User-Friendly Hybrid Sparse Matrix Class in C++ Conrad Sanderson, Ryan R. Curtin July 19, 2018

Exploiting GPU Caches in Sparse Matrix Vector Multiplication Yusuke Nagasaka Tokyo Institute of

The Input/Output Complexity of Sparse Matrix Multiplication Rasmus Pagh, Morten St ockel IT

Parallel Segmented Merge and Its Applications to Two Sparse Matrix Kernels Weifeng Liu, Norwegian

baryons, dark matter or non-circular motions? Isabel Santos-Santos Postdoctoral Research Fellow

Modulated Sparse Regression Codes Kuan Hsieh and Ramji Venkataramanan University of Cambridge, UK

T Matrices F Gabriel Rodr guez, Louis-No el Pouchet A R D International Workshop on

Vector FPGA Acceleration of 1-D DWT Computations using Sparse Matrix Skeletons Sidharth

Multilingual detection of Fake News Spreaders via Sparse Matrix Factorization Boshko Koloski

Optimizing Sparse Matrix Vector Multiplication on Emerging Multicores Orhan Kislal, Wei Ding,

Squeezing GPU performance GPGPU 2015: High Performance Computing with CUDA University of Cape Town