SPARSITY: Optimization Framework For Sparse Matrix Kernels Eun-Jin - PowerPoint PPT Presentation

Dec 28, 2022 •397 likes •621 views

SPARSITY: Optimization Framework For Sparse Matrix Kernels Eun-Jin Im, Katherine Yelick, Richard Vuduc International Journal of High Performance Computing Applications 2004 18: 135 The online version of this article can be found at:

SPARSITY: Optimization Framework For Sparse Matrix Kernels Eun-Jin Im, Katherine Yelick, Richard Vuduc International Journal of High Performance Computing Applications 2004 18: 135 The online version of this article can be found at: http://hpc.sagepub.com/content/18/1/135 Published by: http://www.sagepublications.com
One Operation = ⋅ MATLAB, file from http://www.cise.ufl.edu/research/sparse/matrices/Simon/venkat01.html
Motivation http://3.bp.blogspot.com/-jwj51xaDhsk/Thk3KtjWwsI/AAAAAAAAAOA/P8eNt0_MJUQ/s1600/Challenger2.gif http://www.erneuerbareenergiequellen.com/pictures/other/oil_some_questions/oil_rig.jpg http://eu.art.com/products/p14342284-sa-i2886553/posters.htm?ui=BFBAB751660645AA8C02F859E5BAD142 http://www.aspsys.com/userfiles/image/fluent3.jpg http://www.bloodhoundssc.com/_db/_images/airliner_resized.jpg http://www.fft.be/images/documents/219.jpg http://www.onu.edu/files/images/alumni/Flow_around_object.jpg http://t0.gstatic.com/images?q=tbn:ANd9GcQDP4JEXQNigtR04rNdj2gBvI8QpO1Sf1k2hcOMF9yXWqP_PCQb
Machines Processor Clock (MHz) Data Cache DGEMV DGEMM sizes (MFLOPS) (MFLOPS) Sun Ultra Sparc IIi 333 L1: 16 KB 58 425 L2: 2 MB Intel Pentium III-Mobile 800 L1: 16 KB 147 590 L2: 256 MB IBM Power 4 1300 L1: 64 KB 915 3500 L2: 1.5 MB L3: 32 MB Intel Itanium 2 900 L1: 16 KB 1330 3500 L2: 256 KB L3: 3 MB
CSR: Compressed Sparse Row Format 3 0 0 5 3 5 1 7 2 4 Values: 0 1 7 0 0 3 1 2 2 4 Column Index: 0 0 2 0 0 0 0 4 0 2 3 5 6 Row start Index:
Register-Blocking 3 0 0 5 3 0 0 1 0 5 7 0 2 0 0 4 Values: 0 1 7 0 0 0 2 0 0 2 2 Column Index: 0 0 0 4 0 2 3 Row start Index:
Example for Register-Blocking
Example Results
Performance Model: Machine Profile
Performance Model: Fill-Overhead 3 0 0 5 0 1 7 0 12 6 = 2 0 0 2 0 0 0 0 4
Performance Model Example on Intel Itanium 2 with 2×2 block-size: 3 0 0 5 0 1 7 0 12 6 = 2 0 0 2 0 0 0 0 4 2.54 = 1.27 2
Register-Blocking Speedup: Intel Pentium III-M
Register-Blocking Speedup: Intel Itanium 2
Cache-Blocking 3 1 5 7 2 4 Values: 3 0 0 5 0 1 7 0 0 1 3 2 2 3 Column Index: 0 0 2 0 0 0 0 4 0 1 2 3 4 5 6 Block start Index: 0 4 7 Block row start:
Cache-Blocking
Benchmark Cache-Blocking
Cache-Blocking Speedup
Multiple Vectors u 0 v 0 y 00 y 01 3 0 0 5 u 1 v 1 y 10 y 11 0 1 7 0 = ⋅ u 2 v 2 y 20 y 21 0 0 2 0 u 3 v 3 y 30 y 31 0 0 0 4 3 ⋅ u 0 + 0 ⋅ u 1 = y 00 3 ⋅ u 0 + 0 ⋅ u 1 = y 00 ( 1 ) ( 1 ) 0 ⋅ u 0 + 1 ⋅ u 1 = y 10 0 ⋅ u 0 + 1 ⋅ u 1 = y 10 ( 2 ) ( 2 ) 3 ⋅ v 0 + 0 ⋅ v 1 = y 01 ( 3 ) ⋯ 0 ⋅ v 0 + 1 ⋅ v 1 = y 11 ( 4 ) 3 ⋅ v 0 + 0 ⋅ v 1 = y 01 ( nz + 1 ) 0 ⋅ v 0 + 1 ⋅ v 1 = y 11 ( nz + 2 ) nz = number of non-zero elements in A
Multiple Vectors Speedup: Intel Pentium III-M
Multiple Vectors Speedup: Intel Itanium 2
SPARSITY System Graph: Paper
Conclusion 4x improvement for register-blocking  2x for cache-blocking  10x for register-blocking combined with multiple vectors  Lot of publications in reference to SPARSITY 

Recommend

The Gray Code Kernels The Gray Code Kernels The Gray Code Kernels Gil Ben-Artzi Hagit Hel-Or

The Gray Code Kernels The Gray Code Kernels The Gray Code Kernels Gil Ben-Artzi Hagit Hel-Or Yacov Hel-Or Bar-Ilan University Haifa University IDC 1 Motivation Motivation Image filtering with a successive set of kernels is very

1.06k views • 58 slides

Overview: Kernels for Sequences and Graphs String Kernels 8 Example Sequence Classification

Memorial Sloan-Kettering Cancer Center Overview: Kernels for Sequences and Graphs String Kernels 8 Example Sequence Classification Position-(In)dependent Kernels Advanced Kernels Easysvm Kernels on Graphs 9 Basics Random Walks Subtrees

1.8k views • 148 slides

Sparsity, Randomness and Compressed Sensing Petros Boufounos Mitsubishi Electric Research Labs

Sparsity, Randomness and Compressed Sensing Petros Boufounos Mitsubishi Electric Research Labs petrosb@merl.com Sparsity Why Sparsity Naturaldataandsignalsexhibit structure Sparsity o2encapturesthat

1.07k views • 74 slides

Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity 2020/11

Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity 2020/11 Shanghai Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity Cong Guo 1 , Bo Yang Hsueh 2 , Jingwen Leng 1 , Yuxian Qiu 1 ,

420 views • 27 slides

Beta kernels and transformed kernels applications to copulas and quantiles Arthur Charpentier

Arthur CHARPENTIER, transformed kernels and beta kernels Beta kernels and transformed kernels applications to copulas and quantiles Arthur Charpentier Universit Rennes 1 arthur.charpentier@univ-rennes1.fr http

1.18k views • 96 slides

Kernels on structures Andrea Passerini passerini@disi.unitn.it Machine Learning Kernels on

Kernels on structures Andrea Passerini passerini@disi.unitn.it Machine Learning Kernels on structures Kernels on structures Similarity between structured data Kernels allow to generalize notion of dot product (i.e. similarity) to arbitrary

396 views • 37 slides

High-performance and Memory-saving Sparse General Matrix-Matrix Multiplication for Pascal GPU

High-performance and Memory-saving Sparse General Matrix-Matrix Multiplication for Pascal GPU Yusuke Nagasaka, Akira Nukada, Satoshi Matsuoka Tokyo Institute of Technology Sparse General Matrix-Matrix Multiplication (SpGEMM) Numerical

943 views • 34 slides

Machine Learning and Sparsity Klaus-Robert Mller !!et al.!! Todays Talk sensing, sparse

Machine Learning and Sparsity Klaus-Robert Mller !!et al.!! Todays Talk sensing, sparse models and generalization interpretabilty and sparse methods explaining for nonlinear methods Sparse Models & Generalization? Machine

458 views • 25 slides

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are zero lower triangular (?) dense few elements are zero These are structured sparse matrices. May be mapped into a 1D array so that a mapping

392 views • 6 slides

Exploiting Matrix Reuse and Data Locality in Sparse Matrix-Vector and Matrix-Transpose-Vector

Matrix reuse and data locality in parallel y = A z and z = A T x Exploiting Matrix Reuse and Data Locality in Sparse Matrix-Vector and Matrix-Transpose-Vector Multiplication on Many-Core Architectures Kadir Akbudak Ozan Karsavuran 1 (speaker)

822 views • 15 slides

Parallel Sparse Matrix-Vector and Matrix- Transpose-Vector Multiplication using Compressed Sparse

Parallel Sparse Matrix-Vector and Matrix- Transpose-Vector Multiplication using Compressed Sparse Blocks Aydn Bulu, UCSB Jeremy T. Fineman (MIT) Matteo Frigo (Cilk Arts) John R. Gilbert (UCSB) Charles E. Leiserson (MIT & Cilk Arts) 1

524 views • 21 slides

Sparse Matrix Partitioning, Reordering and Vector Multiplication Albert-Jan Yzelman, Utrecht

Sparse Matrix Partitioning, Reordering and Vector Multiplication Sparse Matrix Partitioning, Reordering and Vector Multiplication Albert-Jan Yzelman, Utrecht University (NL) May, 2010 Albert-Jan Yzelman, Utrecht University (NL) Sparse Matrix

1.26k views • 97 slides

Matrix Calculations: Kernels & Images, Matrix Multiplication A. Kissinger (and H. Geuvers)

Matrix multiplication Matrix inverse Radboud University Nijmegen Kernel and image Matrix Calculations: Kernels & Images, Matrix Multiplication A. Kissinger (and H. Geuvers) Institute for Computing and Information Sciences Intelligent

618 views • 40 slides

Empirical Testing of Sparse Approximation and Matrix Completion Algorithms Jared Tanner Workshop

Sparse Approximation Phase Transitions Matrix completion Empirical Testing of Sparse Approximation and Matrix Completion Algorithms Jared Tanner Workshop on Sparsity, Compressed Sensing and Applications University of Oxford

655 views • 34 slides

Sparse Separable Nonnegative Matrix Factorization Extending Separable NMF with 0 sparsity

Sparse Separable Nonnegative Matrix Factorization Extending Separable NMF with 0 sparsity constraints Nicolas Nadisic, Arnaud Vandaele, Jeremy Cohen, Nicolas Gillis 9 October 2020 GdR MIA Thematic Day Universit e de Mons, Belgium

1.03k views • 44 slides

[3] The Matrix What is a matrix? Traditional answer Neo: What is the Matrix? Trinity: The answer

The Matrix [3] The Matrix What is a matrix? Traditional answer Neo: What is the Matrix? Trinity: The answer is out there, Neo, and its looking for you, and it will find you if you want it to. The Matrix , 1999 Traditional notion of a matrix:

1.43k views • 120 slides

Proving Security Protocols Correct Lawrence C. Paulson Computer Laboratory How Detailed Should a

Proving Security Protocols Correct Lawrence C. Paulson Computer Laboratory How Detailed Should a Model Be? too detailed too simple concrete abstract not usable not credible ``proves'' ``attacks'' everything everything publications 1

590 views • 29 slides

HermitCore A Unikernel for Extreme Scale Computing Stefan Lankes 1 , Simon Pickartz 1 , Jens

HermitCore A Unikernel for Extreme Scale Computing Stefan Lankes 1 , Simon Pickartz 1 , Jens Breitbart 2 1 RWTH Aachen University, Germany 2 Technische Universitt Mnchen, Germany Agenda Motivation OS Architectures HermitCore Design

648 views • 38 slides

ASSURED: Architecture for Secure Software Update of Realistic Embedded Devices N. Asokan 1 ,

ASSURED: Architecture for Secure Software Update of Realistic Embedded Devices N. Asokan 1 , Thomas Nyman 1,2 , Norrathep Rattanavipanon 3 , Ahmad-Reza Sadeghi 4 , Gene Tsudik 3 1 Aalto University, 2 Trustonic, 3 University of California, Irvine,

537 views • 26 slides

File Systems CS 4410, Opera4ng Systems Fall 2016 Cornell University Rachit Agarwal Anne Bracy

File Systems CS 4410, Opera4ng Systems Fall 2016 Cornell University Rachit Agarwal Anne Bracy See: Ch 13 in OSPP textbook The slides are the product of many rounds of teaching CS 4410 by Professors Sirer, Bracy, Agarwal, George, and Van

431 views • 26 slides

Chipping Away at Censorship with User-Generated Content Sam Burnett, Nick Feamster and Santosh

Chipping Away at Censorship with User-Generated Content Sam Burnett, Nick Feamster and Santosh Vempala Internet Censorship is a Problem 12 censors 11 monitors More on the way Some censors have fastest growth in Internet usage

626 views • 43 slides

Inference in first-order logic Chapter 9 1 Outline Reducing first-order inference to

Inference in first-order logic Chapter 9 1 Outline Reducing first-order inference to propositional inference Unification Generalized Modus Ponens Forward and backward chaining Logic programming Resolution 2 Reasoning in

987 views • 40 slides

CROSS-COUNTRY DIFFERENCES IN BUSINESS DYNAMICS AND IN A ALLOCATION OF RESOURCES TO OCATION OF

CROSS-COUNTRY DIFFERENCES IN BUSINESS DYNAMICS AND IN A ALLOCATION OF RESOURCES TO OCATION OF RESOURCES TO PATENTING FIRMS: NEW EVIDENCE FROM MICRO DATA AND THE ROLE OF PO ICIES OF POLICIES Chiara Criscuolo OECD Science Technology and

906 views • 68 slides

12/3/2016 I have no financial interests in this subject matter. Jennifer Rose-Nussbaumer, MD

12/3/2016 I have no financial interests in this subject matter. Jennifer Rose-Nussbaumer, MD Assistant Professor UCSF/Proctor Foundation 0.1% topical riboflavin Exposure to UV-A light at a wavelength of 365nm with irradiance of 3mW/cm

199 views • 4 slides