sparsity optimization framework for sparse matrix kernels
play

SPARSITY: Optimization Framework For Sparse Matrix Kernels Eun-Jin - PowerPoint PPT Presentation

SPARSITY: Optimization Framework For Sparse Matrix Kernels Eun-Jin Im, Katherine Yelick, Richard Vuduc International Journal of High Performance Computing Applications 2004 18: 135 The online version of this article can be found at:


  1. SPARSITY: Optimization Framework For Sparse Matrix Kernels Eun-Jin Im, Katherine Yelick, Richard Vuduc International Journal of High Performance Computing Applications 2004 18: 135 The online version of this article can be found at: http://hpc.sagepub.com/content/18/1/135 Published by: http://www.sagepublications.com

  2. One Operation = ⋅ MATLAB, file from http://www.cise.ufl.edu/research/sparse/matrices/Simon/venkat01.html

  3. Motivation http://3.bp.blogspot.com/-jwj51xaDhsk/Thk3KtjWwsI/AAAAAAAAAOA/P8eNt0_MJUQ/s1600/Challenger2.gif http://www.erneuerbareenergiequellen.com/pictures/other/oil_some_questions/oil_rig.jpg http://eu.art.com/products/p14342284-sa-i2886553/posters.htm?ui=BFBAB751660645AA8C02F859E5BAD142 http://www.aspsys.com/userfiles/image/fluent3.jpg http://www.bloodhoundssc.com/_db/_images/airliner_resized.jpg http://www.fft.be/images/documents/219.jpg http://www.onu.edu/files/images/alumni/Flow_around_object.jpg http://t0.gstatic.com/images?q=tbn:ANd9GcQDP4JEXQNigtR04rNdj2gBvI8QpO1Sf1k2hcOMF9yXWqP_PCQb

  4. Machines Processor Clock (MHz) Data Cache DGEMV DGEMM sizes (MFLOPS) (MFLOPS) Sun Ultra Sparc IIi 333 L1: 16 KB 58 425 L2: 2 MB Intel Pentium III-Mobile 800 L1: 16 KB 147 590 L2: 256 MB IBM Power 4 1300 L1: 64 KB 915 3500 L2: 1.5 MB L3: 32 MB Intel Itanium 2 900 L1: 16 KB 1330 3500 L2: 256 KB L3: 3 MB

  5. CSR: Compressed Sparse Row Format 3 0 0 5 3 5 1 7 2 4 Values: 0 1 7 0 0 3 1 2 2 4 Column Index: 0 0 2 0 0 0 0 4 0 2 3 5 6 Row start Index:

  6. Register-Blocking 3 0 0 5 3 0 0 1 0 5 7 0 2 0 0 4 Values: 0 1 7 0 0 0 2 0 0 2 2 Column Index: 0 0 0 4 0 2 3 Row start Index:

  7. Example for Register-Blocking

  8. Example Results

  9. Performance Model: Machine Profile

  10. Performance Model: Fill-Overhead 3 0 0 5 0 1 7 0 12 6 = 2 0 0 2 0 0 0 0 4

  11. Performance Model Example on Intel Itanium 2 with 2×2 block-size: 3 0 0 5 0 1 7 0 12 6 = 2 0 0 2 0 0 0 0 4 2.54 = 1.27 2

  12. Register-Blocking Speedup: Intel Pentium III-M

  13. Register-Blocking Speedup: Intel Itanium 2

  14. Cache-Blocking 3 1 5 7 2 4 Values: 3 0 0 5 0 1 7 0 0 1 3 2 2 3 Column Index: 0 0 2 0 0 0 0 4 0 1 2 3 4 5 6 Block start Index: 0 4 7 Block row start:

  15. Cache-Blocking

  16. Benchmark Cache-Blocking

  17. Cache-Blocking Speedup

  18. Multiple Vectors u 0 v 0 y 00 y 01 3 0 0 5 u 1 v 1 y 10 y 11 0 1 7 0 = ⋅ u 2 v 2 y 20 y 21 0 0 2 0 u 3 v 3 y 30 y 31 0 0 0 4 3 ⋅ u 0 + 0 ⋅ u 1 = y 00 3 ⋅ u 0 + 0 ⋅ u 1 = y 00 ( 1 ) ( 1 ) 0 ⋅ u 0 + 1 ⋅ u 1 = y 10 0 ⋅ u 0 + 1 ⋅ u 1 = y 10 ( 2 ) ( 2 ) 3 ⋅ v 0 + 0 ⋅ v 1 = y 01 ( 3 ) ⋯ 0 ⋅ v 0 + 1 ⋅ v 1 = y 11 ( 4 ) 3 ⋅ v 0 + 0 ⋅ v 1 = y 01 ( nz + 1 ) 0 ⋅ v 0 + 1 ⋅ v 1 = y 11 ( nz + 2 ) nz = number of non-zero elements in A

  19. Multiple Vectors Speedup: Intel Pentium III-M

  20. Multiple Vectors Speedup: Intel Itanium 2

  21. SPARSITY System Graph: Paper

  22. Conclusion 4x improvement for register-blocking  2x for cache-blocking  10x for register-blocking combined with multiple vectors  Lot of publications in reference to SPARSITY 

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend