profiling high performance dense linear algebra
play

Profiling High Performance Dense Linear Algebra Algorithms on - PowerPoint PPT Presentation

Profiling High Performance Dense Linear Algebra Algorithms on Multicore Architectures for Power and Energy Efficiency Hatem Ltaief 1 Luszczek 2 Jack Dongarra 2 Piotr 1 KAUST Supercomputing Laboratory Thuwal, Saudi Arabia 2 Innovative


  1. Profiling High Performance Dense Linear Algebra Algorithms on Multicore Architectures for Power and Energy Efficiency Hatem Ltaief 1 Luszczek 2 Jack Dongarra 2 Piotr � 1 KAUST Supercomputing Laboratory Thuwal, Saudi Arabia 2 Innovative Computing Laboratory University of Tennessee Knoxville EnaHPC’11 Conference Hamburg, Germany Ltaief, � Luszczek, Dongarra (KAUST, UTK) Energy Profiling of DLA Algorithms EnaHPC 2011 1 / 28

  2. Outline The ”K” Computer 1 A Look Back... 2 LAPACK: Block Algorithms 3 PLASMA: Tile Algorithms 4 Power Analysis 5 Summary and Future Work 6 Ltaief, � Luszczek, Dongarra (KAUST, UTK) Energy Profiling of DLA Algorithms EnaHPC 2011 2 / 28

  3. Outline The ”K” Computer 1 A Look Back... 2 LAPACK: Block Algorithms 3 PLASMA: Tile Algorithms 4 Power Analysis 5 Summary and Future Work 6 Ltaief, � Luszczek, Dongarra (KAUST, UTK) Energy Profiling of DLA Algorithms EnaHPC 2011 2 / 28

  4. Outline The ”K” Computer 1 A Look Back... 2 LAPACK: Block Algorithms 3 PLASMA: Tile Algorithms 4 Power Analysis 5 Summary and Future Work 6 Ltaief, � Luszczek, Dongarra (KAUST, UTK) Energy Profiling of DLA Algorithms EnaHPC 2011 2 / 28

  5. Outline The ”K” Computer 1 A Look Back... 2 LAPACK: Block Algorithms 3 PLASMA: Tile Algorithms 4 Power Analysis 5 Summary and Future Work 6 Ltaief, � Luszczek, Dongarra (KAUST, UTK) Energy Profiling of DLA Algorithms EnaHPC 2011 2 / 28

  6. Outline The ”K” Computer 1 A Look Back... 2 LAPACK: Block Algorithms 3 PLASMA: Tile Algorithms 4 Power Analysis 5 Summary and Future Work 6 Ltaief, � Luszczek, Dongarra (KAUST, UTK) Energy Profiling of DLA Algorithms EnaHPC 2011 2 / 28

  7. Outline The ”K” Computer 1 A Look Back... 2 LAPACK: Block Algorithms 3 PLASMA: Tile Algorithms 4 Power Analysis 5 Summary and Future Work 6 Ltaief, � Luszczek, Dongarra (KAUST, UTK) Energy Profiling of DLA Algorithms EnaHPC 2011 2 / 28

  8. The ”K” Computer Motivations Ltaief, � Luszczek, Dongarra (KAUST, UTK) Energy Profiling of DLA Algorithms EnaHPC 2011 3 / 28

  9. The ”K” Computer Motivations 10 MW needed to feed the baby Exascale roadmap says up to 20 MW Huge challenge: achieving 2 orders of magnitude in performance by only doubling the power rate Co-designed Hardware and Software solutions Ltaief, � Luszczek, Dongarra (KAUST, UTK) Energy Profiling of DLA Algorithms EnaHPC 2011 4 / 28

  10. The ”K” Computer Motivations 10 MW needed to feed the baby Exascale roadmap says up to 20 MW Huge challenge: achieving 2 orders of magnitude in performance by only doubling the power rate Co-designed Hardware and Software solutions Ltaief, � Luszczek, Dongarra (KAUST, UTK) Energy Profiling of DLA Algorithms EnaHPC 2011 4 / 28

  11. The ”K” Computer Motivations 10 MW needed to feed the baby Exascale roadmap says up to 20 MW Huge challenge: achieving 2 orders of magnitude in performance by only doubling the power rate Co-designed Hardware and Software solutions Ltaief, � Luszczek, Dongarra (KAUST, UTK) Energy Profiling of DLA Algorithms EnaHPC 2011 4 / 28

  12. The ”K” Computer Motivations 10 MW needed to feed the baby Exascale roadmap says up to 20 MW Huge challenge: achieving 2 orders of magnitude in performance by only doubling the power rate Co-designed Hardware and Software solutions Ltaief, � Luszczek, Dongarra (KAUST, UTK) Energy Profiling of DLA Algorithms EnaHPC 2011 4 / 28

  13. A Look Back... Software infrastructure and algorithmic design follow hardware evolution in time: 70’s - LINPACK, vector operations: Level-1 BLAS operation 80’s - LAPACK, block, cache-friendly: Level-3 BLAS operation 90’s - ScaLAPACK, distributed memory: PBLAS Message passing 00’s: PLASMA, many-cores friendly: DAG scheduler, tile data layout, some extra kernels Ltaief, � Luszczek, Dongarra (KAUST, UTK) Energy Profiling of DLA Algorithms EnaHPC 2011 5 / 28

  14. A Look Back... Software infrastructure and algorithmic design follow hardware evolution in time: 70’s - LINPACK, vector operations: Level-1 BLAS operation 80’s - LAPACK, block, cache-friendly: Level-3 BLAS operation 90’s - ScaLAPACK, distributed memory: PBLAS Message passing 00’s: PLASMA, many-cores friendly: DAG scheduler, tile data layout, some extra kernels Ltaief, � Luszczek, Dongarra (KAUST, UTK) Energy Profiling of DLA Algorithms EnaHPC 2011 5 / 28

  15. A Look Back... Software infrastructure and algorithmic design follow hardware evolution in time: 70’s - LINPACK, vector operations: Level-1 BLAS operation 80’s - LAPACK, block, cache-friendly: Level-3 BLAS operation 90’s - ScaLAPACK, distributed memory: PBLAS Message passing 00’s: PLASMA, many-cores friendly: DAG scheduler, tile data layout, some extra kernels Ltaief, � Luszczek, Dongarra (KAUST, UTK) Energy Profiling of DLA Algorithms EnaHPC 2011 5 / 28

  16. A Look Back... Software infrastructure and algorithmic design follow hardware evolution in time: 70’s - LINPACK, vector operations: Level-1 BLAS operation 80’s - LAPACK, block, cache-friendly: Level-3 BLAS operation 90’s - ScaLAPACK, distributed memory: PBLAS Message passing 00’s: PLASMA, many-cores friendly: DAG scheduler, tile data layout, some extra kernels Ltaief, � Luszczek, Dongarra (KAUST, UTK) Energy Profiling of DLA Algorithms EnaHPC 2011 5 / 28

  17. LAPACK: Block Algorithms Principles Panel-Update Sequence Transformations are blocked/accumulated within the Panel (Level 2 BLAS) Transformations applied at once on the trailing submatrix (Level 3 BLAS) Parallelism hidden inside the BLAS Fork-join Model Ltaief, � Luszczek, Dongarra (KAUST, UTK) Energy Profiling of DLA Algorithms EnaHPC 2011 6 / 28

  18. LAPACK: Block Algorithms Principles Panel-Update Sequence Transformations are blocked/accumulated within the Panel (Level 2 BLAS) Transformations applied at once on the trailing submatrix (Level 3 BLAS) Parallelism hidden inside the BLAS Fork-join Model Ltaief, � Luszczek, Dongarra (KAUST, UTK) Energy Profiling of DLA Algorithms EnaHPC 2011 6 / 28

  19. LAPACK: Block Algorithms Principles Panel-Update Sequence Transformations are blocked/accumulated within the Panel (Level 2 BLAS) Transformations applied at once on the trailing submatrix (Level 3 BLAS) Parallelism hidden inside the BLAS Fork-join Model Ltaief, � Luszczek, Dongarra (KAUST, UTK) Energy Profiling of DLA Algorithms EnaHPC 2011 6 / 28

  20. LAPACK: Block Algorithms Principles Panel-Update Sequence Transformations are blocked/accumulated within the Panel (Level 2 BLAS) Transformations applied at once on the trailing submatrix (Level 3 BLAS) Parallelism hidden inside the BLAS Fork-join Model Ltaief, � Luszczek, Dongarra (KAUST, UTK) Energy Profiling of DLA Algorithms EnaHPC 2011 6 / 28

  21. LAPACK: Block Algorithms Principles Panel-Update Sequence Transformations are blocked/accumulated within the Panel (Level 2 BLAS) Transformations applied at once on the trailing submatrix (Level 3 BLAS) Parallelism hidden inside the BLAS Fork-join Model Ltaief, � Luszczek, Dongarra (KAUST, UTK) Energy Profiling of DLA Algorithms EnaHPC 2011 6 / 28

  22. LAPACK: Block Algorithms LU, QR, Cholesky L A N I F UPDATE L PANEL A N I F UPDATE PANEL PANEL (a) First step. (b) Second step. (c) Third step. Figure: Panel-update sequences for the LAPACK one-sided factorizations. Ltaief, � Luszczek, Dongarra (KAUST, UTK) Energy Profiling of DLA Algorithms EnaHPC 2011 7 / 28

  23. LAPACK: Block Algorithms Hessenberg, TRD and BRD L L E A N N A I P F L A N L I E F N A P UPDATE UPDATE UPDATE PANEL (a) First step. (b) Second step. (c) Third step. Figure: Panel-update sequences for the LAPACK two-sided transformations. Ltaief, � Luszczek, Dongarra (KAUST, UTK) Energy Profiling of DLA Algorithms EnaHPC 2011 8 / 28

  24. LAPACK: Block Algorithms Fork-Join Paradigm Ltaief, � Luszczek, Dongarra (KAUST, UTK) Energy Profiling of DLA Algorithms EnaHPC 2011 9 / 28

  25. PLASMA: Tile Algorithms PLASMA: Tile Algorithms PLASMA: Parallel Linear Algebra for Scalable Multi-core Architectures = ⇒ http://icl.cs.utk.edu/plasma/ Parallelism is brought to the fore May require the redesign of linear algebra algorithms Tile data layout translation Remove unnecessary synchronization points between Panel-Update sequences DAG execution where nodes represent tasks and edges define dependencies between them Dynamic runtime system environment QUARK Ltaief, � Luszczek, Dongarra (KAUST, UTK) Energy Profiling of DLA Algorithms EnaHPC 2011 10 / 28

  26. PLASMA: Tile Algorithms PLASMA: Tile Algorithms PLASMA: Parallel Linear Algebra for Scalable Multi-core Architectures = ⇒ http://icl.cs.utk.edu/plasma/ Parallelism is brought to the fore May require the redesign of linear algebra algorithms Tile data layout translation Remove unnecessary synchronization points between Panel-Update sequences DAG execution where nodes represent tasks and edges define dependencies between them Dynamic runtime system environment QUARK Ltaief, � Luszczek, Dongarra (KAUST, UTK) Energy Profiling of DLA Algorithms EnaHPC 2011 10 / 28

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend