Improving Power efficiency of Dense Linear Algebra Algorithms on - PowerPoint PPT Presentation

The 2011 International Conference on High Performance Computing & Simulation Workshop on Optimization Issues in Energy Efficient Distributed Systems Improving Power efficiency of Dense Linear Algebra Algorithms on Multi-Core Processors via Slack Control Pedro Alonso 1 , Manuel F. Dolz 2 , Rafael Mayo 2 , Enrique S. Quintana-Ort´ ı 2 1 2 July 4–8, 2011, Istanbul (Turkey)

Introduction Theoretical approach Slack Reduction Algorithm Experimental results Conclusions and future work Motivation High performance computing: Optimization of algorithms applied to solve complex problems Technological advance ⇒ improve performance: Processors works at higher frequencies Higher number of cores per socket (processor) Large number of processors and cores ⇒ High energy consumption Methods, algorithms and techniques to reduce energy consumption applied to high performance computing. Reduce the frequency of processors with DVFS technique Pedro Alonso et al Improving Power efficiency of DLA Algorithms on Multi-Core Processors

Introduction Theoretical approach Slack Reduction Algorithm Experimental results Conclusions and future work Outline Introduction 1 Theoretical approach 2 The Critical Path Method Application to dense linear algebra algorithms 3 Slack Reduction Algorithm Previous steps Slack reduction Simulator Experimental results 4 Description Cholesky factorization QR factorization Conclusions and future work 5 Conclusions Future work Pedro Alonso et al Improving Power efficiency of DLA Algorithms on Multi-Core Processors

Introduction Theoretical approach Slack Reduction Algorithm Experimental results Conclusions and future work Introduction Scheduling tasks of dense linear algebra algorithms Examples: Cholesky, QR and LU factorizations Energy saving tools available for multi-core processors Example: Dynamic Voltage and Frequency Scaling (DVFS) Scheduling tasks + DVFS ⇓ Power-aware scheduling on multi-core processors Our strategy : Reduce the frequency of cores that will execute non-critical tasks to decrease idle times without sacrifying total performance of the algorithm ⇓ Energy saving Pedro Alonso et al Improving Power efficiency of DLA Algorithms on Multi-Core Processors

Introduction Theoretical approach The Critical Path Method Slack Reduction Algorithm Application to dense linear algebra algorithms Experimental results Conclusions and future work The Critical Path Method i j C ij ES i LF i ES j LF j S ij ES i =max(ES k + C ki ) LF j =min(LF k + C jk ) Concepts: S ij =ES j - ES i - C ij DAG of dependencies Nodes ⇒ Temporal events Edges ⇒ Tasks Times Early and latest times to start and finalize execution of tasks Total slack : Amount of time that a task can be delayed without increasing the total execution time of the algorithm Critical path: Formed by a succession of tasks, from initial to final node of the graph, with total slack = 0. Pedro Alonso et al Improving Power efficiency of DLA Algorithms on Multi-Core Processors

Introduction Theoretical approach The Critical Path Method Slack Reduction Algorithm Application to dense linear algebra algorithms Experimental results Conclusions and future work Application to dense linear algebra algorithms Objective ⇒ obtain the dependency graph corresponding to the computation of a dense linear algebra algorithm, apply the Critical Path Method to analize slacks and reducing them with our Slack Reduction Algorithm Example: Cholesky factorization of a matrix consisting of 3 × 3 blocks for k = 1 , 2 , . . . , s do b 3 A kk = L kk L T Cholesky factorization 3 flops � 0 , 33 u . t . kk for i = k + 1 , k + 2 , . . . , s do b 3 flops � 1 u . t . A ik ← A ik L − T Triangular system solve kk end for for i = k + 1 , k + 2 , . . . , s do for j = k + 1 , k + 2 , . . . , i − 1 do 2 b 3 flops � 2 u . t . A ij ← A ij − A ik A T Matrix-matrix product jk end for b 3 flops � 1 u . t . A ii ← A ii − A ik A T Symmetric rank- b update ik end for end for Pedro Alonso et al Improving Power efficiency of DLA Algorithms on Multi-Core Processors

Introduction Theoretical approach The Critical Path Method Slack Reduction Algorithm Application to dense linear algebra algorithms Experimental results Conclusions and future work Application to dense linear algebra algorithms Taks-node DAG capturing the data dependencies in the computation of the Cholesky factorization of a matrix consisting of 3 × 3 blocks S 221(1) P 222(0.33) T 211(1) P 111(0.33) G 321(2) T 322(1) T 311(1) S 332(1) P 333(0.33) S 331(1) Graph transformation in order to apply CPM Conversion from task-node to task-edge graph S 331(1) G 321(2) T 322(1) S 332(1) P 333(0.33) NULL T 311(1) 2 3 4 5 6 7 P 111(0.33) NULL 0 1 T 211(1) S 221(1) 8 P 222(0.33) 9 Pedro Alonso et al Improving Power efficiency of DLA Algorithms on Multi-Core Processors

Introduction Theoretical approach The Critical Path Method Slack Reduction Algorithm Application to dense linear algebra algorithms Experimental results Conclusions and future work Application to dense linear algebra algorithms Application of CPM to the task-edge DAG of the Cholesky factorization of a matrix consisting of 3 × 3 blocks Task i − j Ci , j ESi LFj Si , j 0-1 0.33 0 0.33 0 P 111 T 211 1-8 1 0.33 1.33 0 1-2 1 0.33 1.33 0 T 311 2-3 0 1.33 1.33 0 NULL S 221 8-9 1 1.33 3 0.67 3-4 2 1.33 3.33 0 G 321 2-5 1 1.33 4.33 2 S 331 P 222 9-4 0.33 2.33 3.33 0.67 4-5 1 3.33 4.33 0 T 322 S 332 5-6 1 4.33 5.33 0 6-7 0.33 5.33 5.67 0 P 333 8-3 0 1.33 1.33 0 NULL Critical path: S 331(1) NULL G 321(2) T 322(1) S 332(1) P 333(0.33) T 311(1) 2 3 4 5 6 7 P 111(0.33) NULL 0 1 T 211(1) S 221(1) 8 P 222(0.33) 9 Objective: tune the slack of those tasks with S i , j > 0, reducing its execution frequency and yielding low power usage → Slack Reduction Algorithm Pedro Alonso et al Improving Power efficiency of DLA Algorithms on Multi-Core Processors

Improving Power efficiency of Dense Linear Algebra Algorithms on - PowerPoint PPT Presentation

The 2011 International Conference on High Performance Computing & Simulation Workshop on Optimization Issues in Energy Efficient Distributed Systems Improving Power efficiency of Dense Linear Algebra Algorithms on Multi-Core Processors via

Lecture 14: Dense Linear Algebra David Bindel 18 Oct 2010 Where we are This week: dense

Automation in Dense Linear Algebra Paper by Paolo Bientinesi and Robert van de Geijn Presented by

Chapter 1 What is Linear Algebra? Chapter 1 What is Linear Algebra? The study of linear

Graphics 2014 Linear Algebra II Linear Maps & Matrices Linear Maps & Matrices CORE

Dense Flow Visualization Lecture 10 February 27, 2020 General Overview Dense methods in 2D

A Massively Parallel Dense Symmetric A Massively Parallel Dense Symmetric A Massively Parallel

Results for different matrices and comparisons Dense Matrices Rectangular Matrices

Linear Algebra Linear algebra has become as basic and as applicable as calculus, and

PV Math Department MCL Vision Credit Options Credit General General/Post- College Honors

CS 294-73 Software Engineering for Scientific Computing Lecture 10:Dense Linear

Linear algebra explained in four pages Excerpt from the N O BULLSHIT GUIDE TO LINEAR ALGEBRA by

Matrices Basic Linear Algebra Overview Lecture will cover why matrices and linear algebra

MATRICES AND LINEAR ALGEBRA Linear Algebra Matrix manipulation is the original essence of

Expressive Linear Algebra in Haskell Henning Thielemann 2019-08-21 Expressive Linear Algebra in

Profiling High Performance Dense Linear Algebra Algorithms on Multicore Architectures for Power

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Math for Liberal Arts MAT 110: Chapter 13 Notes Networks and Euler Circuits Graph Theory David

Gang FTP scheduling of periodic and parallel rigid real-time tasks Jo Vandy B erten el G

5 Time-Saving Tips for Busy SysAid Admins June 27, 2018 Meet Our Hosts Oded Moshe Tsahi

Greedy Algorithms Algorithm Theory WS 2013/14 Fabian Kuhn Traveling Salesperson Problem (TSP)

Real-time actionable customer insights Survey Dynamix Fully cloud based no premise option

Homeless Couples and Relationships DILLON MICHELIS, SENIOR CASEWORKER (PREVIOUS COMPLEX COUPLES

WRP SC with Committee Co-Chair Call April 6, 2016 Agenda 1. Past Efforts/Information Sharing:

A Parallel External- -Memory Memory A Parallel External Frontier Breadth- -First Traversal

Improving Power efficiency of Dense Linear Algebra Algorithms on - PowerPoint PPT Presentation

The 2011 International Conference on High Performance Computing & Simulation Workshop on Optimization Issues in Energy Efficient Distributed Systems Improving Power efficiency of Dense Linear Algebra Algorithms on Multi-Core Processors via

Lecture 14: Dense Linear Algebra David Bindel 18 Oct 2010 Where we are This week: dense

Automation in Dense Linear Algebra Paper by Paolo Bientinesi and Robert van de Geijn Presented by

Chapter 1 What is Linear Algebra? Chapter 1 What is Linear Algebra? The study of linear

Graphics 2014 Linear Algebra II Linear Maps &amp; Matrices Linear Maps &amp; Matrices CORE

Dense Flow Visualization Lecture 10 February 27, 2020 General Overview Dense methods in 2D

A Massively Parallel Dense Symmetric A Massively Parallel Dense Symmetric A Massively Parallel

Results for different matrices and comparisons Dense Matrices Rectangular Matrices

Linear Algebra Linear algebra has become as basic and as applicable as calculus, and

PV Math Department MCL Vision Credit Options Credit General General/Post- College Honors

CS 294-73 Software Engineering for Scientific Computing Lecture 10:Dense Linear

Linear algebra explained in four pages Excerpt from the N O BULLSHIT GUIDE TO LINEAR ALGEBRA by

Matrices Basic Linear Algebra Overview Lecture will cover why matrices and linear algebra

MATRICES AND LINEAR ALGEBRA Linear Algebra Matrix manipulation is the original essence of

Expressive Linear Algebra in Haskell Henning Thielemann 2019-08-21 Expressive Linear Algebra in

Profiling High Performance Dense Linear Algebra Algorithms on Multicore Architectures for Power

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Math for Liberal Arts MAT 110: Chapter 13 Notes Networks and Euler Circuits Graph Theory David

Gang FTP scheduling of periodic and parallel rigid real-time tasks Jo Vandy B erten el G

5 Time-Saving Tips for Busy SysAid Admins June 27, 2018 Meet Our Hosts Oded Moshe Tsahi

Greedy Algorithms Algorithm Theory WS 2013/14 Fabian Kuhn Traveling Salesperson Problem (TSP)

Real-time actionable customer insights Survey Dynamix Fully cloud based no premise option

Homeless Couples and Relationships DILLON MICHELIS, SENIOR CASEWORKER (PREVIOUS COMPLEX COUPLES

WRP SC with Committee Co-Chair Call April 6, 2016 Agenda 1. Past Efforts/Information Sharing:

A Parallel External- -Memory Memory A Parallel External Frontier Breadth- -First Traversal

Graphics 2014 Linear Algebra II Linear Maps & Matrices Linear Maps & Matrices CORE