DVFS-Control Techniques for Dense Linear Algebra Operations on - PowerPoint PPT Presentation

International Conference on Energy-Aware High Performance Computing DVFS-Control Techniques for Dense Linear Algebra Operations on Multi-Core Processors Pedro Alonso 1 , Manuel F. Dolz 2 , Francisco D. Igual 2 , Rafael Mayo 2 , Enrique S. Quintana-Ort´ ı 2 1 2 September 07–09, 2011, Hamburg (Germany)

Introduction Dense linear algebra operations Slack Reduction Algorithm Race-to-Idle Algorithm Experimental results Conclusions Motivation High performance computing: Optimization of algorithms applied to solve complex problems Technological advance ⇒ improve performance: Processors works at higher frequencies Higher number of cores per socket (processor) Large number of processors and cores ⇒ High energy consumption Methods, algorithms and techniques to reduce energy consumption applied to high performance computing. Reduce the frequency of processors with DVFS techniques Pedro Alonso et al DVFS for Dense Linear Algebra Operations on Multi-Core Processors

Introduction Dense linear algebra operations Slack Reduction Algorithm Race-to-Idle Algorithm Experimental results Conclusions Outline Introduction 1 2 Dense linear algebra operations Slack Reduction Algorithm 3 Introduction Application Previous steps Slack reduction 4 Race-to-Idle Algorithm Experimental results 5 Simulator Benchmark algorithms Environment setup Results Conclusions 6 Pedro Alonso et al DVFS for Dense Linear Algebra Operations on Multi-Core Processors

Introduction Dense linear algebra operations Slack Reduction Algorithm Race-to-Idle Algorithm Experimental results Conclusions Introduction Scheduling tasks of dense linear algebra algorithms Examples: Cholesky, QR and LU factorizations Energy saving tools available for multi-core processors Example: Dynamic Voltage and Frequency Scaling (DVFS) Scheduling tasks + DVFS ⇓ Power-aware scheduling on multi-core processors Our strategies : Reduce the frequency of cores that will execute non-critical tasks to decrease idle times without sacrifying total performance of the algorithm Execute all tasks at highest frequency to “enjoy” longer inactive periods ⇓ Energy savings Pedro Alonso et al DVFS for Dense Linear Algebra Operations on Multi-Core Processors

Introduction Dense linear algebra operations Slack Reduction Algorithm Race-to-Idle Algorithm Experimental results Conclusions Dense linear algebra operations LU factorization: Factor A = LU , L / U ∈ R n × n unit lower/upper triangular matrices Two algorithms for LU factorization: LU with partial (row) pivoting (traditional version) LU with incremental pivoting ‘ ‘Rapid development of high-performance out-of-core solvers for electromagnetics” T. Joffrain, E.S. Quintana, R. van de Geijn State-if-the-Art in Scientific Computing - PARA 2004 Copenhaguen (Denmark), June 2004 Later called “Tile LU factorization” or “Communication-Avoiding LU factorization with flat tree”. We consider a partitioning of matrix A into blocks of size b × b Pedro Alonso et al DVFS for Dense Linear Algebra Operations on Multi-Core Processors

Introduction Dense linear algebra operations Slack Reduction Algorithm Race-to-Idle Algorithm Experimental results Conclusions Dense linear algebra operations LU factorization with partial (row) pivoting for k = 1 : s do 3 ) b 3 flops ( s − k + 2 A k : s , k = L k : s , k · U kk LU factorization for j = k + 1 : s do b 3 flops A kj ← L − 1 kk · A kj Triangular solve 2 ( s − k ) b 3 flops A k + 1 : s , j ← A k + 1 : s , j − A k + 1 : s , k · A kj Matrix-matrix product end for end for DAG with a matrix consisting of 3 × 3 blocks M 21 G 22 T 21 G 11 T 32 M 32 G 33 T 31 M 31 Pedro Alonso et al DVFS for Dense Linear Algebra Operations on Multi-Core Processors

Introduction Dense linear algebra operations Slack Reduction Algorithm Race-to-Idle Algorithm Experimental results Conclusions Dense linear algebra operations LU factorization with incremental pivoting for k = 1 : s do 2 b 3 A kk = L kk · U kk LU factorization flops 3 for j = k + 1 : s do b 3 flops A kj ← L − 1 kk · A kj Triangular solve end for for i = k + 1 : s do � � � � A kk L kk b 3 flops = · U ik 2 × 1 LU factorization A ik L ik for j = k + 1 : s do � − 1 � � � � � A kj L kk 0 A kj b 3 ← · 2 × 1 Triangular solve 2 flops A ij L ik I A ij end for end for end for Pedro Alonso et al DVFS for Dense Linear Algebra Operations on Multi-Core Processors

Introduction Dense linear algebra operations Slack Reduction Algorithm Race-to-Idle Algorithm Experimental results Conclusions Dense linear algebra operations LU factorization with incremental pivoting DAG with a matrix consisting of 3 × 3 blocks T2 231 T 131 (7.372) (4.273) T 232 (4.273) G2 211 T2 331 T2 332 G 333 (5.246) (7.372) (7.372) (3.311) G 111 (3.311) G 222 G2 322 G2 311 T2 221 (3.311) (5.246) (5.246) (7.372) T 121 T2 321 (4.273) (7.372) Nodes contain execution time of tasks (in milliseconds, ms), for a block size b = 256 on a single-core of and AMD Opteron 6128 running at 2.00 GHz. We will use this info to illustrate our power-saving approach of the SRA! Pedro Alonso et al DVFS for Dense Linear Algebra Operations on Multi-Core Processors

Introduction Dense linear algebra operations Introduction Slack Reduction Algorithm Application Race-to-Idle Algorithm Previous steps Experimental results Slack reduction Conclusions Slack Reduction Algorithm: Introduction Idea Obtain the dependency graph corresponding to the computation of a dense linear algebra algorithm, apply the Critical Path Method to analize slacks and reducing them with our Slack Reduction Algorithm The Critical Path Method: DAG of dependencies Nodes ⇒ Tasks Edges ⇒ Dependencies Times : Early and latest times to start and finalize execution of task T i with cost C i Total slack : Amount of time that a task can be delayed without increasing the total execution time of the algorithm Critical path : Formed by a succession of tasks, from initial to final node of the graph, with total slack = 0. Pedro Alonso et al DVFS for Dense Linear Algebra Operations on Multi-Core Processors

Introduction Dense linear algebra operations Introduction Slack Reduction Algorithm Application Race-to-Idle Algorithm Previous steps Experimental results Slack reduction Conclusions Application to dense linear algebra algorithms Application of CPM to the DAG of the LU factorization with incremental pivoting of a matrix consisting of 3 × 3 blocks: Task C ES LF S G 111 3.311 0.000 3.311 0 T 121 4.273 3.311 8.558 0.973 5.246 3.311 8.558 0 G2 211 G2 311 5.246 3.311 11.869 3.311 T 131 4.273 3.311 12.842 5.257 7.372 8.558 19.241 3.311 T2 321 G2 322 5.246 19.241 24.488 0 T2 332 7.373 24.488 31.861 0 G 333 3.311 31.861 35.171 0 T2 331 7.372 8.558 24.488 8.558 7.372 8.558 15.930 0 T2 221 G 222 3.311 15.930 19.241 0 T 232 4.273 19.241 24.488 0.973 7.372 8.558 20.214 4.284 T2 231 Objective: tune the slack of those tasks with S > 0, reducing its execution frequency and yielding low power usage → Slack Reduction Algorithm Pedro Alonso et al DVFS for Dense Linear Algebra Operations on Multi-Core Processors

DVFS-Control Techniques for Dense Linear Algebra Operations on - PowerPoint PPT Presentation

International Conference on Energy-Aware High Performance Computing DVFS-Control Techniques for Dense Linear Algebra Operations on Multi-Core Processors Pedro Alonso 1 , Manuel F. Dolz 2 , Francisco D. Igual 2 , Rafael Mayo 2 , Enrique S.

Lecture 14: Dense Linear Algebra David Bindel 18 Oct 2010 Where we are This week: dense

DVFS PERFORMANCE PREDICTION FOR MANAGED MULTITHREADED APPLICATIONS Shoaib Akram, Jennifer B.

ClkScrew Aaron Zhang Outline Introduction to DVFS and background information. What makes

Automation in Dense Linear Algebra Paper by Paolo Bientinesi and Robert van de Geijn Presented by

Chapter 1 What is Linear Algebra? Chapter 1 What is Linear Algebra? The study of linear

Graphics 2014 Linear Algebra II Linear Maps & Matrices Linear Maps & Matrices CORE

Dense Flow Visualization Lecture 10 February 27, 2020 General Overview Dense methods in 2D

A Massively Parallel Dense Symmetric A Massively Parallel Dense Symmetric A Massively Parallel

Results for different matrices and comparisons Dense Matrices Rectangular Matrices

Linear Algebra Linear algebra has become as basic and as applicable as calculus, and

PV Math Department MCL Vision Credit Options Credit General General/Post- College Honors

CS 294-73 Software Engineering for Scientific Computing Lecture 10:Dense Linear

Linear algebra explained in four pages Excerpt from the N O BULLSHIT GUIDE TO LINEAR ALGEBRA by

Matrices Basic Linear Algebra Overview Lecture will cover why matrices and linear algebra

MATRICES AND LINEAR ALGEBRA Linear Algebra Matrix manipulation is the original essence of

Expressive Linear Algebra in Haskell Henning Thielemann 2019-08-21 Expressive Linear Algebra in

Judicious Choice of Waveform Parameters and Judicious Choice of Waveform Parameters and Accurate

Critical thinking across the disciplines Peter Donovan Don Jack Lorraine Cornwell Critical

Colorado DOT

A tour of a microprocessor museum Tour t heme: -architectural parallelism complicates

Planning I: Planning I: The Planning Process The Planning Process AU INSY 560, Winter 1997, Dan

Concurrency: Mutual Exclusion and Synchronization Chapter 5 1 Concurrency Concurrency arises

Mathematical Optimization Methods for Research Context NFV and Cloud Resource Allocation VMs

Linear Temporal Logic, Critical Sections and Promela Modelling Dr. Liam OConnor University of

DVFS-Control Techniques for Dense Linear Algebra Operations on - PowerPoint PPT Presentation

International Conference on Energy-Aware High Performance Computing DVFS-Control Techniques for Dense Linear Algebra Operations on Multi-Core Processors Pedro Alonso 1 , Manuel F. Dolz 2 , Francisco D. Igual 2 , Rafael Mayo 2 , Enrique S.

Lecture 14: Dense Linear Algebra David Bindel 18 Oct 2010 Where we are This week: dense

DVFS PERFORMANCE PREDICTION FOR MANAGED MULTITHREADED APPLICATIONS Shoaib Akram, Jennifer B.

ClkScrew Aaron Zhang Outline Introduction to DVFS and background information. What makes

Automation in Dense Linear Algebra Paper by Paolo Bientinesi and Robert van de Geijn Presented by

Chapter 1 What is Linear Algebra? Chapter 1 What is Linear Algebra? The study of linear

Graphics 2014 Linear Algebra II Linear Maps &amp; Matrices Linear Maps &amp; Matrices CORE

Dense Flow Visualization Lecture 10 February 27, 2020 General Overview Dense methods in 2D

A Massively Parallel Dense Symmetric A Massively Parallel Dense Symmetric A Massively Parallel

Results for different matrices and comparisons Dense Matrices Rectangular Matrices

Linear Algebra Linear algebra has become as basic and as applicable as calculus, and

PV Math Department MCL Vision Credit Options Credit General General/Post- College Honors

CS 294-73 Software Engineering for Scientific Computing Lecture 10:Dense Linear

Linear algebra explained in four pages Excerpt from the N O BULLSHIT GUIDE TO LINEAR ALGEBRA by

Matrices Basic Linear Algebra Overview Lecture will cover why matrices and linear algebra

MATRICES AND LINEAR ALGEBRA Linear Algebra Matrix manipulation is the original essence of

Expressive Linear Algebra in Haskell Henning Thielemann 2019-08-21 Expressive Linear Algebra in

Judicious Choice of Waveform Parameters and Judicious Choice of Waveform Parameters and Accurate

Critical thinking across the disciplines Peter Donovan Don Jack Lorraine Cornwell Critical

Colorado DOT

A tour of a microprocessor museum Tour t heme: -architectural parallelism complicates

Planning I: Planning I: The Planning Process The Planning Process AU INSY 560, Winter 1997, Dan

Concurrency: Mutual Exclusion and Synchronization Chapter 5 1 Concurrency Concurrency arises

Mathematical Optimization Methods for Research Context NFV and Cloud Resource Allocation VMs

Linear Temporal Logic, Critical Sections and Promela Modelling Dr. Liam OConnor University of

Graphics 2014 Linear Algebra II Linear Maps & Matrices Linear Maps & Matrices CORE