Binding Performance and Power of Dense Linear Algebra Operations - PowerPoint PPT Presentation

10th IEEE International Symposium on Parallel and Distributed Processing with Applications Binding Performance and Power of Dense Linear Algebra Operations Maria Barreda, Manuel F. Dolz, Rafael Mayo, Enrique S. Quintana-Ort´ ı, Ruym´ an Reyes July 11th, 2012, Legan´ es – Madrid (Spain)

Introduction Tools for performance and power tracing Experimental results Conclusions Motivation High performance computing: Optimization of algorithms applied to solve complex problems Technological advance ⇒ improve performance: Higher number of cores per socket (processor) Large number of processors and cores ⇒ High energy consumption Tools to analyze performance and power in order to detect code inefficiencies and reduce energy consumption Manuel F. Dolz et al Binding Performance and Power of Dense Linear Algebra Operations

Introduction Tools for performance and power tracing Experimental results Conclusions Outline Introduction 1 2 Tools for performance and power tracing Performance tracing framework Power tracing framework Example Experimental results 3 Environment setup LU factorization Cholesky factorization Reduction to tridiagonal form Results Conclusions 4 Manuel F. Dolz et al Binding Performance and Power of Dense Linear Algebra Operations

Introduction Tools for performance and power tracing Experimental results Conclusions Introduction Parallel scientific applications Examples for dense linear algebra: Cholesky, QR and LU factorizations Tools for power and energy analysis Power profiling in combination with Extrae+Paraver tools Parallel applications + Power profiling ⇓ Environment to identify sources of power inefficiency ⇓ Energy savings Manuel F. Dolz et al Binding Performance and Power of Dense Linear Algebra Operations

Introduction Performance tracing framework Tools for performance and power tracing Power tracing framework Experimental results Example Conclusions Tools for performance and power tracing Why traces? Details and variability are important (along time, processors, etc.) Extremely useful to analyze performance of applications, also at power level! MPI/Multi−threaded MPI/Multi−threaded MPI/Multi−threaded Scientific Application Scientific Applicaton Scientific Application Compiler+linker + Executable Annotations app.c app’.c app.x pm API : pm library pm_start() Extrae library pm_stop() ... Other libraries: Computational Extrae API : Extrae_init() Communication Extrae_fini() ... ... Scientific application app.c Application with annotated code app’.c Executable code app.x Manuel F. Dolz et al Binding Performance and Power of Dense Linear Algebra Operations

Introduction Performance tracing framework Tools for performance and power tracing Power tracing framework Experimental results Example Conclusions Tracing framework Extrae : instrumentation and measurement package of BSC (Barcelona Supercomputing Center): Intercept calls to MPI, OpenMP, PThreads Records relevant information: time stamped events, hardware counter values, etc. Dumps all information into a single trace file. Paraver : graphical interface tool from BSC to analyze/visualize trace files: Inspection of parallelism and scalability High number of metrics to characterize the program and performance application Manuel F. Dolz et al Binding Performance and Power of Dense Linear Algebra Operations

Introduction Performance tracing framework Tools for performance and power tracing Power tracing framework Experimental results Example Conclusions Power measurement framework pmlib library Power measurement package of Jaume I University (Spain) Interface to interact and utilize our own and commercial power meters Power tracing Application node server USB External Computer powermeter Power Power tracing supply daemon unit Mainboard RS232 Internal powermeter Ethernet Server daemon : collects data from power meters and send to clients Client library : enables communication with server and synchronizes with start-stop primitives Power meter: ASIC-based powermeter (own design!) LEM HXS 20-NP transductors with PIC microcontroller Sampling rate 25 Hz Manuel F. Dolz et al Binding Performance and Power of Dense Linear Algebra Operations

Introduction Performance tracing framework Tools for performance and power tracing Power tracing framework Experimental results Example Conclusions Scientific application LU factorization with partial pivoting PA = LU A ∈ R n × n nonsingular matrix P ∈ R n × n permutation matrix L / U ∈ R n × n unit lower/upper triangular matrices Consider a partitioning of matrix A into blocks of size b × b For numerical stability, permutations are introduced to prevent operation with small pivot elements Example of performance and power tracing with the LU factorization: LAPACK routine dgetrf Shared-memory parallelism is extracted by calling to the multi-thread implementations of: dgetf2 , dlaswp , dtrsm and dgemm kernels from Intel MKL, AMD ACML or IBM ESSL. Manuel F. Dolz et al Binding Performance and Power of Dense Linear Algebra Operations

Introduction Performance tracing framework Tools for performance and power tracing Power tracing framework Experimental results Example Conclusions Code annotation LU factorization using LAPACK code: #d e f i n e Aref ( i , j ) A [ ( ( j ) − 1) ∗ Alda +(( i ) − 1)] void d g e t r f ( i n t m, i n t n , i n t b , double ∗ A, i n t Alda , i n t ∗ i p i v , i n t ∗ i n f o ) { // D e c l a r a t i o n of v a r i a b l e s ( omitted ) f o r ( j =1; j < =min ( m, n ) ; j+=b ) { // Factor c u r r e n t panel dgetf2 ( m − j +1, b , &Aref ( j , j ) , Alda , &i p i v [ j − 1], i n f o ) ; // Apply permutations to l e f t and r i g h t of panel dlaswp ( j − 1, A, Alda , j , j+b − 1, i p i v , 1 ) ; dlaswp ( n − j − b+1, &Aref ( 1 , j+b ) , Alda , j , j+b − 1, i p i v , 1 ) ; // T r i a n g u l a r s o l v e dtrsm ( ”L” , ”L” , ”N” , ”U” , b , n − j − b+1, done , &Aref ( j , j ) , Alda , &Aref ( j , j+b ) , Alda ) ; // Update t r a i l i n g submatrix dgemm( ”N” , ”N” , m − j − b+1, n − j − b+1, b , done , &Aref ( j+b , j ) , Alda , &Aref ( j , j+b ) , Alda , done , &Aref ( j+b , j+b ) , Alda ) ; } } Manuel F. Dolz et al Binding Performance and Power of Dense Linear Algebra Operations

Introduction Performance tracing framework Tools for performance and power tracing Power tracing framework Experimental results Example Conclusions Code annotation LU factorization using LAPACK code ( Extrae routines): #d e f i n e Aref ( i , j ) A [ ( ( j ) − 1) ∗ Alda +(( i ) − 1)] void d g e t r f ( i n t m, i n t n , i n t b , double ∗ A, i n t Alda , i n t ∗ i p i v , i n t ∗ i n f o ) { // D e c l a r a t i o n of v a r i a b l e s ( omitted ) E x t r a e i n i t ( ) ; f o r ( j =1; j < =min ( m, n ) ; j+=b ) { // Factor c u r r e n t panel dgetf2 ( m − j +1, b , &Aref ( j , j ) , Alda , &i p i v [ j − 1], i n f o ) ; // Apply permutations to l e f t and r i g h t of panel dlaswp ( j − 1, A, Alda , j , j+b − 1, i p i v , 1 ) ; dlaswp ( n − j − b+1, &Aref ( 1 , j+b ) , Alda , j , j+b − 1, i p i v , 1 ) ; // T r i a n g u l a r s o l v e dtrsm ( ”L” , ”L” , ”N” , ”U” , b , n − j − b+1, done , &Aref ( j , j ) , Alda , &Aref ( j , j+b ) , Alda ) ; // Update t r a i l i n g submatrix dgemm( ”N” , ”N” , m − j − b+1, n − j − b+1, b , done , &Aref ( j+b , j ) , Alda , &Aref ( j , j+b ) , Alda , done , &Aref ( j+b , j+b ) , Alda ) ; } E x t r a e f i n i ( ) ; } Manuel F. Dolz et al Binding Performance and Power of Dense Linear Algebra Operations

Binding Performance and Power of Dense Linear Algebra Operations - PowerPoint PPT Presentation

10th IEEE International Symposium on Parallel and Distributed Processing with Applications Binding Performance and Power of Dense Linear Algebra Operations Maria Barreda, Manuel F. Dolz, Rafael Mayo, Enrique S. Quintana-Ort , Ruym an

Lecture 14: Dense Linear Algebra David Bindel 18 Oct 2010 Where we are This week: dense

Attention, Binding, and Consciousness 1. Perceptual binding, dynamic binding 2. Neural

Attention, Binding, and Consciousness 1. Perceptual binding, dynamic binding 2. Neural

The Binding Problem(s) 8/25/2010 9:38 AM Jerome Feldman Abstract The neural binding problem

Automation in Dense Linear Algebra Paper by Paolo Bientinesi and Robert van de Geijn Presented by

Chapter 1 What is Linear Algebra? Chapter 1 What is Linear Algebra? The study of linear

Graphics 2014 Linear Algebra II Linear Maps & Matrices Linear Maps & Matrices CORE

Results for different matrices and comparisons Dense Matrices Rectangular Matrices

Dense Flow Visualization Lecture 10 February 27, 2020 General Overview Dense methods in 2D

A Massively Parallel Dense Symmetric A Massively Parallel Dense Symmetric A Massively Parallel

Linear Algebra Linear algebra has become as basic and as applicable as calculus, and

Late binding Ch 15.3 Highlights - Late binding for variables - Late binding for functions

PV Math Department MCL Vision Credit Options Credit General General/Post- College Honors

CS 294-73 Software Engineering for Scientific Computing Lecture 10:Dense Linear

MATRICES AND LINEAR ALGEBRA Linear Algebra Matrix manipulation is the original essence of

Matrices Basic Linear Algebra Overview Lecture will cover why matrices and linear algebra

OpenMP-Lsung zum Gauss-Algorithmus Hartmut Hfner STEINBUCH CENTRE FOR COMPUTING - SCC

Using GPUs as CPUs for Engineering Applications: Challenges and Issues Michael A. Heroux Sandia

I don't need that much performance and other fables from the world of

CS 294-73 Software Engineering for Scientific Computing Lecture 15: Development

Investigation of High-Speed Solid Contact Induced Demagnetization on Perpendicular Magnetic

Dr. Henry Roukema Neonatologist, London Co-Chair, Access to Services Workgroup, PCMCH 1

Cambodia Sanitation Securing Pit Concrete Concrete Pit View Pit Concrete Completion Measuring

Baryon bound states of three hadrons with charm and hidden charm Chu-Wen Xiao (

Binding Performance and Power of Dense Linear Algebra Operations - PowerPoint PPT Presentation

10th IEEE International Symposium on Parallel and Distributed Processing with Applications Binding Performance and Power of Dense Linear Algebra Operations Maria Barreda, Manuel F. Dolz, Rafael Mayo, Enrique S. Quintana-Ort , Ruym an

Lecture 14: Dense Linear Algebra David Bindel 18 Oct 2010 Where we are This week: dense

Attention, Binding, and Consciousness 1. Perceptual binding, dynamic binding 2. Neural

Attention, Binding, and Consciousness 1. Perceptual binding, dynamic binding 2. Neural

The Binding Problem(s) 8/25/2010 9:38 AM Jerome Feldman Abstract The neural binding problem

Automation in Dense Linear Algebra Paper by Paolo Bientinesi and Robert van de Geijn Presented by

Chapter 1 What is Linear Algebra? Chapter 1 What is Linear Algebra? The study of linear

Graphics 2014 Linear Algebra II Linear Maps &amp; Matrices Linear Maps &amp; Matrices CORE

Results for different matrices and comparisons Dense Matrices Rectangular Matrices

Dense Flow Visualization Lecture 10 February 27, 2020 General Overview Dense methods in 2D

A Massively Parallel Dense Symmetric A Massively Parallel Dense Symmetric A Massively Parallel

Linear Algebra Linear algebra has become as basic and as applicable as calculus, and

Late binding Ch 15.3 Highlights - Late binding for variables - Late binding for functions

PV Math Department MCL Vision Credit Options Credit General General/Post- College Honors

CS 294-73 Software Engineering for Scientific Computing Lecture 10:Dense Linear

MATRICES AND LINEAR ALGEBRA Linear Algebra Matrix manipulation is the original essence of

Matrices Basic Linear Algebra Overview Lecture will cover why matrices and linear algebra

OpenMP-Lsung zum Gauss-Algorithmus Hartmut Hfner STEINBUCH CENTRE FOR COMPUTING - SCC

Using GPUs as CPUs for Engineering Applications: Challenges and Issues Michael A. Heroux Sandia

I don't need that much performance and other fables from the world of

CS 294-73 Software Engineering for Scientific Computing Lecture 15: Development

Investigation of High-Speed Solid Contact Induced Demagnetization on Perpendicular Magnetic

Dr. Henry Roukema Neonatologist, London Co-Chair, Access to Services Workgroup, PCMCH 1

Cambodia Sanitation Securing Pit Concrete Concrete Pit View Pit Concrete Completion Measuring

Baryon bound states of three hadrons with charm and hidden charm Chu-Wen Xiao (

Graphics 2014 Linear Algebra II Linear Maps & Matrices Linear Maps & Matrices CORE