Insightful Automatic Performance Modeling Alexandru Calotoiu 1 , - PowerPoint PPT Presentation

VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Insightful Automatic Performance Modeling Alexandru Calotoiu 1 , Torsten Hoefler 2 , Martin Schulz 3 , Sergei Shudler 1 and Felix Wolf 1 1 TU Darmstadt , 2 ETH Zürich , 3 Lawrence Livermore National Laboratory

VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Sponsors INSIGHTFUL AUTOMATIC PERFORMANCE MODELING TUTORIAL 2

VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Virtual Institute – High Productivity Supercomputing Association of HPC programming tool builders Mission � Development of portable programming tools that assist programmers in diagnosing programming errors and optimizing the performance of their applications � Integration of these tools � Organization of training events designed to teach the application of these tools � Organization of academic workshops to facilitate the exchange of ideas on tool development and to promote young scientists www.vi-hps.org

VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Motivation - latent scalability bugs System size Execution time INSIGHTFUL AUTOMATIC PERFORMANCE MODELING TUTORIAL 4

VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Learning objectives � Performance modeling background � Automatic performance modeling with Extra-P � How it works � When it doesn’t work � Practical experiences with � Prepared examples � Your own data INSIGHTFUL AUTOMATIC PERFORMANCE MODELING TUTORIAL 5

VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Talk structure � Introduction � Background � Automatic performance modeling � Theory � Performance Model Normal Form (PMNF) � Assumptions & limitations � Practice � Workflow � Model refinement � Examples � Case studies � Discussion INSIGHTFUL AUTOMATIC PERFORMANCE MODELING TUTORIAL 6

VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Introduction

VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Outline � Performance analysis methods � Analytical performance modeling � Automatic performance modeling � Scalability validation framework INSIGHTFUL AUTOMATIC PERFORMANCE MODELING TUTORIAL 8

VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Spectrum of performance analysis methods Benchmark Full simulation Model simulation Model Number of parameters Model error INSIGHTFUL AUTOMATIC PERFORMANCE MODELING TUTORIAL 9

VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Scaling model 21 18 c ` 15 p 2 0 ´ 4 1 12 Time r s s ¨ 3 � Represents performance 9 metric as a function of the number of processes 6 � Provides insight into the 3 program behavior at scale 0 2 9 2 10 2 11 2 12 2 13 Processes INSIGHTFUL AUTOMATIC PERFORMANCE MODELING TUTORIAL 10

VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Pitfalls Intuition is not enough 2.95*log 2 p + 0.0871* p 12.06* p INSIGHTFUL AUTOMATIC PERFORMANCE MODELING TUTORIAL 11

VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Analytical performance modeling • Parts of the program that dominate its performance at larger scales Identify • Identified via small-scale tests and intuition kernels • Laborious process • Still confined to a small community of skilled Examples: Create experts models Hoisie et al.: Performance and scalability analysis of teraflop-scale parallel architectures using multi- dimensional wavefront applications. International Disadvantages: Journal of High Performance Computing Applications, � Time consuming 2000 � Danger of overlooking unscalable code Bauer et al.: Analysis of the MILC Lattice QCD Application su3_rmd . CCGrid, 2012 INSIGHTFUL AUTOMATIC PERFORMANCE MODELING TUTORIAL 12

VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Automatic performance modeling with Extra-P Performance measurements main() { foo() bar() M i M j compute() Instrumentation } Extra-P Input Output Human-readable performance models of all functions (e.g., t = c 1 *log(p) + c 2 ) INSIGHTFUL AUTOMATIC PERFORMANCE MODELING TUTORIAL 13

VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Automatic performance modeling with Extra-P Performance measurements (profiles) main() { foo() p 1 = 128 p 4 = 1,024 bar() p 2 = 256 p 5 = 2,048 compute() Instrumentation } p 3 = 512 p 6 = 4,096 • All functions Input Extra-P Output 1. foo 2. compute Ranking: 3. main • Target scale p t 4. bar • Asymptotic […] INSIGHTFUL AUTOMATIC PERFORMANCE MODELING TUTORIAL 14

VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Requirements modeling Program Computation Communication … FLOPS Store Load P2P Collective Disagreement may be indicative of wait states Time INSIGHTFUL AUTOMATIC PERFORMANCE MODELING TUTORIAL 15

VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Algorithm engineering Courtesy of Peter Sanders, KIT INSIGHTFUL AUTOMATIC PERFORMANCE MODELING TUTORIAL 16

VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING How to validate scalability in practice? Small Real Program text book application example Verifiable Asymptotic Expectation analytical complexity expression #FLOPS = n 2 (2n − 1) #FLOPS = O(n 2.8074 ) INSIGHTFUL AUTOMATIC PERFORMANCE MODELING TUTORIAL 17

VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Scalability evaluation framework Search space Model Benchmark generation generation Scaling Expectation model Performance measurements + optional deviation limit Shudler et al: Exascaling Divergence model Your Library: Will Your Implementation Meet Your Expectations?. Initial Comparing Regression validation alternatives testing 2015 INSIGHTFUL AUTOMATIC PERFORMANCE MODELING TUTORIAL 18

VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Theory

VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Outline � Goal – scaling trends � Model generation � Performance Model Normal Form (PMNF) � Statistical quality control & confidence intervals � Assumptions & limitations of the method INSIGHTFUL AUTOMATIC PERFORMANCE MODELING TUTORIAL 20

VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Automatic performance modeling Performance measurements main() { foo() bar() M i M j compute() Instrumentation } • All functions Extra-P Input Output Human-readable performance models of all functions (e.g., t = c 1 *log(p) + c 2 ) INSIGHTFUL AUTOMATIC PERFORMANCE MODELING TUTORIAL 21

VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Primary focus on scaling trend Common performance analysis chart in a paper Common performance Ranking analysis chart in a paper 1. F 2 2. F 1 3. F 3 INSIGHTFUL AUTOMATIC PERFORMANCE MODELING TUTORIAL 22

VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Primary focus on scaling trend Actual measurement in laboratory conditions Common performance Ranking analysis chart in a paper 1. F 2 2. F 1 3. F 3 INSIGHTFUL AUTOMATIC PERFORMANCE MODELING TUTORIAL 23

VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Primary focus on scaling trend Production Reality Ranking 1. F 2 2. F 1 3. F 3 INSIGHTFUL AUTOMATIC PERFORMANCE MODELING TUTORIAL 24

VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Model building blocks LU LU t ( p ) ~ c t ( p ) ~ c Communication Computation FFT FFT t ( p ) ~ log 2 ( p ) t ( p ) ~ c Naïve N-body Naïve N-body t ( p ) ~ p t ( p ) ~ p … … Samplesort Samplesort t ( p ) ~ p 2 log 2 2 ( p ) t ( p ) ~ p 2 INSIGHTFUL AUTOMATIC PERFORMANCE MODELING TUTORIAL 25

VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Performance model normal form n ∈ n j k ( p ) c k ⋅ p i k ⋅ log 2 ∑ i k ∈ I f ( p ) = j k ∈ J k = 1 I , J ⊂ c 1 c 1 ⋅ log( p ) n = 1 c 1 ⋅ p c 1 ⋅ p ⋅ log( p ) I = 0,1,2 { } c 1 ⋅ p 2 ⋅ log( p ) c 1 ⋅ p 2 J = {0,1} INSIGHTFUL AUTOMATIC PERFORMANCE MODELING TUTORIAL 26

VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Performance model normal form n ∈ n c 1 ⋅ log( p ) + c 2 ⋅ p j k ( p ) c k ⋅ p i k ⋅ log 2 ∑ i k ∈ I f ( p ) = c 1 ⋅ log( p ) + c 2 ⋅ p ⋅ log( p ) j k ∈ J c 1 ⋅ log( p ) + c 2 ⋅ p 2 k = 1 I , J ⊂ c 1 + c 2 ⋅ p c 1 ⋅ log( p ) + c 2 ⋅ p 2 ⋅ log( p ) c 1 + c 2 ⋅ p 2 c 1 ⋅ p + c 2 ⋅ p ⋅ log( p ) c 1 + c 2 ⋅ log( p ) c 1 ⋅ p + c 2 ⋅ p 2 c 1 c 1 ⋅ log( p ) c 1 + c 2 ⋅ p ⋅ log( p ) c 1 ⋅ p + c 2 ⋅ p 2 ⋅ log( p ) n = 1 c 1 ⋅ p c 1 ⋅ p ⋅ log( p ) c 1 + c 2 ⋅ p 2 ⋅ log( p ) c 1 ⋅ p ⋅ log( p ) + c 2 ⋅ p 2 I = 0,1,2 { } c 1 ⋅ p 2 ⋅ log( p ) c 1 ⋅ p 2 c 1 ⋅ p ⋅ log( p ) + c 2 ⋅ p 2 ⋅ log( p ) J = {0,1} c 1 ⋅ p 2 + c 2 ⋅ p 2 ⋅ log( p ) INSIGHTFUL AUTOMATIC PERFORMANCE MODELING TUTORIAL 27

VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Weak vs. strong scaling � Wall-clock time not necessarily monotonically increasing under strong scaling � Harder to capture model automatically � Different invariants require different reductions across processes Weak scaling Strong scaling Invariant Problem size per process Overall problem size Model target Wall-clock time Accumulated time Reduction Maximum / average Sum INSIGHTFUL AUTOMATIC PERFORMANCE MODELING TUTORIAL 28

Insightful Automatic Performance Modeling Alexandru Calotoiu 1 , - PowerPoint PPT Presentation

VIRTUAL INSTITUTE HIGH PRODUCTIVITY SUPERCOMPUTING Insightful Automatic Performance Modeling Alexandru Calotoiu 1 , Torsten Hoefler 2 , Martin Schulz 3 , Sergei Shudler 1 and Felix Wolf 1 1 TU Darmstadt , 2 ETH Zrich , 3 Lawrence Livermore

Insightful Automatic Performance Modeling Software Alexandru Calotoiu 1 , Torsten Hoefler 2 ,

Overview Least Angle Regression Why is LARS imporant? Tim Hesterberg, Insightful Corp.

Automatic Verification of Automatic Verification of Automatic Verification of Automatic

Automatic Enrollment and Automatic IRAs David C. John The Heritage Foundation The Retirement

Automatic Registration and Calibration Automatic Registration and Calibration Automatic

Dependency Dependency- -Based Automatic Evaluation Based Automatic Evaluation Dependency

Modeling of proteins and complexes High resolution Low resolution Modeling of domains Modeling

Virtual Reality Modeling Virtual Reality Modeling from http://www.okino.com/ Modeling Modeling

The Full Glass Review Returns Revise Research Review Via Expert Taste. Insightful Strategy.

Exploring Consumer Attitudes to Default Funds Qualitative Consumer Research Workshop: 11

Nice, Insightful, Awesome, Educational, Good-looking, Generous, Lean, Young, Interesting, Funny

Insightful D-branes Albion Lawrence Brandeis University 1 Outline I. Introduction II. A

Value Orientations (VO) Holistic. Insightful. Effective. A psychological assessment tool: It

Language Modeling CSE354 - Spring 2020 Task Language Modeling Probabilistic Modeling

Seminar 18122 Automatic Quality Assurance and Release Seminar 18122 Automatic Quality

Advice Automatic Structures and Uniformly Automatic Classes Faried Abu Zaid 1 , Erich Grdel 2 ,

Shower & Hadronisation Uncertainties for Precision Top Physics Peter Skands (Monash U) Scale

CYBERDYNE: Automatic bug-finding at scale Peter Goodman COUNTERMEASURE 2016 Cyberdyne

At-sea validation of AIRS radiances Peter Minnett Meteorology and Physical Oceanography

Radiative forcing of a small-scale wildfire smoke plume at the surface, atmosphere, and TOA from

The Loewner framework for model reduction of large-scale systems: An Overview and Sensitivity

A Kernel Density Based Approach for Large Scale Image Retrieval and Its Application to Tattoo

Assimilation of Geostationary Satellite Land Surface Skin Temperature Observations into the GEOS-5

JET SUBSTRUCTURE AT THE LHC & BEYOND Simone Marzani Universit di Genova & INFN

Insightful Automatic Performance Modeling Alexandru Calotoiu 1 , - PowerPoint PPT Presentation

VIRTUAL INSTITUTE HIGH PRODUCTIVITY SUPERCOMPUTING Insightful Automatic Performance Modeling Alexandru Calotoiu 1 , Torsten Hoefler 2 , Martin Schulz 3 , Sergei Shudler 1 and Felix Wolf 1 1 TU Darmstadt , 2 ETH Zrich , 3 Lawrence Livermore

Insightful Automatic Performance Modeling Software Alexandru Calotoiu 1 , Torsten Hoefler 2 ,

Overview Least Angle Regression Why is LARS imporant? Tim Hesterberg, Insightful Corp.

Automatic Verification of Automatic Verification of Automatic Verification of Automatic

Automatic Enrollment and Automatic IRAs David C. John The Heritage Foundation The Retirement

Automatic Registration and Calibration Automatic Registration and Calibration Automatic

Dependency Dependency- -Based Automatic Evaluation Based Automatic Evaluation Dependency

Modeling of proteins and complexes High resolution Low resolution Modeling of domains Modeling

Virtual Reality Modeling Virtual Reality Modeling from http://www.okino.com/ Modeling Modeling

The Full Glass Review Returns Revise Research Review Via Expert Taste. Insightful Strategy.

Exploring Consumer Attitudes to Default Funds Qualitative Consumer Research Workshop: 11

Nice, Insightful, Awesome, Educational, Good-looking, Generous, Lean, Young, Interesting, Funny

Insightful D-branes Albion Lawrence Brandeis University 1 Outline I. Introduction II. A

Value Orientations (VO) Holistic. Insightful. Effective. A psychological assessment tool: It

Language Modeling CSE354 - Spring 2020 Task Language Modeling Probabilistic Modeling

Seminar 18122 Automatic Quality Assurance and Release Seminar 18122 Automatic Quality

Advice Automatic Structures and Uniformly Automatic Classes Faried Abu Zaid 1 , Erich Grdel 2 ,

Shower &amp; Hadronisation Uncertainties for Precision Top Physics Peter Skands (Monash U) Scale

CYBERDYNE: Automatic bug-finding at scale Peter Goodman COUNTERMEASURE 2016 Cyberdyne

At-sea validation of AIRS radiances Peter Minnett Meteorology and Physical Oceanography

Radiative forcing of a small-scale wildfire smoke plume at the surface, atmosphere, and TOA from

The Loewner framework for model reduction of large-scale systems: An Overview and Sensitivity

A Kernel Density Based Approach for Large Scale Image Retrieval and Its Application to Tattoo

Assimilation of Geostationary Satellite Land Surface Skin Temperature Observations into the GEOS-5

JET SUBSTRUCTURE AT THE LHC &amp; BEYOND Simone Marzani Universit di Genova &amp; INFN

Shower & Hadronisation Uncertainties for Precision Top Physics Peter Skands (Monash U) Scale

JET SUBSTRUCTURE AT THE LHC & BEYOND Simone Marzani Universit di Genova & INFN