insightful automatic performance modeling
play

Insightful Automatic Performance Modeling Alexandru Calotoiu 1 , - PowerPoint PPT Presentation

VIRTUAL INSTITUTE HIGH PRODUCTIVITY SUPERCOMPUTING Insightful Automatic Performance Modeling Alexandru Calotoiu 1 , Torsten Hoefler 2 , Martin Schulz 3 , Sergei Shudler 1 and Felix Wolf 1 1 TU Darmstadt , 2 ETH Zrich , 3 Lawrence Livermore


  1. VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Insightful Automatic Performance Modeling Alexandru Calotoiu 1 , Torsten Hoefler 2 , Martin Schulz 3 , Sergei Shudler 1 and Felix Wolf 1 1 TU Darmstadt , 2 ETH Zürich , 3 Lawrence Livermore National Laboratory

  2. VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Sponsors INSIGHTFUL AUTOMATIC PERFORMANCE MODELING TUTORIAL 2

  3. VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Virtual Institute – High Productivity Supercomputing Association of HPC programming tool builders Mission � Development of portable programming tools that assist programmers in diagnosing programming errors and optimizing the performance of their applications � Integration of these tools � Organization of training events designed to teach the application of these tools � Organization of academic workshops to facilitate the exchange of ideas on tool development and to promote young scientists www.vi-hps.org

  4. VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Motivation - latent scalability bugs System size Execution time INSIGHTFUL AUTOMATIC PERFORMANCE MODELING TUTORIAL 4

  5. VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Learning objectives � Performance modeling background � Automatic performance modeling with Extra-P � How it works � When it doesn’t work � Practical experiences with � Prepared examples � Your own data INSIGHTFUL AUTOMATIC PERFORMANCE MODELING TUTORIAL 5

  6. VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Talk structure � Introduction � Background � Automatic performance modeling � Theory � Performance Model Normal Form (PMNF) � Assumptions & limitations � Practice � Workflow � Model refinement � Examples � Case studies � Discussion INSIGHTFUL AUTOMATIC PERFORMANCE MODELING TUTORIAL 6

  7. VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Introduction

  8. VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Outline � Performance analysis methods � Analytical performance modeling � Automatic performance modeling � Scalability validation framework INSIGHTFUL AUTOMATIC PERFORMANCE MODELING TUTORIAL 8

  9. VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Spectrum of performance analysis methods Benchmark Full simulation Model simulation Model Number of parameters Model error INSIGHTFUL AUTOMATIC PERFORMANCE MODELING TUTORIAL 9

  10. VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Scaling model 21 18 c ` 15 p 2 0 ´ 4 1 12 Time r s s ¨ 3 � Represents performance 9 metric as a function of the number of processes 6 � Provides insight into the 3 program behavior at scale 0 2 9 2 10 2 11 2 12 2 13 Processes INSIGHTFUL AUTOMATIC PERFORMANCE MODELING TUTORIAL 10

  11. VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Pitfalls Intuition is not enough 2.95*log 2 p + 0.0871* p 12.06* p INSIGHTFUL AUTOMATIC PERFORMANCE MODELING TUTORIAL 11

  12. VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Analytical performance modeling • Parts of the program that dominate its performance at larger scales Identify • Identified via small-scale tests and intuition kernels • Laborious process • Still confined to a small community of skilled Examples: Create experts models Hoisie et al.: Performance and scalability analysis of teraflop-scale parallel architectures using multi- dimensional wavefront applications. International Disadvantages: Journal of High Performance Computing Applications, � Time consuming 2000 � Danger of overlooking unscalable code Bauer et al.: Analysis of the MILC Lattice QCD Application su3_rmd . CCGrid, 2012 INSIGHTFUL AUTOMATIC PERFORMANCE MODELING TUTORIAL 12

  13. VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Automatic performance modeling with Extra-P Performance measurements main() { foo() bar() M i M j compute() Instrumentation } Extra-P Input Output Human-readable performance models of all functions (e.g., t = c 1 *log(p) + c 2 ) INSIGHTFUL AUTOMATIC PERFORMANCE MODELING TUTORIAL 13

  14. VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Automatic performance modeling with Extra-P Performance measurements (profiles) main() { foo() p 1 = 128 p 4 = 1,024 bar() p 2 = 256 p 5 = 2,048 compute() Instrumentation } p 3 = 512 p 6 = 4,096 • All functions Input Extra-P Output 1. foo 2. compute Ranking: 3. main • Target scale p t 4. bar • Asymptotic […] INSIGHTFUL AUTOMATIC PERFORMANCE MODELING TUTORIAL 14

  15. VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Requirements modeling Program Computation Communication … FLOPS Store Load P2P Collective Disagreement may be indicative of wait states Time INSIGHTFUL AUTOMATIC PERFORMANCE MODELING TUTORIAL 15

  16. VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Algorithm engineering Courtesy of Peter Sanders, KIT INSIGHTFUL AUTOMATIC PERFORMANCE MODELING TUTORIAL 16

  17. VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING How to validate scalability in practice? Small Real Program text book application example Verifiable Asymptotic Expectation analytical complexity expression #FLOPS = n 2 (2n − 1) #FLOPS = O(n 2.8074 ) INSIGHTFUL AUTOMATIC PERFORMANCE MODELING TUTORIAL 17

  18. VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Scalability evaluation framework Search space Model Benchmark generation generation Scaling Expectation model Performance measurements + optional deviation limit Shudler et al: Exascaling Divergence model Your Library: Will Your Implementation Meet Your Expectations?. Initial Comparing Regression validation alternatives testing 2015 INSIGHTFUL AUTOMATIC PERFORMANCE MODELING TUTORIAL 18

  19. VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Theory

  20. VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Outline � Goal – scaling trends � Model generation � Performance Model Normal Form (PMNF) � Statistical quality control & confidence intervals � Assumptions & limitations of the method INSIGHTFUL AUTOMATIC PERFORMANCE MODELING TUTORIAL 20

  21. VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Automatic performance modeling Performance measurements main() { foo() bar() M i M j compute() Instrumentation } • All functions Extra-P Input Output Human-readable performance models of all functions (e.g., t = c 1 *log(p) + c 2 ) INSIGHTFUL AUTOMATIC PERFORMANCE MODELING TUTORIAL 21

  22. VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Primary focus on scaling trend Common performance analysis chart in a paper Common performance Ranking analysis chart in a paper 1. F 2 2. F 1 3. F 3 INSIGHTFUL AUTOMATIC PERFORMANCE MODELING TUTORIAL 22

  23. VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Primary focus on scaling trend Actual measurement in laboratory conditions Common performance Ranking analysis chart in a paper 1. F 2 2. F 1 3. F 3 INSIGHTFUL AUTOMATIC PERFORMANCE MODELING TUTORIAL 23

  24. VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Primary focus on scaling trend Production Reality Ranking 1. F 2 2. F 1 3. F 3 INSIGHTFUL AUTOMATIC PERFORMANCE MODELING TUTORIAL 24

  25. VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Model building blocks LU LU t ( p ) ~ c t ( p ) ~ c Communication Computation FFT FFT t ( p ) ~ log 2 ( p ) t ( p ) ~ c Naïve N-body Naïve N-body t ( p ) ~ p t ( p ) ~ p … … Samplesort Samplesort t ( p ) ~ p 2 log 2 2 ( p ) t ( p ) ~ p 2 INSIGHTFUL AUTOMATIC PERFORMANCE MODELING TUTORIAL 25

  26. VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Performance model normal form n ∈ n j k ( p ) c k ⋅ p i k ⋅ log 2 ∑ i k ∈ I f ( p ) = j k ∈ J k = 1 I , J ⊂ c 1 c 1 ⋅ log( p ) n = 1 c 1 ⋅ p c 1 ⋅ p ⋅ log( p ) I = 0,1,2 { } c 1 ⋅ p 2 ⋅ log( p ) c 1 ⋅ p 2 J = {0,1} INSIGHTFUL AUTOMATIC PERFORMANCE MODELING TUTORIAL 26

  27. VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Performance model normal form n ∈ n c 1 ⋅ log( p ) + c 2 ⋅ p j k ( p ) c k ⋅ p i k ⋅ log 2 ∑ i k ∈ I f ( p ) = c 1 ⋅ log( p ) + c 2 ⋅ p ⋅ log( p ) j k ∈ J c 1 ⋅ log( p ) + c 2 ⋅ p 2 k = 1 I , J ⊂ c 1 + c 2 ⋅ p c 1 ⋅ log( p ) + c 2 ⋅ p 2 ⋅ log( p ) c 1 + c 2 ⋅ p 2 c 1 ⋅ p + c 2 ⋅ p ⋅ log( p ) c 1 + c 2 ⋅ log( p ) c 1 ⋅ p + c 2 ⋅ p 2 c 1 c 1 ⋅ log( p ) c 1 + c 2 ⋅ p ⋅ log( p ) c 1 ⋅ p + c 2 ⋅ p 2 ⋅ log( p ) n = 1 c 1 ⋅ p c 1 ⋅ p ⋅ log( p ) c 1 + c 2 ⋅ p 2 ⋅ log( p ) c 1 ⋅ p ⋅ log( p ) + c 2 ⋅ p 2 I = 0,1,2 { } c 1 ⋅ p 2 ⋅ log( p ) c 1 ⋅ p 2 c 1 ⋅ p ⋅ log( p ) + c 2 ⋅ p 2 ⋅ log( p ) J = {0,1} c 1 ⋅ p 2 + c 2 ⋅ p 2 ⋅ log( p ) INSIGHTFUL AUTOMATIC PERFORMANCE MODELING TUTORIAL 27

  28. VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Weak vs. strong scaling � Wall-clock time not necessarily monotonically increasing under strong scaling � Harder to capture model automatically � Different invariants require different reductions across processes Weak scaling Strong scaling Invariant Problem size per process Overall problem size Model target Wall-clock time Accumulated time Reduction Maximum / average Sum INSIGHTFUL AUTOMATIC PERFORMANCE MODELING TUTORIAL 28

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend