Update on the Performance-Modeling Tool Extra-P Felix Wolf, TU - PowerPoint PPT Presentation

Update on the Performance-Modeling Tool Extra-P Felix Wolf, TU Darmstadt

Acknowledgement • David Beckingsale • Alexandru Calotoiu • Christopher W. Earl • Torsten Hoefler • Kashif Ilyas • Ian Karlin • Daniel Lorenz • Patrick Reisert • Martin Schulz • Sergei Shudler • Andreas Vogel 7/10/18 | Department of Computer Science | Laboratory for Parallel Programming | Felix Wolf | 2

Latent scalability bugs System size Wall time 7/10/18 | Department of Computer Science | Laboratory for Parallel Programming | Felix Wolf | 3

Motivation Performance model = formula that expresses relevant performance metrics as a function of one or more execution parameters 21 Manual creation challenging 18 • Incomplete 3 ¨ 10 ´ 4 p 2 ` c 15 coverage Identify kernels 12 Time r s s 9 • Laborious, difficult Create 6 models 3 0 2 9 2 10 2 11 2 12 2 13 Processes 7/10/18 | Department of Computer Science | Laboratory for Parallel Programming | Felix Wolf | 4

Automatic empirical performance modeling n j k ( p ) c k ⋅ p i k ⋅ log 2 ∑ f ( p ) = k = 1 Small-scale measurements Performance model normal form (PMNF) c 1 ⋅ log( p ) + c 2 ⋅ p c 1 ⋅ log( p ) + c 2 ⋅ p ⋅ log( p ) c 1 ⋅ log( p ) + c 2 ⋅ p 2 c 1 + c 2 ⋅ p c 1 ⋅ log( p ) + c 2 ⋅ p 2 ⋅ log( p ) Kernel Model [s] c 1 + c 2 ⋅ p 2 c 1 ⋅ p + c 2 ⋅ p ⋅ log( p ) [2 of 40] t = f(p) c 1 ⋅ p + c 2 ⋅ p 2 c 1 + c 2 ⋅ log( p ) c 1 ⋅ p + c 2 ⋅ p 2 ⋅ log( p ) c 1 + c 2 ⋅ p ⋅ log( p ) sweep → c 1 + c 2 ⋅ p 2 ⋅ log( p ) c 1 ⋅ p ⋅ log( p ) + c 2 ⋅ p 2 4.03 p c 1 ⋅ p ⋅ log( p ) + c 2 ⋅ p 2 ⋅ log( p ) MPI_Recv c 1 ⋅ p 2 + c 2 ⋅ p 2 ⋅ log( p ) sweep 582.19 Generation of candidate models and selection of best fit 7/10/18 | Department of Computer Science | Laboratory for Parallel Programming | Prof. Dr. Felix Wolf | 5

Extra-P 3.0 • GUI improvements, better stability, additional features • Tutorials available through VI-HPS and upon request http://www.scalasca.org/software/extra-p/download.html 7/10/18 | Department of Computer Science | Laboratory for Parallel Programming | Felix Wolf | 6

Recent developments 1. Performance models with multiple parameters 2. Automatic configuration of the search space 3. Segmented models 4. Iso-efficiency modeling 5. Lightweight requirements engineering for co-design 7/10/18 | Department of Computer Science | Laboratory for Parallel Programming | Felix Wolf | 7

Models with more than one parameter n m n = 3 j kl ( x l ) i kl ⋅ log 2 ∑ ∏ m = 3 f ( x 1 ,.., x m ) = c k x l ⎧ ⎫ I = 0 4, 1 4,...,12 ⎨ ⎬ ⎩ 4 ⎭ k = 1 l = 1 J = {0,1,2} Search space explosion • Total number of hypotheses to search: 34.786,300,841,019 • Too slow for any practical purpose 7/10/18 | Department of Computer Science | Laboratory for Parallel Programming | Felix Wolf | 8

Search space reduction through heuristics • Hierarchical search – Assumes the best multi- parameter model is created out of the combination of the best single parameter hypothesis for each parameter • Modified golden section search – Speeds up the single parameter search by ordering the hypothesis space and then using a variant of binary search to find the model in logarithmic time rather than linear time Calotoiu et al. 7/10/18 | Department of Computer Science | Laboratory for Parallel Programming | Felix Wolf | 9

Search space reduction • Assuming 300.000 hypotheses searched per second* n = 3 • 3-parameter models m = 3 ⎧ ⎫ I = 0 4, 1 4,...,12 ⎨ ⎬ ⎩ 4 ⎭ J = {0,1,2} 7/10/18 | Department of Computer Science | Laboratory for Parallel Programming | Felix Wolf | 10

Search space reduction • Assuming 300.000 hypotheses searched per second* n = 3 • 3-parameter models m = 3 ⎧ ⎫ I = 0 4, 1 4,...,12 *This is optimistic ⎨ ⎬ ⎩ 4 ⎭ J = {0,1,2} Exhaustive search 34.786.300.841.019 hypotheses searched ~1 model / 3.5 years 7/10/18 | Department of Computer Science | Laboratory for Parallel Programming | Felix Wolf | 11

Search space reduction • Assuming 300.000 hypotheses searched per second* n = 3 • 3-parameter models m = 3 ⎧ ⎫ I = 0 4, 1 4,...,12 *This is optimistic ⎨ ⎬ ⎩ 4 ⎭ J = {0,1,2} Exhaustive search 34.786.300.841.019 27.929 hypotheses hypotheses searched searched ~1 model / 3.5 years ~11 models / second 7/10/18 | Department of Computer Science | Laboratory for Parallel Programming | Felix Wolf | 12

Search space reduction • Assuming 300.000 hypotheses searched per second* n = 3 • 3-parameter models m = 3 ⎧ ⎫ I = 0 4, 1 4,...,12 *This is optimistic ⎨ ⎬ ⎩ 4 ⎭ J = {0,1,2} Exhaustive + search 34.786.300.841.019 27.929 590 hypotheses hypotheses hypotheses searched searched searched ~1 model / 3.5 years ~11 models / second ~508 models / second 7/10/18 | Department of Computer Science | Laboratory for Parallel Programming | Felix Wolf | 13

Evaluation with synthetic data (100,000 models with two parameters) Distribution of generated models [%] 100 90 Exhaustive search - 107 hours 80 70 Heuristics - 1.5 hours 60 50 40 30 20 10 0 Optimal model Lead-order term Lead-order term not identified identified identified 7/10/18 | Department of Computer Science | Laboratory for Parallel Programming | Felix Wolf | 14

Evaluation with application data Distribution of generated models [%] 100 90 80 70 Identical models 60 Lead-order terms identical 50 40 Different lead-order terms 30 20 10 0 Blast (full) Blast (partial) CloverLeaf Kripke 7/10/18 | Department of Computer Science | Laboratory for Parallel Programming | Felix Wolf | 15

Case study – Kripke • Neutron transport proxy code • Three parameters considered • Process count – p • Number of directions – d • Number of groups – g 7/10/18 | Department of Computer Science | Laboratory for Parallel Programming | Felix Wolf | 16

Expected behavior SweepSolver MPI_Testany Main computation kernel Main communication kernel: 3D wave-front communication pattern Expectation – Performance depends on Expectation – Performance depends on problem size cubic root of process count t ~ p t ~ d ⋅ g 3 7/10/18 | Department of Computer Science | Laboratory for Parallel Programming | Felix Wolf | 17

Expected behavior SweepSolver MPI_Testany Main computation kernel Main communication kernel: 3D wave-front communication pattern Expectation – Performance depends on Expectation – Performance depends on problem size cubic root of process count Kernels must wait on t ~ p t ~ d ⋅ g 3 each other Actual model: Actual model: t = 5 + d ⋅ g + 0.005 ⋅ p ⋅ d ⋅ g t = 7 + p + 0.005 ⋅ p ⋅ d ⋅ g 3 3 3 Smaller compounded effect discovered 7/10/18 | Department of Computer Science | Laboratory for Parallel Programming | Felix Wolf | 18

How to find good PMNF parameters? Option (1) : Rely on default parameters → But what if they don‘t fit the problem? Option (2): Try those parameters that you expect to fit → Requires prior expertise! Also, what if your expectation is wrong? Option (3) : Try very large sets I, J → Requires more resources (especially bad for multiple parameters)! Option (4) : Let Extra-P automatically refine the search space based on previous results. 7/10/18 | Department of Computer Science | Laboratory for Parallel Programming | Felix Wolf | 19

Simplified PMNF • Use only constant and “lead order” term • Want to find values for c ₀ , c ₁ , α, and β, such that model error is minimized • c ₀ and c ₁ are determined by regression • What about α and β? 7/10/18 | Department of Computer Science | Laboratory for Parallel Programming | Felix Wolf | 20

Simplified PMNF We define four slices: • β = 0, α = ? • β = 1, α = ? • β = 2, α = ? • α = 0, β = ? Goal: Unimodal error distribution along each slice 7/10/18 | Department of Computer Science | Laboratory for Parallel Programming | Felix Wolf | 21

Evaluation Data from previous case studies Results • Sweep3D • 4453 models • MILC • 49% remain unchanged • UG4 • 39% get better • MPI collective operations • 12% get worse • BLAST • Mean relative prediction down from 45.7% to 13.0% • Kripke • Improvements in every individual • 5–9 points available case study • Last data point (largest p) not used for modeling, but to evaluate prediction accuracy Reisert et al. 7/10/18 | Department of Computer Science | Laboratory for Parallel Programming | Felix Wolf | 22

Segmented behavior Second Model behaviour: predicted by 30 + p Extra-P: log 22 (p) Runtime First p 2 behaviour: 30 + p p 2 2 (p) l og 2 Number of processors (p) 7/10/18 | Department of Computer Science | Laboratory for Parallel Programming | Felix Wolf | 23

Divide data into subsets Subset 3 Subset 6 Subset 2 Runtime Subset 1 p 2 30 + p Number of processors (p) 7/10/18 | Department of Computer Science | Laboratory for Parallel Programming | Felix Wolf | 24

Model each subset and compute nRSS High nRSS Normalized RSS values Heterogeneous subsets 7/10/18 | Department of Computer Science | Laboratory for Parallel Programming | Felix Wolf | 25

Update on the Performance-Modeling Tool Extra-P Felix Wolf, TU - PowerPoint PPT Presentation

Update on the Performance-Modeling Tool Extra-P Felix Wolf, TU Darmstadt Acknowledgement David Beckingsale Alexandru Calotoiu Christopher W. Earl Torsten Hoefler Kashif Ilyas Ian Karlin Daniel Lorenz

www.rodecon.co.za Extra Abrasion Liner (EAL) www.rodecon.co.za Extra Abrasion Liner (EAL)

SynAthina Onli line Tools 1. . A mapping tool 2. A Community Tool 3. An Archive Tool 3. An

Modeling of proteins and complexes High resolution Low resolution Modeling of domains Modeling

Virtual Reality Modeling Virtual Reality Modeling from http://www.okino.com/ Modeling Modeling

Development of an Arc Damage Development of an Arc Damage Modeling Tool Modeling Tool Aging

Black Box Scanning Tool + White Box Testing Tool Toshis Black Box Scanning Tool Same

Workflow Plus Signature Capture Tool for Synergy Enterprise What is This Tool ? This tool

Workflow Plus URL Hyperlinks Tool for Synergy Enterprise What is This Tool ? This tool will

ION 2018 PR PRODUC DUCT PR PRESEN ENTATION RETAIL IL PRODUC UCTS TS Extra Virgin Olive Oil

Greek Olive Oil & Olives Extra Virgin Olive Oil High quality extra virgin olive oil obtained

Extra Dimensional Models Extra Dimensional Models for TeV TeV- -scale Physics scale Physics

Extra healthy years or just extra years? What do we know about the gap between life expectancy

Extra Small 900 1120 Small 1238 1739 Medium 1770 2418 Large 3100 3998 Extra Large 4546

Lecture 22 Logistics HW8 due Monday (6/2), HW9 due Friday (6/6) Ant extra credit due

LLV8: LLV8: Adding Adding LLVM LLVM as as an an extra extra JIT tier to V8 JavaScript engine

Muon g g- -2 and 2 and Muon a peculiar extra U(1) a peculiar extra U(1) PRD 80, 033001

THE LAW OF SUSTAINABILITY: PAST & FUTURE 7.25 CLE Credits (6.25 General) (1.00 Access to

Community Campaigns for Renewable Heating and Cooling Technologies, Part 1 Hosted by Val Stori,

Fer$lity Preserva$on - Issues in Cancer Pa$ents Hakan

Issues and Trends in Edtech in 2014: A Conversation with WCET's Leadership January 9, 2014 The

Academic Recommendation using Citation Analysis with the advisor Erik Saule c, Kamer Kaya,

Payroll Tax Train Wreck Resolving Your Clients Payroll Tax Nightmare Eric L. Green, Esq. 1

Employer-Provided Leave and the Americans with Disabilities Act DBTAC, Baltimore, MD, 9/15/16

CAPT D. Moskoff Maritime Risk Piracy Impacts and Risk Perspectives Captain David B. Moskoff,

Update on the Performance-Modeling Tool Extra-P Felix Wolf, TU - PowerPoint PPT Presentation

Update on the Performance-Modeling Tool Extra-P Felix Wolf, TU Darmstadt Acknowledgement David Beckingsale Alexandru Calotoiu Christopher W. Earl Torsten Hoefler Kashif Ilyas Ian Karlin Daniel Lorenz

www.rodecon.co.za Extra Abrasion Liner (EAL) www.rodecon.co.za Extra Abrasion Liner (EAL)

SynAthina Onli line Tools 1. . A mapping tool 2. A Community Tool 3. An Archive Tool 3. An

Modeling of proteins and complexes High resolution Low resolution Modeling of domains Modeling

Virtual Reality Modeling Virtual Reality Modeling from http://www.okino.com/ Modeling Modeling

Development of an Arc Damage Development of an Arc Damage Modeling Tool Modeling Tool Aging

Black Box Scanning Tool + White Box Testing Tool Toshis Black Box Scanning Tool Same

Workflow Plus Signature Capture Tool for Synergy Enterprise What is This Tool ? This tool

Workflow Plus URL Hyperlinks Tool for Synergy Enterprise What is This Tool ? This tool will

ION 2018 PR PRODUC DUCT PR PRESEN ENTATION RETAIL IL PRODUC UCTS TS Extra Virgin Olive Oil

Greek Olive Oil &amp; Olives Extra Virgin Olive Oil High quality extra virgin olive oil obtained

Extra Dimensional Models Extra Dimensional Models for TeV TeV- -scale Physics scale Physics

Extra healthy years or just extra years? What do we know about the gap between life expectancy

Extra Small 900 1120 Small 1238 1739 Medium 1770 2418 Large 3100 3998 Extra Large 4546

Lecture 22 Logistics HW8 due Monday (6/2), HW9 due Friday (6/6) Ant extra credit due

LLV8: LLV8: Adding Adding LLVM LLVM as as an an extra extra JIT tier to V8 JavaScript engine

Muon g g- -2 and 2 and Muon a peculiar extra U(1) a peculiar extra U(1) PRD 80, 033001

THE LAW OF SUSTAINABILITY: PAST &amp; FUTURE 7.25 CLE Credits (6.25 General) (1.00 Access to

Community Campaigns for Renewable Heating and Cooling Technologies, Part 1 Hosted by Val Stori,

Fer$lity Preserva$on - Issues in Cancer Pa$ents Hakan

Issues and Trends in Edtech in 2014: A Conversation with WCET's Leadership January 9, 2014 The

Academic Recommendation using Citation Analysis with the advisor Erik Saule c, Kamer Kaya,

Payroll Tax Train Wreck Resolving Your Clients Payroll Tax Nightmare Eric L. Green, Esq. 1

Employer-Provided Leave and the Americans with Disabilities Act DBTAC, Baltimore, MD, 9/15/16

CAPT D. Moskoff Maritime Risk Piracy Impacts and Risk Perspectives Captain David B. Moskoff,

Greek Olive Oil & Olives Extra Virgin Olive Oil High quality extra virgin olive oil obtained

THE LAW OF SUSTAINABILITY: PAST & FUTURE 7.25 CLE Credits (6.25 General) (1.00 Access to