to Understand Parallel Program Behaviors LAI WEI , JOHN - PowerPoint PPT Presentation

Automated Analysis of Time Series Data to Understand Parallel Program Behaviors LAI WEI , JOHN MELLOR-CRUMMEY RICE UNIVERSITY HOUSTON, TX, USA

Background Parallel computers of increasing scale ◦ Support scientific simulations of increasing ambition Performance of many applications fail to scale accordingly ◦ Load imbalance, serialization, network congestion, etc. Performance tools to understand application behaviors ◦ Measure and present performance data ◦ Used by experts to manually identify performance inefficiencies 2

Profile Breaks down application run time into sources of costs Calling context P0 P1 P2 P3 main() 9s 9s 9s 9s init() 1s 1s 1s 1s solve() 8s 8s 8s 8s compute() 4s 4.1s 3.9s 4s sync() 4s 3.9s 4.1s 4s 3

Profile Breaks down application run time into sources of costs Calling context P0 P1 P2 P3 main() 9s 9s 9s 9s init() 1s 1s 1s 1s solve() 8s 8s 8s 8s compute() 4s 4.1s 3.9s 4s sync() 4s 3.9s 4.1s 4s 4

Profile Breaks down application run time into sources of costs Calling context P0 P1 P2 P3 main() 9s 9s 9s 9s init() 1s 1s 1s 1s solve() 8s 8s 8s 8s compute() 4s 4.1s 3.9s 4s sync() 4s 3.9s 4.1s 4s Performance loss, why? 5

Time series Presents application behavior over time init() solve() Calling context P0 main() init() P1 solve() P2 compute() sync() P3 3s 5s 7s 9s Depth = 1 1s 6

Time series Presents application behavior over time init() compute() sync() Calling context P0 main() init() P1 solve() P2 compute() sync() P3 3s 5s 7s 9s Depth = 2 1s 7

Time series Presents application behavior over time init() compute() sync() Calling context P0 main() init() P1 solve() P2 compute() sync() P3 3s 5s 7s 9s Depth = 2 1s 8

Time series Presents application behavior over time init() compute() sync() Calling context P0 main() init() P1 solve() P2 compute() sync() P3 3s 5s 7s 9s Depth = 2 1s Load imbalance 9

Motivation Experts manually examine time series ◦ Understand how and why performance inefficiencies arise Time series of large scale parallel executions ◦ Vast in three dimensions ◦ Process ◦ Time ◦ Call path depth ◦ Manual analysis is difficult if not impractical 12

Related work -- automated analysis Analysis of profiles [Huck, SC’05] [Tallent, SC’10 ] ◦ Often insufficient for diagnosing how and why parallel inefficiencies arise Analysis of execution traces ◦ Collecting instrumentation-based traces are costly in time and space ◦ Fine-grained traces explode at large scale ◦ Analysis at coarse granularity [Gonzalez , IPDPS’09] [Llort , IPDPS’10] ◦ Still needs lots of manual effort ◦ Analysis at fine granularity for short intervals [Geimer , CCPE’10 ] [Böhme, TOPC’16] ◦ Requires prior knowledge for selective tracing 13

Our contribution Automated analysis of sample-based time-series data ◦ Feasible for large-scale programs ◦ Data volume is manageable ◦ Derive compact top-down summaries ◦ Uncover patterns and variance ◦ Direct attention to potential performance losses ◦ Attribute losses to code regions where they originate 14

Approach 1. Collect and prepare sample-based time-series for further analysis ◦ Collect a time series of call paths with HPCToolkit ◦ Organize each time series as a tree of program calling contexts ◦ Identify iterative behaviors in the time series 2. Build clusters across threads and loop iterations 3. Quantify performance losses and attribute them to call paths 15

Collect call path samples over time 1 Depth = 0 2 Depth = 1 3 4 Depth = 2 5 Depth = 3 6 7 8 Depth = 0 9 Depth = 1 10 Depth = 2 11 12 Depth = 3 13 14 15 16

Collect call path samples over time 1 Depth = 0 2 Depth = 1 3 4 Depth = 2 5 Depth = 3 6 T1 7 8 Depth = 0 9 Depth = 1 10 Depth = 2 11 12 Depth = 3 13 14 15 20

Collect call path samples over time 1 main() Depth = 0 2 foo@13 Depth = 1 3 C@5 4 Depth = 2 5 Depth = 3 6 T1 7 8 Depth = 0 9 Depth = 1 10 Depth = 2 11 12 Depth = 3 13 14 15 21

Collect call path samples over time 1 main() Depth = 0 2 foo@13 Depth = 1 3 C@5 4 Depth = 2 5 Depth = 3 6 T1 T2 7 8 Depth = 0 9 Depth = 1 10 Depth = 2 11 12 Depth = 3 13 14 15 26

Collect call path samples over time 1 main() main() Depth = 0 2 foo@13 foo@13 Depth = 1 3 C@5 4 Depth = 2 A@7 5 Depth = 3 6 T1 T2 7 8 Depth = 0 9 Depth = 1 10 Depth = 2 11 12 Depth = 3 13 14 15 27

Collect call path samples over time 1 main() main() Depth = 0 2 foo@13 foo@13 Depth = 1 3 C@5 4 Depth = 2 A@7 5 Depth = 3 6 T1 T2 7 8 Depth = 0 9 Depth = 1 10 Depth = 2 11 12 Depth = 3 13 14 15 28

Collect call path samples over time 1 main() main() Depth = 0 2 foo@13 foo@13 Depth = 1 3 C@5 loop@6 4 Depth = 2 A@7 5 Depth = 3 6 T1 T2 7 8 Depth = 0 9 Depth = 1 10 Depth = 2 11 12 Depth = 3 13 14 15 29

Collect call path samples over time 1 main() main() main() Depth = 0 2 foo@13 foo@13 foo@13 Depth = 1 3 C@5 loop@6 loop@6 4 Depth = 2 A@7 B@8 5 Depth = 3 6 T1 T2 T3 7 8 Depth = 0 9 Depth = 1 10 Depth = 2 11 12 Depth = 3 13 14 15 34

Collect call path samples over time 1 main() main() main() main() Depth = 0 2 foo@13 foo@13 foo@13 foo@13 Depth = 1 3 C@5 loop@6 loop@6 loop@6 4 Depth = 2 A@7 B@8 A@7 5 Depth = 3 6 T1 T2 T3 T4 7 8 Depth = 0 9 Depth = 1 10 Depth = 2 11 12 Depth = 3 13 14 15 37

Collect call path samples over time 1 main() main() main() main() Depth = 0 2 foo@13 foo@13 foo@13 foo@13 Depth = 1 3 C@5 loop@6 loop@6 loop@6 4 Depth = 2 A@7 B@8 A@7 5 Depth = 3 6 T1 T2 T3 T4 7 main() 8 Depth = 0 9 foo@13 Depth = 1 10 loop@6 Depth = 2 11 A@7 12 Depth = 3 13 T5 14 15 38

to Understand Parallel Program Behaviors LAI WEI , JOHN - PowerPoint PPT Presentation

Automated Analysis of Time Series Data to Understand Parallel Program Behaviors LAI WEI , JOHN MELLOR-CRUMMEY RICE UNIVERSITY HOUSTON, TX, USA Background Parallel computers of increasing scale Support scientific simulations of increasing

20-03-06 7. Learning Sequences/Behaviors How to use sequences/behaviors? Sequences and more

Parallel and Distributed Programming Introduction Kenjiro Taura 1 / 21 Contents 1 Why Parallel

Otsego County Youth & Parent Data Overview Dec 17, 2015 Risk Behaviors & Arrests

Movement Behaviors Marco Chiarandini Department of Mathematics & Computer Science University

Specification-based Testing Software Engineering Gordon Fraser Saarland University Program

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.2 Parallel

Introduction Introduction What is Parallel Architecture? Why Parallel Architecture? Evolution

Introduction to Parallel Computing George Karypis Principles of Parallel Algorithm Design

+ Design of Parallel Algorithms Parallel Algorithm Analysis Tools + Topic Overview n Sources of

+ Design of Parallel Algorithms Parallel Algorithm Analysis Tools + Topic Overview n Sources

Overview Why Parallel Sorting? Parallel Quicksort Bitonic Sort Parallel Merge Sort

Parallel Computing: Opportunities and Challenges Victor Lee Parallel Computing Lab (PCL), Intel

A Massively Parallel Dense Symmetric A Massively Parallel Dense Symmetric A Massively Parallel

Shared Memory Programming with OpenMP Lecture 3: Parallel Regions Parallel region directive

High Impact Leadership Behaviors Self-Assessment Tool This self-assessment tool is intended to

PARALLEL Joachim Nitschke PROGRAMMING Project Seminar Parallel Programming, Summer

Static Detection and Automatic Exploitation of Intent Message Vulnerabilities in Android

A Model-Based System Supporting Automatic Self-Regeneration of Critical Software Paul Robertson

Distributed motion coordination of robotic networks or how to get global behavior out of

2019-2020 Provider Orientation South Carolina Department of Disabilities and Special Needs June

Automatic Synthesis of High-Speed Processor Simulators Martin Burtscher and Ilya Ganusov

Statistics, Big Data and small data Pierre Lebrun, Arlenda SA / Pharmalex

$ Lesson Ten Consumer Privacy 04/09 privacy and information information privacy: privacy that

Image Source : I NQUISITR

to Understand Parallel Program Behaviors LAI WEI , JOHN - PowerPoint PPT Presentation

Automated Analysis of Time Series Data to Understand Parallel Program Behaviors LAI WEI , JOHN MELLOR-CRUMMEY RICE UNIVERSITY HOUSTON, TX, USA Background Parallel computers of increasing scale Support scientific simulations of increasing

20-03-06 7. Learning Sequences/Behaviors How to use sequences/behaviors? Sequences and more

Parallel and Distributed Programming Introduction Kenjiro Taura 1 / 21 Contents 1 Why Parallel

Otsego County Youth &amp; Parent Data Overview Dec 17, 2015 Risk Behaviors &amp; Arrests

Movement Behaviors Marco Chiarandini Department of Mathematics &amp; Computer Science University

Specification-based Testing Software Engineering Gordon Fraser Saarland University Program

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.2 Parallel

Introduction Introduction What is Parallel Architecture? Why Parallel Architecture? Evolution

Introduction to Parallel Computing George Karypis Principles of Parallel Algorithm Design

+ Design of Parallel Algorithms Parallel Algorithm Analysis Tools + Topic Overview n Sources of

+ Design of Parallel Algorithms Parallel Algorithm Analysis Tools + Topic Overview n Sources

Overview Why Parallel Sorting? Parallel Quicksort Bitonic Sort Parallel Merge Sort

Parallel Computing: Opportunities and Challenges Victor Lee Parallel Computing Lab (PCL), Intel

A Massively Parallel Dense Symmetric A Massively Parallel Dense Symmetric A Massively Parallel

Shared Memory Programming with OpenMP Lecture 3: Parallel Regions Parallel region directive

High Impact Leadership Behaviors Self-Assessment Tool This self-assessment tool is intended to

PARALLEL Joachim Nitschke PROGRAMMING Project Seminar Parallel Programming, Summer

Static Detection and Automatic Exploitation of Intent Message Vulnerabilities in Android

A Model-Based System Supporting Automatic Self-Regeneration of Critical Software Paul Robertson

Distributed motion coordination of robotic networks or how to get global behavior out of

2019-2020 Provider Orientation South Carolina Department of Disabilities and Special Needs June

Automatic Synthesis of High-Speed Processor Simulators Martin Burtscher and Ilya Ganusov

Statistics, Big Data and small data Pierre Lebrun, Arlenda SA / Pharmalex

$ Lesson Ten Consumer Privacy 04/09 privacy and information information privacy: privacy that

Image Source : I NQUISITR

Otsego County Youth & Parent Data Overview Dec 17, 2015 Risk Behaviors & Arrests

Movement Behaviors Marco Chiarandini Department of Mathematics & Computer Science University