Detecting and Using Critical Paths at Runtime in Message Driven - - PowerPoint PPT Presentation

detecting and using critical paths at runtime in message
SMART_READER_LITE
LIVE PREVIEW

Detecting and Using Critical Paths at Runtime in Message Driven - - PowerPoint PPT Presentation

Detecting and Using Critical Paths at Runtime in Message Driven Parallel Programs Isaac Dooley, Laxmikant V. Kale Department of Computer Science University of Illinois idooley2@illinois.edu kale@illinois.edu 12th Workshop on Advances in


slide-1
SLIDE 1

Detecting and Using Critical Paths at Runtime in Message Driven Parallel Programs

Isaac Dooley, Laxmikant V. Kale Department of Computer Science University of Illinois

idooley2@illinois.edu kale@illinois.edu

12th Workshop on Advances in Parallel and Distributed Computational Models IPDPS April 19, 2010

1

slide-2
SLIDE 2

Motivation

  • Critical paths historically have been used important in post-mortem

(offline) parallel performance analysis.

  • Can they be computed online in message driven parallel languages?
  • Is critical path information useful in running parallel HPC programs?

2

slide-3
SLIDE 3

Critical Paths in Parallel Programs

  • Existing algorithms for recording critical paths in a hybrid online/offline

manner:

  • J Hollingsworth. An online computation of critical path profiling. In SPDT ʼ96: Proceedings of the SIG-

METRICS symposium on Parallel and distributed tools, pages 11–20, New York, NY, USA, 1996

  • J Hollingsworth. Critical path profiling of message passing and shared-memory programs. Parallel and

Distributed Systems, IEEE Transactions on, 9(10):1029– 1040, Oct 1998

  • C Yang, B P Miller. Path Analysis for the Execution of Parallel and Distributed Programs. IEEE Transactions
  • n Parallel and Distributed Systems, pages 1029 - 1040, Oct 1998
  • M Schulz. Extracting critical path graphs from MPI applications. Cluster Computing, 2005, pages 1 – 10,

Sep 2005

  • For guiding expert post-mortem performance analysis
  • For visualizing parallel program execution to gain understanding

3

slide-4
SLIDE 4

Critical Paths in Message Driven Parallel Programs

  • Message Driven Execution (as implemented in Charm++):
  • Tasks invoke methods asynchronously
  • An asynchronous method invocation results in:
  • New local task in queue, or
  • Message sent to remote processor, resulting in new task

4

slide-5
SLIDE 5

Example Program Activity Graph

5 Legend Task (Prefix) Message C Processor 1 Processor 2 Processor 3 Processor 4 A B Program Order Dependency Time

  • Critical path profiles represent a path through the Program Activity Graph

(PAG) composed of computation and messages.

slide-6
SLIDE 6

Program Activity Graph

6

In-Edge (processor, Index) 2 Task Prefix Index (2,2) initial 1 Processor 1 A (1,1) 3 In-Edge (processor, Index) Task Prefix Index (1,1) (1,1) 1 2 Processor 2 B In-Edge (processor, Index) (3,1) or (4,1) 1 2 Task Prefix Index (1,1) Processor 4 (1,2) or (3,2) or (4,2) 3 In-Edge (processor, Index) Task Prefix Index (2,1) (2,3) 1 2 Processor 3 C

Legend Task (Prefix) Message C Processor 1 Processor 2 Processor 3 Processor 4 A B Program Order Dependency Time

  • The PAG can be recorded as a

program runs in a distributed graph

  • Path weights include

computation time, but not message send time

slide-7
SLIDE 7
  • Record PAG as program runs
  • Augment each message with:
  • an identifier
  • path length
  • Record maximal incoming path for each task in a table
  • Requires compiler support or code modifications
  • Retrieve Critical Path for any task with a backwards

traversal

7

Finding Critical Paths

slide-8
SLIDE 8

Implementation

8

  • Implemented in the Charm++ runtime system.
  • Supports multiple languages:
  • Charm++
  • Structured Dagger
  • Charisma
  • Trickiness is in how multiple incoming dependencies are captured.
  • Reductions
  • User maintains knowledge of dependencies satisfied by earlier tasks
  • Language specific dependency mechanisms
slide-9
SLIDE 9

Costs of Recording Critical Paths

9

  • Cost of extra 8 bytes in message
  • Cost of adding table entries for each task execution
  • Cost of backwards traversal retrieval: Application Dependent

1000 10000

Tasks Per Second

1 2 3 4 5 6

Overhead of Recording PAG (Percentage)

4-Neighbor (4pe) Ring

  • Microbenchmarks:
slide-10
SLIDE 10

Use: Automatic Task Priorities

10

  • Automatically Tuning Task Priorities:
  • OpenAtom Application
  • Record critical path for 20 iterations, then switch to new priorities based
  • n observed critical path.
  • 10.2% speedup when prioritizing critical path task types
slide-11
SLIDE 11

Uses: Phase Detection

11

  • Critical path is retrieved
  • Frequently repeated subpaths are extracted
  • Cheap!

a b b b b b b c d e f g g g g g h i j b b b b b k l m n o p o p q r b b b b s t i u v w b b b b x x y v w b b b b b A A x y v w b b b b x x x y v w b b b b b x x x y v w b b b b b x x x a b b b b b b c d e f g g g g g h i j b b b b b b k l m n o p o p q r b b b b s t i u v w b b b b b A A x y v w b b b b b A A x y v w b b b b b A A x y v w b b b b b x x x y v w b b b b b x x x a b b b b b b c d e f g g g g g h i j b b b b b k l m n o p o p q r b b b b s t i u v w b b b b x x y v w b b b A A x y v w b b b b x x x y v w b b b b A x x y w b b b A A x a b b b b b b c d e f g g g h i j b b b k l m n o p o p q r b b b b s t i u v w b b b b b A x x y v w b b b b b A x x y v w b b b b b x x x y v w b b b b x x x y w b b b b x x A a b b b b b b c d e f g g g g g h i j b b b b b k l m n o p o p q r b b b b s t i u v w b b b b x x y v w b b b b x x x y v w b b b b x x x y v w b b b b x x x y v w b b b b b x x x a b b b b b b c d e f g g g g g h i j b b b b b k l m n o p o p q r b b b b s t i u v w b b b b

slide-12
SLIDE 12

Uses: Performance Analysis

12

  • Visualization:
slide-13
SLIDE 13

Uses: Filter Performance Data

13

  • Reducing volume of performance analysis data
  • Filter out processors not on critical path
  • Performance analyst only needs to manipulate & view fewer files
slide-14
SLIDE 14

Conclusion

14

  • Our Contribution:

Critical paths can be recorded and used in message driven parallel programs at runtime for tuning message priorities.

slide-15
SLIDE 15

Thanks & Questions

Detecting and Using Critical Paths at Runtime in Message Driven Parallel Programs

Isaac Dooley, Laxmikant V. Kale Department of Computer Science University of Illinois

12th Workshop on Advances in Parallel and Distributed Computational Models April 19, 2010

15

slide-16
SLIDE 16

Handling Multiple Input Dependencies

16

Processor 1 Processor 2 Processor 4 (1,17,7.3) (2,12,10.5) ( 4 , 1 9 , 9 . 1 ) (Source Processor, Source Index, Cummulative Path Duration) maximum duration incoming dependency = (2,12,10.5) Processor 3