a generic adaptive runtime autotuning framework
play

A Generic Adaptive Runtime Autotuning Framework Isaac Dooley 7th - PowerPoint PPT Presentation

A Generic Adaptive Runtime Autotuning Framework Isaac Dooley 7th Annual Workshop on Charm++ and its Applications Thursday, April 16th, 2009 1 Existing Parallel Programming Models MPI Model Charm++ Model One Thread Per Processor


  1. A Generic Adaptive Runtime Autotuning Framework Isaac Dooley 7th Annual Workshop on Charm++ and its Applications Thursday, April 16th, 2009 1

  2. Existing Parallel Programming Models MPI Model Charm++ Model One Thread Per Processor Overdecomposition Application Application Parallel Runtime System Parallel Runtime System Dynamic Load Balancing of Chare Objects to Processors 2

  3. Runtime System Controls the Application Application Application Instrumented Performance Control Points Parallel Runtime System Knowledge of Adaptive Control Points Control Experiment System History Control Points Instrumented Performance Characteristics 3

  4. Intelligent Tuning Measured Performance Metrics Descriptive Categorizations (Input to Controller) for Application Behavior as Control Point Values are Increased Processor Utilization Task Decomposition Granularity Processor Overhead Task Scheduling Priorities Memory Utilization Degree of Pipeline Streaming Cache Performance Memory Usage Application Decomposition Granularity Prefetch / Lookahead Distance Communication Volume Critical Path Profiling 4

  5. Control Point API Application Exposes Control Point Values: int controlPointValue = controlPoint("Control Point Name", 1, 50); Application Specified Performance: registerControlPointTiming(time); Control Point Framework Instructs Application to adapt: CkCallback myCallback (CkIndex_Main::controlPointChange(NULL),proxy); registerControlPointChangeCallback(myCallback); Describe Knowledge: controlPointPriorityArray("Control Point Name", ArrayProxy); controlPointPriorityEntry("Control Point Name", EntryMethod); 5

  6. Use Cases Adjust task/data granularity Adjust scheduling priorities Adjust load balancing parameters Choose algorithmic alternatives Apply various communication optimizations 6

  7. Tuning Critical Path Priorities 7

  8. Control Point Configuration Space Pipelined Filtering 1 Number of Worker Chares (Pipeline Stages) 64 1 2 Input Slice Size 4 512 1024 Legend: Performance within 1.0% of best Performance within 2.0% of best Performance less than 98.0% of best Smaller Squares Represent Lower Performance 8

  9. Control Point Configuration Space 2D Jacobi 1 Number of Worker Chares (partitions) in X Dimension 50 1 Number of Worker Chares (partitions) in Y Dimension Legend: Performance within 1.0% of best Performance within 2.0% of best Performance less than 98.0% of best Smaller Squares Represent Lower Performance 50

  10. Future Work Improve critical path profiles. Detect & fix more patterns of known performance problems. Use with complicated applications & algorithms such as MD and LU. Find appropriate ways to expose application knowledge. Build an expert system combining all the patterns we discover. 10

  11. The End Questions? Suggestions? Isaac Dooley idooley2@uiuc.edu 11

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend