opportunistic computing opportunistic computing a new
play

Opportunistic Computing Opportunistic Computing : A New Paradigm : A - PowerPoint PPT Presentation

Opportunistic Computing Opportunistic Computing : A New Paradigm : A New Paradigm for Scalable Realism on Many-Cores for Scalable Realism on Many-Cores Romain Cledat, Tushar Kumar, Jaswanth Sreeram, and Santosh Pande Speedup Is Not Always the


  1. Opportunistic Computing Opportunistic Computing : A New Paradigm : A New Paradigm for Scalable Realism on Many-Cores for Scalable Realism on Many-Cores Romain Cledat, Tushar Kumar, Jaswanth Sreeram, and Santosh Pande

  2. Speedup Is Not Always the End-Goal � Immersive Applications intend to provide the richest, most engrossing experience possible to the interactive user � Gaming, Multimedia, Interactive Visualization � With growing number of cores, or increasing clock-frequencies � These applications want to do MORE, not just do it FASTER Computation Computation � Design goal: maximize Realism More Faster Enhanced I dling CPUs, Per-Frame Time Realism No Benefit ! Must continually update world & respond to Interactive User ( 30 frames-per-sec ) Fewer Cores More, Faster Cores 2

  3. What is Realism? � Realism consists of � Sophistication in Modeling � Example: Render/Animate as highly detailed a simulated world as possible � Responsiveness � Example: Update world frequently, respond “instantly” to user inputs � Unit of world update: Frame � Typical Programming Goal � Pick models/algorithms of as high a sophistication as possible that can execute within a frame deadline of 1/30 seconds � Flexibility: Probabilistic Achievement of Realism is Sufficient � Most frames (say, >90%) must complete within 10% of frame deadline � Relatively few frames (<10%) may complete very early or very late 3

  4. How do we Maximize Realism? Maximizing Realism Two complementary techniques #1: N-version Parallelism #2: Scalable Soft Real-Time Speed up hard-to-parallelize Semantics ( SRT ) Scale application semantics to algorithms with high probability available compute resources using more cores - Applies to algorithms whose execution - Applies to algorithms that make time, multi-core resource requirements random choices and sophistication are parametric - Basic Intuition: Randomized Algorithms - Basic Intuition: Real-Time Systems (but not limited to them) (but with different formal techniques) Unified as Opportunistic Computing Paradigm : N-versions creates slack for SRT to utilize for Realism 4

  5. #1 #1 N-Versions Parallelism: N-Versions Parallelism: Speedup Sequential Algorithms with High Probability

  6. Bottleneck for Speedup � Applications still have significant sequential parts � Stagnation in processor clock frequencies Sequential makes sequential parts the major Speedup Speedup bottleneck to speedup (Amdahl’s Law) Bottleneck Bottleneck � A reduction in expected execution time for sequential parts of an application will Speedup Parallel provide more slack to improve realism 6 6

  7. Intuition � Algorithms making random choices for a fixed input lead to varying completion times Run n instances in parallel under isolation 2 E 4 Uniform Bimodal E 4 E 3 E 3 E 2 E 1 E 2 E 1 Fastest among 2 n is faster than average with high probability Completion time Completion time p � Big opportunities for expected speedup u d e 5 e p with increasing n s r 4 a Speedup e n S = E 1 i l ↔ n r � Tradeoff e 3 E 1 p u S E n 2 E n � Requires knowledge of distribution 1 � Wider spread � more speedup 1 2 3 4 n (# of cores) 7 7

  8. Application Use Scenario Input E 2 E 1 (mean) � Goal : Find the reasonable Probability n to reduce expected I j-1 … I j-M completion time of A A PDF [ A ( I j ) ] Program Program Completion time � Need knowledge of PDF [ A ( I j ) ] to compute the speedup S � Determine PDF [ A ( I j-1 ) …A ( I j-M ) ] How do we do this? � Assume PDF [ A ( I j ) ] ≈ PDF [ A ( I j-1 ) …A ( I j-M ) ] (stability condition) � Stability condition gives predictive power When will this hold? We want to determine the speedup S and the number of concurrent instances n on A ( I j ) from PDF with no prior knowledge of the underlying distribution 8 8

  9. PDF and Stability Condition PDF [ A ( I j ) ] ≈ PDF [ A ( I j-1 ) …A ( I j-M ) ] � Randomized algorithms � Holds statically over j for inputs of the same “size” � Analytically known PDF � Graph algos: and V E � Depends on input size and parameters (referred to as “size”) � Holds for sufficiently slow � “Size” might be unknown variations � Other algorithms � |I j-M | ≈ … ≈ |I j-1 | ≈ |I j | � PDF is analytically � Example: TSP for trucks in unknown/intractable continental United States � Fixed grid size � Similar paths Runtime Runtime Estimation Estimation 9 9

  10. N-version parallelism in C/C++ C++ can eliminate API wrappers int a[]; Shared<int> a[]; void f (Input) { int b = …; Local state: leave as is a[k] = …; Non-local state: wrap with API call } Render each instance side-effect free Start n-versions f (I) f (I) f (I) f (I) R 1 R 2 R 3 R 4 n-versions completion time Commits Commits non-local non-local state state 10 10

  11. Current Avenues of Research � How broad is the class of algorithms that � Make random choices � Satisfy the stability condition � Exploring common randomized algorithms � TSP over a fixed grid � Randomized graph algorithms � Exploring applicability of our technique to application specific characteristics that indirectly benefit performance � Reducing the number of iterations in a Genetic Algorithm by minimizing the expected score at each iteration � Or, achieving a better final score (higher quality of result ) � Independent of performance gains 11 11

  12. #2 #2 Scalable Soft Real-Time Semantics (SRT): Scalable Soft Real-Time Semantics (SRT): Scale Application Semantics to Available Compute Resources

  13. Applications with Scalable Semantics � Games, Multimedia Codecs, Interactive Visualization � Possess scalable semantics Characteristic 1 Game Game-Frames User-Responsiveness is Crucial. at approx. 30 fps AI Physics � Model/Algorithmic Complexity must be suitably adjusted / bounded Frame Time Frame# 0 - 10 Characteristic 2 Scale down AI complexity : Dynamic Variations think-frequency, vision-range 1/30 sec in Execution Time over Data Set. � To preserve Responsiveness Frame# 50 - 60 while maximizing Sophistication, Scale up AI & Physics complexity : slack Continually Monitor Time and Scale sim time-step, effects modeled compromises Realism Algorithmic Complexity (semantics) by not maximizing Sophistication Frame# 80 - 90 Scale down Physics complexity Missed deadline significantly 13 Responsiveness Affected

  14. Scaling Semantics with Multi-cores � Traditionally, benefiting from more cores required breaking up the same computation into more parallel parts � Difficult problem for many applications, including gaming and multimedia � Scalable Semantics provide an additional mechanism to utilize more cores Data D Data D Data D Scaling Algorithms with Resources A simple A medium A sophisticated D 1 : Simple D 3 : Fine-grain D 2 Game Objects Polytope Objects Scaling Data Sets with Resources Algo A Algo A Algo A Scripted Game-World I nteractions, Open-Ended Game-World I nteractions, Unbreakable Objects Dynamic Fracture Mechanics 14

  15. Don’t Real-Time Methods Solve This Already? T0 Games, Multimedia, T1 T2 T3 I nteractive Viz I mplement as T4 T5 a Real-Time App T6 T7 I mplement with High-Productivity, Real-Time Task-Graph Large Scale - Application decomposed Programming flows into Tasks and Precedence Constraints C, C+ + , Java: Monolithic App - Responsiveness - 100Ks to Millions of LoC guaranteed by Real-time - No analyzable structure for semantics (hard or responsiveness and scaling probabilistic) - Responsiveness is entirely an emergent attribute Need a new bag of tricks to Scale (currently tuning this is an art) Semantics in Monolithic Applications 15

  16. Scaling Semantics in Monolithic Applications � Challenge for Monolithic Applications � C/C++/Java do not express user-responsiveness objectives and scalable semantics � Our Approach � Let Programmers specify responsiveness policy and scaling hooks using SRT API � Let SRT Runtime determine how to achieve policy by manipulating provided hooks � SRT API enables programmers to specify policy and hooks � Based purely on their knowledge of the functional design of individual algorithms and application components � Without requiring them to anticipate the emergent responsiveness behavior of interacting components � SRT Runtime is based on Machine Learning and System Identification (Control Theory), enabling Runtime to � Infer the structure of the application � Learn cause-effect relationships across application structure � Statistically predicts how manipulating hooks will scale semantics in a manner that best achieves desired responsiveness policy 16

  17. Case Study: Incorporating SRT API & Runtime in a Gaming Application responsiveness objective : Typical Game Engine Achieve 25 to 40 fps, frame “Game” with probability > 90% run_frame() choices affect resp. objective : frame-times & objectives Consume < 40% of “Game” frame frame frame model Physics AI Rendering simple complex, parallel model user code SRT Runtime - Learns & Caches statistical relations: - Monitors frame - Reinforcement Learning : Which models predominantly - Learns Application-wide affect which objectives? (infer complex relationships, slowly) Average Frame Structure - Feedback Control : Adjust choices in models (simple, - Chooses between medium, complex, …) to meet objectives (fast reaction) 17 user-codes in model

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend