Model-Driven Computational Sprinting Nathaniel Morris , Christopher - PowerPoint PPT Presentation

Model-Driven Computational Sprinting Nathaniel Morris , Christopher Stewart, Lydia Chen, Robert Birke, Jaimie Kelley 1

Computational Sprinting [Raghavan, 2012]: Processor improves application responsiveness by temporarily exceeding its sustainable thermal budget (1) DVFS (2) Core Scaling Sprint 3 2 Active Sprint 2.2 GHZ 1 1 1 Clock Cores Rate 0 0 0 (by ID) 1.3 GHZ time time 2

Computational Sprinting cont. Sprinting budget constrains total time in sprint mode For example, 6 minutes per 1 hour (AWS Burstable) Budget defjned by scarce resources Thermal capacitance (Raghavan, 2012) Energy (Zheng,2015;Fan,2016) Reserve CPU cycles in Co-located Contexts (AWS) Sprinting policy = mechanism + budget + trigger SLO-driven services use timeouts to trigger sprinting [Haque, 2012; Hsu, 2015] 3

Sprinting Example Example: SLO → Complete 99% of queries in 2 seconds Example Policy: Execute at 1.3 GHZ. Time out after 1.5 seconds, set DVFS to 2.2 GHZ until (1) query completes or (2) 50 J budget is exhausted Root causes: (1) Slow execution (2) Long queuing delay TO TO Queuing Processing 0 time 1.5 0 time 1.5 Energy Sprinting Query Energy Query Used Execution Used Execution 4

Sprinting Policies Are Hard to Set With sprinting, dynamic runtime factors determine query execution time e.g., queue length, speedup from sprinting, remaining budget How to set timeout policies and budgets? State of practice: Same sprinting policy for all workloads [AWS Burstable] State of art: T arget slower than expected query executions [Hsu, 2016], T arget high utilization [Haque, 2015] These approaches are heuristic driven; Could perform poorly & sensitive to parameter settings 5

Model-Driven Computational Sprinting Model-Driven Computational Sprinting predicts expected response time and uses the predictions to compare policies and discover high performance settings Our approach combines: First-principles modeling to capture sprinting fundamentals Machine learning to accurately characterize the efgects of runtime factors on response time 6

Outline Introduction First Principles for Sprinting Efgective Sprint Rate Model Evaluation & Model-Driven Management 7

Principles of Sprinting Discrete-event queuing simulator for sprinting output: average input Traditional queuing response time parameters Arrival & service rate arrival rate Sprinting accepts # rt additional parameters Q 1 1.3 service rate Q 2 0.7 Sprint rate & Timeout discrete-event queue simulation Q N 4.1 timeout Budget sprint rate Principle: Compute resp. budget time for each job given queuing delay, processing time and timeout 8

Offmine Workload Profjling Profjling varies workload conditions and sprinting policies The service rate (sustained processing time) and marginal sprint rate are calculated via profjling Marginal sprint rate: Processing time when a entire query execution is sprinted offmine 9

Runtime Factors Afgect Sprinting Offmine profjling explains sprinting in isolation System properties known only under live workload, i.e., at runtime, afgect response time signifjcantly Why offmine profjling is inaccurate? Concurrency Paradox: A sprint that alters 1 query execution can afgect response time for many queries ● The sprint reduces queuing backlog Phase Paradox: For 1 query execution, sprinting can consistently yield less speedup under live workload ● Timeout triggers too late, missing execution phases amenable to sprinting mechanism (e.g., seq phase under core scaling) 11

From Marginal to Efgective Sprint Rate Naive insight: Learn F(wrkld, sprint policy) → resp. time ● Complicated function, lots of training Our insight: Learn F(wrkld, sprint policy) → efg. sprint rate ● Then use fjrst principles to get response time Which machine learning approach? Random Decision Forest combines multiple, deep decision trees ● Deep → low bias ● Multiple → reduce variance 12

Evaluation Setup Goals: 1. Compare how well our ● Set up 7 services (2 Spark + 5 NAS) modeling approach generalizes and tested multiple sprint policies Do sprinting mechanisms afgect accuracy? Workloads? ● T ested DVFS, Core-Scale, ec2-DVFS 2. Contrast with alternative modeling approaches? Accuracy? Cost to set up? ● Methodology: Given arrival rate and sprinting policy, predict response time. Error is percent 3. Does a model-driven difgerence between prediction and approach help discover better observed response time sprinting policies? 14

Accuracy Across Mechanisms/Workloads kmeans 8 dvfs knn 7 jacobi ec2dvfs mem 6 leuk Median Error 5 bfs 4 3 2 1 0 arch hybrid ● Our approach is 93-97% accurate across sprinting mechanisms and a wide variety of workloads. 15

Hybrid Model vs ANN 25 kmeans knn 20 jacobi Median Error mem 15 leuk bfs 10 5 0 hybrid ann ● What if we just used machine learning? ANN – 5-layer Artifjcial Neural Network trained iteratively and tuned ● Our approach required 6x to 54x less data than ANN with comparable accuracy 16

Model-Driven Management CASE STUDY CPU 0 CPU 0 Computational Sprinting & AWS Burstable Instances Service can access only a fraction of CPU resources during normal operation Service sprints (exclusive use of CPU) for 6 min/hour Implementations Baseline: No Sprint Sprint Big burst: 20% norm → 100% sprint Small burst: 20% norm → 60% sprint 17

Model-Driven Management Cont. Search for best Example with Jacobi Service sprinting policy Scan timeouts until the policy with lowest response time is found T ry for a large and small budget The best timeout is difgerent depending on budget and workload Best policy improved response time by up to 1.4X 18

Model-Driven Management Cont. Use hybrid model to search for best sprinting policy Adrenaline: Sets timeout to the 85 th % percentile of non-sprinting response time [Hsu, HPCA, 2015] Few-to-Many: Finds the largest timeout setting that exhausts budget (speeding up the slowest queries) [Haque, ASPLOS,2015] Response Time Improvement Our Approach Adrenaline Few-to-Many Big Burst 1 1.26 1.06 Small Burst 1 1.45 1.36 19

Conclusion Sprinting reduces SLO violations, but sprinting policies have complex efgects on runtime execution and response time We combine machine learning and fjrst principles to model response time quickly and accurately Our modeling approach introduces efgective sprint rate, i.e., speedup given dynamic runtime conditions With our model, we discovered policies that outperformed state-of-the-art heuristics by 1.45X 20

Benefjts of Good Sprinting Policies Better sprinting policy allows for more colocated workloads More workloads per node increases profjt Profjt increased by 1.6X Budgeting shrinks budget but increases sprint rate Our approach fjxes the budget and selects a timeout Sprinting policies more effjcient for all 3 combos 21

Model-Driven Computational Sprinting Nathaniel Morris , Christopher - PowerPoint PPT Presentation

Model-Driven Computational Sprinting Nathaniel Morris , Christopher Stewart, Lydia Chen, Robert Birke, Jaimie Kelley 1 Computational Sprinting [Raghavan, 2012]: Processor improves application responsiveness by temporarily exceeding its

The Computational Sprinting Game Songchun Fan , Seyed Majid Zahedi , Benjamin C. Lee {

Keeping pace and sprinting ahead of the Chinese transformation Hakan Bicil , Chief Commercial

LET S PLAY SCRUMBLE The essential to start sprinting! ALL THINGS INDISPENSABLE The game

Priority-Driven Scheduling of Periodic Tasks Priority-driven vs. clock-driven scheduling:

False fasting is driven by pride False fasting is driven by pride False fasting is

Gillian Smith September 13, 2012 gillian@ccs.neu.edu Graphics-Driven Game Design

1. Computational Fluid a. Computational Fluid Dynamics is in the domain of Computational Science

Model-Driven Software Engineering Foundations of Model-Driven Software Engineering Dr. Jochen

Cosmological model : Cosmological model Cosmological model Cosmological model : : : :

A Computational Model of A Computational Model of Routine Procedural Memory Routine Procedural

Transit-Driven Complete Streets Transit-Driven Complete Streets Questions: Type questions

Data-Driven Research Program Data-Driven Research Program Linked Longitudinal Retrospective

Large deviations and heterogeneities in driven or non-driven kinetically constrained models

SCE Map Update: Data-Driven Spatial and E Field Maps Michael Mooney, Hannah Rogers Colorado

Domain Driven Domain Driven Design with relational Design with relational Databases and Spring

Events Event-driven programming Event loop Event dispatch Event handling Event Driven

Why you might like Scala.js Li Haoyi, Scaladays 17-March-2015 0 Who Am I? Li Haoyi Work

Healthy Operations about:phil.ingram First install in 2001 Working in IT since 2006

Patent Law Prof. Roger Ford Monday, March 28, 2016 Class 16 Patentable subject matter I:

Caring for Your Team: Newsroom Mental Health Strategies Meli eliss ssa Stang tanger, , LMSW

G ETTING U NSTUCK : THE SECRET LIFE OF PROCRASTINATORS Jean Marie Heilig Christine Kreger

Connected Parenting in The Digital Age Supporting My Child Joy Ong Senior Counsellor &

Biosocial Research: Some methodological considerations Tarani Chandola NCRM, University of

Student Distress and Identify two mental health concerns facing students attending international

Model-Driven Computational Sprinting Nathaniel Morris , Christopher - PowerPoint PPT Presentation

Model-Driven Computational Sprinting Nathaniel Morris , Christopher Stewart, Lydia Chen, Robert Birke, Jaimie Kelley 1 Computational Sprinting [Raghavan, 2012]: Processor improves application responsiveness by temporarily exceeding its

The Computational Sprinting Game Songchun Fan , Seyed Majid Zahedi , Benjamin C. Lee {

Keeping pace and sprinting ahead of the Chinese transformation Hakan Bicil , Chief Commercial

LET S PLAY SCRUMBLE The essential to start sprinting! ALL THINGS INDISPENSABLE The game

Priority-Driven Scheduling of Periodic Tasks Priority-driven vs. clock-driven scheduling:

False fasting is driven by pride False fasting is driven by pride False fasting is

Gillian Smith September 13, 2012 gillian@ccs.neu.edu Graphics-Driven Game Design

1. Computational Fluid a. Computational Fluid Dynamics is in the domain of Computational Science

Model-Driven Software Engineering Foundations of Model-Driven Software Engineering Dr. Jochen

Cosmological model : Cosmological model Cosmological model Cosmological model : : : :

A Computational Model of A Computational Model of Routine Procedural Memory Routine Procedural

Transit-Driven Complete Streets Transit-Driven Complete Streets Questions: Type questions

Data-Driven Research Program Data-Driven Research Program Linked Longitudinal Retrospective

Large deviations and heterogeneities in driven or non-driven kinetically constrained models

SCE Map Update: Data-Driven Spatial and E Field Maps Michael Mooney, Hannah Rogers Colorado

Domain Driven Domain Driven Design with relational Design with relational Databases and Spring

Events Event-driven programming Event loop Event dispatch Event handling Event Driven

Why you might like Scala.js Li Haoyi, Scaladays 17-March-2015 0 Who Am I? Li Haoyi Work

Healthy Operations about:phil.ingram First install in 2001 Working in IT since 2006

Patent Law Prof. Roger Ford Monday, March 28, 2016 Class 16 Patentable subject matter I:

Caring for Your Team: Newsroom Mental Health Strategies Meli eliss ssa Stang tanger, , LMSW

G ETTING U NSTUCK : THE SECRET LIFE OF PROCRASTINATORS Jean Marie Heilig Christine Kreger

Connected Parenting in The Digital Age Supporting My Child Joy Ong Senior Counsellor &amp;

Biosocial Research: Some methodological considerations Tarani Chandola NCRM, University of

Student Distress and Identify two mental health concerns facing students attending international

Connected Parenting in The Digital Age Supporting My Child Joy Ong Senior Counsellor &