Work Stealing for Interac1ve Services to Meet Target Latency Jing Li - PowerPoint PPT Presentation

Work Stealing for Interac1ve Services to Meet Target Latency Jing Li ∗ , Kunal Agrawal ∗ , Sameh Elnikety†, Yuxiong He†, I-Ting Angelina Lee ∗ , Chenyang Lu ∗ , Kathryn S. McKinley† ∗ Washington University in St. Louis †MicrosoF Research ＊ This work and was iniIated and partly done during Jing Li’s internship at MicrosoF Research in summer 2014.

Interac1ve services must meet a target latency Interactive services Search, ads, games, finance Users demand responsiveness

Interac1ve services must meet a target latency Interactive services Search, ads, games, finance Users demand responsiveness Problem setting Multiple requests arrive over time Each request: parallelizable Latency = completion time – arrival time Its latency should be less than a target latency T Goal: maximize the number of requests that meet � a target latency T

Latency in Internet search Ø In industrial interactive services, thousands of servers together serve a single user query. Ø End-to-end latency ≥ latency of the slowest server end-to-end response Ime (~ 100ms for user to find responsive) Doc lookup & ranking Target latency Parsing a Doc lookup Result aggrega1on & search query & ranking snippet genera1on . . . Doc lookup & ranking

Goal — Meet Target Latency in Single Server Ø Goal – design a scheduler to maximize the number of requests that can be completed within the target latency � (in a single server) Doc lookup & ranking Target latency Parsing a Doc lookup Result aggrega1on & search query & ranking snippet genera1on . . . Doc lookup & ranking

Sequen1al execu1on is insufficient Large request must execute in parallel to meet target latency constraint Target latency Request Sequen1al Execu1on Time (ms) ( work )

Full parallelism does not always work well Large request Target latency: 90ms 270 60 Small request

Full parallelism does not always work well Finish by 1me 90 Target latency: 90ms 270 Case 1 : 1 large request + 3 small requests 60 Finish by 60 1me 110 60 0 20 1me

Full parallelism does not always work well Finish by 1me 90 Target latency: 90ms 270 Case 1 : 1 large request + 3 small requests 60 Finish by 60 1me 110 60 Small requests are wai1ng 0 20 ✖ Miss 2 requests core 1 core 2 core 3 90 130 1me 110 150

Full parallelism does not always work well Finish by 1me 90 Target latency: 90ms 270 Case 1 : 1 large request + 3 small requests 60 Finish by 60 1me 110 60 0 20 ✖ ✔ Miss 2 requests Miss 1 request core 1 core 1 core 2 core 2 core 3 core 3 50 90 130 1me 110 1me 110 150 80 270

Some large requests require parallelism Finish by 1me 90 Target latency: 90ms 270 Case 2 : 1 large request + 1 small request 60 Finish by 1me 110 0 20 1me

Some large requests require parallelism Finish by 1me 90 Target latency: 90ms 270 Case 2 : 1 large request + 1 small request 60 Finish by 1me 110 0 20 1me ✔ ✖ Miss 0 request Miss 1 request core 1 core 1 core 2 core 2 core 3 core 3 80 90 1me 1me 110 270

Strategy: adapt scheduling to load Case 1 � ✔ Miss 1 request core 1 Cannot afford to run all large � core 2 requests in parallel core 3 50 110 1me 80 270 Case 2 ✔ Miss 0 request core 1 Do need to run some large � core 2 requests in parallel core 3 90 1me 110

Strategy: adapt scheduling to load High load run large requests sequentially � ✔ Miss 1 request core 1 Cannot afford to run all large � core 2 requests in parallel core 3 50 110 1me 80 270 Low load run all requests in parallel ✔ Miss 0 request core 1 Do need to run some large � core 2 requests in parallel core 3 90 1me 110

Why does the adap1ve strategy work? Latency = Processing Time + Waiting time At low load, processing time dominates latency q Parallel execution reduces request processing time q All requests run in parallel At high load, waiting time dominates latency q Executing a large request in parallel increases waiting time of many more later arriving requests q Each large request that is sacrificed helps to reduce waiting time of many more later arriving requests

Challenge: which request to sacrifice? Strategy: when load is low, run all requests in parallel; when load is high, run large requests sequentially

Challenge: which request to sacrifice? Strategy: when load is low, run all requests in parallel; when load is high, run large requests sequentially Challenge 1 non-clairvoyant q We do not know the work of a request when it arrives Challenge 2 no accurate definition of large requests q Large is relative to instantaneous load

Challenge: which request to sacrifice? Strategy: when load is low, run all requests in parallel; when load is high, run large requests sequentially Challenge 1 non-clairvoyant q We do not know the work of a request when it arrives Challenge 2 no accurate definition of large requests q Large is relative to instantaneous load q load = 10, large request >180ms � load = 20, large request > 80ms � load = 30, large request > 20ms

Contribu1ons Tail-control scheduler Tail-control offline threshold calculation Tail-control online runtime

Contribu1ons Tail-control scheduler Target latency T Input Request work distribuIon Available in highly engineered interacIve services Request per second (RPS) Tail-control offline threshold calculation Tail-control online runtime

Contribu1ons Tail-control scheduler Input Compute a large request Tail-control offline threshold for each load value threshold calculation Large request threshold table Tail-control online runtime

Contribu1ons Tail-control scheduler Input Tail-control offline threshold calculation Large request threshold table Tail-control Use threshold table to decide online runtime which request to serialize

Contribu1ons We modify work stealing to implement tail-control scheduling using Intel Thread Building Block Be\er performance

Contribu1ons Tail-control scheduler Input Tail-control offline threshold calculation Large request threshold table Implementation Tail-control details in the paper online runtime

Tail-control scheduler Input Threshold table Tail-control Tail-control offline threshold online calculation runtime Runtime functionalities: q Execute all requests in parallel to begin with q Record total amount of computation time spent on each request thus far q Detect large requests based on the current threshold and current processing time q Serializes large requests to limit their impact on other waiting requests

Work Stealing for Single Request Ø Workers’ local queues q Execute work, if there is any in local queue q Steal Workers 1 A execute 2 A parallelize 3

Generalize Work Stealing to Mul1ple Req. Ø Workers’ local queues + a global queue q Execute work, if there is any in local queue q Steal – further parallelize a request Workers q Admit – start executing a new request 1 A execute Parallelizable requests C B 2 A arrive at global queue admit parallelize 3

Implement Tail-Control in TBB Ø Workers’ local queues + a global queue q Execute work, if there is any in local queue q Steal – further parallelize a request Workers q Admit – start executing a new request 1 A execute Parallelizable requests C B 2 A arrive at global queue admit parallelize 3 Ø Steal-first (try to reduce processing time) Ø Admit-first (try to reduce waiting time) Ø Tail-control q Steal-first + long request detection & serialization

Evalua1on Ø Various request work distributions q Bing search q Finance server q Log-normal Ø Different request arrival q Poisson q Log-normal Ø Each setting:100,000 requests, plot target latency miss ratio Ø Two baselines (generalized from work stealing for single job) q Steal-first: tries to parallelize requests and reduce proc time q Admit-first: tries to admit requests and reduce waiting time

Improvement in target latency miss ra1o Be\er performance Hard à Easy to meet the target latency

Improvement in target latency miss ra1o Be\er performance Admit-first wins Steal-first wins Hard à Easy to meet the target latency Rela1ve load: high à low

Improvement in target latency miss ra1o Be\er performance

The inner workings of tail-control Target Latency

The inner workings of tail-control Tail-control sacrifices few large requests and reduces latency of many more small requests to meet target latency. Target Latency

Tail-control performs well with inaccurate input

Tail-control performs well with inaccurate input Slightly inaccurate input work distribution is still useful less à more inaccurate input work distribu1on

Work Stealing for Interac1ve Services to Meet Target Latency Jing Li - PowerPoint PPT Presentation

Work Stealing for Interac1ve Services to Meet Target Latency Jing Li , Kunal Agrawal , Sameh Elnikety, Yuxiong He, I-Ting Angelina Lee , Chenyang Lu , Kathryn S. McKinley Washington University in St. Louis MicrosoF

WORK STEALING SCHEDULER 2 6/16/2010 Work Stealing Scheduler

Target Risk vs. Target Date Funds in 401(k) Plans: Maybe the answer is both January 14, 2015

Checki king in and Treating High-Achievi ving Students Meet Meet you your r Doctor Doctor

Understanding Task Scheduling Algorithms Kenjiro Taura 1 / 51 Contents 1 Introduction 2 Work

Natural Target Pruning Making Proper Pruning Cuts Natural Target Pruning In this lesson we

Cotton Incorporated TARGET SPOT UPDATE A. K. Hagan Auburn University TARGET SPOT Target Spot

LBNE 1.2MW Target NBI 2014 Presented by Brian Hartsell LBNE Target - Introduction Target

Semi-Heuristic Target-Based Fuzzy Target . . . Fuzzy Target . . . Fuzzy Decision Procedures:

What your Team needs to know and do at a CARA Meet PRIOR TO THE MEET 1. OBTAIN MEET SCHEDULE

Asymmetry-Aware Work-Stealing Runtimes Christopher Torng, Moyang Wang, and Christopher Batten

DWS: Demand-aware Work-Stealing in Multi-programmed Multi-core Architectures Quan Chen, Long

Shared Memory Parallelism in Ada: Load Balancing by Work Stealing Jan Verschelde University of

Scheduling Parallel Programs by Work Stealing with Private Deques Umut Acar Arthur Charguraud

Palirria: Accurate On-line Parallelism Estimation for Adaptive Work-Stealing Georgios Varisteas,

Balancing Graph Processing Workloads Using Work Stealing on Heterogeneous CPU-FPGA Systems Matthew

Parallel Search Ciaran McCreesh and Patrick Prosser This Weeks Lectures Search and

Gary Shiu University of Wisconsin-Madison Hunting for the Higgs String theory landscape? ...

Correlations and field theory inside the arctic circle [or Arctic quenches] ephan 1 Jean-Marie

MOL2NET, 2018 , 4, http://sciforum.net/conference/mol2net-04 2 . Conclusions (optional) . . .

Greening'Datacenters'Through'Self4 Genera5on'of'Renewable'Energy' Thu'D.'Nguyen'

Multi lti-le level l Attentio ion Netw twork rk usin sing Text xt, Audio io and Vide

Optimizing Charm++ over MPI Ralf Gunter , David Goodell, James Dinan, Pavan Balaji April 15, 2013

Dyson Schwinger 101 Research in the DSE approach Esther Weil Justus-Liebig-Universitt Gieen

FALL 2017 PRISME ForumTechnical Meeting PRISME Forum Chair: Olivier Gien VP, Global Head,

Sambuz

Useful Links

Newsletter

Mail Us