work stealing for interac1ve services to meet target
play

Work Stealing for Interac1ve Services to Meet Target Latency Jing Li - PowerPoint PPT Presentation

Work Stealing for Interac1ve Services to Meet Target Latency Jing Li , Kunal Agrawal , Sameh Elnikety, Yuxiong He, I-Ting Angelina Lee , Chenyang Lu , Kathryn S. McKinley Washington University in St. Louis MicrosoF


  1. Work Stealing for Interac1ve Services to Meet Target Latency Jing Li ∗ , Kunal Agrawal ∗ , Sameh Elnikety†, Yuxiong He†, I-Ting Angelina Lee ∗ , Chenyang Lu ∗ , Kathryn S. McKinley† ∗ Washington University in St. Louis †MicrosoF Research * This work and was iniIated and partly done during Jing Li’s internship at MicrosoF Research in summer 2014.

  2. Interac1ve services must meet a target latency Interactive services Search, ads, games, finance Users demand responsiveness

  3. Interac1ve services must meet a target latency Interactive services Search, ads, games, finance Users demand responsiveness Problem setting Multiple requests arrive over time Each request: parallelizable Latency = completion time – arrival time Its latency should be less than a target latency T Goal: maximize the number of requests that meet � a target latency T

  4. Latency in Internet search Ø In industrial interactive services, thousands of servers together serve a single user query. Ø End-to-end latency ≥ latency of the slowest server end-to-end response Ime (~ 100ms for user to find responsive) Doc lookup & ranking Target latency Parsing a Doc lookup Result aggrega1on & search query & ranking snippet genera1on . . . Doc lookup & ranking

  5. Goal — Meet Target Latency in Single Server Ø Goal – design a scheduler to maximize the number of requests that can be completed within the target latency � (in a single server) Doc lookup & ranking Target latency Parsing a Doc lookup Result aggrega1on & search query & ranking snippet genera1on . . . Doc lookup & ranking

  6. Sequen1al execu1on is insufficient Large request must execute in parallel to meet target latency constraint Target latency Request Sequen1al Execu1on Time (ms) ( work )

  7. Full parallelism does not always work well Large request Target latency: 90ms 270 60 Small request

  8. Full parallelism does not always work well Finish by 1me 90 Target latency: 90ms 270 Case 1 : 1 large request + 3 small requests 60 Finish by 60 1me 110 60 0 20 1me

  9. Full parallelism does not always work well Finish by 1me 90 Target latency: 90ms 270 Case 1 : 1 large request + 3 small requests 60 Finish by 60 1me 110 60 Small requests are wai1ng 0 20 ✖ Miss 2 requests core 1 core 2 core 3 90 130 1me 110 150

  10. Full parallelism does not always work well Finish by 1me 90 Target latency: 90ms 270 Case 1 : 1 large request + 3 small requests 60 Finish by 60 1me 110 60 0 20 ✖ ✔ Miss 2 requests Miss 1 request core 1 core 1 core 2 core 2 core 3 core 3 50 90 130 1me 110 1me 110 150 80 270

  11. Some large requests require parallelism Finish by 1me 90 Target latency: 90ms 270 Case 2 : 1 large request + 1 small request 60 Finish by 1me 110 0 20 1me

  12. Some large requests require parallelism Finish by 1me 90 Target latency: 90ms 270 Case 2 : 1 large request + 1 small request 60 Finish by 1me 110 0 20 1me ✔ ✖ Miss 0 request Miss 1 request core 1 core 1 core 2 core 2 core 3 core 3 80 90 1me 1me 110 270

  13. Strategy: adapt scheduling to load Case 1 � ✔ Miss 1 request core 1 Cannot afford to run all large � core 2 requests in parallel core 3 50 110 1me 80 270 Case 2 ✔ Miss 0 request core 1 Do need to run some large � core 2 requests in parallel core 3 90 1me 110

  14. Strategy: adapt scheduling to load High load run large requests sequentially � ✔ Miss 1 request core 1 Cannot afford to run all large � core 2 requests in parallel core 3 50 110 1me 80 270 Low load run all requests in parallel ✔ Miss 0 request core 1 Do need to run some large � core 2 requests in parallel core 3 90 1me 110

  15. Why does the adap1ve strategy work? Latency = Processing Time + Waiting time At low load, processing time dominates latency q Parallel execution reduces request processing time q All requests run in parallel At high load, waiting time dominates latency q Executing a large request in parallel increases waiting time of many more later arriving requests q Each large request that is sacrificed helps to reduce waiting time of many more later arriving requests

  16. Challenge: which request to sacrifice? Strategy: when load is low, run all requests in parallel; when load is high, run large requests sequentially

  17. Challenge: which request to sacrifice? Strategy: when load is low, run all requests in parallel; when load is high, run large requests sequentially Challenge 1 non-clairvoyant q We do not know the work of a request when it arrives Challenge 2 no accurate definition of large requests q Large is relative to instantaneous load

  18. Challenge: which request to sacrifice? Strategy: when load is low, run all requests in parallel; when load is high, run large requests sequentially Challenge 1 non-clairvoyant q We do not know the work of a request when it arrives Challenge 2 no accurate definition of large requests q Large is relative to instantaneous load q load = 10, large request >180ms � load = 20, large request > 80ms � load = 30, large request > 20ms

  19. Contribu1ons Tail-control scheduler Tail-control offline threshold calculation Tail-control online runtime

  20. Contribu1ons Tail-control scheduler Target latency T Input Request work distribuIon Available in highly engineered interacIve services Request per second (RPS) Tail-control offline threshold calculation Tail-control online runtime

  21. Contribu1ons Tail-control scheduler Input Compute a large request Tail-control offline threshold for each load value threshold calculation Large request threshold table Tail-control online runtime

  22. Contribu1ons Tail-control scheduler Input Tail-control offline threshold calculation Large request threshold table Tail-control Use threshold table to decide online runtime which request to serialize

  23. Contribu1ons We modify work stealing to implement tail-control scheduling using Intel Thread Building Block Be\er performance

  24. Contribu1ons Tail-control scheduler Input Tail-control offline threshold calculation Large request threshold table Implementation Tail-control details in the paper online runtime

  25. Tail-control scheduler Input Threshold table Tail-control Tail-control offline threshold online calculation runtime Runtime functionalities: q Execute all requests in parallel to begin with q Record total amount of computation time spent on each request thus far q Detect large requests based on the current threshold and current processing time q Serializes large requests to limit their impact on other waiting requests

  26. Work Stealing for Single Request Ø Workers’ local queues q Execute work, if there is any in local queue q Steal Workers 1 A execute 2 A parallelize 3

  27. Generalize Work Stealing to Mul1ple Req. Ø Workers’ local queues + a global queue q Execute work, if there is any in local queue q Steal – further parallelize a request Workers q Admit – start executing a new request 1 A execute Parallelizable requests C B 2 A arrive at global queue admit parallelize 3

  28. Implement Tail-Control in TBB Ø Workers’ local queues + a global queue q Execute work, if there is any in local queue q Steal – further parallelize a request Workers q Admit – start executing a new request 1 A execute Parallelizable requests C B 2 A arrive at global queue admit parallelize 3 Ø Steal-first (try to reduce processing time) Ø Admit-first (try to reduce waiting time) Ø Tail-control q Steal-first + long request detection & serialization

  29. Evalua1on Ø Various request work distributions q Bing search q Finance server q Log-normal Ø Different request arrival q Poisson q Log-normal Ø Each setting:100,000 requests, plot target latency miss ratio Ø Two baselines (generalized from work stealing for single job) q Steal-first: tries to parallelize requests and reduce proc time q Admit-first: tries to admit requests and reduce waiting time

  30. Improvement in target latency miss ra1o Be\er performance Hard à Easy to meet the target latency

  31. Improvement in target latency miss ra1o Be\er performance Admit-first wins Steal-first wins Hard à Easy to meet the target latency Rela1ve load: high à low

  32. Improvement in target latency miss ra1o Be\er performance

  33. The inner workings of tail-control Target Latency

  34. The inner workings of tail-control Tail-control sacrifices few large requests and reduces latency of many more small requests to meet target latency. Target Latency

  35. The inner workings of tail-control Tail-control sacrifices few large requests and reduces latency of many more small requests to meet target latency. Target Latency

  36. The inner workings of tail-control Tail-control sacrifices few large requests and reduces latency of many more small requests to meet target latency. Target Latency

  37. Tail-control performs well with inaccurate input

  38. Tail-control performs well with inaccurate input Slightly inaccurate input work distribution is still useful less à more inaccurate input work distribu1on

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend