kairos preemptive data center scheduling without runtime
play

Kairos: Preemptive Data Center Scheduling Without Runtime Estimates - PowerPoint PPT Presentation

Symposium on Cloud Computing (SoCC) Kairos: Preemptive Data Center Scheduling Without Runtime Estimates Pamela Delgado, Diego Didona, Florin Dinu and Willy Zwaenepoel October 11, 2018 1 Kairos Data center scheduling without task runtime


  1. Symposium on Cloud Computing (SoCC) Kairos: Preemptive Data Center Scheduling Without Runtime Estimates Pamela Delgado, Diego Didona, Florin Dinu and Willy Zwaenepoel October 11, 2018 1

  2. Kairos Data center scheduling without task runtime estimates Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 2

  3. Kairos key idea • New preemption approach ✓ No head-of-line blocking ✓ Good scheduling performance Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 3

  4. Data center scheduling challenge cluster • Heavy-tailed workloads scheduler … Motivation Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 4

  5. Problem: head-of-line blocking • Short waiting for long • High likelihood … Motivation Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 5

  6. Historical use of runtime estimates per-task estimations dual classification no estimations Yarn’13 Sparrow’13 Apollo’14 Hawk’15 Mercury*’15 Do not avoid Borg’15 head-of-line! Yaq’16 Depend on Tetrisched’16 runtime estimates Eagle’16 Firmament’16 … Motivation Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 6

  7. Hard to obtain reliable estimates • Mis-estimations happen • unseen jobs, skewed input, failures/spikes • Consequences: • poor scheduling decisions*, violate SLOs^ • complex designs to compensate *Job- aware scheduling in Eagle: Divide and Stick to Your Probes (SoCC’16) ^ Tetrisched: global rescheduling with adaptive plan- ahead in dynamic heterogeneous clusters (Eurosys’16) Motivation Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 7

  8. Can we dispense with task runtime estimates altogether? Motivation Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 8

  9. Can we dispense with ✓ Avoid head-of-line blocking task runtime estimates ✓ No task runtime estimates altogether? Kairos Motivation Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 9

  10. Kairos insight Use preemption!! Kairos Preemption Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 10

  11. Preemption in Kairos Costly resuming elsewhere: Preempt long! Do preemption locally! … Kairos Architecture Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 11

  12. Kairos architecture Node j Kairos node scheduler Kairos Local preemption centralized scheduler Node x Kairos node Centralized Distributed scheduler component component Local preemption Load balancing … Node y Kairos node scheduler Local preemption Kairos Architecture Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 12

  13. Kairos architecture Node j Kairos node scheduler Kairos Local preemption centralized scheduler Node x Kairos node scheduler Local preemption Load balancing … Node y Kairos node scheduler Local preemption Kairos Architecture Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 13

  14. Kairos architecture Node j Kairos node scheduler Kairos Local preemption centralized scheduler Node x Kairos node scheduler Local preemption Load balancing … Node y Kairos node scheduler Local preemption Kairos Architecture Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 14

  15. Least-Attained Service (LAS) • Preemptive policy • Give resources to task that received least service ✓ New task runs immediately ✓ Runs as long as it is the one with least received service Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 15

  16. LAS rationale • Good for heavy-tailed workloads* • Benefits: 1.Shorter tasks have priority (no head-of-line blocking) 2.Shorter tasks – very likely – execute until completion *Performance modeling and design of computer systems: queueing theory in action M. Harchol-Balter 2013 Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 16

  17. Kairos distributed scheduling • Node schedulers Kairos node scheduler … • LAS at the nodes Kairos node scheduler How to dispatch … tasks among nodes? … Kairos node … scheduler Kairos Distributed scheduling Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 17

  18. Kairos architecture Node j Kairos node scheduler Kairos Local preemption centralized scheduler Node x Kairos node scheduler Local preemption Load balancing … Node y Kairos node scheduler Local preemption Kairos Architecture Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 18

  19. Kairos centralized scheduling Node j Kairos node scheduler 1 Kairos ? centralized Node x scheduler Kairos node scheduler 4 4 … 1 st Load balancing Node y Kairos node 2 nd Maximize LAS effectiveness scheduler 2 Kairos Centralized scheduling Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 19

  20. Load balancing rationale 1. Avoid! 1. Lowest # tasks: no idle nodes Node j Kairos node scheduler • Bound max # tasks 0 tasks Node y Kairos node scheduler … 100 tasks … Kairos Centralized scheduling Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 20

  21. Load balancing rationale 2. Avoid! 2. LAS-aware policy break ties: Node j Kairos node scheduler • Heavy-tailed for each node • Maximize LAS effectiveness only short • Node with lowest AS variance* Node y Kairos node scheduler only long *Minimizing total flow time and total completion time with immediate dispatching. Avrahami et.al. 2003 Multi-layered round robin routing for parallel servers Down et.al. 2006 Kairos Centralized scheduling Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 21

  22. Kairos recap 1. Distributed: ✓ LAS node level 2. Centralized: ✓ LAS-aware load balancing technique Kairos Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 22

  23. Evaluation • Yarn and Docker containers • 120 cores in 30 nodes • heavy-tailed workload (100 jobs) • Metrics: Job runtime and slowdown • Compare to: Big- C [ATC’17], FIFO • Simulation: Google trace, compare to Eagle [SoCC’16] Kairos Evaluation Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 23

  24. What is the slowdown? 𝑝𝑐𝑡𝑓𝑠𝑤𝑓𝑒 𝑘𝑝𝑐 𝑠𝑣𝑜𝑢𝑗𝑛𝑓 𝑘𝑝𝑐 𝑡𝑚𝑝𝑥𝑒𝑝𝑥𝑜 = 𝑣𝑜𝑑𝑝𝑜𝑢𝑓𝑜𝑒𝑓𝑒 𝑘𝑝𝑐 𝑠𝑣𝑜𝑢𝑗𝑛𝑓 Best job slowdown = 1 Evaluation Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 24

  25. Kairos vs Big-C and FIFO Job slowdown Kairos Big-C FIFO 120 100 80 CDF 60 40 20 0 0 2 4 6 8 10 12 14 16 18 20 Job slowdown better Kairos Evaluation Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 25

  26. Kairos vs Big-C and FIFO Job slowdown Kairos Big-C FIFO 120 100 80 Slowdown in Kairos <1.8X CDF 60 40 20 0 0 2 4 6 8 10 12 14 16 18 20 Job runtime/expected job runtime better Kairos Evaluation Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 26

  27. Kairos vs Big-C and FIFO Job running times Kairos Big-C FIFO 120 2.3X 100 2X 80 CDF 60 1.6X 40 20 0 0 500 1000 1500 2000 2500 3000 3500 4000 Job runtime [s] better Kairos Evaluation Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 27

  28. Kairos vs Big-C and FIFO Job running times Kairos Big-C FIFO 120 2.3X 100 2X 80 Kairos better across the board CDF 60 1.6X 40 20 0 0 500 1000 1500 2000 2500 3000 3500 4000 Job runtime [s] better Kairos Evaluation Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 28

  29. Kairos vs Eagle • Short jobs runtime • Google trace 50th 90th 99th 1,4 1,2 Kairos/Eagle 1 0,8 0,6 0,4 0,2 0 better Lower Higher Cluster load Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 29

  30. Kairos vs Eagle • Short jobs runtime • Google trace 50th 90th 99th 1,4 1,2 Kairos/Eagle 1 Kairos works well at large scale 0,8 0,6 0,4 0,2 0 better Lower Higher Cluster load Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 30

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend