Building and Optimizing Learning- augmented Computer Systems Hongzi - - PowerPoint PPT Presentation

building and optimizing learning augmented computer
SMART_READER_LITE
LIVE PREVIEW

Building and Optimizing Learning- augmented Computer Systems Hongzi - - PowerPoint PPT Presentation

Building and Optimizing Learning- augmented Computer Systems Hongzi Mao October 24, 2019 Learning Scheduling Algorithms for Data Processing Clusters . Hongzi Mao, Malte Schwarzkopf, Shaileshh Bojja Venkatakrishnan, Zili Meng, Mohammad


slide-1
SLIDE 1

Building and Optimizing Learning- augmented Computer Systems

Hongzi Mao October 24, 2019

  • Learning Scheduling Algorithms for Data Processing Clusters. Hongzi Mao, Malte Schwarzkopf, Shaileshh Bojja Venkatakrishnan, Zili Meng,

Mohammad Alizadeh. ACM SIGCOMM, 2019.

slide-2
SLIDE 2

Mot Motivation

  • n

Scheduling is a fundamental task in computer systems

  • Cluster management (e.g., Kubernetes, Mesos, Borg)
  • Data analytics frameworks (e.g., Spark, Hadoop)
  • Machine learning (e.g., Tensorflow)

Efficient scheduler matters for large datacenters

  • Small improvement can save millions of dollars at scale

2

slide-3
SLIDE 3

De Desi signing Op Optimal Sch chedulers s is s Intract ctable

Must consider many factors for optimal performance:

  • Job dependency structure
  • Modeling complexity
  • Placement constraints
  • Data locality
  • ……

Graphene [OSDI ’16], Carbyne [OSDI ’16] Tetris [SIGCOMM ’14], Jockey [EuroSys ’12] TetriSched [EuroSys ‘16], device placement [NIPS ’17] Delayed Scheduling [EuroSys ’10] ……

Practical deployment:

Ignore complexity à resort to simple heuristics Sophisticated system à complex configurations and tuning

No “one-size-fits-all” solution: Best algorithm depends on specific workload and system

slide-4
SLIDE 4

Can machine learning help tame the complexity of efficient schedulers for data processing jobs?

slide-5
SLIDE 5

Dec ecima: A Lea earned ned Clus uster er Schedul heduler er

5

  • Learns workload-specific scheduling algorithms for jobs with

dependencies (represented as DAGs)

Job DAG Job 2 Job 3

Scheduler

Executor 1 Executor m Executor 2

“Stages”: Identical tasks that can run in parallel Data dependencies

5

slide-6
SLIDE 6

Dec ecima: A Lea earned ned Clus uster er Schedul heduler er

6

  • Learns workload-specific scheduling algorithms for jobs with

dependencies (represented as DAGs)

Job 1

Scheduler

Server 1 Server m Server 2

slide-7
SLIDE 7

Number of servers working on this job

Scheduling policy: FIFO Average Job Completion Time: 225 sec

De Demo

slide-8
SLIDE 8

Scheduling policy: Shortest-Job-First Average Job Completion Time: 135 sec

slide-9
SLIDE 9

Scheduling policy: Fair Average Job Completion Time: 120 sec

slide-10
SLIDE 10

Fair Shortest-Job-First Average Job Completion Time: 135 sec Average Job Completion Time: 120 sec

slide-11
SLIDE 11

Scheduling policy: Decima Average Job Completion Time: 98 sec

slide-12
SLIDE 12

Fair Decima Average Job Completion Time: 98 sec Average Job Completion Time: 120 sec

slide-13
SLIDE 13

Decima it=0

166 sec

² 20 Spark jobs (TPC-H queries), 50 servers

slide-14
SLIDE 14

Decima it=3000

160 sec

slide-15
SLIDE 15

Decima it=6000

148 sec

slide-16
SLIDE 16

Decima it=9000

145 sec

slide-17
SLIDE 17

Decima it=12000

142 sec

slide-18
SLIDE 18

Decima it=15000

126 sec

slide-19
SLIDE 19

Decima it=18000

111 sec

slide-20
SLIDE 20

Decima it=21000

108 sec

slide-21
SLIDE 21

Decima it=24000

107 sec

slide-22
SLIDE 22

Decima it=27000

93 sec

slide-23
SLIDE 23

Decima it=30000

89 sec

slide-24
SLIDE 24

24

Design

slide-25
SLIDE 25

De Design n overvi view

State

Job DAG 1 Job DAG n Executor 1 Executor m

Scheduling Agent

p[

Policy Network Graph Neural Network Environment Schedulable Nodes Objective Reward

Observation of jobs and cluster status

slide-26
SLIDE 26

Con Contribution

  • ns

State

Job DAG 1 Job DAG n Executor 1 Executor m

Scheduling Agent

p[

Policy Network Graph Neural Network Environment Schedulable Nodes Objective Reward

Observation of jobs and cluster status

  • 1. First RL-based scheduler for complex data processing jobs
  • 2. Scalable graph neural network to express scheduling policies
  • 3. New learning methods that enables training with online job arrivals

26

slide-27
SLIDE 27

Enc Encode de sc sche hedul duling ng de decisi sions ns as s actions ns

27

Job DAG 1 Job DAG n Server 1 Server 2 Server 4 Server 3 Server m

Set of identical free executors

slide-28
SLIDE 28

Op Option n 1: Assign n al all Exec ecut utors in n 1 Action

Problem: huge action space

Job DAG 1 Job DAG n Server 1 Server 2 Server 4 Server 3 Server m

28

slide-29
SLIDE 29

Op Option n 2: Assign n One One Exec ecut utor Per er Action

Problem: long action sequences

Job DAG 1 Job DAG n Server 1 Server 2 Server 4 Server 3 Server m

29

slide-30
SLIDE 30

De Decima: Assign Gr Groups of Executors per Action

30

Job DAG 1 Job DAG n Server 1 Server 2 Server 4 Server 3 Server m

Use 3 servers Use 1 server Use 1 server

Action = (node, parallelism limit)

slide-31
SLIDE 31

31

Job DAG 1 Job DAG n

Node features:

  • # of tasks
  • avg. task duration
  • # of servers currently

assigned to the node

  • are free servers local to

this job?

Arbitrary number

  • f jobs

Pr Process Job Informat ation

slide-32
SLIDE 32

32

Gr Grap aph Ne Neural al Ne Netw twork

Job DAG

6 8 3 2

Score on each node

slide-33
SLIDE 33

Step 1 Step 1 Job DAG 1 Job DAG n Step 2 Step 2

Children of v !" = $ %", !' '∈) " ; +

Gr Grap aph Neural al Network

Same aggregation applied to all nodes for each DAG

slide-34
SLIDE 34

Gr Grap aph Neural al Network

Same aggregation applied everywhere in the DAGs Critical path

max max max

slide-35
SLIDE 35

Gr Grap aph Neural al Network

50 100 150 200 250 300 350 1umber of iterDtions 40% 60% 80% 100% Testing DccurDcy 6ingle non-lineDr DggregDtion 'ecimD's two-level DggregDtion

Supervised learning training curve Same aggregation applied everywhere in the DAGs

Use supervised learning to verify a representation

slide-36
SLIDE 36

Tr Training

36

Decima agent cluster Reinforcement learning training Generate experience data

slide-37
SLIDE 37

Time Number of backlogged jobs

Ha Handle e Online e Jo Job Arri Arrival

The RL agent has to experience continuous job arrival during training. → inefficient if simply feeding long sequences of jobs

Initial random policy

37

slide-38
SLIDE 38

The RL agent has to experience continuous job arrival during training. → inefficient if simply feeding long sequences of jobs

Time Number of backlogged jobs

Waste training time Initial random policy

Ha Handle e Online e Jo Job Arri Arrival

38

slide-39
SLIDE 39

The RL agent has to experience continuous job arrival during training. → inefficient if simply feeding long sequences of jobs

Time Number of backlogged jobs

Early reset for initial training Initial random policy

Ha Handle e Online e Jo Job Arri Arrival

39

slide-40
SLIDE 40

The RL agent has to experience continuous job arrival during training. → inefficient if simply feeding long sequences of jobs

Time Number of backlogged jobs

As training proceeds, stronger policy keeps the queue stable Increase the reset time

Curriculum learning

Ha Handle e Online e Jo Job Arri Arrival

40

slide-41
SLIDE 41

41

Va Variance from Job Sequences

RL agent needs to be robust to the variation in job arrival patterns. → huge variance can throw off the training process

slide-42
SLIDE 42

Re Review: Policy Gradient RL Methods

42

! ← ! + $ %&log *& +,, ., /

,01, 2

3,0 − 5(+,) “return” from step t “baseline” Expected return from state st Increase probability of actions with better-than-expected returns

slide-43
SLIDE 43

43

Job size Time t Future workload #1 Future workload #2 action at

Score for action at = (return after at) − (average return) = ∑#$%#

&

'#$ − ((*#) Must consider the entire job sequence to score actions

Va Variance from Job Sequences

slide-44
SLIDE 44

Average return for trajectories from state st with job sequence zt, zt+1, …

Score for action at = ∑"#$"

%

&"# − ((*") Score for action at = ∑"#$"

%

&"# − ((*", -", -"./, … )

In Input-De Depe pende ndent t Baseline ne

44

  • Variance reduction for reinforcement learning in input-driven environments. Hongzi Mao, Shaileshh Bojja Venkatakrishnan, Malte Schwarzkopf,

Mohammad Alizadeh. International Conference on Learning Representations (ICLR), 2019.

Theorem: Input-dependent baselines reduce variance without adding bias

1 ) 2 log 67 *", 8" ((*", -", -"./, … = 0

slide-45
SLIDE 45

In Input-De Depe pende ndent t Baseline ne

45

Broadly applicable to other systems with external input process: Adaptive video streaming, load balancing, caching, robotics with disturbance…

  • Variance reduction for reinforcement learning in input-driven environments. Hongzi Mao, Shaileshh Bojja Venkatakrishnan, Malte Schwarzkopf,

Mohammad Alizadeh. International Conference on Learning Representations (ICLR), 2019.

Train with standard baseline Train with input-dependent baseline

slide-46
SLIDE 46

46

Evaluation

slide-47
SLIDE 47
  • 20 TPC-H queries

sampled at random; input sizes: 2, 5, 10, 20, 50, 100 GB

  • Decima trained on

simulator; tested on real Spark cluster

47

Decima improves average job completion time by 21%-3.1x

  • ver baseline schemes

De Decima ma vs. Baseline nes: : Batche hed d Arrivals

slide-48
SLIDE 48

48

De Decima ma with th Conti tinuo nuous us Job b Arrivals

1000 jobs arrives as a Poisson process with avg. inter-arrival time = 25 sec Decima achieves 28% lower average JCT than best heuristic, and 2X better JCT in overload

Better

slide-49
SLIDE 49

Tuned weighted fair Decima

49

Un Under erstanding D g Dec ecima

Tuned weighted fair Decima

slide-50
SLIDE 50

Industrial trace (Alibaba): 20,000 jobs from production cluster Multi-resource requirement: CPU cores + memory units

Flexibility: y: Multi-Re Resource Scheduling

50

slide-51
SLIDE 51

51

Objective: Avg JCT Objective: Makespan

Executors Time (seconds)

120 60 90 30

  • Avg. JCT 74.5 sec, makespan 102.1 sec

Executors

  • Avg. JCT 67.3 sec, makespan 119.6 sec

Time (seconds)

120 60 90 30

Flexibility: y: Different Objectives & Systems

slide-52
SLIDE 52

52

Objective: Avg JCT Objective: Avg JCT zero-cost migration

Executors

  • Avg. JCT 67.3 sec, makespan 119.6 sec

Time (seconds)

120 60 90 30

Executors

  • Avg. JCT 61.4 sec, makespan 114.3 sec

Time (seconds)

120 60 90 30

Flexibility: y: Different Objectives & Systems

slide-53
SLIDE 53
  • Impact of each component in the learning algorithm
  • Generalization to different workloads
  • Training and inference speed
  • Handling missing features
  • Optimality gap

Ot Other Evaluations

53

slide-54
SLIDE 54

Re Real-wo world Video Bitrate Adaptation with RL

  • Real-world Video Adaptation with Reinforcement Learning. Hongzi Mao, Shannon Chen, Drew Dimmery, Shaun Singh, Drew Blaisdell,

Yuandong Tian, Mohammad Alizadeh, Eytan Bakshy. ICML Workshop, 2019.

720P

RL Agent (§3.2) Policy neural network

240P 360P 720P 1080P

Sample action State Observe predicted bandwidth and current buffer Reward shaping (§3.4) Network and watch time trace replay (§3.3) Simulator (§3.1)

slide-55
SLIDE 55

Re Real-wo world Video Bitrate Adaptation with RL

  • Real-world Video Adaptation with Reinforcement Learning. Hongzi Mao, Shannon Chen, Drew Dimmery, Shaun Singh, Drew Blaisdell,

Yuandong Tian, Mohammad Alizadeh, Eytan Bakshy. ICML Workshop, 2019.

720P

Translated ABR model Next bitrate action State observations Video session

Front end Back end

Simulator (§3.1) RL training (§3.2-4) Policy translation (§3.5) Store experience Update model

ELF

slide-56
SLIDE 56

Re Real-wo world Video Bitrate Adaptation with RL

  • Real-world Video Adaptation with Reinforcement Learning. Hongzi Mao, Shannon Chen, Drew Dimmery, Shaun Singh, Drew Blaisdell,

Yuandong Tian, Mohammad Alizadeh, Eytan Bakshy. ICML Workshop, 2019.

xt

  • t

n1t n2t nMt xt

  • t

n1t xt

  • t

nMt Softmax

Policy neural network

πθ(at|st)

Bandwidth estimate Buffer occupancy File size of bitrate 1 File size of bitrate 2 Parameters θ Parameters θ File size of bitrate M q1t qMt pMt p1t

State Policy Network Action

slide-57
SLIDE 57

Re Real-wo world Video Bitrate Adaptation with RL

  • Real-world Video Adaptation with Reinforcement Learning. Hongzi Mao, Shannon Chen, Drew Dimmery, Shaun Singh, Drew Blaisdell,

Yuandong Tian, Mohammad Alizadeh, Eytan Bakshy. ICML Workshop, 2019.

slide-58
SLIDE 58

Re Real-wo world Video Bitrate Adaptation with RL

  • Real-world Video Adaptation with Reinforcement Learning. Hongzi Mao, Shannon Chen, Drew Dimmery, Shaun Singh, Drew Blaisdell,

Yuandong Tian, Mohammad Alizadeh, Eytan Bakshy. ICML Workshop, 2019.

slide-59
SLIDE 59

Park: k: An Open Platform for Learning-Au Augm gmen ented ed S System ems

59

Computer system environment

Triggers an MDP step

Common interface RL agent

Python C++, Java, HTML, Rust, … RPC request

Listening server

RPC reply state, reward action Agent object

Actor

Experience Storage Training

  • 12 system environments for networking, databases, distributed systems, …
  • Contains real system and simulation
  • Interact with the system with a standard API
  • Park: An Open Platform for Learning-Augmented Computer Systems. H. Mao, P. Negi, A. Narayan, H. Wang, J. Yang, H. Wang, R. Marcus, R. Addanki, M.

Khani, S. He, V. Nathan, F. Cangialosi, S. Venkatakrishnan, W. Weng, S. Han, T. Kraska, M. Alizadeh. Neural Information Processing Systems (NeurIPS), 2019.

slide-60
SLIDE 60

Park: k: An Open Platform for Learning-Au Augm gmen ented ed S System ems

60

  • Park: An Open Platform for Learning-Augmented Computer Systems. H. Mao, P. Negi, A. Narayan, H. Wang, J. Yang, H. Wang, R. Marcus, R. Addanki, M.

Khani, S. He, V. Nathan, F. Cangialosi, S. Venkatakrishnan, W. Weng, S. Han, T. Kraska, M. Alizadeh. Neural Information Processing Systems (NeurIPS), 2019.

slide-61
SLIDE 61

Park: k: An Open Platform for Learning-Au Augm gmen ented ed S System ems

61

  • Park: An Open Platform for Learning-Augmented Computer Systems. H. Mao, P. Negi, A. Narayan, H. Wang, J. Yang, H. Wang, R. Marcus, R. Addanki, M.

Khani, S. He, V. Nathan, F. Cangialosi, S. Venkatakrishnan, W. Weng, S. Han, T. Kraska, M. Alizadeh. Neural Information Processing Systems (NeurIPS), 2019.

Some example challenges:

  • Infinite horizon
  • Representation of the states and actions
  • Simulation-reality gap
  • Needle-in-the-haystack problem
slide-62
SLIDE 62

Su Summary

  • Decima develops new RL algorithms to learn workload-specific

cluster scheduling algorithms http://web.mit.edu/decima/

  • ABRL conducts large-scale production experiment on applying RL to

video bitrate adaptation https://openreview.net/forum?id=SJlCkwN8iV

  • Park open sources a platform for RL research in computer systems

https://github.com/park-project/park

62