Iroko A Data Center Emulator for Reinforcement Learning Fabian - - PowerPoint PPT Presentation

iroko
SMART_READER_LITE
LIVE PREVIEW

Iroko A Data Center Emulator for Reinforcement Learning Fabian - - PowerPoint PPT Presentation

Iroko A Data Center Emulator for Reinforcement Learning Fabian Ruffy, Michael Przystupa, Ivan Beschastnikh University of British Columbia https://github.com/dcgym/iroko Reinforcement Learning and Networking 2 Reinforcement Learning and


slide-1
SLIDE 1

Iroko

A Data Center Emulator for Reinforcement Learning

Fabian Ruffy, Michael Przystupa, Ivan Beschastnikh University of British Columbia

https://github.com/dcgym/iroko

slide-2
SLIDE 2

2

Reinforcement Learning and Networking

slide-3
SLIDE 3

3

Reinforcement Learning and Networking

slide-4
SLIDE 4

4

Reinforcement Learning and Networking

slide-5
SLIDE 5

5

Reinforcement Learning and Networking

slide-6
SLIDE 6

6

Reinforcement Learning and Networking

slide-7
SLIDE 7

7

Reinforcement Learning and Networking

slide-8
SLIDE 8

8

  • DC challenges are optimization problems
  • Traffic control
  • Resource management
  • Routing
  • Operators have complete control
  • Automation possible
  • Lots of data can be collected

The Data Center: A perfect use case

Cho, Inho, Keon Jang, and Dongsu Han. "Credit-scheduled delay-bounded congestion control for datacenters.“ SIGCOMM 2017

slide-9
SLIDE 9

9

  • Typical reinforcement learning is not viable for data center operators!
  • Fragile stability
  • Questionable reproducibility
  • Unknown generalizability
  • Prototyping RL is complicated
  • Cannot interfere with live production traffic
  • Offline traces are limited in expressivity
  • Deployment is tedious and slow

Two problems…

slide-10
SLIDE 10

10

  • Iroko: open reinforcement learning gym for data center scenarios
  • Inspired by the Pantheon* for WAN congestion control
  • Deployable on a local Linux machine
  • Can scale to topologies with many hosts
  • Approximates real data center conditions
  • Allows arbitrary definition of
  • Reward
  • State
  • Actions

Our work: A platform for RL in Data Centers

*Yan, Francis Y., et al. "Pantheon: the training ground for Internet congestion-control research.“ ATC 2018

slide-11
SLIDE 11

11

Iroko in one slide

slide-12
SLIDE 12

12

Iroko in one slide

Topology Fat-Tree Rack Dumbbell

slide-13
SLIDE 13

13

Iroko in one slide

Topology Fat-Tree Rack Dumbbell Traffic Pattern Action Model

slide-14
SLIDE 14

14

Iroko in one slide

Topology Fat-Tree Rack Dumbbell Data Collectors Traffic Pattern Action Model

slide-15
SLIDE 15

15

Iroko in one slide

Topology Fat-Tree Rack Dumbbell Data Collectors Reward Model State Model Traffic Pattern Action Model

slide-16
SLIDE 16

16

Iroko in one slide

OpenAI Gym Topology Fat-Tree Rack Dumbbell Data Collectors Reward Model State Model Traffic Pattern Action Model

slide-17
SLIDE 17

17

Iroko in one slide

Policy OpenAI Gym Topology Fat-Tree Rack Dumbbell Data Collectors Reward Model State Model Traffic Pattern Action Model

slide-18
SLIDE 18

18

  • Ideal data center should have:
  • Low latency, high utilization
  • No packet loss or queuing delay
  • Fairness
  • CC variations draw from the reactive TCP
  • Queueing latency dominates
  • Frequent retransmits reduce goodput
  • Data center performance may be unstable

Use Case: Congestion Control

slide-19
SLIDE 19

19

10 10 10 10 10 10 Bandwidth Allocation Data Collection Flow Pattern Switch Policy 10

Predicting Networking Traffic

slide-20
SLIDE 20

20

10 10 10 10 10 10 Bandwidth Allocation Data Collection Flow Pattern Switch Policy 10

Predicting Networking Traffic

slide-21
SLIDE 21

21

10 10 10 10 10 10 Bandwidth Allocation Data Collection Flow Pattern Switch Policy 10

Predicting Networking Traffic

slide-22
SLIDE 22

22

3.3 3.3 3.4 10 10 10 10 10 10 Bandwidth Allocation Data Collection Flow Pattern Switch Policy 10

Predicting Networking Traffic

slide-23
SLIDE 23

23

10 3.3 10 3.3 3.4 10 Bandwidth Allocation Data Collection Flow Pattern Switch Policy 10

Predicting Networking Traffic

slide-24
SLIDE 24

24

  • Two environments:
  • env_iroko: centralized rate limiting arbiter
  • Agent can set the sending rate of hosts
  • PPO, DDPG, REINFORCE
  • env_tcp: raw TCP
  • Contains implementations of TCP algorithms
  • TCP Cubic, TCP New Vegas, DCTCP
  • Goal: Avoid congestion

Can we learn to allocate traffic fairly?

slide-25
SLIDE 25

25

  • 50000 timesteps
  • Linux default UDP as base transport
  • 5 runs (~7 hours per run)
  • Bottleneck at central link

Experiment Setup

slide-26
SLIDE 26

Results – Dumbbell UDP

slide-27
SLIDE 27

27

  • Challenging real-time environment
  • Noisy observation
  • Exhibits strong credit assignment problem
  • RL algorithms show expected behavior for our gym
  • Achieve better performance than TCP New Vegas
  • More robust algorithms required to learn good policy
  • DDPG and PPO achieve near optimum
  • REINFORCE fails to learn good policy

Results - Takeaways

slide-28
SLIDE 28

28

  • Data center reinforcement learning is gaining traction

…but it is difficult to prototype and evaluate

  • Iroko is
  • a platform to experiment with RL for data centers
  • intended to train on live traffic
  • early stage work
  • but experiments are promising
  • available on Github:

https://github.com/dcgym/iroko

Contributions