Eagle: Refining Congestion Control by Learning from the Experts - - PowerPoint PPT Presentation

eagle refining congestion control by learning from the
SMART_READER_LITE
LIVE PREVIEW

Eagle: Refining Congestion Control by Learning from the Experts - - PowerPoint PPT Presentation

Eagle: Refining Congestion Control by Learning from the Experts Salma Emara 1 , Baochun Li 1 , Yanjiao Chen 2 1 University of Toronto, {salma, bli}@ece.utoronto.ca 2 Wuhan University, chenyj.thu@gmail.com Internet Congestion Control Video


slide-1
SLIDE 1

Eagle: Refining Congestion Control by Learning from the Experts

Salma Emara1, Baochun Li1, Yanjiao Chen2

1 University of Toronto, {salma, bli}@ece.utoronto.ca 2 Wuhan University, chenyj.thu@gmail.com

slide-2
SLIDE 2

Internet Congestion Control

2

Video Streaming Applications

Vegas Hybla BIC CUBIC Illinois

Before 2000 2005 2010 2020

BBR

2015

PCC Vivace Indigo

slide-3
SLIDE 3

Internet Congestion Control

3

Vegas Hybla BIC CUBIC Illinois

Before 2000 2005 2010 2020

BBR

2015

PCC Vivace Indigo

[Dong et al., 2015 & 2018]

  • Online learning
  • Utility framework
slide-4
SLIDE 4

Internet Congestion Control

4

Vegas Hybla BIC CUBIC Illinois

Before 2000 2005 2010 2020

BBR

2015

PCC Vivace Indigo

[Cardwell et al., 2016]

  • Heuristic
  • Estimate bottleneck

bandwidth and minimum RTT [Dong et al., 2015 & 2018]

  • Online learning
  • Utility framework
slide-5
SLIDE 5

Internet Congestion Control

5

Vegas Hybla BIC CUBIC Illinois

Before 2000 2005 2010 2020

BBR

2015

PCC Vivace Indigo

[Cardwell et al., 2016]

  • Heuristic
  • Estimate bottleneck

bandwidth and minimum RTT [Dong et al., 2015 & 2018]

  • Online learning
  • Utility framework

[Yan et al., 2018]

  • Offline learning
  • Map states to actions
slide-6
SLIDE 6

Existing Congestion Control Algorithms

  • Fixed mappings between

events and control responses

6

Bandwidth is Dynamic or Stable? Shared with other flows? Lossy?

?

slide-7
SLIDE 7

Existing Congestion Control Algorithms

  • Fixed mappings between

events and control responses

  • Mappings are fixed on

environments the model was trained on

7

Bandwidth is Dynamic or Stable? Shared with other flows? Lossy?

?

slide-8
SLIDE 8

Existing Congestion Control Algorithms

  • Fixed mappings between

events and control responses

  • Mappings are fixed on

environments the model was trained on

  • Oblivious to earlier traffic

patterns

8

Bandwidth is Dynamic or Stable? Shared with other flows? Lossy?

?

slide-9
SLIDE 9

Think of Congestion Control as a Game

slide-10
SLIDE 10

Think of Congestion Control as a Game

1

No fixed way to play the game

2 3

slide-11
SLIDE 11

Think of Congestion Control as a Game

1

No fixed way to play the game

2

Based on changes in the game, you make a move

3

slide-12
SLIDE 12

Think of Congestion Control as a Game

1

No fixed way to play the game

2

Based on changes in the game, you make a move

3

Use history to understand your game environment

slide-13
SLIDE 13

A Sender/Learner/Agent can be trained to play the Congestion Control Game

slide-14
SLIDE 14

Earlier Success Stories of Training for Games

  • In 2016, AlphaGo was the first to beat human expert in Go game
  • It was trained using supervised and reinforcement learning

14

slide-15
SLIDE 15

Contributions

  • Eagle is designed to
  • Train using reinforcement learning
  • Learn from an expert and explore on its own
  • Matching performance of expert and outperform it on average

15

slide-16
SLIDE 16

What do we need to play the congestion control game?

slide-17
SLIDE 17

Target Solution Characteristics

  • Consider
  • Avoiding deterministic

mappings between network states and actions by the sender

17

slide-18
SLIDE 18

Target Solution Characteristics

  • Consider
  • Avoiding deterministic

mappings between network states and actions by the sender

  • Generalizing well to many

network environments

18

slide-19
SLIDE 19

Target Solution Characteristics

  • Consider
  • Avoiding deterministic

mappings between network states and actions by the sender

  • Generalizing well to many

network environments

  • Adapting well to newly seen

network environments

19

slide-20
SLIDE 20

Target Solution Characteristics

  • Consider
  • Avoiding deterministic

mappings between network states and actions by the sender

  • Generalizing well to many

network environments

  • Adapting well to newly seen

network environments

20

  • Areas of focus

Stochastic policy A more general system design Online learning

slide-21
SLIDE 21

Target Solution Characteristics

  • Consider
  • Avoiding deterministic

mappings between network states and actions by the sender

  • Generalizing well to many

network environments

  • Adapting well to newly seen

network environments

21

  • Areas of focus

Stochastic policy A more general system design Online learning

slide-22
SLIDE 22

General Framework of Reinforcement Learning

22

slide-23
SLIDE 23

Challenges in using Deep Reinforcement Learning

slide-24
SLIDE 24

First-Cut: GOLD

  • Deep Neural Network with two hidden layers
  • Congestion window size (cwnd) as the control parameter
  • State space: [sending rate, loss rate, RTT gradient] in past 4 steps
  • Action Space: [

]

  • Reward Function:

× 2.89, × 1.5, × 1.05, 0, ÷ 2.89, ÷ 1.5, ÷ 1.05

rt = goodnessa − b × goodness × dRTT dT − c × goodness × Lt

ut = xt − b × xt × dRTT dT − c × xt × Lt

24

slide-25
SLIDE 25

Issues with GOLD

  • Overly aggressive action space

taking so much time to drain queues

  • Not considering delays in our

reward function

  • Hard coded the number of past

steps to be considered to 4

  • Slow training convergence, since

step size was dependent on RTT

25

5 Mbps and 40ms one-way delay

slide-26
SLIDE 26

Motivating Current System Design

  • Deep Reinforcement Learning:
  • Stochastic policy, hence we choose a policy-based algorithm
  • LSTM neural network to save weights across time steps
  • Generalize system design
  • state space across different environments
  • Tailor reward for different phases

26

slide-27
SLIDE 27

Motivating Current System Design

  • Why do we need an expert?
  • Get out of bad states that slows training time, since step size

depends on RTT

  • No need to try very bad actions when we can learn easy tasks

quickly from expert

  • Avoid local optima

27

slide-28
SLIDE 28

Expert BBR Mechanism

  • Start-up phase: aggressive

increase in sending rate until delay is seen

  • Queue draining phase: decrease

sending rate to the last sending rate before delay

  • Bandwidth probing phase: increase

sending rate slowly until delay is seen

28

slide-29
SLIDE 29

Design Decisions

  • Reward function: accurate

feedback to the agent

  • Start-up phase:
  • Queue draining phase:
  • Bandwidth probing phase:

rt ∝ Δdelivery rate rt ∝ − Δqueueing delay

rt ∝ (Δdelivery rate − Δqueueing delay)

29

slide-30
SLIDE 30

Design Parameters

  • Step size: 3 RTT
  • State space (for past 4 steps)
  • Experienced Delay Before?
  • Increases - Decrease Multiples
  • Percentage Change in

exponentially weighted moving average (EWMA) Delivery Rate

  • Loss Rate
  • EWMA of Queueing Delay

×

30

  • Algorithm: Cross-entropy method
  • Neural Network: LSTM with 64 hidden

units and 2 layers

  • Action space on sending rate
  • 2.89
  • 1.25
  • Do nothing
  • 1.25
  • 2.89

× × ÷ ÷

slide-31
SLIDE 31

System Design

31

LSTM

Agent

Softmax

State st Action at

Network Environment

Reward rt rt+1 st+1

OR

Synthesized BBR

Congestion Signals Sending Rate Adjustments

slide-32
SLIDE 32

Results: Pantheon LTE Environment

32

slide-33
SLIDE 33

Results: Pantheon Constant Bandwidth Environment

33

slide-34
SLIDE 34

Concluding Remarks

  • Eagle: Congestion Control Algorithm powered by Deep Reinforcement Leaning and

a teacher — BBR

  • Generalize well
  • Performed well on newly seen environments
  • Step forward to self-learning congestion control
  • Future work:
  • Test the performance in online-learning phase
  • Test fairness with other flows

34

slide-35
SLIDE 35

Thank you!

35