eagle refining congestion control by learning from the
play

Eagle: Refining Congestion Control by Learning from the Experts - PowerPoint PPT Presentation

Eagle: Refining Congestion Control by Learning from the Experts Salma Emara 1 , Baochun Li 1 , Yanjiao Chen 2 1 University of Toronto, {salma, bli}@ece.utoronto.ca 2 Wuhan University, chenyj.thu@gmail.com Internet Congestion Control Video


  1. Eagle: Refining Congestion Control by Learning from the Experts Salma Emara 1 , Baochun Li 1 , Yanjiao Chen 2 1 University of Toronto, {salma, bli}@ece.utoronto.ca 2 Wuhan University, chenyj.thu@gmail.com

  2. Internet Congestion Control Video Streaming Applications Before 2005 2010 2015 2020 2000 Vegas Hybla CUBIC BBR Indigo BIC Illinois PCC Vivace 2

  3. Internet Congestion Control [Dong et al., 2015 & 2018] - Online learning - Utility framework Before 2005 2010 2015 2020 2000 Indigo BBR Vegas Hybla CUBIC BIC Illinois PCC Vivace 3

  4. Internet Congestion Control [Cardwell et al., 2016] [Dong et al., 2015 & 2018] - Heuristic - Online learning - Estimate bottleneck - Utility framework bandwidth and minimum RTT Before 2005 2010 2015 2020 2000 Indigo Vegas Hybla CUBIC BBR BIC Illinois PCC Vivace 4

  5. Internet Congestion Control [Cardwell et al., 2016] [Dong et al., 2015 & 2018] [Yan et al., 2018] - Heuristic - Online learning - Offline learning - Estimate bottleneck - Utility framework - Map states to actions bandwidth and minimum RTT Before 2005 2010 2015 2020 2000 Indigo Vegas Hybla CUBIC BBR BIC Illinois PCC Vivace 5

  6. Existing Congestion Control Algorithms ‣ Fixed mappings between events and control responses ? Bandwidth is Dynamic or Stable ? Shared with other flows ? Lossy ? 6

  7. Existing Congestion Control Algorithms ‣ Fixed mappings between events and control responses ‣ Mappings are fixed on ? environments the model was trained on Bandwidth is Dynamic or Stable ? Shared with other flows ? Lossy ? 7

  8. Existing Congestion Control Algorithms ‣ Fixed mappings between events and control responses ‣ Mappings are fixed on ? environments the model was trained on Bandwidth is Dynamic or ‣ Oblivious to earlier traffic Stable ? patterns Shared with other flows ? Lossy ? 8

  9. Think of Congestion Control as a Game

  10. Think of Congestion Control as a Game 1 2 3 No fixed way to play the game

  11. Think of Congestion Control as a Game 2 1 Based on changes 3 No fixed way to in the game , you play the game make a move

  12. Think of Congestion Control as a Game 2 3 1 Based on changes Use history to No fixed way to in the game , you understand your play the game make a move game environment

  13. A Sender/Learner/Agent can be trained to play the Congestion Control Game

  14. Earlier Success Stories of Training for Games ‣ In 2016, AlphaGo was the first to beat human expert in Go game ‣ It was trained using supervised and reinforcement learning 14

  15. Contributions ‣ Eagle is designed to ‣ Train using reinforcement learning ‣ Learn from an expert and explore on its own ‣ Matching performance of expert and outperform it on average 15

  16. What do we need to play the congestion control game?

  17. Target Solution Characteristics ‣ Consider ‣ Avoiding deterministic mappings between network states and actions by the sender 17

  18. Target Solution Characteristics ‣ Consider ‣ Avoiding deterministic mappings between network states and actions by the sender ‣ Generalizing well to many network environments 18

  19. Target Solution Characteristics ‣ Consider ‣ Avoiding deterministic mappings between network states and actions by the sender ‣ Generalizing well to many network environments ‣ Adapting well to newly seen network environments 19

  20. Target Solution Characteristics ‣ Consider ‣ Areas of focus ‣ Avoiding deterministic Stochastic policy mappings between network states and actions by the sender A more general ‣ Generalizing well to many system design network environments ‣ Adapting well to newly seen network environments Online learning 20

  21. Target Solution Characteristics ‣ Consider ‣ Areas of focus ‣ Avoiding deterministic Stochastic policy mappings between network states and actions by the sender A more general ‣ Generalizing well to many system design network environments ‣ Adapting well to newly seen network environments Online learning 21

  22. General Framework of Reinforcement Learning 22

  23. Challenges in using Deep Reinforcement Learning

  24. First-Cut: GOLD ‣ Deep Neural Network with two hidden layers ‣ Congestion window size (cwnd) as the control parameter ‣ State space: [sending rate, loss rate, RTT gradient] in past 4 steps × 2.89, × 1.5, × 1.05, 0, ÷ 2.89, ÷ 1.5, ÷ 1.05 ‣ Action Space: [ ] ‣ Reward Function: r t = goodness a − b × goodness × dRTT − c × goodness × L t dT u t = x t − b × x t × dRTT − c × x t × L t dT 24

  25. Issues with GOLD 5 Mbps and 40ms one-way delay ‣ Overly aggressive action space taking so much time to drain queues ‣ Not considering delays in our reward function ‣ Hard coded the number of past steps to be considered to 4 ‣ Slow training convergence, since step size was dependent on RTT 25

  26. Motivating Current System Design ‣ Deep Reinforcement Learning: ‣ Stochastic policy, hence we choose a policy-based algorithm ‣ LSTM neural network to save weights across time steps ‣ Generalize system design ‣ state space across different environments ‣ Tailor reward for different phases 26

  27. Motivating Current System Design ‣ Why do we need an expert? ‣ Get out of bad states that slows training time, since step size depends on RTT ‣ No need to try very bad actions when we can learn easy tasks quickly from expert ‣ Avoid local optima 27

  28. Expert BBR Mechanism ‣ Start-up phase: aggressive increase in sending rate until delay is seen ‣ Queue draining phase: decrease sending rate to the last sending rate before delay ‣ Bandwidth probing phase: increase sending rate slowly until delay is seen 28

  29. Design Decisions ‣ Reward function : accurate feedback to the agent ‣ Start-up phase: r t ∝ Δ delivery rate ‣ Queue draining phase: r t ∝ − Δ queueing delay ‣ Bandwidth probing phase: r t ∝ ( Δ delivery rate − Δ queueing delay ) 29

  30. Design Parameters ‣ Algorithm: Cross-entropy method × ‣ Step size: 3 RTT ‣ Neural Network : LSTM with 64 hidden ‣ State space (for past 4 steps) units and 2 layers ‣ Experienced Delay Before? ‣ Action space on sending rate × ‣ Increases - Decrease Multiples ‣ 2.89 × ‣ Percentage Change in ‣ 1.25 exponentially weighted moving ‣ Do nothing average (EWMA) Delivery Rate ÷ ‣ 1.25 ‣ Loss Rate ÷ ‣ 2.89 ‣ EWMA of Queueing Delay 30

  31. System Design Agent LSTM Softmax Synthesized BBR OR Action Reward State a t r t+1 r t s t Network Environment s t+1 Congestion Sending Rate Signals Adjustments 31

  32. Results: Pantheon LTE Environment 32

  33. Results: Pantheon Constant Bandwidth Environment 33

  34. Concluding Remarks ‣ Eagle: Congestion Control Algorithm powered by Deep Reinforcement Leaning and a teacher — BBR ‣ Generalize well ‣ Performed well on newly seen environments ‣ Step forward to self-learning congestion control ‣ Future work: ‣ Test the performance in online-learning phase ‣ Test fairness with other flows 34

  35. Thank you! 35

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend