Cellular Network Traffic Scheduling using Deep Reinforcement Learning
Sandeep Chinchali, et. al. Marco Pavone, Sachin Katti Stanford University AAAI 2018
Cellular Network Traffic Scheduling using Deep Reinforcement - - PowerPoint PPT Presentation
Cellular Network Traffic Scheduling using Deep Reinforcement Learning Sandeep Chinchali, et. al. Marco Pavone, Sachin Katti Stanford University AAAI 2018 arn to optimally manage cellular networks? Can we le lear Delay Tolerant (DT) Traffic
Sandeep Chinchali, et. al. Marco Pavone, Sachin Katti Stanford University AAAI 2018
2
Internet
Delay Sensitive Real-time Mobile Traffic Delay Tolerant (DT) Traffic IoT: Map/SW updates Pre-fetched content
csandeep@stanford.edu 3
IoT Utilization Acceptable Limit IoT
Contending goals
traffic
csandeep@stanford.edu 4 09:00 11:00 13:00 15:00 17:00 19:00 21:00
Local time
10 20 30 40 50
Congestion C
Melbourne Central Business District, Rolling Average = 1 min
Shopping center Office building Southern cross station Melbourne central station
Diverse city-wide cell patterns
4 weeks, 10 diverse cells in Downtown Melbourne, Australia
Our live network experiments match MDP dynamics
Flexibly responds to operator reward functions
csandeep@stanford.edu 5
IoT Scheduler Network State IoT rate
dynamics
csandeep@stanford.edu 6 09:00 11:00 13:00 15:00 17:00 19:00 21:00
Local time
10 20 30 40 50
Congestion C
Melbourne Central Business District, Rolling Average = 1 min
Shopping center Office building Southern cross station Melbourne central station
csandeep@stanford.edu 7
8
Num Users
IoT Scheduler Network state + forecasts
Congestion Cell efficiency
IoT rate
csandeep@stanford.edu 9
Goal: Max safe IoT 𝐮𝐬𝐛𝐠𝐠𝐣𝐝 𝑾𝒖 over day (Link Quality)
Current Network State Full State with Temporal Features
csandeep@stanford.edu 10
Agent Environment Action Network state Reward
Stochastic Forecast (LSTM)
Horizon: Day of T mins
IoT Traffic Rate: IoT Volume per minute: Utilization gain:
csandeep@stanford.edu 11
Agent Environment Action Network state Reward
csandeep@stanford.edu 12
20:10 20:15 20:20
Local time
1.0 1.1 1.2 1.3 1.4 1.5 1.6
Congestion C
Controlled traffic Background dynamics
Agent Environment Action Network state Reward
Overall weighted reward 1. IoT traffic volume 2. Loss to regular users 3. Traffic below network limit
13
Agent Environment Action Network state Reward Goal: Find Optimal Operator Policy
What-if model
14 csandeep@stanford.edu
15
Num Users
IoT Scheduler Network state + forecasts
Congestion Cell efficiency
IoT rate
TUain Test 20 40 60 80 100 8tilization gain VIoT/V0 (%)
α
1 2
Respond to operator priorities Significant gains:
2016): $4.5B for 10 MHz of
spectrum
csandeep@stanford.edu 16
17
Richer LSTM forecasts
RL Benchmark
Controlled Congestion Utilization gain
18
9:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00
Local time
2 4 6 8 10 12 14 16
Congestion C
Original Heuristic control DDPG control
Transient Dip
9:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00
Local time
20 40 60 80 100
Utilization gain VIoT /V0 (%)
Heuristic control DDPG control
Controlled Congestion Resulting Throughput
csandeep@stanford.edu 19
9:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00
Local time
2 4 6 8 10 12 14 16
Congestion C
Original Heuristic control DDPG control
9:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00
Local time
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6
Throughput B (MBps)
Original Heuristic control DDPG control Throughput limit
Modern networks are evolving
Data-driven optimal control
Future work:
Questions: csandeep@stanford.edu
csandeep@stanford.edu
21 csandeep@stanford.edu
Better forecasts enhance performance Discretized MDP for offline optimal
50 100 150 200 250 |¯ S| 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 Reward R |¯ A| =5 |¯ A| =20 |¯ A| =40 |¯ A| =60
csandeep@stanford.edu 22
Richer LSTM forecasts Approach Cts MDP