Cellular Network Traffic Scheduling using Deep Reinforcement - - PowerPoint PPT Presentation

cellular network traffic scheduling using deep
SMART_READER_LITE
LIVE PREVIEW

Cellular Network Traffic Scheduling using Deep Reinforcement - - PowerPoint PPT Presentation

Cellular Network Traffic Scheduling using Deep Reinforcement Learning Sandeep Chinchali, et. al. Marco Pavone, Sachin Katti Stanford University AAAI 2018 arn to optimally manage cellular networks? Can we le lear Delay Tolerant (DT) Traffic


slide-1
SLIDE 1

Cellular Network Traffic Scheduling using Deep Reinforcement Learning

Sandeep Chinchali, et. al. Marco Pavone, Sachin Katti Stanford University AAAI 2018

slide-2
SLIDE 2

Can we le lear arn to optimally manage cellular networks?

2

Internet

Delay Sensitive Real-time Mobile Traffic Delay Tolerant (DT) Traffic IoT: Map/SW updates Pre-fetched content

slide-3
SLIDE 3

Why is IoT/DT traffic scheduling hard?

csandeep@stanford.edu 3

IoT Utilization Acceptable Limit IoT

Contending goals

  • Max IoT/DT data
  • Loss to mobile

traffic

  • Network limits

Optimal Control

slide-4
SLIDE 4

Why is IoT/DT traffic scheduling hard?

csandeep@stanford.edu 4 09:00 11:00 13:00 15:00 17:00 19:00 21:00

Local time

10 20 30 40 50

Congestion C

Melbourne Central Business District, Rolling Average = 1 min

Shopping center Office building Southern cross station Melbourne central station

Diverse city-wide cell patterns

slide-5
SLIDE 5

Our contributions

  • 1. Identify inefficiencies in real cellular networks

4 weeks, 10 diverse cells in Downtown Melbourne, Australia

  • 2. Data Driven, Deep Learning Network Model

Our live network experiments match MDP dynamics

  • 3. Adaptive RL scheduler

Flexibly responds to operator reward functions

csandeep@stanford.edu 5

IoT Scheduler Network State IoT rate

slide-6
SLIDE 6

Why Deep Learning?

  • 1. Learn time-variant network

dynamics

  • 2. Adapt to high-level network
  • peration goals
  • 3. Generalize to diverse cells
  • 4. Abundance of network data

csandeep@stanford.edu 6 09:00 11:00 13:00 15:00 17:00 19:00 21:00

Local time

10 20 30 40 50

Congestion C

Melbourne Central Business District, Rolling Average = 1 min

Shopping center Office building Southern cross station Melbourne central station

slide-7
SLIDE 7

Related Work

  • 1. Dynamic Resource Allocation
  • Electricity grid (Reddy 2011), call admission (Marbach 1998), traffic control (Chu 2016)
  • 2. Data-driven Optimal Control + Forecasting
  • Deep RL (Mnih 2013, Silver 2014, Lillicrap 2015)
  • LSTM networks (Hochreiter 1997, Laptev 2017, Shi 2015)
  • 3. Machine Learning for Computer Networks
  • Cluster Resource Management (Mao 2016)
  • Mobile Video Streaming (Mao 2017, Yin 2015)

csandeep@stanford.edu 7

slide-8
SLIDE 8

Data-driven problem formulation

  • 1. Network State Space
  • 2. IoT Scheduler Actions
  • 3. Time-variant dynamics
  • 4. Network operator policies

8

Num Users

IoT Scheduler Network state + forecasts

Congestion Cell efficiency

IoT rate

slide-9
SLIDE 9

Primer on Cell Networks

csandeep@stanford.edu 9

Goal: Max safe IoT 𝐮𝐬𝐛𝐠𝐠𝐣𝐝 𝑾𝒖 over day (Link Quality)

slide-10
SLIDE 10

Current Network State Full State with Temporal Features

RL setup (1): State Space

csandeep@stanford.edu 10

Agent Environment Action Network state Reward

Stochastic Forecast (LSTM)

Horizon: Day of T mins

slide-11
SLIDE 11

IoT Traffic Rate: IoT Volume per minute: Utilization gain:

RL setup (2): Action Space

csandeep@stanford.edu 11

Agent Environment Action Network state Reward

slide-12
SLIDE 12

RL setup (3): Transition Dynamics

csandeep@stanford.edu 12

20:10 20:15 20:20

Local time

1.0 1.1 1.2 1.3 1.4 1.5 1.6

Congestion C

Controlled traffic Background dynamics

Agent Environment Action Network state Reward

slide-13
SLIDE 13

RL setup (4): Operator Rewards

Overall weighted reward 1. IoT traffic volume 2. Loss to regular users 3. Traffic below network limit

13

Agent Environment Action Network state Reward Goal: Find Optimal Operator Policy

What-if model

slide-14
SLIDE 14

Evaluation

14 csandeep@stanford.edu

slide-15
SLIDE 15

Evaluation Criteria

  • 1. Robust performance on diverse cell-day pairs
  • 2. Ability to exploit better forecasts
  • 3. Interpretability

15

Num Users

IoT Scheduler Network state + forecasts

Congestion Cell efficiency

IoT rate

slide-16
SLIDE 16
  • 1. RL generalizes to several cell-day pairs

TUain Test 20 40 60 80 100 8tilization gain VIoT/V0 (%)

α

1 2

Respond to operator priorities Significant gains:

  • FCC Spectrum Auction (Reardon

2016): $4.5B for 10 MHz of

spectrum

  • 14.7% median gain for α = 2
  • Significant cost savings [simulated]

csandeep@stanford.edu 16

slide-17
SLIDE 17
  • 2. RL effectively leverages forecasts

17

Richer LSTM forecasts

RL Benchmark

slide-18
SLIDE 18
  • 3a. RL exploits transient dips in utilization

Controlled Congestion Utilization gain

18

9:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00

Local time

2 4 6 8 10 12 14 16

Congestion C

Original Heuristic control DDPG control

Transient Dip

9:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00

Local time

20 40 60 80 100

Utilization gain VIoT /V0 (%)

Heuristic control DDPG control

slide-19
SLIDE 19
  • 3b. RL smooths network throughput

Controlled Congestion Resulting Throughput

csandeep@stanford.edu 19

9:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00

Local time

2 4 6 8 10 12 14 16

Congestion C

Original Heuristic control DDPG control

9:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00

Local time

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6

Throughput B (MBps)

Original Heuristic control DDPG control Throughput limit

slide-20
SLIDE 20

Conclusion

Modern networks are evolving

  • Delay tolerant traffic (IoT updates, pre-fetched content)

Data-driven optimal control

  • LSTM forecasts + RL controller
  • 14.7% simulated gain -> significant savings

Future work:

  • Operational network tests
  • Decouple prediction and control

Questions: csandeep@stanford.edu

csandeep@stanford.edu

slide-21
SLIDE 21

Extra slides

21 csandeep@stanford.edu

slide-22
SLIDE 22
  • 2. RL effectively leverages forecasts

Better forecasts enhance performance Discretized MDP for offline optimal

50 100 150 200 250 |¯ S| 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 Reward R |¯ A| =5 |¯ A| =20 |¯ A| =40 |¯ A| =60

csandeep@stanford.edu 22

Richer LSTM forecasts Approach Cts MDP