Cellular Network Traffic Scheduling using Deep Reinforcement - PowerPoint PPT Presentation

Cellular Network Traffic Scheduling using Deep Reinforcement Learning Sandeep Chinchali, et. al. Marco Pavone, Sachin Katti Stanford University AAAI 2018

arn to optimally manage cellular networks? Can we le lear Delay Tolerant (DT) Traffic Pre-fetched content IoT: Map/SW updates Internet Real-time Mobile Traffic Delay Sensitive 2

Why is IoT/DT traffic scheduling hard? Contending goals IoT Acceptable Limit • Max IoT/DT data Utilization • Loss to mobile traffic IoT • Network limits Optimal Control csandeep@stanford.edu 3

Why is IoT/DT traffic scheduling hard? Melbourne Central Business District, Rolling Average = 1 min 50 Shopping center O ffi ce building Southern cross station 40 Melbourne central station Diverse city-wide cell Congestion C 30 patterns 20 10 0 09:00 11:00 13:00 15:00 17:00 19:00 21:00 Local time csandeep@stanford.edu 4

Our contributions 1. Identify inefficiencies in real cellular networks 4 weeks, 10 diverse cells in Downtown Melbourne, Australia 2. Data Driven, Deep Learning Network Model Our live network experiments match MDP dynamics IoT Scheduler 3. Adaptive RL scheduler IoT rate Flexibly responds to operator Network State reward functions csandeep@stanford.edu 5

Why Deep Learning? 1. Learn time-variant network Melbourne Central Business District, Rolling Average = 1 min 50 dynamics Shopping center O ffi ce building Southern cross station 40 Melbourne central station 2. Adapt to high-level network Congestion C 30 operation goals 20 3. Generalize to diverse cells 10 4. Abundance of network data 0 09:00 11:00 13:00 15:00 17:00 19:00 21:00 Local time csandeep@stanford.edu 6

Related Work 1. Dynamic Resource Allocation • Electricity grid (Reddy 2011) , call admission (Marbach 1998), traffic control (Chu 2016) 2. Data-driven Optimal Control + Forecasting • Deep RL (Mnih 2013, Silver 2014, Lillicrap 2015) • LSTM networks (Hochreiter 1997, Laptev 2017, Shi 2015) 3. Machine Learning for Computer Networks • Cluster Resource Management (Mao 2016) • Mobile Video Streaming (Mao 2017, Yin 2015 ) csandeep@stanford.edu 7

Data-driven problem formulation 1. Network State Space 2. IoT Scheduler Actions 3. Time-variant dynamics 4. Network operator policies Congestion IoT Scheduler Cell efficiency IoT rate Num Users Network state + forecasts 8

Primer on Cell Networks (Link Quality) Goal: Max safe IoT 𝐮𝐬𝐛𝐠𝐠𝐣𝐝 𝑾 𝒖 over day csandeep@stanford.edu 9

RL setup (1): State Space Current Network State Reward Action Full State with Temporal Features Agent Environment Network state Stochastic Forecast (LSTM) Horizon: Day of T mins csandeep@stanford.edu 10

RL setup (2): Action Space IoT Traffic Rate: Reward Action IoT Volume per minute: Agent Environment Network state Utilization gain: csandeep@stanford.edu 11

RL setup (3): Transition Dynamics 1 . 6 Controlled tra ffi c 1 . 5 Reward 1 . 4 Background Congestion C dynamics 1 . 3 Action Agent 1 . 2 Environment 1 . 1 Network state 1 . 0 20:10 20:15 20:20 Local time csandeep@stanford.edu 12

RL setup (4): Operator Rewards Overall weighted reward Reward 1. IoT traffic volume Action What-if model Agent Environment 2. Loss to regular users Network state Goal: Find Optimal Operator Policy 3. Traffic below network limit 13

Evaluation csandeep@stanford.edu 14

Evaluation Criteria 1. Robust performance on diverse cell-day pairs 2. Ability to exploit better forecasts 3. Interpretability Congestion IoT Scheduler Cell efficiency IoT rate Num Users Network state + forecasts 15

1. RL generalizes to several cell-day pairs Respond to operator priorities 100 α 1 2 80 8tilization gain V IoT / V 0 (%) Significant gains: 60 • FCC Spectrum Auction (Reardon 2016) : $4.5B for 10 MHz of 40 spectrum • 14.7% median gain for α = 2 20 • Significant cost savings [simulated] 0 TUain Test csandeep@stanford.edu 16

2. RL effectively leverages forecasts RL Benchmark Richer LSTM forecasts 17

3a. RL exploits transient dips in utilization Controlled Congestion Utilization gain 100 16 Utilization gain V IoT /V 0 (%) Heuristic control Original 14 DDPG control Heuristic control 80 12 DDPG control Transient Dip Congestion C 10 60 8 40 6 4 20 2 0 0 9:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00 9:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00 Local time Local time 18

3b. RL smooths network throughput Controlled Congestion Resulting Throughput 1 . 6 16 Original 1 . 4 Original 14 Throughput B (MBps) Heuristic control Heuristic control 1 . 2 12 DDPG control DDPG control Congestion C 1 . 0 Throughput limit 10 0 . 8 8 0 . 6 6 4 0 . 4 2 0 . 2 0 0 . 0 9:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00 9:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00 Local time Local time csandeep@stanford.edu 19

Conclusion Modern networks are evolving • Delay tolerant traffic (IoT updates, pre-fetched content) Data-driven optimal control • LSTM forecasts + RL controller • 14.7% simulated gain -> significant savings Future work: • Operational network tests • Decouple prediction and control Questions: csandeep@stanford.edu csandeep@stanford.edu

Extra slides csandeep@stanford.edu 21

2. RL effectively leverages forecasts Better forecasts enhance performance Discretized MDP for offline optimal 2.4 | ¯ A| =5 2.2 | ¯ A| =20 2.0 | ¯ A| =40 1.8 Reward R | ¯ A| =60 1.6 1.4 1.2 1.0 0.8 0 50 100 150 200 250 | ¯ S| Approach Cts MDP Richer LSTM forecasts csandeep@stanford.edu 22

Cellular Network Traffic Scheduling using Deep Reinforcement - PowerPoint PPT Presentation

Cellular Network Traffic Scheduling using Deep Reinforcement Learning Sandeep Chinchali, et. al. Marco Pavone, Sachin Katti Stanford University AAAI 2018 arn to optimally manage cellular networks? Can we le lear Delay Tolerant (DT) Traffic

Cellular Automaton Tracking for VXD Cellular Automaton Tracking for VXD Cellular Automaton

Aperiodic Task Scheduling Radek Pel anek Preemptive Scheduling Non-preemptive Scheduling

Instruction Scheduling Last time Instruction scheduling using list scheduling Today

CPU Scheduling CPU Scheduling CPU Scheduling 101 CPU Scheduling 101 The CPU scheduler makes a

Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Uniprocessor Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms 2

Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Module 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Uniprocessor Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Three

CPU Scheduling CPU Scheduling CPU Scheduling 101 CPU Scheduling 101 The CPU scheduler makes a

Cellular structure for a digital fiat currency Robleh Ali MIT Media Lab Digital Currency

Mobility and cellular networks Mobility and cellular networks Cellular radio and PCS networks

Traffic Shaping, Traffic Policing Peter Puschner, Institut fr Technische Informatik Traffic

Traffic signal optimization and traffic assignment Traffic signals Traffic signal optimization

Deep-Q: Traffic-driven QoS Inference using Deep Generative Network Shihan Xiao , Dongdong He,

Tips for Designing a Formal Presentation for SCHOLAR Day: In PowerPoint or Another Medium

VHI Represents all Health Care Stakeholders VHI is an independent, nonprofit, 501(c)(3) health

VHI Represents all Health Care Stakeholders VHI is an independent, nonprofit, 501(c)(3)

Yngve Gustafson, Professor, Consultant, Head of department of Geriatric Medicine Scientific

DEEP LEARNING DEMYSTIFIED Will Ramey Director, Developer Programs NVIDIA Corporation

DEEP p P project ct p presentatio tion Brief project overview Eve vent nt Autho thor name

Connecticut Department of Energy and Environmental Protection Peter Spangenberg Great Swamp Flood

On Merging MobileNets for Efficient Multitask Inference Cheng-En Wu, Yi-Ming Chan , and Chu-Song