Deep-Q: Traffic-driven QoS Inference using Deep Generative Network - - PowerPoint PPT Presentation

deep q traffic driven qos inference using
SMART_READER_LITE
LIVE PREVIEW

Deep-Q: Traffic-driven QoS Inference using Deep Generative Network - - PowerPoint PPT Presentation

Deep-Q: Traffic-driven QoS Inference using Deep Generative Network Shihan Xiao , Dongdong He, Zhibo Gong Network Technology Lab, Huawei Technologies Co., Ltd., Beijing, China 1 Background What is a QoS Model? Traffic D elay, jitter, packet


slide-1
SLIDE 1

Deep-Q: Traffic-driven QoS Inference using Deep Generative Network

Shihan Xiao, Dongdong He, Zhibo Gong Network Technology Lab, Huawei Technologies Co., Ltd., Beijing, China

1

slide-2
SLIDE 2
  • What is a QoS Model?

Background

Traffic Network Delay, jitter, packet loss…

QoS Model

slide-3
SLIDE 3
  • Why is it important?

Background

Online QoS Monitoring

Path A Path B Path C Delay Monitoring

SLA guarantee & anomaly detection

Monitor

Require high cost on real-time active QoS measurements! A QoS model helps reduce most of the cost!

slide-4
SLIDE 4

Online QoS Monitoring

Path A Path B Path C Delay Monitoring

SLA guarantee & anomaly detection

Offline Traffic Analysis

Traffic trace Network

Path A Path B Path C Delay Inference

Inference

+

Monitor

  • Why is it important?

Background

A QoS model can do QoS inference without QoS measurements

slide-5
SLIDE 5
  • Why is it important?

Background

Online QoS Monitoring

Path A Path B Path C Delay Monitoring

SLA guarantee & anomaly detection

Offline Traffic Analysis

Traffic trace Network

Path A Path B Path C Delay Inference

Inference

+

“What if” Analysis

Path A Path C

Predict

Delay Prediction

How QoS will change if a flow switches from Path A to C? Monitor

slide-6
SLIDE 6

Traditional Methods

6

Traffic Network Delay, jitter, packet loss

Network Simulator

NS2, NS3, OMNeT++…

Slow and Inaccurate

  • 1. Network simulator
slide-7
SLIDE 7
  • 2. Mathematical modeling

Traditional Methods

7

Traffic Network Delay, jitter, packet loss

Queuing Theory

Simplified assumptions

Large human-analysis cost & Inaccurate

slide-8
SLIDE 8
  • 2. Mathematical modeling

Traditional Methods

8

Traffic Network Delay, jitter, packet loss

Queuing Theory

Simplified assumptions

A fast, accurate & low-cost QoS model is helpful! Large human-analysis cost & Inaccurate

slide-9
SLIDE 9

Key Observations

  • Observation 1: Traffic load per link is much easier to collect & well-

supported by existing tools (e.g., SNMP) than QoS values per path

9

slide-10
SLIDE 10

Key Observations

  • Observation 1: Traffic load per link is much easier to collect & well-

supported by existing tools (e.g., SNMP) than QoS values per path

  • Observation 2: Traffic load is the key factor of QoS changes

10

Traffic: collected link load matrixes

Delay, jitter, packet loss

QoS Model

Node index Node index

slide-11
SLIDE 11

Key Observations

  • Observation 3: Different traffic loads lead to different QoS distributions

11

Testbed measurement

40 traffic loads (per 20 min) Measured delay samples

slide-12
SLIDE 12

Key Observations

  • Target Problem: Given a set of traffic load matrixes during time T, what are the

distributions of QoS values (delay, jitter, loss...) of each network path during T?

12

Different traffic loads lead to different QoS distributions

slide-13
SLIDE 13

Solution of Deep-Q

  • Why deep learning helps?

13

Data-driven VS. Human-engineered model

Low human-analysis cost Fast inference

Network Simulator Packets QoS values Running time of Hours! QoS values Running time of Milliseconds!

… … …

Traffic load matrix Delay model Loss model

… Auto Training

Delay/Jitter/Loss …

slide-14
SLIDE 14

Key Technology: Deep Generative Network

  • State-of-the-art DGNs in deep learning

– GAN(Generative Adversarial Network) & VAE(Variational Autoencoder)

14

Input: “this small bird has a pink breast and crown, and black primaries and secondaries”

infer

Source: ICML2016, “Generative Adversarial Text to Image Synthesis”

infer

Input: number 2

(Conditional) GAN Example (Conditional) VAE Example

Source: NIPS2014, “Semi-supervised Learning with Deep Generative Models”

Input: traffic load matrixes

infer

Delay (us) Probability

Deep-Q Image domain Network domain

So what is the difference?

slide-15
SLIDE 15

Key Technology: Deep Generative Network

  • Differences

15

Discrete Label Image samples Traffic statistics QoS values Input Discrete & Low/high Dimensional Discrete & High Dimensional Output Input Output Continuous & High Dimensional Continuous & Low Dimensional Image Image do doma main (GAN & VAE) Ne Netw twork k do doma main (Deep (Deep-Q) Q) Target: the generated image samples satisfy “real” image distribution and match the label class Target: the generated QoS values satisfy real QoS distribution and match the traffic statistics

Deep-Q requires a high accuracy on the output distribution, but GAN & VAE do not apply!

Application: text label to images Application: traffic load matrixes to QoS values

slide-16
SLIDE 16

Deep-Q Solution

  • 1. Handle the continuous high-dimensional input

– Extract traffic features from a sequence of high-dimensional traffic load matrixes

16

LSTM Cell LSTM Cell

… …

LSTM Cell

𝑁𝑢

2

𝑁𝑢

3

𝑁𝑢

𝑜

Traffic features Micro-load matrixes during time t

Hidden State Hidden State LSTM Cell Hidden State

𝑁𝑢

1

… … LSTM (Long Short Term Memory) module: a state-of-the-art deep learning method to learn features from a data sequence … … … … … … … … … … … …

slide-17
SLIDE 17
  • 2. Handle the continuous low-dimensional output

– Challenge: high accuracy is required for QoS distribution inference – Solution: a new metric “Cinfer loss” to accurately quantify the QoS distribution error

Deep-Q Solution

17

CDF curve of X CDF curve of Y X: Inferred QoS distribution Y: Target QoS distribution Height Difference

Cumulative Probability

Delay (ms)

CDF (Cumulative Distribution Function)

slide-18
SLIDE 18

Deep-Q Solution

  • Deep-Q: A stable & accurate inference engine

– Built upon VAE (Stable) and augmented with Cinfer Loss (Accurate)

18

L2 Loss of VAE KL Loss of GAN Cinfer Loss of Deep-Q VAE: Stable but Inaccurate GAN: More accurate but unstable Deep-Q: Stable & Accurate A simple example of learning ability: Target distribution Inferred distribution

slide-19
SLIDE 19

Deep-Q Solution

  • Cinfer-Loss computation for training

– The exact computation is NP-hard – The approximation must be fully differentiable to compute gradients for training

  • Step 1: Discretization

19 Cumulative Probability Delay (ms) Discretization Cumulative Probability Delay (ms)

From integral to a discrete sum of bins

slide-20
SLIDE 20

Deep-Q Solution

  • Cinfer-Loss computation for training

– The exact computation is NP-hard – The approximation must be fully differentiable to compute gradients for training

  • Step 2: Bin Height Computation– required to be differentiable
  • An intuitive method:

– Calculate the located bin index of each sample & Count the sample number per bin

20 Cumulative Probability Delay (ms)

Ceil function is non-differentiable & difficult to approximate!

slide-21
SLIDE 21

Deep-Q Solution

  • Cinfer-Loss computation for training

– The exact computation is NP-hard – The approximation must be fully differentiable to compute gradients for training

  • Step 2: Bin Height Computation– required to be differentiable
  • A differentiable method with some math tricks (borrowed from deep learning)

21 Cumulative Probability Delay (ms)

Step 2): Approximate 𝑇𝑗𝑕𝑜 function with 𝑢𝑏𝑜ℎ

Approximation error< 10−5 in experiments

Step 1): Use 𝑇𝑗𝑕𝑜 function

slide-22
SLIDE 22

Deep-Q Solution

  • Put it all together

22

VAE Encoder VAE Decoder

Z

Sampling from N(0,1)

X

Traffic load matrix along time

X’

Automatic feature engineering & QoS modeling:end-to-end training using Cinfer Loss

Network QoS (delay,jitter, loss…) Collect traffic data Underlay network

Space-time Traffic Features

Network QoS (delay,jitter, loss…)

Inference phase of Deep-Q Training phase of Deep-Q

… … … … … … … … …

slide-23
SLIDE 23
  • Testbed Topology
  • Traffic traces: WIDE backbone network [1]

– Training set: 24 hours of traffic traces on April 12, 2017 – Test set: 24 hours of traffic traces on April 13, 2017

  • Neural network: TensorFlow implementation with 2 hidden layers

Experiment Setup

23

CPE CPE r0 r1 r2 r3 AS-1 Internet AS-2 r4 NEU200 Probe NEU200 Probe NEU200 Probe NEU200 Probe

Experiment topology of data center network Experiment topology of overlay IP network

[1] Traffic traces are public available at http://mawi.wide.ad.jp/mawi/

slide-24
SLIDE 24

Experiment Results

  • Delay Inference in Datacenter Topology

24

Traffic Real delay Distribution error of inference Mean error of inference 90-percentile error of inference 99-percentile error of inference Queuing theory Deep learning

  • 1. Deep learning methods achieve on average 3x higher accuracy over Queuing theory
  • 2. Deep-Q achieves the lowest errors and most stable performance over all cases
slide-25
SLIDE 25

Experiment Results

  • Packet Loss Inference in Overlay IP Topology

25

  • 1. Deep learning methods achieve on average 3x higher accuracy over Queuing theory
  • 2. Deep-Q achieves the lowest errors and most stable performance over all cases

Queuing theory Deep learning Deep-Q inference speed < 10ms for network scale < 200 nodes

slide-26
SLIDE 26

Conclusion

  • Deep-Q: an accurate, fast and low-cost QoS inference engine

– Automation: LSTM module for auto traffic feature extraction – High stability: an extended VAE inference structure with the encoder and decoder – High accuracy: a new metric “Cinfer loss” to accurately quantify the QoS distribution error

  • Future vision:

– Learn device-level QoS models (routers/switches) → scalable network-level QoS models – Learn high-level application QoE from traffic traces

26

slide-27
SLIDE 27

27