SVM Learning of IP Address Structure for Latency Prediction Rob - - PowerPoint PPT Presentation

svm learning of ip address structure for latency
SMART_READER_LITE
LIVE PREVIEW

SVM Learning of IP Address Structure for Latency Prediction Rob - - PowerPoint PPT Presentation

SVM Learning of IP Address Structure for Latency Prediction Rob Beverly, Karen Sollins and Arthur Berger {rbeverly,sollins,awberger}@csail.mit.edu SIGCOMM MineNet06 1 SVM Learning of IP Address Structure for Latency Prediction The case


slide-1
SLIDE 1

SIGCOMM MineNet06 1

SVM Learning of IP Address Structure for Latency Prediction

Rob Beverly, Karen Sollins and Arthur Berger

{rbeverly,sollins,awberger}@csail.mit.edu

slide-2
SLIDE 2

SIGCOMM MineNet06 2

SVM Learning of IP Address Structure for Latency Prediction

  • The case for Latency Prediction
  • The case for Machine Learning
  • Data and Methodology
  • Results
  • Going Forward
slide-3
SLIDE 3

SIGCOMM MineNet06 3

The Case for Latency Prediction

slide-4
SLIDE 4

SIGCOMM MineNet06 4

Latency Prediction (again?)

  • Significant prior work:

– King [Gummandi 2002] – Vivaldi [Dabek 2004] – Meridian [Wong 2005] – Others… IDMaps, GNP, etc…

  • Prior Methods:

– Active Queries – Synthetic Coordinate Systems – Landmarks

  • Our work seeks to provide an agent-centric

(single-node) alternative

slide-5
SLIDE 5

SIGCOMM MineNet06 5

Why Predict Latency?

  • 1. Service Selection: balance load, optimize

performance, P2P replication

  • 2. User-directed Routing: e.g. IPv6 with per-

provider logical interfaces

  • 3. Resource Scheduling: Grid computing,

etc.

  • 4. Network Inference: Measure additional

topological network properties

slide-6
SLIDE 6

SIGCOMM MineNet06 6

An Agent-Centric Approach

  • Hypothesis: Two hosts within same sub

network likely have consistent congestion and latency

  • Registry allocation policies give network

structure – but fragmented and discontinuous

  • Formulate as a supervised learning problem
  • Given latencies to a set of (random)

destinations as training:

– predict_latency(unseen IP) – error = |predict(IP) – actual(IP)|

slide-7
SLIDE 7

SIGCOMM MineNet06 7

The Case for Machine Learning

slide-8
SLIDE 8

SIGCOMM MineNet06 8

Why Machine Learning?

Internet-scale Networks:

– Complex (high-dimensional) – Dynamic

  • Can accommodate and recover from

infrequent errors in probabilistic world

  • Traffic provides large and continuous

training base

slide-9
SLIDE 9

SIGCOMM MineNet06 9

Candidate Tool: Support Vector Machine

  • Supervised learning (but amenable to online

learning)

  • Separate training set into two classes in most

general way

  • Main insight: find hyper-plane separator that

maximizes the minimum margin between convex hulls of classes

  • Second insight: if data is not linearly separable,

take to higher dimension

  • Result: generalizes well, fast, accommodate

unknown data structure

slide-10
SLIDE 10

SIGCOMM MineNet06 10

SVMs – Maximize Minimum Margin

=positive examples =negative examples

Most Simple Case: 2 Classes in 2 Dimensions Linearly Separable

=support vector

slide-11
SLIDE 11

SIGCOMM MineNet06 11

SVMs – Redefining the Margin

=positive examples =negative examples

The single new positive example redefines margin

slide-12
SLIDE 12

SIGCOMM MineNet06 12

Non-SV Points don’t affect solution

=positive examples =negative examples

slide-13
SLIDE 13

SIGCOMM MineNet06 13

IP Latency Non-Linearity

=10ms examples =200ms examples

IP Bit 0 IP Bit 1

slide-14
SLIDE 14

SIGCOMM MineNet06 14

Higher Dimensions for Non-Linearity

2 Classes in 2 Dimensions NOT Linearly Separable

?? ?? ?? =10ms examples =200ms examples

slide-15
SLIDE 15

SIGCOMM MineNet06 15

Kernel Function Φ

2 Classes in 3 Dimensions Linearly Separable

=10ms examples =200ms examples

slide-16
SLIDE 16

SIGCOMM MineNet06 16

Support Vector Regression

  • Same idea as classification
  • ε-insensitive loss function

First Octet Latency Φ Latency ε

slide-17
SLIDE 17

SIGCOMM MineNet06 17

Data and Methodology

slide-18
SLIDE 18

SIGCOMM MineNet06 18

Data Set

  • 30,000 random hosts responding to ping
  • Average latency to each over 5 pings
  • Non-trivial distribution for learning
slide-19
SLIDE 19

SIGCOMM MineNet06 19

Methodology

IP Latency # IP Latency # Permute Order Train Test

  • Average 5 experiments:

– Randomly permute data set – Split data set into training / test points

Data Set

slide-20
SLIDE 20

SIGCOMM MineNet06 20

Methodology

  • Average 5 experiments:

– Training data defines SVM – Performance on (unseen) test points – Each bit of IP an input feature

IP Latency # IP Latency # Permute Order Train Test

Build SVM Measure Performance

slide-21
SLIDE 21

SIGCOMM MineNet06 21

Results

slide-22
SLIDE 22

SIGCOMM MineNet06 22

Results

  • Spoiler: So, does it work?
  • Yes, within 30% for more than 75% of

predictions

  • Performance varies with selection of

parameters (multi-optimization problem)

– Training Size – Input Dimension – Kernel

slide-23
SLIDE 23

SIGCOMM MineNet06 23

Training Size

slide-24
SLIDE 24

SIGCOMM MineNet06 24

Question: Are MSB Better Predictors

  • Determine error versus number most

significant bits of test input IPs

slide-25
SLIDE 25

SIGCOMM MineNet06 25

Question: Are MSB Better Predictors

  • Determine error versus number most

significant bits of test input IPs

Powerful result: majority of discriminatory power in first 8 bits

slide-26
SLIDE 26

SIGCOMM MineNet06 26

Feature Selection

  • Use feature selection to determine which

individual bits of address contribute to discriminatory power of prediction

θi ← ← ← ← argminj V(f(θ θ θ θ,xj),y) ∀ ∀ ∀ ∀ xj !∈ ∈ ∈ ∈ θ1,…,θi-1

slide-27
SLIDE 27

SIGCOMM MineNet06 27

Feature Selection

Note x-axis: cumulative bit “features”

slide-28
SLIDE 28

SIGCOMM MineNet06 28

Feature Selection

Matches Intuition and Registry Allocations

slide-29
SLIDE 29

SIGCOMM MineNet06 29

Performance

  • Given empirically optimal training size and

input features

  • How well can agents predict latency to

unknown destinations?

slide-30
SLIDE 30

SIGCOMM MineNet06 30

Prediction Performance

Ideal

slide-31
SLIDE 31

SIGCOMM MineNet06 31

Prediction Performance

Within 30% for >75%

  • f Predictions
slide-32
SLIDE 32

SIGCOMM MineNet06 32

Prediction Performance

Within 30% for >75%

  • f Predictions
slide-33
SLIDE 33

SIGCOMM MineNet06 33

Going Forward

slide-34
SLIDE 34

SIGCOMM MineNet06 34

Future Research

  • How agents select training data (random,

BGP prefix, registry allocation, from TCP flows, etc)

  • How performance decays over time and

how often to retrain

  • Online, continuous learning
slide-35
SLIDE 35

SIGCOMM MineNet06 35

Summary - Questions?

  • Major Results:

– An agent-centric approach to latency prediction – Validation of SVMs and Kernel Functions as a means to learn on the basis of Internet Addresses – Feature Selection analysis of IP address informational content in predicting latency – Latency estimation accuracy within 30% of true value for > 75% of data points

slide-36
SLIDE 36

SIGCOMM MineNet06 36

Prediction Accuracy

Only ~25% of predictions accurate to within 5% of actual value

slide-37
SLIDE 37

SIGCOMM MineNet06 37

Prediction Accuracy

About ~45% of predictions accurate to within 10% of true value But >85% of predictions accurate to within 50% of true value

slide-38
SLIDE 38

SIGCOMM MineNet06 38

Prediction Accuracy

Bottom line: suitable for coarse-grained latency prediction