Shortest Path Similar Routing 2 A New Metric A new metric path- - - PowerPoint PPT Presentation

shortest path similar routing
SMART_READER_LITE
LIVE PREVIEW

Shortest Path Similar Routing 2 A New Metric A new metric path- - - PowerPoint PPT Presentation

R outing S tate D istance: A Path-based Metric for Network Analysis Gonca Grsun joint work with joint work with Natali Ruchansky, Evimaria Terzi, Mark Crovella Distance Metrics for Analyzing Routing Shortest Path Similar Routing 2 A New


slide-1
SLIDE 1

Routing State Distance: A Path-based

Metric for Network Analysis Gonca Gürsun

joint work with joint work with

Natali Ruchansky, Evimaria Terzi, Mark Crovella

slide-2
SLIDE 2

Distance Metrics for Analyzing Routing

2

Shortest Path Similar Routing

slide-3
SLIDE 3

A New Metric

A new metric path- based metric that can use used for:

– Visualization of networks and routes – Characterizing routes – Detecting significant patterns – Gaining insight about routing

slide-4
SLIDE 4

We call this path-based distance metric:

Routing State Distance

4

slide-5
SLIDE 5

Measuring “Routing Similarity”

  • Conceptually, imagine capturing the entire routing state of

in a matrix N

  • N(i,j) = next hop (next neighbor node) on path from i to j
  • Each row is actually the routing table of a single node
  • Now consider the columns

5

N

slide-6
SLIDE 6

Routing State Distance (RSD)

N

rsd(a,b) = # of entries that differ in columns a and b of N If rsd(a,b) is small, most nodes think a and b are ‘in the same direction’

slide-7
SLIDE 7

Formal Definition

Given a set of destinations and a next-hop matrix s.t. is the next hop on the path from to ,

X N

j i

x x x N = ) , (

1 i

x

1

x

{ }|

) , ( ) , ( | | ) , (

2 1 2 1

x x N x x N x x x RSD

i i i

≠ =

RSD is a metric (obeys triangle inequality)

slide-8
SLIDE 8

RSD to BGP

In order to apply RSD to measured BGP paths we define to have all ASes on rows and prefixes on columns.

N

= ) , ( p a N

the next-hop from AS to prefix

a p

8

A few issues: missing and multiple next-hops.

slide-9
SLIDE 9

Dataset

  • 48 million routing paths collected from

– Routeviews and Ripe projects (publicly available) – Collected from 359 monitors

  • Some preprocessing (details omitted)

243 x 135K

– 243 source ASes, 135K destinations.

N

243 x 135K

) , ( ) , (

2 1 2 1

x x RSD x x D =

  • From compute , our distance matrix where:

D N RSD

D

135K x 135K

slide-10
SLIDE 10

Why is RSD appealing ?

Let’s look at its properties…

10

Let’s look at its properties…

slide-11
SLIDE 11

RSD vs. Hop Distance

Varies smoothly, has a gradual slope. Allows fine granularity. Defines neighborhoods. No relation between RSD and hop distance.

slide-12
SLIDE 12

) , ( ) , (

2 1 2 1

x x RSD x x D =

From compute , our distance matrix where:

D N RSD

RSD for Visualization

12

Highly structured : allows 2D visualization !

slide-13
SLIDE 13

Clear Separation! RSD for Visualization

This happens with any random sample: Internet-wide phenomena!

slide-14
SLIDE 14

What Causes Clusters in RSD?

14

First think matrix-wise (N):

  • A cluster C corresponds to set of

columns

  • Columns C being close in RSD means

they are similar in some positions S

  • N(S,C) is highly coherent

Now in routing terms:

  • Any row in N(S,C) must have the

same next hop in nearly each cell

  • The set of ASes S make similar routing

decisions w.r.t destinations C

slide-15
SLIDE 15

Small cluster “C” Large Cluster

Small cluster “C” Large cluster

A local atom is a set of destinations that are routed similarly in by a set of sources.

local atom

slide-16
SLIDE 16

Why these specific destinations?

For this investigate S …

  • Prefer a specific AS for transit to these destinations :

Hurricane Electric (HE)

  • If any path passes through HE

1. Source ASes prefer that path 2. Destination appears in the smaller cluster

Level3 Hurricane Electric Sprint

2. Destination appears in the smaller cluster

slide-17
SLIDE 17

But why do sources always route through Hurricane Electric (HE) if the option exists? HE has a relatively unique peering policy. It offers peering to ANY AS with presence in the same It offers peering to ANY AS with presence in the same exchange point.

HE’s peers prefer using HE for ANY customer of HE. S = networks that peer with HE C = HE’s customers

slide-18
SLIDE 18

Analysis with RSD uncovered a macroscopic atom. Can we formulate a systematic study to uncover

  • ther small atoms?

Can we find more clusters ?

18

Intuitively we would like a partitioning of the destinations such that RSD : In the same group is minimized Between different groups is maximized

slide-19
SLIDE 19

RS-Clustering Problem

Intuition: A partitioning of the destinations s.t. RSD : In the same group is minimized Between different groups is maximized For a partition :

P

19

For a partition :

∑ ∑

= =

− + = −

) ' ( ) ( : ' , ) ' ( ) ( : ' ,

) ' , ( ) ' , ( ) (

x P x P x x x P x P x x

x x D m x x D P Cost P P

Key Advantage: Parameter-free!!

slide-20
SLIDE 20

RS-Clustering is a hard problem …

Finding the optimal solution is NP-hard. We propose two solutions:

  • 1. Pivot Clustering
  • 1. Pivot Clustering
  • 2. Overlap Clustering
slide-21
SLIDE 21

Pivot Clustering Algorithm

Given a set of destinations , their RSD values, and a threshold parameter :

  • 1. Start from a random destination (the pivot)
  • 2. Find all that fall within to and form a cluster
  • 3. Remove cluster from and repeat

X

τ

i

x

j

x

τ

i

x

X

  • 3. Remove cluster from and repeat

Advantages: The algorithm is fast : O(|E|) Provable approximation guarantee

X

slide-22
SLIDE 22

5 largest clusters

Clusters show a clear separation Each cluster corresponds to a local atom

slide-23
SLIDE 23

Size of C Size of S Destinations C1 150 16 Ukraine 83%

  • Czech. Rep 10%

Interpreting Clusters

23

C1 150 16 Ukraine 83%

  • Czech. Rep 10%

C2 170 9 Romania 33% Poland 33% C3 126 7 India 93% US 2% C4 484 8 Russia 73% Czech rep. 10% C5 375 15 US 74% Australia 16%

slide-24
SLIDE 24
  • Reported that BGP tables provide an incomplete view of the

AS graph [Roughan et. al. ‘11]

  • Visualization based on AS degree and geo-location.

[Huffaker and k. claffy ‘10]

  • Small scale visualization through BGPlay and bgpviz

Related Work

  • Clustering on the inferred AS graph [Gkantsidis et. al. ‘03]
  • Grouping prefixes that share the same BGP paths into policy

atoms [Broido and k. claffy ‘01]

  • Methods for calculating policy atoms and characteristics

[Afek et. al. ‘02]

24

slide-25
SLIDE 25

Future Directions

  • 1. Routing Instability Detection

Analyzing next-hop matrices over time

  • 2. Anomaly Detection

Leveraging low effective rank of RSD matrix

  • 3. BGP Root Cause Analysis

Monitoring migration of prefixes between clusters

25

slide-26
SLIDE 26

Take-Away

A new metric: Routing State Distance (RSD) to measure routing similarity of destinations.

– A path-based metric – Capturing closeness useful for visualization – In-depth analysis of AS-level routing – In-depth analysis of AS-level routing – Uncovering surprising patterns

26

slide-27
SLIDE 27

Code, data, and more information is available on our website at: csr.bu.edu/rsd

27

slide-28
SLIDE 28

THANKS! Routing State Distance: A Path-based

Metric for Network Analysis Gonca Gürsun Gonca Gürsun

joint work with

Natali Ruchansky, Evimaria Terzi, Mark Crovella

slide-29
SLIDE 29

We ask ourselves if a partition is really best?

Seek a clustering that captures overlap

To address this we propose a formalism called Overlap Clustering and show that it is capable of extracting such clusters.

29

slide-30
SLIDE 30

Missing Values

Issue:

Measured BGP data consists of paths from a set of monitor ASes to a large collection of prefixes.

For any given the paths may not contain information

about ) , ( p a N ) , ( p a

30

about

Solution:

  • 1. Using only a set of high degree ASes on the rows of
  • 2. Rescaling based on known entries both

in and

N

) , ( p a N

) , (

2 1 p

p RSD ) (:,

1

p N ) (:,

2

p N

slide-31
SLIDE 31

Multiple Next-Hops

Issue:

An AS may use more than one next hop for a given prefix.

Solution:

31

Partition that AS by its quasi-routers [Muhlbauer et. al. ‘07]

slide-32
SLIDE 32

RSD Metric Proof

32

slide-33
SLIDE 33

BGPlay snapshot

33

slide-34
SLIDE 34

Multi-Dimensional Scaling

34

slide-35
SLIDE 35

35

slide-36
SLIDE 36

Overlap Clustering

[Bonchi et al ‘11]

36

slide-37
SLIDE 37

Details of Overlap Clustering

37

slide-38
SLIDE 38

Local Search of OC

38

slide-39
SLIDE 39

Post Processing of OC

39

slide-40
SLIDE 40

Cost Functions of OC

40

slide-41
SLIDE 41

Overlap Clustering

41

slide-42
SLIDE 42

Comparison with non-overlapping

42

slide-43
SLIDE 43

OC Visual

43

slide-44
SLIDE 44

Clustering Algorithm Comparison

44

slide-45
SLIDE 45

Motivating Problem

  • What paths pass through my network?

– If someone at Boston University were to send an email to Telefonica, would it go through my network?

  • Important for network planning, traffic management,

security, business intelligence.

Inferring Visibility: Who is (not) Talking to Whom?, Gürsun, Ruchansky, Terzi, Crovella, In the proc. of SIGCOMM 2012.

Surprisingly hard!

slide-46
SLIDE 46

A New Metric

A new metric path- based metric that can use used for:

We only have an incomplete view of the AS graph [Roughan et. al. ‘11]

– Visualization of networks and routes

  • Visualization based on AS degree and geo-location [Huffaker ‘10]
  • Small scale visualization through BGPlay and bgpviz
  • Small scale visualization through BGPlay and bgpviz

– Characterizing routes

  • Clustering on the inferred AS graph [Gkantsidis et. al. ‘03]

– Detecting significant patterns – Gaining insight about routing

slide-47
SLIDE 47

RSD in Practice

  • Key observation: we don’t need all of to obtain a useful metric
  • Many (most?) nodes contribute little information to RSD

– Nodes at edges of network have nearly-constant rows in H

  • Sufficient to work with a small set of well-chosen rows of

N

  • Sufficient to work with a small set of well-chosen rows of
  • Such a set is obtainable from publicly available BGP measurements

– Note that public BGP measurements require some careful handling to use properly for computing RSD

47

N

slide-48
SLIDE 48

Seeking a metric for ‘neighborhoods’

  • Typical distance used in graphs is hop count
  • Not suitable in small worlds

4 5 6 7

istance

  • 90% of destination pairs have hop distance < 5

– Clearly, typical distance metric is inappropriate

  • Need a metric that expresses ‘routed similarly in the Internet’

  • r other graph

200 400 600 800 1000 1 2 3 4

Hop Di Prefix Pairs