Active Server Sibling Resolution Robert Beverly, Arthur Berger - - PowerPoint PPT Presentation

active server sibling resolution
SMART_READER_LITE
LIVE PREVIEW

Active Server Sibling Resolution Robert Beverly, Arthur Berger - - PowerPoint PPT Presentation

Active Server Sibling Resolution Robert Beverly, Arthur Berger Naval Postgraduate School MIT/Akamai rbeverly@nps.edu, awberger@mit.edu January 7, 2013 NPS IPv6 Measurement Meeting 2013 Beverly & Berger (NPS) NPS-SIX 2013 1 / 24


slide-1
SLIDE 1

Active Server Sibling Resolution

Robert Beverly, Arthur Berger∗

Naval Postgraduate School

∗MIT/Akamai

rbeverly@nps.edu, awberger@mit.edu

January 7, 2013 NPS IPv6 Measurement Meeting 2013

Beverly & Berger (NPS) NPS-SIX 2013 1 / 24

slide-2
SLIDE 2

Sibling Resolution Intro

Outline

1

Sibling Resolution Intro

2

Methodology

3

Results

Beverly & Berger (NPS) NPS-SIX 2013 2 / 24

slide-3
SLIDE 3

Sibling Resolution Intro

Sibling Resolution

New Problem We Term “Sibling Resolution:” Given a candidate (IPv4, IPv6) address pair, determine if these addresses are assigned to the same cluster, device, or interface. Sibling resolution may be either active or passive. Lots of prior work on passive sibling associations: e.g. web-bugs, javascript, etc. Prior work focuses on clients (adoption, performance) This work:

Targeted, active test: on-demand for any given pair Infrastructure: finding server siblings

Beverly & Berger (NPS) NPS-SIX 2013 3 / 24

slide-4
SLIDE 4

Sibling Resolution Intro

Motivation

Why? IPv4 and IPv6 expected to co-exist (for a long while?) → dual-stacked devices

Track adoption (and dis-adoption) Track IPv6 evolution

Security:

Inter-dependence of IPv6 on IPv4 (and vice-versa) e.g. attack on IPv6 resource affecting IPv4 service

Performance:

Measurements of IPv4 vs. IPv6 performance Desire to isolate path vs. host performance Correlating geolocation, reputation, etc with IPv4 host counterpart.

Beverly & Berger (NPS) NPS-SIX 2013 4 / 24

slide-5
SLIDE 5

Methodology

Outline

1

Sibling Resolution Intro

2

Methodology

3

Results

Beverly & Berger (NPS) NPS-SIX 2013 5 / 24

slide-6
SLIDE 6

Methodology

Targeted, Active Technique

Targeted, Active Technique Intuition: IPv4 and IPv6 share a common transport-layer (TCP) stack Leverage prior work on physical device fingerprinting using TCP timestamp clockskew [Kohno 2005] TCP timestamp option: “TCP Extensions for High Performance” [RFC1323, May 1992] Universal support for TCP timestamps (modulo middleboxes, proxies). Enabled by default.

Beverly & Berger (NPS) NPS-SIX 2013 6 / 24

slide-7
SLIDE 7

Methodology

TCP Timestamp Clock Skew

TCP Timestamp Clock Skew TS value: 4 bytes containing current clock Note: RFC does not specify value of TS (assume millisec for now) Note: TS clock = system clock Note: TS clock frequently unaffected by system clock adjustments (e.g. NTP) Basic Idea: Probe over time. Fingerprint is clock skew (and remote clock resolution).

Beverly & Berger (NPS) NPS-SIX 2013 7 / 24

slide-8
SLIDE 8

Methodology

TCP Timestamp Clock Skew

Some Details Must be able to connect to remote TCP service on each host Periodically connect to TCP service. Given a sequence of timestamp offsets, use linear programming to obtain a line that minimizes distance to points, constrained to be under data points. Obtain: y4 = α4x + β4 and y6 = α6x + β6 Angle between lines then: θ(α4, α6) = tan−1

  • α4 − α6

1 + α4α6

  • Siblings if: θ < τ

Beverly & Berger (NPS) NPS-SIX 2013 8 / 24

slide-9
SLIDE 9

Methodology Examples

Example

Example Gather 4 timestamp series:

www.caida.org (v4 and v6) www.ripe.net (v4 and v6)

Beverly & Berger (NPS) NPS-SIX 2013 9 / 24

slide-10
SLIDE 10

Methodology Examples

Example

  • 70
  • 60
  • 50
  • 40
  • 30
  • 20
  • 10

10 20 30 40 200 400 600 800 1000

  • bserved offset (msec)

measurement time(sec) Host A (IPv6) Host B (IPv4) α=0.029938 β=-3.519 α=-0.058276 β=-1.139

CAIDA IPv6 vs. RIPE IPv4 Observe different skew slopes (one negative) Different timestamp granularity y = 0.029938x equates to skew of ≈ 1.8ms / minute, or ≈ 15 minutes per year. False siblings!

Beverly & Berger (NPS) NPS-SIX 2013 10 / 24

slide-11
SLIDE 11

Methodology Examples

Example

  • 70
  • 60
  • 50
  • 40
  • 30
  • 20
  • 10

10 20 30 40 200 400 600 800 1000

  • bserved offset (msec)

measurement time(sec) Host A (IPv6) Host B (IPv4) α=0.029938 β=-3.519 α=-0.058276 β=-1.139

False Siblings

  • 70
  • 60
  • 50
  • 40
  • 30
  • 20
  • 10

10 200 400 600 800 1000

  • bserved offset (msec)

measurement time(sec) Host A (IPv6) Host A (IPv4) α=-0.058253 β=-1.178 α=-0.058276 β=-1.139

True Siblings CAIDA IPv4 vs. CAIDA IPv6: identical slopes (θ = 0.0098) CAIDA IPv6 vs. RIPE IPv4: different slopes (θ = 31.947)

Beverly & Berger (NPS) NPS-SIX 2013 11 / 24

slide-12
SLIDE 12

Methodology Examples

Complications

  • 50

50 100 150 200 250 10000 20000 30000 40000 50000 60000 70000

  • bserved offset (msec)

measurement time(sec) 193.110.128.199 2001:67c:2294:1000::f199

www.marca.com (#6 on alexa ipv6) Not always so distinct of a difference! Slope angle difference: θ = 2.046

Beverly & Berger (NPS) NPS-SIX 2013 12 / 24

slide-13
SLIDE 13

Methodology Examples

Complications

5e+08 1e+09 1.5e+09 2e+09 2.5e+09 3e+09 3.5e+09 4e+09 4.5e+09 50 100 150 200 TCP Timestamp TCP Packet Sample apache.org V4 apache.org V6

www.apache.com Raw TCP timestamps Deterministically random and monotonic for a single connection Random across

  • connections. Looks like

noise to us.

Beverly & Berger (NPS) NPS-SIX 2013 13 / 24

slide-14
SLIDE 14

Methodology Examples

Complications

  • 0.005

0.005 0.01 0.015 0.02 0.025 10000 20000 30000 40000 50000 60000 70000

  • bserved offset (msec)

measurement time(sec) 203.5.76.12 2001:388:1:5062::cb05:4c0c

What’s going on here?

Beverly & Berger (NPS) NPS-SIX 2013 14 / 24

slide-15
SLIDE 15

Methodology Examples

Complications

  • 2e+16
  • 1.8e+16
  • 1.6e+16
  • 1.4e+16
  • 1.2e+16
  • 1e+16
  • 8e+15
  • 6e+15
  • 4e+15
  • 2e+15

2e+15 10000 20000 30000 40000 50000 60000 70000

  • bserved offset (msec)

measurement time(sec) 209.85.225.160 2001:4860:b007::a0

Also detects load balancing among servers But how to deal with it?

Beverly & Berger (NPS) NPS-SIX 2013 15 / 24

slide-16
SLIDE 16

Results

Outline

1

Sibling Resolution Intro

2

Methodology

3

Results

Beverly & Berger (NPS) NPS-SIX 2013 16 / 24

slide-17
SLIDE 17

Results

Machine Sibling Inference

Machine Sibling Inference Methodology: Analyze Alexa top 100,000 websites Pull A and AAAA records 1398 (≈ 1.4%) have IPv6 DNS Repeatedly fetch root HTML page via IPv4 and IPv6 via deterministic IP address Record all packets

Beverly & Berger (NPS) NPS-SIX 2013 17 / 24

slide-18
SLIDE 18

Results

Machine Sibling Inference

Alexa 100K Targeted Machine-Sibling Inference Case Count v4 and v6 non-monotonic (possible siblings) 109 (7.8%) v4 or v6 non-monotonic (non-siblings) 140 (10.0%) v4 and v6 no timestamps (possible siblings) 94 (6.7%) v4 or v6 no timestamps (non-sibling) 101 (7.2%) Our technique fails when timestamps are not monotonic across TCP flows (e.g. load-balancer or BSD OS) Or, when timestamps are not supported (e.g. middlebox) Note, can disambiguate non-siblings

Beverly & Berger (NPS) NPS-SIX 2013 18 / 24

slide-19
SLIDE 19

Results

Machine Sibling Inference

Alexa 100K Targeted Machine-Sibling Inference Case Count v4 and v6 non-monotonic (possible siblings) 109 (7.8%) v4 or v6 non-monotonic (non-siblings) 140 (10.0%) v4 and v6 no timestamps (possible siblings) 94 (6.7%) v4 or v6 no timestamps (non-sibling) 101 (7.2%) Skew-based siblings 839 (60.0%) Skew-based non-siblings 115 (8.3%) Total 1398 (100%) 25.5% (356) non-siblings 43% of skew-based non-siblings are in different ASes

Beverly & Berger (NPS) NPS-SIX 2013 19 / 24

slide-20
SLIDE 20

Results

DNS Machine Siblings

DNS Machine Siblings With respect to collecting DNS siblings, would like to differentiate between machine and equipment siblings. Tie passive and active DNS collection with skew-based inference. For addresses with an DNS equivalence class:

Add IP to machine sibling group with small θ < 1.0 Else θ ≥ 1.0, create new sibling group with single IP . Until all IPs of equipment equivalence class clustered

Beverly & Berger (NPS) NPS-SIX 2013 20 / 24

slide-21
SLIDE 21

Results

DNS Machine Siblings

DNS Machine Siblings

0.2 0.4 0.6 0.8 1 1 2 3 4 5 6 Fraction Equipment Equiv Classes Num Machine Equiv Classes

Relationship between equipment siblings and machine siblings.

Beverly & Berger (NPS) NPS-SIX 2013 21 / 24

slide-22
SLIDE 22

Results

Evaluating Sibling Inference Accuracy

Evaluating Inference Accuracy Seek to understand the accuracy of timestamp-based sibling inference Use ground-truth dual-stacked Akamai machines No load-balancers or middleboxes Experiment: 100 known-siblings, 100 known non-siblings (random v4/v6 pairs drawn from Akamai population) Hardest scenario: single organization, similar boxes, same

  • perating system, etc.

Beverly & Berger (NPS) NPS-SIX 2013 22 / 24

slide-23
SLIDE 23

Results

Evaluating Sibling Inference Accuracy

Evaluating Inference Accuracy Actual Prediction sibling non sibling′ 84 TP 13 FN non′ 43 FP 54 TN Threshold τ = 0.002 gives best results! 71% accuracy, 66% precision, 87% recall (f-score: 0.75)

Beverly & Berger (NPS) NPS-SIX 2013 23 / 24

slide-24
SLIDE 24

Results

Evaluating Sibling Inference Accuracy

Evaluating Inference Accuracy Actual Prediction sibling non sibling′ 97 TP FN non′ 94 FP 3 TN No false negatives w/ τ = 0.05 (but more FP’s) 52% accuracy, 51% precision, 100% recall (f-score: 0.67)

Beverly & Berger (NPS) NPS-SIX 2013 24 / 24

slide-25
SLIDE 25

Results

Current Work

Current Work Quantify whether vantage point imparts any difference on results Refine inference algorithm to deal with load-balancers Refine algorithm to produce better accuracy, eliminate false positives

Beverly & Berger (NPS) NPS-SIX 2013 25 / 24