Diagnosing: Home Wireless & Wide-area Networks Partha - - PowerPoint PPT Presentation

diagnosing home wireless wide area networks
SMART_READER_LITE
LIVE PREVIEW

Diagnosing: Home Wireless & Wide-area Networks Partha - - PowerPoint PPT Presentation

Diagnosing: Home Wireless & Wide-area Networks Partha Kanuparthy, Constantine Dovrolis Georgia Institute of Technology 1 Monday, February 13, 2012 1 Two Parts Diagnosing home wireless networks [CCR12] Joint work between GT,


slide-1
SLIDE 1

1

Diagnosing: Home Wireless & Wide-area Networks

Partha Kanuparthy, Constantine Dovrolis Georgia Institute of Technology

1 Monday, February 13, 2012

slide-2
SLIDE 2

2

Two Parts

Diagnosing home wireless networks [CCR’12]

Joint work between GT, Telefonica, CMU

Diagnosing wide-area networks [in-progress]

Joint work with Constantine Dovrolis

and a quick update on ShaperProbe

2 Monday, February 13, 2012

slide-3
SLIDE 3

3

Diagnosing Home Wireless

3 Monday, February 13, 2012

slide-4
SLIDE 4

4

Home 802.11 Networks

Ubiquitous: most residential e2e paths start/ end with 802.11 hop Use a shared channel across devices

infrastructure, half-duplex

Co-exist with neighborhood wireless and non-802.11 devices (2.4GHz cordless, Microwave

  • vens, ...)

4 Monday, February 13, 2012

slide-5
SLIDE 5

5

802.11 Performance Problems

5 Monday, February 13, 2012

slide-6
SLIDE 6

5

802.11 Performance Problems

Wireless clients see problems:

5 Monday, February 13, 2012

slide-7
SLIDE 7

5

802.11 Performance Problems

Wireless clients see problems: Low signal strength (due to distance, fading and multipath)

5 Monday, February 13, 2012

slide-8
SLIDE 8

5

802.11 Performance Problems

Wireless clients see problems: Low signal strength (due to distance, fading and multipath) Congestion (due to shared channel)

5 Monday, February 13, 2012

slide-9
SLIDE 9

5

802.11 Performance Problems

Wireless clients see problems: Low signal strength (due to distance, fading and multipath) Congestion (due to shared channel) Hidden terminals (no carrier sense)

5 Monday, February 13, 2012

slide-10
SLIDE 10

5

802.11 Performance Problems

Wireless clients see problems: Low signal strength (due to distance, fading and multipath) Congestion (due to shared channel) Hidden terminals (no carrier sense) Non-802.11 interference (microwave, cordless, ...)

5 Monday, February 13, 2012

slide-11
SLIDE 11

6

WLAN-Probe

We diagnose 3 performance pathologies:

congestion, low signal strength, hidden terminals

Tool: WLAN-Probe

single 802.11 prober user-level: works with commodity NICs no special hardware or administrator requirements

6 Monday, February 13, 2012

slide-12
SLIDE 12

6

WLAN-Probe

We diagnose 3 performance pathologies:

congestion, low signal strength, hidden terminals

Tool: WLAN-Probe

single 802.11 prober user-level: works with commodity NICs no special hardware or administrator requirements

6 Monday, February 13, 2012

slide-13
SLIDE 13

6

WLAN-Probe

We diagnose 3 performance pathologies:

congestion, low signal strength, hidden terminals

Tool: WLAN-Probe

single 802.11 prober user-level: works with commodity NICs no special hardware or administrator requirements

6 Monday, February 13, 2012

slide-14
SLIDE 14

6

WLAN-Probe

We diagnose 3 performance pathologies:

congestion, low signal strength, hidden terminals

Tool: WLAN-Probe

single 802.11 prober user-level: works with commodity NICs no special hardware or administrator requirements

6 Monday, February 13, 2012

slide-15
SLIDE 15

7

Life of 802.11 Packet

Delays in a busy channel:

channel busy-wait delay

Delays in presence of bit errors:

L2 retransmissions random backoffs

Unavoidable variable delays:

TX-delay(s) (based on L2 TX-rate) 802.11 ACK receipt delay

7 Monday, February 13, 2012

slide-16
SLIDE 16

7

Life of 802.11 Packet

Delays in a busy channel:

channel busy-wait delay

Delays in presence of bit errors:

L2 retransmissions random backoffs

Unavoidable variable delays:

TX-delay(s) (based on L2 TX-rate) 802.11 ACK receipt delay

7 Monday, February 13, 2012

slide-17
SLIDE 17

7

Life of 802.11 Packet

Delays in a busy channel:

channel busy-wait delay

Delays in presence of bit errors:

L2 retransmissions random backoffs

Unavoidable variable delays:

TX-delay(s) (based on L2 TX-rate) 802.11 ACK receipt delay

Usually implemented in NIC firmware Can we measure these delays?

Yes!

7 Monday, February 13, 2012

slide-18
SLIDE 18

8

Access Delay

busy-wait re-TXs backoffs TX-delay ACKs

8 Monday, February 13, 2012

slide-19
SLIDE 19

8

Access Delay

busy-wait re-TXs backoffs TX-delay ACKs

rate adaptation!

8 Monday, February 13, 2012

slide-20
SLIDE 20

8

Access Delay

Captures channel “busy-ness” and channel bit errors

excludes 802.11 rate modulation effects

d = OWD - (TX delay)

busy-wait re-TXs backoffs TX-delay ACKs

rate adaptation! first L2 transmission

8 Monday, February 13, 2012

slide-21
SLIDE 21

8

Access Delay

Captures channel “busy-ness” and channel bit errors

excludes 802.11 rate modulation effects

d = OWD - (TX delay)

busy-wait re-TXs backoffs TX-delay ACKs

rate adaptation! first L2 transmission

??

8 Monday, February 13, 2012

slide-22
SLIDE 22

9

Access Delay: TX delay

d = OWD - (TX delay) TX-rate?

send 50-packet train with few tiny packets use packet pair dispersion to get TX-rate:

current busy- wait delays

9 Monday, February 13, 2012

slide-23
SLIDE 23

10

Access Delay: noise?

d = OWD - (TX delay)

10 Monday, February 13, 2012

slide-24
SLIDE 24

10

Access Delay: noise?

Dispersion underestimates:

due to re-TXs, busy-waits, etc.

d = OWD - (TX delay)

10 Monday, February 13, 2012

slide-25
SLIDE 25

10

Access Delay: noise?

Dispersion underestimates:

due to re-TXs, busy-waits, etc.

Insight: TX-rate typically remains same at timescales

  • f a single train

d = OWD - (TX delay)

10 Monday, February 13, 2012

slide-26
SLIDE 26

10

Access Delay: noise?

Dispersion underestimates:

due to re-TXs, busy-waits, etc.

Insight: TX-rate typically remains same at timescales

  • f a single train

Find a single rate for the train! d = OWD - (TX delay)

10 Monday, February 13, 2012

slide-27
SLIDE 27

11

Diagnosis

11 Monday, February 13, 2012

slide-28
SLIDE 28

12

Size-dependent Pathologies

Low signal strength Hidden terminals Congestion Bit errors increase with packet size: Higher percentile access delays show trends.

12 Monday, February 13, 2012

slide-29
SLIDE 29

13

Hidden Terminals

Hidden terminals respond to frame corruption

by random backoffs

Look at immediate neighbors of large delay or lost (L3) packets

hidden terminal: neighbor delays are small low SNR: neighbors are similar

13 Monday, February 13, 2012

slide-30
SLIDE 30

14

Hidden Terminals

Define two measures:

pu = P [ high delay or L3 loss ] pc = P [ neighbor is high delay or L3 loss | high delay or L3 loss ]

Hidden terminal:

pc ≈ pu

time Access delay time Access delay Hidden terminal(s) Low SNR

14 Monday, February 13, 2012

slide-31
SLIDE 31

15

Hidden Terminals

Hidden terminal: pc ≈ pu Low SNR: pc ≫ pu

15 Monday, February 13, 2012

slide-32
SLIDE 32

16

Summary

WLAN-Probe: tool for user-level diagnosis of 802.11 pathologies

Single 802.11 probing point Commodity NICs No kernel/admin-level changes

Extensions:

wide-area probing for 802.11 diagnosis? (“M-Lab”) passive (TCP) inference?

16 Monday, February 13, 2012

slide-33
SLIDE 33

17

Pythia: Detection, Localization, Diagnosis

  • f

Wide-area Performance Problems

17 Monday, February 13, 2012

slide-34
SLIDE 34

18

Pythia:

  • ne tool, three objectives

Data analysis tool (e.g, perfSONAR data) Funded by DoE Detection:

“noticeable loss rate between ORNL and SLAC on 07 /11/11 at 09:00:02 EDT”

Localization

“it happened at DENV-SLAC link”

Diagnosis

“it was due to insufficient router buffers”

18 Monday, February 13, 2012

slide-35
SLIDE 35

19

Pythia: Approach

Existing diagnosis systems mine patterns and dependencies in large-scale network data (e.g., AT&T’s G- RCA) Can we use domain knowledge?

useful in inter-domain diagnosis where data is not available

Architecture:

sensors do full-mesh measurements of network central server computes and renders results Infrastructure: perfSONAR (ESnet & Internet2)

19 Monday, February 13, 2012

slide-36
SLIDE 36

20

Detection

First step: “Is there a problem?” Look for deviations from baseline Delay: nonparametric kernel density estimates to locate baseline Loss and reordering: empirical baseline estimates

NY-CLEV ALBU-ATL

baseline 2.5s rise!

20 Monday, February 13, 2012

slide-37
SLIDE 37

20

Detection

First step: “Is there a problem?” Look for deviations from baseline Delay: nonparametric kernel density estimates to locate baseline Loss and reordering: empirical baseline estimates

NY-CLEV ALBU-ATL

baseline 2.5s rise!

Estimated events Events / path / day

ESnet

12 days, 33 monitors 933 0.1

Internet2

22 days, 9 monitors 2268 1.4

20 Monday, February 13, 2012

slide-38
SLIDE 38

21

Diagnosis

Follow-up to detection: “What is the root cause?” Diagnosis types:

congestion types routing effects loss nature reordering nature end-host effects

21 Monday, February 13, 2012

slide-39
SLIDE 39

22

Congestion Nature

22 Monday, February 13, 2012

slide-40
SLIDE 40

22

Congestion Nature

“Overload” : persistent queue build-up

22 Monday, February 13, 2012

slide-41
SLIDE 41

22

Congestion Nature

“Overload” : persistent queue build-up “Bursty” : intermittent queues (high jitter)

22 Monday, February 13, 2012

slide-42
SLIDE 42

22

Congestion Nature

“Overload” : persistent queue build-up “Bursty” : intermittent queues (high jitter) Very small buffer

22 Monday, February 13, 2012

slide-43
SLIDE 43

22

Congestion Nature

“Overload” : persistent queue build-up “Bursty” : intermittent queues (high jitter) Very small buffer Excessive buffer

Overload: ESnet Bursty: PlanetLab Bursty: Home link Excessive buffer: Home link

22 Monday, February 13, 2012

slide-44
SLIDE 44

23

Loss Nature

Random losses: (majority) losses do not correlate with high delays Otherwise: non-random losses

23 Monday, February 13, 2012

slide-45
SLIDE 45

23

Loss Nature

Random losses: (majority) losses do not correlate with high delays Otherwise: non-random losses

Random losses: Home link Non-random loss: ESnet

23 Monday, February 13, 2012

slide-46
SLIDE 46

24

End-host Effects

Delays and losses induced due to: context switches clock synchronization (NTP)

  • thers (e.g., PlanetLab virtualization)

24 Monday, February 13, 2012

slide-47
SLIDE 47

24

End-host Effects

Delays and losses induced due to: context switches clock synchronization (NTP)

  • thers (e.g., PlanetLab virtualization)

Internet2: context switch PlanetLab: end-host noise

24 Monday, February 13, 2012

slide-48
SLIDE 48

25

The Diagnosis Tree

Input: Detected Events (delay, loss, reordering)

25 Monday, February 13, 2012

slide-49
SLIDE 49

25

The Diagnosis Tree

End-host effects NTP vs. route events Loss events Congestion Reordering nature

Input: Detected Events (delay, loss, reordering)

Not shown: Unknown type

25 Monday, February 13, 2012

slide-50
SLIDE 50

26

Pythia In-Progress

More performance problem types... Unsupervised clustering to identify unknown events Open-source system implementation:

Detection, localization, diagnosis Interfacing with data: ESnet, I2, PL-testbed, broadband networks Front-end for operators

26 Monday, February 13, 2012

slide-51
SLIDE 51

27

ShaperProbe: update

27 Monday, February 13, 2012

slide-52
SLIDE 52

28

The FCC 2011 report

FCC broadband study (2011) found that: “many cable service tiers exceed 100% of the advertised upstream rate”

We revisit this statement FCC/SamKnows measured the sustained rate using a 30s TCP stream If shaping kicks in after 25s, the sustained speed can’t be measured

28 Monday, February 13, 2012

slide-53
SLIDE 53

29

How long should we test?

Capacity (Mbps) Shaping rate (Mbps) Burst duration (s) Measured/sustained (%) 3.5 1 17 100 Comcast upstream 5 2 15 , 31 100, 250 Comcast upstream 9 5.5 26 163 14.5 10 19 100

29 Monday, February 13, 2012

slide-54
SLIDE 54

29

How long should we test?

Capacity (Mbps) Shaping rate (Mbps) Burst duration (s) Measured/sustained (%) 3.5 1 17 100 Comcast upstream 5 2 15 , 31 100, 250 Comcast upstream 9 5.5 26 163 14.5 10 19 100

Cox: FCC data

Capacity (Mbps) Shaping rate (Mbps) Burst duration (s) Capacity/sustained (%) Cox upstream 1.5 2 50 133

Measured Advertised

29 Monday, February 13, 2012

slide-55
SLIDE 55

30

Thank You!

Diagnosing: Home Wireless & Wide-area Networks

Partha Kanuparthy, Constantine Dovrolis Georgia Institute of Technology

30 Monday, February 13, 2012

slide-56
SLIDE 56

31

Localization

Follow-up to detection: “Which link is bad?” Link/path performance levels discrete: e.g., high delay, medium delay, low delay Localization: minimum number of bad links that can explain bad paths use greedy heuristic to solve iteratively

31 Monday, February 13, 2012

slide-57
SLIDE 57

31

Localization

Follow-up to detection: “Which link is bad?” Link/path performance levels discrete: e.g., high delay, medium delay, low delay Localization: minimum number of bad links that can explain bad paths use greedy heuristic to solve iteratively

path: CHIC to LOSA path: ATLA to KANS path: HOUS to LOSA

+ =

Internet2 event: 28th Feb 2011, 00: 10:51 GMT

ge-6-2-0.0-rtr.KANS ge-6-1-0.0-rtr.LOSA

31 Monday, February 13, 2012