TCP CONGESTION SIGNATURES Srikanth Sundaresan (Princeton Univ.) - - PowerPoint PPT Presentation

tcp congestion signatures
SMART_READER_LITE
LIVE PREVIEW

TCP CONGESTION SIGNATURES Srikanth Sundaresan (Princeton Univ.) - - PowerPoint PPT Presentation

TCP CONGESTION SIGNATURES Srikanth Sundaresan (Princeton Univ.) Amogh Dhamdhere (CAIDA/UCSD) kc Claffy (CAIDA/UCSD) Mark Allman (ICSI) 1 w w w . cai da. or Typical Speed Tests Dont Tell Us Much 2 w w w . cai da. or Typical Speed


slide-1
SLIDE 1

w w w . cai da.

  • r

TCP CONGESTION SIGNATURES

1

Srikanth Sundaresan (Princeton Univ.) Amogh Dhamdhere (CAIDA/UCSD) kc Claffy (CAIDA/UCSD) Mark Allman (ICSI)

slide-2
SLIDE 2

w w w . cai da.

  • r

Typical Speed Tests Don’t Tell Us Much

2

slide-3
SLIDE 3

w w w . cai da.

  • r

Typical Speed Tests Don’t Tell Us Much

2

slide-4
SLIDE 4

w w w . cai da.

  • r

Typical Speed Tests Don’t Tell Us Much

2

slide-5
SLIDE 5

w w w . cai da.

  • r

Typical Speed Tests Don’t Tell Us Much

  • Upload and download throughput measurements: no

information beyond that

2

slide-6
SLIDE 6

w w w . cai da.

  • r

Typical Speed Tests Don’t Tell Us Much

2

What type of congestion did the TCP flow experience?

slide-7
SLIDE 7

w w w . cai da.

  • r

Two Potential Sources of Congestion in the End-to-end Path

3

slide-8
SLIDE 8

w w w . cai da.

  • r

Two Potential Sources of Congestion in the End-to-end Path

  • Self-induced congestion
  • Clear path, the flow itself induced congestion
  • eg: last-mile access link

3

slide-9
SLIDE 9

w w w . cai da.

  • r

Two Potential Sources of Congestion in the End-to-end Path

  • Self-induced congestion
  • Clear path, the flow itself induced congestion
  • eg: last-mile access link
  • External congestion
  • Flow starts on an already congested path
  • eg: congested interconnect

3

slide-10
SLIDE 10

w w w . cai da.

  • r

Two Potential Sources of Congestion in the End-to-end Path

  • Self-induced congestion
  • Clear path, the flow itself induced congestion
  • eg: last-mile access link
  • External congestion
  • Flow starts on an already congested path
  • eg: congested interconnect

3

Distinguishing the two cases has implications for users / ISPs / regulators

slide-11
SLIDE 11

w w w . cai da.

  • r

How can we distinguish the two?

  • Cannot distinguish using just throughput numbers
  • Access plan rates vary widely, and are typically not available to content /

speed test providers

  • eg: Speed test reports 5 Mbps – is that the access link rate (DSL), or a

congested path?

4

slide-12
SLIDE 12

w w w . cai da.

  • r

How can we distinguish the two?

  • Cannot distinguish using just throughput numbers
  • Access plan rates vary widely, and are typically not available to content /

speed test providers

  • eg: Speed test reports 5 Mbps – is that the access link rate (DSL), or a

congested path?

4

We can use the dynamics of TCP’s startup phase, i.e., Congestion Signatures

slide-13
SLIDE 13

w w w . cai da.

  • r

TCP’s RTT Congestion Signatures

5

slide-14
SLIDE 14

w w w . cai da.

  • r

TCP’s RTT Congestion Signatures

  • Flows experiencing self-induced congestion fill up an

empty buffer during slow start

  • Hence increase the TCP flow RTT

5

slide-15
SLIDE 15

w w w . cai da.

  • r

TCP’s RTT Congestion Signatures

  • Flows experiencing self-induced congestion fill up an

empty buffer during slow start

  • Hence increase the TCP flow RTT
  • Externally congested flows encounter an already full buffer
  • Less potential for RTT increases

5

slide-16
SLIDE 16

w w w . cai da.

  • r

TCP’s RTT Congestion Signatures

  • Flows experiencing self-induced congestion fill up an

empty buffer during slow start

  • Hence increase the TCP flow RTT
  • Externally congested flows encounter an already full buffer
  • Less potential for RTT increases
  • Self-induced congestion therefore has higher RTT variance

compared to external congestion

5

slide-17
SLIDE 17

w w w . cai da.

  • r

TCP’s RTT Congestion Signatures

  • Flows experiencing self-induced congestion fill up an

empty buffer during slow start

  • Hence increase the TCP flow RTT
  • Externally congested flows encounter an already full buffer
  • Less potential for RTT increases
  • Self-induced congestion therefore has higher RTT variance

compared to external congestion

5

We can quantify this using Max-Min and CoV of RTT

slide-18
SLIDE 18

w w w . cai da.

  • r

Example Controlled Experiment

  • 20 Mbps “access” link

with 100 ms buffer

  • 1 Gbps “interconnect”

link with 50 ms buffer

  • Self-induced

congestion flows have higher values for both metrics and are clearly distinguishable

Max-Min RTT

6

CoV RTT

101 102 0.0 0.2 0.4 0.6 0.8 1.0

CDF External Self

10−2 10−1 100 0.0 0.2 0.4 0.6 0.8 1.0

CDF External Self

slide-19
SLIDE 19

w w w . cai da.

  • r

Example Controlled Experiment

  • 20 Mbps “access” link

with 100 ms buffer

  • 1 Gbps “interconnect”

link with 50 ms buffer

  • Self-induced

congestion flows have higher values for both metrics and are clearly distinguishable

The two types of congestion exhibit widely contrasting behaviors

Max-Min RTT

6

CoV RTT

101 102 0.0 0.2 0.4 0.6 0.8 1.0

CDF External Self

10−2 10−1 100 0.0 0.2 0.4 0.6 0.8 1.0

CDF External Self

slide-20
SLIDE 20

w w w . cai da.

  • r

Model

  • Max-min and CoV of RTT derived from RTT samples

during slow start

  • We feed the two metrics into a simple Decision Tree
  • We control the depth of the tree to a low value to minimize

complexity

  • We build the decision tree classifier using controlled

experiments and apply it to real-world data

7

slide-21
SLIDE 21

w w w . cai da.

  • r

Validating the Method: Step 1- Controlled Experiments

8

Internet R2 R1

Server 1 Server 2 Server 3 Server 4 Pi 1 Pi 2 100 Mbps Shaped “access” 1 Gbps

slide-22
SLIDE 22

w w w . cai da.

  • r

Validating the Method: Step 1- Controlled Experiments

8

Internet R2 R1

Server 1 Server 2 Server 3 Server 4 Pi 1 Pi 2 100 Mbps Shaped “access” 1 Gbps

Background cross-traffic Interconnect cross-traffic

slide-23
SLIDE 23

w w w . cai da.

  • r

Validating the Method: Step 1- Controlled Experiments

8

Internet R2 R1

Server 1 Server 2 Server 3 Server 4 Pi 1 Pi 2 100 Mbps Shaped “access” 1 Gbps

Throughput tests Background cross-traffic Interconnect cross-traffic

slide-24
SLIDE 24

w w w . cai da.

  • r

It’s Real

9

slide-25
SLIDE 25

w w w . cai da.

  • r

It’s Real

9

Fantastic Cabling effort Post-it defined networking

slide-26
SLIDE 26

w w w . cai da.

  • r

Validating the Method: Step 1- Controlled Experiments

  • Emulated access link + “core” link
  • Wide range of access link throughputs, buffer sizes, loss rates, cross-

traffic (background and congestion-inducing)

  • Can accurately label flows in training data as “self” or “externally”

congested

10

Internet R2 R1

Server 1 Server 2 Server 3 Server 4 Pi 1 Pi 2 100 Mbps Shaped “access” 1 Gbps

Throughput tests Background cross-traffic Interconnect cross-traffic

slide-27
SLIDE 27

w w w . cai da.

  • r

Validating the Method: Step 1- Controlled Experiments

11

Internet R2 R1

Server 1 Server 2 Server 3 Server 4 Pi 1 Pi 2 100 Mbps Shaped “access” 1 Gbps

Throughput tests Background cross-traffic Interconnect cross-traffic

High accuracy: precision and recall > 80% robust to model settings

slide-28
SLIDE 28

w w w . cai da.

  • r

Validating the Method: Step 2

  • From Ark VP in ISP A identified congested link with ISP B using

TSLP*

12

*Luckie et al. “Challenges in Inferring Internet Interdomain Congestion”, IMC 2014

ISP A ISP B

Ark VP

slide-29
SLIDE 29

w w w . cai da.

  • r

Validating the Method: Step 2

  • From Ark VP in ISP A identified congested link with ISP B using

TSLP*

12

*Luckie et al. “Challenges in Inferring Internet Interdomain Congestion”, IMC 2014

ISP A ISP B

congested link

Ark VP

slide-30
SLIDE 30

w w w . cai da.

  • r

Validating the Method: Step 2

13

ISP A ISP B

M-lab NDT server congested link

Ark VP

  • Periodic NDT tests from Ark VP to M-Lab NDT server “behind”

the congested interdomain link

slide-31
SLIDE 31

w w w . cai da.

  • r

Validation of the Method: Step 2

14

5 10 15 20 25 30 02/18 02/25 03/04 03/11 d/l Mbps 10 20 30 40 50 60 70 02/18 02/25 03/04 03/11 TSLP latency (far side)

Strong correlation between throughput and TSLP latency: flows during elevated TSLP latency labeled as “externally” congested

slide-32
SLIDE 32

w w w . cai da.

  • r

Validation of the Method: Step 2

14

5 10 15 20 25 30 02/18 02/25 03/04 03/11 d/l Mbps 10 20 30 40 50 60 70 02/18 02/25 03/04 03/11 TSLP latency (far side)

Strong correlation between throughput and TSLP latency: flows during elevated TSLP latency labeled as “externally” congested “Externally” congested

slide-33
SLIDE 33

w w w . cai da.

  • r

Validation of the Method: Step 2

14

5 10 15 20 25 30 02/18 02/25 03/04 03/11 d/l Mbps 10 20 30 40 50 60 70 02/18 02/25 03/04 03/11 TSLP latency (far side)

Strong correlation between throughput and TSLP latency: flows during elevated TSLP latency labeled as “externally” congested “Externally” congested “self” congested

slide-34
SLIDE 34

w w w . cai da.

  • r

Validation of the Method: Step 2

15

5 10 15 20 25 30 02/18 02/25 03/04 03/11 d/l Mbps 10 20 30 40 50 60 70 02/18 02/25 03/04 03/11 TSLP latency (far side)

75%+ accuracy in detecting external congestion, 100% accuracy for self-induced congestion

slide-35
SLIDE 35

w w w . cai da.

  • r

Validation of the Method: Step 3

  • We use Measurement Lab’s NDT test data for real-world

validation

  • Cogent interconnect issue in late 2013/early 2014
  • NDT tests to Cogent servers saw significant drops in throughput during

peak hours

  • Several major U.S. ISPs were affected, except Cox
  • The problem was identified as congested interconnects

16

slide-36
SLIDE 36

w w w . cai da.

  • r

Using the M-lab Data

17

January 2014 April 2014

5 10 15 20

Hour of day (local)

10 20 30 40

Mbps

Comcast Cox TimeWarner Verizon 5 10 15 20

Hour of day (local)

10 20 30 40

Mbps

Comcast Cox TimeWarner Verizon

slide-37
SLIDE 37

w w w . cai da.

  • r

Using the M-lab Data

17

January 2014 Drop in peak-hour throughput for for Comcast, TWC, Verizon April 2014

5 10 15 20

Hour of day (local)

10 20 30 40

Mbps

Comcast Cox TimeWarner Verizon 5 10 15 20

Hour of day (local)

10 20 30 40

Mbps

Comcast Cox TimeWarner Verizon

slide-38
SLIDE 38

w w w . cai da.

  • r

Using the M-lab Data

17

January 2014 Cox not affected Drop in peak-hour throughput for for Comcast, TWC, Verizon April 2014

5 10 15 20

Hour of day (local)

10 20 30 40

Mbps

Comcast Cox TimeWarner Verizon 5 10 15 20

Hour of day (local)

10 20 30 40

Mbps

Comcast Cox TimeWarner Verizon

slide-39
SLIDE 39

w w w . cai da.

  • r

Using the M-lab Data

17

January 2014 Cox not affected Drop in peak-hour throughput for for Comcast, TWC, Verizon April 2014 Interconnection dispute resolved; no diurnal effect

5 10 15 20

Hour of day (local)

10 20 30 40

Mbps

Comcast Cox TimeWarner Verizon 5 10 15 20

Hour of day (local)

10 20 30 40

Mbps

Comcast Cox TimeWarner Verizon

slide-40
SLIDE 40

w w w . cai da.

  • r

Using the M-lab Data

18

Peak hour tests in Jan/Feb 2014 are likely “externally” congested Off-peak tests in Mar/Apr 2014 are likely “self” congested

5 10 15 20

Hour of day (local)

10 20 30 40

Mbps

Comcast Cox TimeWarner Verizon 5 10 15 20

Hour of day (local)

10 20 30 40

Mbps

Comcast Cox TimeWarner Verizon

slide-41
SLIDE 41

w w w . cai da.

  • r

But didn’t you just say it’s hard to infer congestion using throughput tests??

19

slide-42
SLIDE 42

w w w . cai da.

  • r

But didn’t you just say it’s hard to infer congestion using throughput tests??

  • Yes :)

19

slide-43
SLIDE 43

w w w . cai da.

  • r

But didn’t you just say it’s hard to infer congestion using throughput tests??

  • Yes :)
  • For that reason, our labeling is broad and coarse. All tests

labeled “external” may not be traversing congested interconnects

19

slide-44
SLIDE 44

w w w . cai da.

  • r

But didn’t you just say it’s hard to infer congestion using throughput tests??

  • Yes :)
  • For that reason, our labeling is broad and coarse. All tests

labeled “external” may not be traversing congested interconnects

  • We do not expect the technique to identify all peak hour

tests as externally congested, and vice versa

  • Looking for qualitative differences

19

slide-45
SLIDE 45

w w w . cai da.

  • r

But didn’t you just say it’s hard to infer congestion using throughput tests??

  • Yes :)
  • For that reason, our labeling is broad and coarse. All tests

labeled “external” may not be traversing congested interconnects

  • We do not expect the technique to identify all peak hour

tests as externally congested, and vice versa

  • Looking for qualitative differences
  • The general observations about congestion were verified

by other sources, e.g., CAIDA’s TSLP measurements

19

slide-46
SLIDE 46

w w w . cai da.

  • r

Applying the Model to M-lab data

20

Comcast TimeWarner VerizonCox Comcast TimeWarner VerizonCox Comcast TimeWarner VerizonCox 0.2 0.4 0.6 0.8 1.0

% self-induced congestion

Cogent (LAX) Cogent (LGA) Level3 (ATL)

Jan-Feb Mar-Apr

slide-47
SLIDE 47

w w w . cai da.

  • r

Applying the Model to M-lab data

20

Comcast TimeWarner VerizonCox Comcast TimeWarner VerizonCox Comcast TimeWarner VerizonCox 0.2 0.4 0.6 0.8 1.0

% self-induced congestion

Cogent (LAX) Cogent (LGA) Level3 (ATL)

Jan-Feb Mar-Apr

Much lower incidences of self-induced congestion for Cogent in Jan/Feb 2014 as compared to Mar/Apr

slide-48
SLIDE 48

w w w . cai da.

  • r

Applying the Model to M-lab data

21

Comcast TimeWarner VerizonCox Comcast TimeWarner VerizonCox Comcast TimeWarner VerizonCox 0.2 0.4 0.6 0.8 1.0

% self-induced congestion

Cogent (LAX) Cogent (LGA) Level3 (ATL)

Jan-Feb Mar-Apr

Level3 does not show significant differences, was not affected by interconnection disputes

slide-49
SLIDE 49

w w w . cai da.

  • r

Applying the Model to M-lab data

22

Comcast TimeWarner VerizonCox Comcast TimeWarner VerizonCox Comcast TimeWarner VerizonCox 0.2 0.4 0.6 0.8 1.0

% self-induced congestion

Cogent (LAX) Cogent (LGA) Level3 (ATL)

Jan-Feb Mar-Apr

Cox does not show significant differences, was not affected by interconnection disputes

slide-50
SLIDE 50

w w w . cai da.

  • r

Looking at Throughput

  • What throughput should we observe for “self” and

“external” congested flows?

  • With congested interconnects affecting many flows, both

“self” and “external” should see similar throughput

  • Without congested interconnects affecting many flows,

“self” congested throughput should follow access link speeds, generally higher than “externally” congested

23

slide-51
SLIDE 51

w w w . cai da.

  • r

Looking at Throughput

  • Avg. throughput of self-induced congestion flows

significantly higher than externally congested in Mar-Apr (no interconnection disputes)

24

Comcast TimeWarner Verizon Cox 5 10 15 20

Throughput (Mbps) Jan-Feb Self Jan-Feb External Mar-Apr Self Mar-Apr External

slide-52
SLIDE 52

w w w . cai da.

  • r

Looking at Throughput

  • Avg. throughput of self-induced congestion flows

significantly higher than externally congested in Mar-Apr (no interconnection disputes)

24

Comcast TimeWarner Verizon Cox 5 10 15 20

Throughput (Mbps) Jan-Feb Self Jan-Feb External Mar-Apr Self Mar-Apr External

Both “self” and “external” get similar throughput

slide-53
SLIDE 53

w w w . cai da.

  • r

Looking at Throughput

  • Avg. throughput of self-induced congestion flows

significantly higher than externally congested in Mar-Apr (no interconnection disputes)

25

Comcast TimeWarner Verizon Cox 5 10 15 20

Throughput (Mbps) Jan-Feb Self Jan-Feb External Mar-Apr Self Mar-Apr External

slide-54
SLIDE 54

w w w . cai da.

  • r

Looking at Throughput

  • Avg. throughput of self-induced congestion flows

significantly higher than externally congested in Mar-Apr (no interconnection disputes)

25

Comcast TimeWarner Verizon Cox 5 10 15 20

Throughput (Mbps) Jan-Feb Self Jan-Feb External Mar-Apr Self Mar-Apr External

“Self” get higher throughput than “external”

slide-55
SLIDE 55

w w w . cai da.

  • r

Takeaways

  • It is possible to distinguish two kinds of congestion: self-

induced vs. externally congested

  • The difference is important to identify the solution
  • Upgrade service plan? Or talk to ISP?
  • Also for regulatory purposes
  • Simple, accurate technique using RTT during TCP slow

start dynamics

  • Can be easily computed using packet captures or other tools such as

Web100 (future work)

26

slide-56
SLIDE 56

w w w . cai da.

  • r

Limitations

  • Relies on buffering effect
  • May not work on TCP variants that minimize buffer occupancy, e.g., BBR
  • Only uses slow start dynamics
  • Might be confounded by flows that perform one way during slow start

but differently afterward

  • Real-world validation relies on coarsely labeled data
  • It would be great to validate on more real-world data!

27

slide-57
SLIDE 57

w w w . cai da.

  • r

Thanks! Questions?

28