Peek-a-Boo, I Still See You: Why Efficient Traffic Analysis - - PowerPoint PPT Presentation

peek a boo i still see you why efficient traffic analysis
SMART_READER_LITE
LIVE PREVIEW

Peek-a-Boo, I Still See You: Why Efficient Traffic Analysis - - PowerPoint PPT Presentation

Peek-a-Boo, I Still See You: Why Efficient Traffic Analysis Countermeasures Fail Kevin P Dyer Portland State University Joint work with: Scott Coull , RedJack LLC Thomas Ristenpart , University of Wisconsin-Madison Thomas Shrimpton , Portland


slide-1
SLIDE 1

Peek-a-Boo, I Still See You: Why Efficient Traffic Analysis Countermeasures Fail

Kevin P Dyer Portland State University

Joint work with: Scott Coull, RedJack LLC Thomas Ristenpart, University of Wisconsin-Madison Thomas Shrimpton, Portland State University

1

Wednesday, May 23, 12

slide-2
SLIDE 2

Peek-a-Boo, I Still See You: Why Efficient Traffic Analysis Countermeasures Fail...

...to prevent website fingerprinting.

2

Wednesday, May 23, 12

slide-3
SLIDE 3
  • only proxy’s IP address revealed
  • encryption hides everything else

Attacker’s goal is to identify the webpage requested. The client makes a single request for a webpage over an encrypted link. Client Proxy

3

Security Intuition:

Wednesday, May 23, 12

slide-4
SLIDE 4

Attacker learns:

  • packet lengths
  • packet directions
  • packet timings}

Enables traffic analysis attacks. Client Proxy

4

[Sun et al. ’02] [Bissias et al. ‘05] [Liberatore and Levine ’06] [Herrmann et al. ’09] [Wright et al. ’09] [Lu et al. ’10] [Chen et al. ’10] [Luo et al. ’11] [Panchenko et al. ’11]

But show otherwise

Wednesday, May 23, 12

slide-5
SLIDE 5

[Liberatore and Levine ’06] Attack Scenario

SSH protected link

  • 2. Attacker knows the finite

universe of webpages. Adversary knows the universe of sites.

  • 1. Attacker knows what

client software is used.

  • 3. Attacker has labeled

training data. Proxy Client

5

Wednesday, May 23, 12

slide-6
SLIDE 6

[Liberatore and Levine ’06] Attack

naive Bayes Classifier: (packet direction, packet length) counts

Attacker can identify randomly chosen webpage with 68% accuracy! Packet lengths are a damaging side-channel

SSH protected link

k=1000 webpages

Proxy Client

6

Wednesday, May 23, 12

slide-7
SLIDE 7

Proxy Client Countermeasure

7

Example countermeasures:

  • Pad to MTU
  • Pad to random-length
  • “Mice-elephants” padding
  • Traffic Morphing [Wright et al. ’09]
  • SSL RFC-compliant padding [SSL 3.0 RFC ’99]
  • ...

Wednesday, May 23, 12

slide-8
SLIDE 8

Proxy Client Countermeasure Example countermeasures:

  • Pad to MTU
  • Pad to random-length
  • “Mice-elephants” padding
  • Traffic Morphing [Wright et al. ’09]
  • SSL RFC-compliant padding [SSL 3.0 RFC ’99]
  • ...

Do these countermeasures prevent TA attacks?

8

Wednesday, May 23, 12

slide-9
SLIDE 9

k=2 k=1000

# of webpages

8% [LL] 68% [LL] No Countermeasure Pad to MTU

Prior work does not provide a clear answer

9

Wednesday, May 23, 12

slide-10
SLIDE 10

k=2 k=1000

# of webpages

68% [LL] 8% [LL] 86% [W] 98% [W] No Countermeasure Pad to MTU

10

Prior work does not provide a clear answer

Wednesday, May 23, 12

slide-11
SLIDE 11

k=2 k=1000

# of webpages

k=775

98% [H] No Countermeasure Pad to MTU 68% [LL] 8% [LL] 86% [W] 98% [W]

11

Prior work does not provide a clear answer

Wednesday, May 23, 12

slide-12
SLIDE 12

What about

  • ther values
  • f k?

k=2 k=1000

# of webpages

k=775

98% [H] 68% [LL] 8% [LL] 86% [W] 98% [W] No Countermeasure Pad to MTU

12

Prior work does not provide a clear answer

Wednesday, May 23, 12

slide-13
SLIDE 13

What about

  • ther values
  • f k?

k=2 k=1000

# of webpages

68% [LL]

k=775

8% [LL] 98% [H] No Countermeasure Pad to MTU

13

Prior work does not provide a clear answer

Does the data set used impact efficacy?

86% [W] 98% [W]

Wednesday, May 23, 12

slide-14
SLIDE 14

What about

  • ther values
  • f k?

What about other classification strategies? k=2 k=1000

# of webpages

68% [LL]

k=775

8% [LL] 98% [H] No Countermeasure Pad to MTU

14

Prior work does not provide a clear answer

Does the data set used impact efficacy?

86% [W] 98% [W]

Wednesday, May 23, 12

slide-15
SLIDE 15

What about

  • ther values
  • f k?

k=2 k=1000

# of webpages

68% [LL]

k=775

8% [LL] 98% [H] No Countermeasure Pad to MTU

What about other countermeasures?

15

Prior work does not provide a clear answer

What about other classification strategies? Does the data set used impact efficacy?

86% [W] 98% [W]

Wednesday, May 23, 12

slide-16
SLIDE 16

Our work

  • 1. Comprehensive evaluation of traffic analysis countermeasures.

16

No countermeasure works in the LL setting.

  • 2. In-depth analysis of traffic features

Coarse features (e.g., time, bandwidth) enable high-accuracy attacks despite countermeasures

Wednesday, May 23, 12

slide-17
SLIDE 17

Our work

  • 1. Comprehensive evaluation of traffic analysis countermeasures.

17

No countermeasure works in the LL setting.

  • 2. In-depth analysis of traffic features

Pessimistic conclusion: efficient countermeasures can’t hide “coarse” features. Coarse features (e.g., time, bandwidth) enable high-accuracy attacks despite countermeasures

Wednesday, May 23, 12

slide-18
SLIDE 18

Our Comprehensive Analysis

9 countermeasures 6 classifiers 10 “universe” sizes 2 data sets 5 padding schemes 2 TLS/SSH “inspired” padding schemes 2 versions of traffic morphing [Liberatore and Levine] naive Bayes, Jaccard [Wright et al.] naive Bayes [Lu et al.] edit distance [Herrmann et al.] multinomial naive-Bayes [Panchenko et al.] support vector machine k=2,4,8,16,32,64,128,256,512,775 Liberatore and Levine (2000 websites) Herrmann et al. (775 websites)

18

Wednesday, May 23, 12

slide-19
SLIDE 19

The countermeasures

19

  • Session Random 255
  • Packet Random 255
  • Linear Padding
  • Exponential Padding
  • Mice-Elephants Padding
  • Pad to MTU
  • Packet Random MTU
  • Traffic Morphing
  • Direct Target Sampling

Wednesday, May 23, 12

slide-20
SLIDE 20

The countermeasures

20

  • Session Random 255
  • Packet Random 255
  • Linear Padding
  • Exponential Padding
  • Mice-Elephants Padding
  • Pad to MTU
  • Packet Random MTU
  • Traffic Morphing
  • Direct Target Sampling

Every packet on the wire is padded to a fixed length.

Wednesday, May 23, 12

slide-21
SLIDE 21

The countermeasures

21

  • Session Random 255
  • Packet Random 255
  • Linear Padding
  • Exponential Padding
  • Mice-Elephants Padding
  • Pad to MTU
  • Packet Random MTU
  • Traffic Morphing
  • Direct Target Sampling

[Wright et al. ’09]

  • Pads packets
  • Chops packets
  • Sends dummy packets
  • Mimics packet-length

distributions Every packet on the wire is padded to a fixed length.

Wednesday, May 23, 12

slide-22
SLIDE 22

Some representative results

22

None Pad to MTU Traffic Morphing Herrmann et al. 99% 2% 3% Liberatore and Levine 97% 41% 17% Panchenko et al. 96% 82% 81%

Classifier accuracy at k=512

Wednesday, May 23, 12

slide-23
SLIDE 23

Some representative results

23

None Pad to MTU Traffic Morphing Herrmann et al. 99% 2% 3% Liberatore and Levine 97% 41% 17% Panchenko et al. 96% 82% 81%

Classifier accuracy at k=512 Best performer with no countermeasure applied.

Wednesday, May 23, 12

slide-24
SLIDE 24

Some representative results

24

None Pad to MTU Traffic Morphing Herrmann et al. 99% 2% 3% Liberatore and Levine 97% 41% 17% Panchenko et al. 96% 82% 81%

Classifier accuracy at k=512 Best performer with countermeasures applied. Best performer with no countermeasure applied.

Wednesday, May 23, 12

slide-25
SLIDE 25

Under the hood of the [Panchenko ’11] classifier

Pad to MTU 82% at k=512 Traffic Morphing 81% at k=512

25

Wednesday, May 23, 12

slide-26
SLIDE 26

Support vector machine Features used:

Packet lengths upstream Packet lengths downstream Burst bandwidth upstream Burst bandwidth downstream HTML marker downstream Number markers upstream Number markers downstream Total bytes transmitted upstream Total bytes transmitted downstream Percentage of downstream packets Total number of packets upstream Total number of packets downstream Occurring packet lengths downstream Occurring packet lengths upstream

WHY?

Pad to MTU 82% at k=512 Traffic Morphing 81% at k=512

26

Under the hood of the [Panchenko ’11] classifier

Wednesday, May 23, 12

slide-27
SLIDE 27

Support vector machine Features used:

Packet lengths upstream Packet lengths downstream Burst bandwidth upstream Burst bandwidth downstream HTML marker downstream Number markers upstream Number markers downstream Total bytes transmitted upstream Total bytes transmitted downstream Percentage of downstream packets Total number of packets upstream Total number of packets downstream Occurring packet lengths downstream Occurring packet lengths upstream

WHY?

Pad to MTU 82% at k=512 Traffic Morphing 81% at k=512

X ?

27

Under the hood of the [Panchenko ’11] classifier

Wednesday, May 23, 12

slide-28
SLIDE 28

Support vector machine Features used:

Packet lengths upstream Packet lengths downstream Burst bandwidth upstream Burst bandwidth downstream HTML marker downstream Number markers upstream Number markers downstream Total bytes transmitted upstream Total bytes transmitted downstream Percentage of downstream packets Total number of packets upstream Total number of packets downstream Occurring packet lengths downstream Occurring packet lengths upstream

WHY?

Pad to MTU 82% at k=512 Traffic Morphing 81% at k=512

X ?

28

Under the hood of the [Panchenko ’11] classifier

Wednesday, May 23, 12

slide-29
SLIDE 29

Digging deeper: Understanding the features

29

  • 1. Identify “coarse” feature.
  • 2. Implement a feature-specific classifier.
  • 3. Run classifier against all countermeasures.

Time Bandwidth Burst Bandwidth

Wednesday, May 23, 12

slide-30
SLIDE 30

“Coarse” Traffic Features with Pad to MTU

None Pad to MTU time 2.8s 2.8s bandwidth 277KB 347KB bursts 13 13 None Pad to MTU time 5.2s 5.2s bandwidth 1794KB 2560KB bursts 107 107

30

Wednesday, May 23, 12

slide-31
SLIDE 31

Feature: Time Elapsed

Useful for small values of k “Pad to MTU” 5% at k=512

31

Wednesday, May 23, 12

slide-32
SLIDE 32

Feature: Bandwidth

More robust to large values k than the time classifier Still a “coarse” measurement “Pad to MTU” 42% at k=512

32

Wednesday, May 23, 12

slide-33
SLIDE 33

Feature: Burst Bandwidth

“Pad to MTU” 71% at k=512

33

Wednesday, May 23, 12

slide-34
SLIDE 34

34

80% at k=512 Putting coarse features together: simple naive Bayes classifier using

  • Total download time
  • Total bandwidth
  • Burst bandwidth

Wednesday, May 23, 12

slide-35
SLIDE 35

35

80% at k=512 Putting coarse features together: simple naive Bayes classifier using

  • Total download time
  • Total bandwidth
  • Burst bandwidth

82% at k=512

Wednesday, May 23, 12

slide-36
SLIDE 36

36

80% at k=512 Putting coarse features together: simple naive Bayes classifier using

  • Total download time
  • Total bandwidth
  • Burst bandwidth

82% at k=512 Coarse features are sufficient for high-accuracy classification.

Wednesday, May 23, 12

slide-37
SLIDE 37

Can countermeasures obfuscate coarse features?

37

  • fixed-length packets
  • packets at a fixed interval
  • packets for at least a fixed amount of time

In theory we can obfuscate all features by sending: ... but this destroys efficiency

Wednesday, May 23, 12

slide-38
SLIDE 38

Can countermeasures obfuscate coarse features?

time 2.8s bandwidth 277KB bursts 13 time 5.2s bandwidth 1794KB bursts 107

38

Wednesday, May 23, 12

slide-39
SLIDE 39

Can countermeasures obfuscate coarse features?

time 2.8s

bandwidth

277KB bursts 13 time 5.2s

bandwidth

1794KB bursts 107

1794/277 = 6.48

39

Wednesday, May 23, 12

slide-40
SLIDE 40

Where do we go from here?

40

Bad news: efficient countermeasure don’t work in the LL setting

Wednesday, May 23, 12

slide-41
SLIDE 41

Where do we go from here?

41

Bad news: efficient countermeasure don’t work in the LL setting Open question 1: What is the impact of real-world artifacts?

Caching, inter-leaved downloading, hurdles to training

Wednesday, May 23, 12

slide-42
SLIDE 42

Where do we go from here?

42

Bad news: efficient countermeasure don’t work in the LL setting Open question 2: Can we improve application-layer countermeasures?

HTTPOS [Luo et al. ’11], Camouflage [Panchenko et al. ’11]

Open question 1: What is the impact of real-world artifacts?

Caching, inter-leaved downloading, hurdles to training

Wednesday, May 23, 12

slide-43
SLIDE 43

Where do we go from here?

43

VoIP [Wright et al. ’07, ’08] [White et al. ’11], Web App leaks [Chen et al. ’10] ...

Bad news: efficient countermeasure don’t work in the LL setting Open question 1: What is the impact of real-world artifacts? Open question 2: Can we improve application-layer countermeasures?

HTTPOS [Luo et al. ’11], Camouflage [Panchenko et al. ’11]

Open question 3: Do these countermeasures work for other settings?

Caching, inter-leaved downloading, hurdles to training

Wednesday, May 23, 12

slide-44
SLIDE 44

Summary

Coarse features are sufficient for high-accuracy classification.

44

  • 1. None of the countermeasures work (in the LL setting)
  • 2. Countermeasures fail because they don’t conceal “coarse” features
  • 3. Efficient countermeasures can’t hide “coarse” features

Wednesday, May 23, 12