A Critical Evaluation of Website Fingerprinting Attacks Marc Juarez - - PowerPoint PPT Presentation

a critical evaluation of website fingerprinting attacks
SMART_READER_LITE
LIVE PREVIEW

A Critical Evaluation of Website Fingerprinting Attacks Marc Juarez - - PowerPoint PPT Presentation

A Critical Evaluation of Website Fingerprinting Attacks Marc Juarez 1 Sadia Afroz 2 Gunes Acar 1 Claudia Diaz 1 Rachel Greenstadt 3 1 KU Leuven, ESAT/COSIC and iMinds, Leuven, Belgium 2 UC Berkeley, US 3 Drexel University, US CCS 2014, Scottsdale,


slide-1
SLIDE 1

A Critical Evaluation of Website Fingerprinting Attacks

Marc Juarez1 Sadia Afroz2 Gunes Acar1 Claudia Diaz1 Rachel Greenstadt3

1KU Leuven, ESAT/COSIC and iMinds, Leuven, Belgium 2UC Berkeley, US 3Drexel University, US

CCS 2014, Scottsdale, AZ, USA, November 4, 2014

slide-2
SLIDE 2

Introduction: how does WF work?

User Tor Web Adversary

2

User = Alice Webpage = ??

slide-3
SLIDE 3

Why is WF so important?

  • Tor as the most advanced anonymity network
  • Allows an adversary to discover the browsing history
  • Series of successful attacks
  • Low cost to the adversary

Number of top conference publications

  • n WF

(25)

3

slide-4
SLIDE 4

Introduction: unrealistic assumptions

User Tor Web Adversary

Client settings: e.g., browsing behaviour

4

slide-5
SLIDE 5

Introduction: unrealistic assumptions

User Tor Web Adversary

Adversary: e.g., replicability

4

slide-6
SLIDE 6

Introduction: unrealistic assumptions

User Tor Web Adversary

Web: e.g., staleness

4

slide-7
SLIDE 7

Contributions

  • A critical analysis of the assumptions
  • Evaluation of variables that affect accuracy
  • An approach to reduce false positives
  • A model of the adversary’s cost

5

slide-8
SLIDE 8

Methodology

  • Based on Wang and Goldberg’s

○ Batches and k-fold cross-validation ○ Fast-levenshtein attack (SVM)

  • Comparative experiments

○ Key: isolate variable under evaluation (e.g., TBB version)

6

slide-9
SLIDE 9

Comparative experiments: example

  • Step 1:
  • Step 2:

7

slide-10
SLIDE 10

Comparative experiments: example

  • Step 1:
  • Step 2:

Train: on data with default value Test: on data with default value

  • Acc. Control

7

slide-11
SLIDE 11

Comparative experiments: example

  • Step 1:
  • Step 2:

Train: on data with default value Test: on data with value of interest Acc. Test

7

slide-12
SLIDE 12

Datasets

  • Alexa Top Sites
  • Active Linguistic Authentication Dataset (ALAD)

○ Real-world users (80 users, 40K unique URLs) ○ Training on Alexa and testing on ALAD?

8

slide-13
SLIDE 13

Datasets

  • Alexa Top Sites
  • Active Linguistic Authentication Dataset (ALAD)

○ Real-world users (80 users, 40K unique URLs) ○ Training on Alexa and testing on ALAD?

45% not in Alexa top 100 Prohibitive number of FPs

8

slide-14
SLIDE 14

Experiments: multitab browsing

  • FF users use average 2 or 3 tabs

9

slide-15
SLIDE 15

Experiments: multitab browsing

  • Experiment with 2 tabs: 0.5s, 3s, 5s
  • FF users use average 2 or 3 tabs

9

slide-16
SLIDE 16

Experiments: multitab browsing

  • Experiment with 2 tabs: 0.5s, 3s, 5s
  • FF users use average 2 or 3 tabs

9

slide-17
SLIDE 17

Experiments: multitab browsing

Background Background Foreground Foreground

  • Experiment with 2 tabs: 0.5s, 3s, 5s
  • FF users use average 2 or 3 tabs
  • Background page picked at random

9

slide-18
SLIDE 18

Experiments: multitab browsing

  • Experiment with 2 tabs: 0.5s, 3s, 5s
  • FF users use average 2 or 3 tabs
  • Background page picked at random
  • Success: detection of either page

9

slide-19
SLIDE 19

Control Test (0.5s)

77.08% 9.8% 7.9% 8.23%

Test (3s) Test (5s)

Experiments: multitab browsing

Accuracy for different time gaps

10

Time BW Tab 2 Tab 1

slide-20
SLIDE 20

Experiments: TBB versions

  • Coexisting Tor Browser Bundle (TBB) versions
  • Versions: 2.4.7, 3.5 and 3.5.2.1 (changes in RP, etc.)

11

slide-21
SLIDE 21

Experiments: TBB versions

  • Coexisting Tor Browser Bundle (TBB) versions
  • Versions: 2.4.7, 3.5 and 3.5.2.1 (changes in RP, etc.)

Control (3.5.2.1) Test (2.4.7) Test (3.5)

79.58% 66.75% 6.51%

Latest version of RP

11

slide-22
SLIDE 22

VM New York VM Leuven VM Singapore

Experiments: network conditions

12

KU Leuven DigitalOcean (virtual private servers)

slide-23
SLIDE 23

VM New York VM Leuven VM Singapore

Experiments: network conditions

66.95% 8.83%

Control (LVN) Test (NY)

12

slide-24
SLIDE 24

VM New York VM Leuven VM Singapore

Experiments: network conditions

66.95% 9.33%

Control (LVN) Test (SI)

12

slide-25
SLIDE 25

VM New York VM Leuven VM Singapore

Experiments: network conditions

76.40% 68.53%

Test (NY) Control (SI)

12

slide-26
SLIDE 26

Experiments: entry guard config.

  • What entry config. works better for training?
  • 3 configs.:

○ Fix 1 entry guard ○ Pick entry from a list of 3 entries guards (default) ○ Pick entry from all possible entries guards (Wang and Goldberg)

13

slide-27
SLIDE 27

64.40% 62.70% 70.38% any 3 entry guards 1 entry guard

Experiments: entry guard config.

Accuracy for different entry guard configurations

14

slide-28
SLIDE 28

Experiments: data staleness

Accuracy (%) Time (days)

Staleness of our collected data over 90 days

Less than 50% after 9d.

15

slide-29
SLIDE 29

Summary

16

slide-30
SLIDE 30
  • Breathalyzer test:

○ 0.88 identifies truly drunk drivers (true positives) ○ 0.05 false positives

  • Alice gives positive in the test

○ What is the probability that she is indeed drunk? (BDR) ○ Is it 0.95? Is it 0.88? Something in between?

The base rate fallacy: example

17

slide-31
SLIDE 31
  • Breathalyzer test:

○ 0.88 identifies truly drunk drivers (true positives) ○ 0.05 false positives

  • Alice gives positive in the test

○ What is the probability that she is indeed drunk? (BDR) ○ Is it 0.95? Is it 0.88? Something in between?

The base rate fallacy: example

Only 0.1!

17

slide-32
SLIDE 32

The base rate fallacy: example

  • Circumference represents

the world of drivers.

  • Each dot represents a

driver.

18

slide-33
SLIDE 33

The base rate fallacy: example

  • 1% of drivers are driving

drunk (base rate or prior).

19

slide-34
SLIDE 34

The base rate fallacy: example

  • From drunk people 88%

are identified as drunk by the test

20

slide-35
SLIDE 35

The base rate fallacy: example

  • From the not drunk

people, 5% are erroneously identified as drunk

21

slide-36
SLIDE 36
  • Alice must be within the

black circumference

  • Ratio of red dots within

the black circumference: BDR = 7/70 = 0.1 !

The base rate fallacy: example

22

slide-37
SLIDE 37
  • Base rate must be

taken into account

  • In WF:

Blue: webpages

Red: monitored

Base rate?

The base rate fallacy in WF

23

slide-38
SLIDE 38
  • Probability of visiting a monitored page?
  • “false positives matter a lot”1
  • Experiment: 35K world

The base rate fallacy in WF

1Mike Perry, “A Critique of Website Traffic Fingerprinting Attacks”, Tor project Blog, 2013. https://blog.

torproject.org/blog/critique-website-traffic-fingerprinting-attacks.

24

slide-39
SLIDE 39

Experiment: BDR in a 35K world

Size of the world

  • Uniform world
  • Non-popular pages

from ALAD

25

slide-40
SLIDE 40

Classify, but verify

  • Verification step to test classifier confidence
  • Number of FPs reduced 397-42 (400)
  • But BDR is still very low for non popular pages

26

slide-41
SLIDE 41

Cost for the adversary

  • Adversary’s cost will depend on:

○ Number of pages

27

slide-42
SLIDE 42

Versions of a page: St Valentine’s doodle

700 800 900 1000 1100 (KBytes) 10 5

28

13 Feb 2013 14 Feb 2013

Total trace size

slide-43
SLIDE 43

Cost for the adversary

  • Adversary’s cost will depend on:

○ Number of pages ○ Number of targets

29

slide-44
SLIDE 44

Non-targeted attacks

Tor Web Users

. . .

ISP router

30

slide-45
SLIDE 45

Cost for the adversary

  • Adversary’s cost will depend on:

○ Number of pages ○ Number of targets ○ Training and testing complexities

31

slide-46
SLIDE 46

Cost for the adversary

  • Adversary’s cost will depend on:

○ Number of pages ○ Number of targets ○ Training and testing complexities

  • To maintain a successful WF system is costly

32

slide-47
SLIDE 47

Limitations

  • We took samples and may not be representative of

all possible practical scenarios

  • Variables difficult to control

○ Time gap ○ Tor circuit

33

slide-48
SLIDE 48

Conclusions

  • WF attack fails in realistic conditions
  • We do not completely dismiss the attack
  • Attack can be enhanced at a greater cost
  • Defenses might be cheaper in practice

34

slide-49
SLIDE 49

Questions?

Thank you for your attention.