W e b s i t e F i n g e r p r i n tj n g Claudia Diaz KU Leuven - - PowerPoint PPT Presentation

w e b s i t e f i n g e r p r i n tj n g
SMART_READER_LITE
LIVE PREVIEW

W e b s i t e F i n g e r p r i n tj n g Claudia Diaz KU Leuven - - PowerPoint PPT Presentation

W e b s i t e F i n g e r p r i n tj n g Claudia Diaz KU Leuven COSIC (With thanks to Marc Juarez and Bekah Overdorf) Summer School on real-world crypto and privacy June 2017 Outline Website Fingerprintjng for htups sites Website


slide-1
SLIDE 1

Website Fingerprintjng

Claudia Diaz KU Leuven – COSIC

(With thanks to Marc Juarez and Bekah Overdorf)

Summer School on real-world crypto and privacy June 2017

slide-2
SLIDE 2

Outline

  • Website Fingerprintjng for htups sites
  • Website Fingerprintjng for Tor
  • From the lab to reality: reviewing assumptjons
  • Fingerprintability of hidden services
slide-3
SLIDE 3

htups

slide-4
SLIDE 4

htups htups train test

slide-5
SLIDE 5

Side channel leaks in web applicatjons (Chen et al, 2010)

  • Interactjve pages that are responsive to user actjons such as

choices in drop-down menus, mouse clicks, typing

  • Examples: healthcare diagnosis, taxatjon, web search (auto-

complete)

  • Characteristjcs:

– Stateful communicatjon: transitjons to next states depend both on the

current state and on its input

– Low entropy input: small input space – Uniqueness of traffjc: disparate sizes and patuerns for each possibility

5

slide-6
SLIDE 6

“I know why you went to the clinic” (Miller et al, 2014)

  • Hidden Markov Models used to leverage link structure in

websites

  • Impact of caching and cookies was 17% (train with one
  • ptjon, test with the other)
slide-7
SLIDE 7

Tor

7

Guar d Exi t

directory server directory server download public (onion) keys

Middl e

slide-8
SLIDE 8

Tor

Tor Web

slide-9
SLIDE 9

Website Fingerprintjng

Tor Web

slide-10
SLIDE 10

Website Fingerprintjng

Tor Web

slide-11
SLIDE 11

Website Fingerprintjng

Tor Web

slide-12
SLIDE 12

Website Fingerprintjng

Tor Web

Open world

slide-13
SLIDE 13

Tor Hidden (“Onion”) Services (HS)

Client Introduction Point (IP) Rendezvous Point (RP) HS-IP HS-RP xyz.onion HSDir Client-RP

HS-RP circuits are distinguishable from normal circuits (Kwon et al, 2015) Size of the HS world is estimated at a few thousands (closed world!)

slide-14
SLIDE 14

State of the art atuacks

  • kNN
  • CUMUL
  • k-Fingerprintjng
slide-15
SLIDE 15

kNN classifjer (Wang et al, 2014)

  • Features

– 3,000 – total size, total tjme, number of packets, packet ordering – the lengths of the fjrst 20 packets – traffjc bursts (sequences of packets in the same directjon)

  • Classifjcatjon

– k-NN – Tune weights of the distance metric that minimizes the distance among

instances that belong to the same site.

  • Results

– 90% - 95% accuracy on a closed-world of 100 non-onion service

websites.

slide-16
SLIDE 16

kNN

slide-17
SLIDE 17

CUMUL (Panchenko et al, 2016)

  • Features

– a 104-coordinate vector formed by the number of bytes

and packets in each directjon and 100 interpolatjon points of the cumulatjve sum of packet lengths (with directjon)

  • Classifjcatjon

– Radial Basis Functjon kernel (RBF) SVM

  • Results

– 90% - 93% for 100 Non HS sites – Open world of 9,000 pages

slide-18
SLIDE 18

SVM

slide-19
SLIDE 19

k-Fingerprintjng (Hayes et al, 2016)

  • Features

– 175 – Timing and Size features such as #packets/second

  • Classifjcatjon

– Random Forest (RF) + k-NN

  • Results

– 90% accuracy on 30 onion services – Open world of 100,000 pages

slide-20
SLIDE 20

Random Forest

  • Train decision trees with web traffjc features
  • Training set is randomized per tree
  • Random Forest is an ensemble of decision trees
  • Use Random Forest output as the fjngerprint of a

website download

slide-21
SLIDE 21

Why Do We Care?

  • Tor is the most advanced anonymity network
  • WF allows an adversary to discover the browsing history
  • Can be deployed by a low-resource adversary (that Tor

aims to protect against)

  • Series of successful atuacks in the lab
  • … how concerned should we be about these atuacks in

practjce?

– Critjcal review of WF atuacks (Juarez et al, 2014)

slide-22
SLIDE 22

Assumptjons

User Tor Web Adversary

Client settjngs: e.g., browser version, single tab browsing

slide-23
SLIDE 23

Efgect of multj-tab browsing

  • FF users use average 2 or 3 tabs
  • Experiment with 2 tabs: 0.5s, 3s, 5s
  • Success: detectjon of either page
slide-24
SLIDE 24

Experiments multj-tab

Control Test (0.5s)

77.08% 9.8% 7.9% 8.23%

Test (3s) Test (5s)

Accuracy for difgerent tjme gaps

Time BW Tab 2 Tab 1

slide-25
SLIDE 25

Experiments: TBB version

  • TBB: Tor Browser Bundle
  • Several versions coexist at any given tjme

Control (3.5.2.1) Test (2.4.7) Test (3.5) 79.58% 66.75% 6.51%

slide-26
SLIDE 26

Assumptjons

User Tor Web Adversary

Adversary: e.g., replicability

slide-27
SLIDE 27

VM New York VM Leuven VM Singapore

Experiments: network conditjons

12

KU Leuven DigitalOcean (virtual private servers)

slide-28
SLIDE 28

VM New York VM Leuven VM Singapore

Experiments: network conditjons

66.95% 8.83% Control (LVN) Test (NY)

12

slide-29
SLIDE 29

VM New York VM Leuven VM Singapore

Experiments: network conditjons

66.95% 9.33% Control (LVN) Test (SI)

12

slide-30
SLIDE 30

VM New York VM Leuven VM Singapore

Experiments: network conditjons

76.40% 68.53% Test (NY) Control (SI)

12

slide-31
SLIDE 31

Assumptjons

User Tor Web Adversary

Web: e.g., staleness

slide-32
SLIDE 32

Data staleness

Accuracy (%) Time (days)

Less than 50% afuer 9d.

slide-33
SLIDE 33

Efgect of false negatjves: Base rate fallacy

  • Breathalyzer test:

– 0.88 identjfjes truly drunk drivers (true positjves) – 0.05 false positjves

  • Alice gives positjve in the test

– What is the probability that she is indeed drunk? (BDR) – Is it 0.95? Is it 0.88? Something in between?

Only 0.1!

slide-34
SLIDE 34

The base rate fallacy: example

  • Circumference represents the

world of drivers.

  • Each dot represents a driver.

18

slide-35
SLIDE 35

The base rate fallacy: example

  • 1% of drivers are driving

drunk (base rate or prior).

19

slide-36
SLIDE 36

The base rate fallacy: example

  • From drunk people 88% are

identjfjed as drunk by the test

20

slide-37
SLIDE 37

The base rate fallacy: example

  • From the not drunk people,

5% are erroneously identjfjed as drunk

21

slide-38
SLIDE 38
  • Alice must be within the

black circumference

  • Ratjo of red dots within the

black circumference: BDR = 7/70 = 0.1 !

The base rate fallacy: example

22

slide-39
SLIDE 39
  • Base rate must be taken

into account

  • In WF:

– Blue: webpages – Red: monitored – Base rate?

The base rate fallacy in WF

23

slide-40
SLIDE 40

Experiment: BDR in a 35K world

  • World of 35K sites
  • 4 target pages
  • Uniform prior
  • For 30K sites BDR is 0.4%
slide-41
SLIDE 41

Disparate impact

  • WF normally atuacks report average success
  • But…

– Are certain websites more susceptjble to website fjngerprintjng atuacks than others? – What makes some sites more vulnerable to the atuack than others?

slide-42
SLIDE 42

Misclassifjcatjons of onion services: Sites that are “safe”

slide-43
SLIDE 43

Misclassifjcatjons: Sites that are “safe”

Some sites are hidden from all methods! Some sites are hidden from all methods!

slide-44
SLIDE 44

Median of total incoming packet size for misclassifjed instances

. . 2 5 . 5 . 7 5 . 1 . . 2 5 . 5 . 7 5 . 1

True Site − Median Predicted Site − Median

slide-45
SLIDE 45

Site-level Feature Analysis

  • Trace features are not always helpful
  • Can we determine what characteristjcs of a

website afgect its fjngerprintability?

  • Site-Level Features:

– Total HTTP download size – htup duratjon – screenshot size – number of scripts – …

slide-46
SLIDE 46

Site Level Feature Analysis

slide-47
SLIDE 47

WF countermeasures

  • Network layer

– Add padding

  • Constant rate is unreasonable
  • Leakage: how to optjmize padding?

– Add latency to disrupt the traffjc patuern

  • Bad idea
  • Page design

– Small size – Dynamism

slide-48
SLIDE 48

To conclude

  • WF can be deployed by adversaries with only local access to

the communicatjons network

  • WF seriously undermines the protectjon ofgered by htups
  • WF threatens the anonymity propertjes of Tor

– Though it’s unclear to which extent lab results would hold in the

wild

– The atuack is costly in terms of resources

  • Disparate impact: some pages are more fjngerprintable than
  • thers, which is not captured if you only look at average

results

  • Countermeasures involve additjonal traffjc and/or dynamism