Website Fingerprinting Attacks and Defenses in the Tor Onion Space - - PowerPoint PPT Presentation

website fingerprinting attacks and defenses in the tor
SMART_READER_LITE
LIVE PREVIEW

Website Fingerprinting Attacks and Defenses in the Tor Onion Space - - PowerPoint PPT Presentation

Website Fingerprinting Attacks and Defenses in the Tor Onion Space Marc Juarez imec-COSIC KU Leuven COSIC Seminar - 23rd October 2017, Leuven Introduction Contents of this presentation: - PETS17: Website Fingerprinting Defenses at


slide-1
SLIDE 1

Website Fingerprinting Attacks and Defenses in the Tor Onion Space

Marc Juarez

imec-COSIC KU Leuven

COSIC Seminar - 23rd October 2017, Leuven

slide-2
SLIDE 2

Introduction

2

  • Contents of this presentation:
  • PETS’17: “Website Fingerprinting Defenses at the Application Layer”
  • CCS’17: “How Unique is Your Onion?”
slide-3
SLIDE 3

What is Website Fingerprinting (WF)?

3

Adversary Tor network User WWW Entry Middle Exit

slide-4
SLIDE 4

Website Fingerprinting: deployment

4

Training Testing

slide-5
SLIDE 5

Why Do We Care?

  • Tor is the most popular anonymity network and aims to

protect against such adversaries.

  • Series of successful attacks with accuracies greater than 90%
  • … how concerned should we be in practice?

– Critical review of WF attacks (Juarez et al, 2014)

slide-6
SLIDE 6

Closed vs Open World

6

Closed world Open world

slide-7
SLIDE 7

Tor Hidden Services (HS)

7

xyz.onion User

  • HS: user visits xyz.onion without resolving it to an IP
  • Examples: Wikileaks, GlobaLeaks, Facebook, ...
slide-8
SLIDE 8

Website Fingerprinting on Hidden Services (HSes)

  • WF adversary can distinguish HSes from regular sites
  • Website Fingerprinting in HSes is more threatening:
  • Fewer sites makes HSes more identifiable (~ closed world)
  • HS users vulnerable because content is sensitive

8

slide-9
SLIDE 9

The SecureDrop case

  • Freedom of the Press Foundation
  • Whistleblowing platform
  • Vulnerable to website fingerprinting (?)

9

slide-10
SLIDE 10

Website Fingerprinting Defenses at the Application Layer

Giovanni Cherubin1 Jamie Hayes2 Marc Juarez3

1Royal Holloway University of London 2University College London 3imec-COSIC KU Leuven

Presented in PETS 2017, Minneapolis, MN, USA

slide-11
SLIDE 11

Website Fingerprinting defenses

11

Tor network Entry Middle Dummy Real User These are TCP packets or Tor messages

WF Defenses BuFLO Tamaraw CS-BuFLO WTF-PAD …

slide-12
SLIDE 12
  • Existing defenses are designed at the network layer

Key observation: identifying info originates at app layer!

Application-layer Defenses

HTTP(S) Tor TCP ... TLS

Adversary

Web content

‘Latent‘ features: F1, …, Fn Observed features: O1, ..., On

Identifying info Last layer of encryption

T(·)

12

slide-13
SLIDE 13

The main advantage is that they are easier to implement:

  • do not depend on Tor to be implemented

Cons:

  • padding runs end-to-end
  • may require server collaboration… but HSes have incentives!

13

Pros and Cons of app-layer Defenses

slide-14
SLIDE 14

14

LLaMA ALPaCA

  • Server-side (first one)
  • Applied on hosted content
  • More bandwidth overhead
  • Client-side (FF add-on)
  • Applied on website requests
  • More latency overhead

(two different solutions, not a client-server solution)

slide-15
SLIDE 15

ALPaCA

  • Abstract web pages as num objects and object sizes:

pad them to match a target page

  • Does not impact user experience:

e.g., comments in HTML/JS, images’ metadata, “display: none” styles

15

Original Morphed Target

slide-16
SLIDE 16

ALPaCA strategies (1)

securedrop.png index.html fake.css index.html facebook.png style.css

Example: protect a SecureDrop page

  • Strategy 1: target page is Facebook

securedrop facebook

16 Padding

slide-17
SLIDE 17

ALPaCA strategies (2)

  • Strategy 2: pad to an “anonymity set” target page

target

securedrop.png index.html fake.css index.html facebook.png style.css

securedrop facebook

Defines num objects and object sizes by:

  • Deterministic: next multiple of λ, δ
  • Probabilistic: sampled from empirical distribution

17 Padding

slide-18
SLIDE 18
  • Collect with and without defense: 100 HSes (cached)

○ Security: accuracy of attacks kNN, k-Fingerprinting (kFP), CUMUL ○ Performance: overheads

  • latency (extra delay)
  • bandwidth (extra padding/time)

18

Evaluation: methodology

slide-19
SLIDE 19

ALPaCA: results

19

  • From 40% to 60% decrease in accuracy
  • 50% latency and 85% bandwidth overheads
slide-20
SLIDE 20

20

slide-21
SLIDE 21

How Unique is Your Onion? An Analysis of the Fingerprintability of Tor Onion Services

Rebekah Overdorf1 Marc Juarez2 Gunes Acar2 Rachel Greenstadt1 Claudia Diaz2

1Drexel University 2imec-COSIC KU Leuven

To be presented in CCS 2017, Dallas, TX, USA

slide-22
SLIDE 22

Disparate impact

  • WF normally attacks report average success
  • But…

– Are certain websites more susceptible to website fingerprinting attacks than others? – What makes some sites more vulnerable to the attack than others?

Credit: Claudia Diaz

slide-23
SLIDE 23

State-of-the-Art Attacks

  • k-NN (Wang et al., 2015)
  • CUMUL (Panchenko et al., 2016)
  • k-Fingerprinting (Hayes and Danezis, 2016)

23

slide-24
SLIDE 24

k-NN (Wang et al. 2015)

  • Features

– number of outgoing packets in spans of 30 packets – the lengths of the first 20 packets – traffic bursts (sequences of packets in the same direction)

  • Classification

– k-NN – Tune weights of the distance metric that minimizes the distance among instances that belong to the same site.

  • Results

– From 90% to 95% accuracy on a closed-world of 100 non-hidden service websites.

Credit: Bekah Overdorf

slide-25
SLIDE 25

CUMUL (Panchenko et al. 2016)

  • Features

– 100 interpolation points of the cumulative sum of packet lengths (with direction)

  • Classification

– Radial Basis Function kernel (RBF) SVM

  • Results

– From 90% to 93% for 100 Non HS sites.

Credit: Bekah Overdorf

slide-26
SLIDE 26

k-Fingerprinting (Hayes and Danezis 2016)

  • Features

– Timing and size features in the literature

  • Classification

– Random Forest (RF) + k-NN

  • Results

– 90% accuracy on 30 hidden services

Credit: Bekah Overdorf

slide-27
SLIDE 27

Data

  • Crawled 790 sites over Tor (homepages)
  • Removed

– Offline sites – Failed visits – Duplicates

  • 482 sites fit our criteria with 70 visits each

Credit: Bekah Overdorf

slide-28
SLIDE 28

Credit: Bekah Overdorf

slide-29
SLIDE 29

SecureDrop sites

  • There was a SecureDrop site in our dataset:

– Project On Gov’t Oversight’ (POGO)

  • CUMUL achieved 99%!!!

– As compared to 80% in average

slide-30
SLIDE 30

Misclassifications of Hidden Services

Credit: Bekah Overdorf

slide-31
SLIDE 31

Misclassifications of Hidden Services

Credit: Bekah Overdorf

slide-32
SLIDE 32

Median of total incoming packet size for misclassified instances

Credit: Bekah Overdorf

slide-33
SLIDE 33

Low-level Feature Analysis

  • Intra-class variance: similarity between instances of the same site.

– Lower intra-class variances improves identification.

  • Inter-class variance: similarity between instances of different sites.

– Higher inter-class variances improves identification. Top features:

1. Total Size of all Outgoing Packets 2. Total Size of Incoming Packets 3. Number of Incoming Packets 4. Number of Outgoing Packets

slide-34
SLIDE 34

Site-level Feature Analysis

  • Can we determine what characteristics of a website affect its

fingerprintability?

  • Site-Level Features:

– Number of embedded resources – Number of fonts – Screenshot size – Use of a CMS? – …

slide-35
SLIDE 35

Can we predict if a site will be fingerprintable?

“Meta-classifier”

Random forest regressor

slide-36
SLIDE 36

Results: importance of site-level features

slide-37
SLIDE 37
slide-38
SLIDE 38

Take aways

  • WF threatens Tor, especially its Hidden services.
  • Disparate impact: some pages are more fingerprintable than
  • thers (there is a bias in reporting average results).
  • WF defenses that alter the website design (app layer) are easier

to implement and as effective as network-layer defenses.

  • Changes of the paget that protect against WF:

– Small (e.g., fewer resources) and dynamic.

slide-39
SLIDE 39

Take aways

  • WF threatens Tor, especially its Hidden services.
  • Disparate impact: some pages are more fingerprintable than
  • thers (there is a bias in reporting average results).
  • WF defenses that alter the website design (app layer) are easier

to implement and as effective as network-layer defenses.

  • Changes of the paget that protect against WF:

– Small (e.g., fewer resources) and dynamic.

Future work Re-design ALPaCA to follow these guidelines.

slide-40
SLIDE 40
  • HSes have incentives to support server-side defenses:

SecureDrop has implemented a prototype of ALPaCA

  • ALPaCA is running on a HS: 3tmaadslguc72xc2.onion
  • Source code defenses: github.com/camelids
  • Source code and data for fingerprintability analysis:

cosic.esat.kuleuven.be/fingerprintability

40

Software and Data

slide-41
SLIDE 41
  • Exploratory crawl: 5,000 HSes (from Ahmia.fi)
  • Stats for the HS world (from intercepted HTTP headers)
  • Distribution of types, sizes and number of resources
  • Most HSes are small compared to an average website
  • Few HSes have any JS or 3rd-party content
  • JS: less than 13% Assumption: no JS
  • 3rd party content: less than 20% Assumption: no 3rd parties

41

The HS world

slide-42
SLIDE 42
  • ALPaCA can only make sites bigger, but not smaller
  • What’s the optimal padding at the app layer? Lack of a

thorough feature analysis.

  • How do the distributions change over time? How do we

update our defenses accordingly?

  • How does the strategy need be adapted as HSes adopt our

defense(s)?

42

Limitations and Future Work

slide-43
SLIDE 43

LLaMA

  • Inspired by Randomized Pipelining

Goal: randomize HTTP requests

  • Same goal from a FF add-on:
  • Random delays (δ)
  • Repeat previous requests (C1)

43

C1 Client Server C2 C1’ C2 δ

slide-44
SLIDE 44

LLaMA: results

44

  • Accuracy drops between 20% and 30%
  • Less than 10% latency and bandwidth overheads