website fingerprinting attacks and defenses in the tor
play

Website Fingerprinting Attacks and Defenses in the Tor Onion Space - PowerPoint PPT Presentation

Website Fingerprinting Attacks and Defenses in the Tor Onion Space Marc Juarez imec-COSIC KU Leuven COSIC Seminar - 23rd October 2017, Leuven Introduction Contents of this presentation: - PETS17: Website Fingerprinting Defenses at


  1. Website Fingerprinting Attacks and Defenses in the Tor Onion Space Marc Juarez imec-COSIC KU Leuven COSIC Seminar - 23rd October 2017, Leuven

  2. Introduction • Contents of this presentation: - PETS’17: “Website Fingerprinting Defenses at the Application Layer” - CCS’17: “How Unique is Your Onion?” 2

  3. What is Website Fingerprinting (WF)? Adversary Tor network WWW Middle User Exit Entry 3

  4. Website Fingerprinting: deployment Training Testing 4

  5. Why Do We Care? • Tor is the most popular anonymity network and aims to protect against such adversaries. • Series of successful attacks with accuracies greater than 90% • … how concerned should we be in practice ? – Critical review of WF attacks (Juarez et al, 2014)

  6. Closed vs Open World Closed world Open world 6

  7. Tor Hidden Services (HS) User xyz.onion • HS: user visits xyz.onion without resolving it to an IP • Examples: Wikileaks, GlobaLeaks, Facebook, ... 7

  8. Website Fingerprinting on Hidden Services (HSes) • WF adversary can distinguish HSes from regular sites • Website Fingerprinting in HSes is more threatening: - Fewer sites makes HSes more identifiable (~ closed world) - HS users vulnerable because content is sensitive 8

  9. The SecureDrop case • Freedom of the Press Foundation • Whistleblowing platform • Vulnerable to website fingerprinting (?) 9

  10. Website Fingerprinting Defenses at the Application Layer Giovanni Cherubin 1 Jamie Hayes 2 Marc Juarez 3 1 Royal Holloway University of London 2 University College London 3 imec-COSIC KU Leuven Presented in PETS 2017, Minneapolis, MN, USA

  11. Website Fingerprinting defenses WF Defenses BuFLO Tamaraw Tor network CS-BuFLO WTF-PAD … Middle User Entry Dummy These are TCP packets or Tor messages Real 11

  12. Application-layer Defenses • Existing defenses are designed at the network layer Key observation: identifying info originates at app layer! Identifying info Web content ‘Latent‘ features: F 1 , …, F n HTTP(S) T(·) Tor Last layer of encryption TLS Observed features: O 1 , ..., O n TCP Adversary ... 12

  13. Pros and Cons of app-layer Defenses The main advantage is that they are easier to implement: • do not depend on Tor to be implemented Cons: • padding runs end-to-end may require server collaboration… but HSes have incentives! • 13

  14. LLaMA ALPaCA • Client-side (FF add-on) • Server-side (first one) • Applied on hosted content • Applied on website requests • More bandwidth overhead • More latency overhead (two different solutions, not a client-server solution) 14

  15. ALPaCA Original Target Morphed • Abstract web pages as num objects and object sizes : pad them to match a target page • Does not impact user experience: e.g., comments in HTML/JS, images’ metadata, “ display: none” styles 15

  16. ALPaCA strategies (1) Example: protect a SecureDrop page - Strategy 1: target page is Facebook securedrop securedrop.png fake.css index.html facebook index.html facebook.png style.css Padding 16

  17. ALPaCA strategies (2) - Strategy 2: pad to an “anonymity set” target page securedrop securedrop.png index.html fake.css facebook facebook.png index.html style.css target Padding Defines num objects and object sizes by: Deterministic: next multiple of λ, δ ● ● Probabilistic: sampled from empirical distribution 17

  18. Evaluation: methodology • Collect with and without defense: 100 HSes (cached) ○ Security: accuracy of attacks kNN, k-Fingerprinting (kFP), CUMUL ○ Performance: overheads - latency (extra delay) - bandwidth (extra padding/time) 18

  19. ALPaCA: results • From 40% to 60% decrease in accuracy • 50% latency and 85% bandwidth overheads 19

  20. 20

  21. How Unique is Your Onion? An Analysis of the Fingerprintability of Tor Onion Services Rebekah Overdorf 1 Marc Juarez 2 Gunes Acar 2 Rachel Greenstadt 1 Claudia Diaz 2 1 Drexel University 2 imec-COSIC KU Leuven To be presented in CCS 2017, Dallas, TX, USA

  22. Disparate impact • WF normally attacks report average success • But… – Are certain websites more susceptible to website fingerprinting attacks than others? – What makes some sites more vulnerable to the attack than others? Credit: Claudia Diaz

  23. State-of-the-Art Attacks - k-NN (Wang et al., 2015) - CUMUL (Panchenko et al., 2016) - k-Fingerprinting (Hayes and Danezis, 2016) 23

  24. k-NN (Wang et al. 2015) • Features – number of outgoing packets in spans of 30 packets – the lengths of the first 20 packets – traffic bursts (sequences of packets in the same direction) • Classification – k -NN – Tune weights of the distance metric that minimizes the distance among instances that belong to the same site. • Results – From 90% to 95% accuracy on a closed-world of 100 non-hidden service websites. Credit: Bekah Overdorf

  25. CUMUL (Panchenko et al. 2016) • Features – 100 interpolation points of the cumulative sum of packet lengths (with direction) • Classification – Radial Basis Function kernel (RBF) SVM • Results – From 90% to 93% for 100 Non HS sites. Credit: Bekah Overdorf

  26. k-Fingerprinting (Hayes and Danezis 2016) • Features – Timing and size features in the literature • Classification – Random Forest (RF) + k-NN • Results – 90% accuracy on 30 hidden services Credit: Bekah Overdorf

  27. Data • Crawled 790 sites over Tor (homepages) • Removed – Offline sites – Failed visits – Duplicates • 482 sites fit our criteria with 70 visits each Credit: Bekah Overdorf

  28. Credit: Bekah Overdorf

  29. SecureDrop sites • There was a SecureDrop site in our dataset: – Project On Gov’t Oversight’ (POGO) • CUMUL achieved 99%!!! – As compared to 80% in average

  30. Misclassifications of Hidden Services Credit: Bekah Overdorf

  31. Misclassifications of Hidden Services Credit: Bekah Overdorf

  32. Median of total incoming packet size for misclassified instances Credit: Bekah Overdorf

  33. Low-level Feature Analysis • Intra-class variance: similarity between instances of the same site. – Lower intra-class variances improves identification. • Inter-class variance: similarity between instances of different sites. – Higher inter-class variances improves identification. Top features: 1. Total Size of all Outgoing Packets 2. Total Size of Incoming Packets 3. Number of Incoming Packets 4. Number of Outgoing Packets

  34. Site-level Feature Analysis • Can we determine what characteristics of a website affect its fingerprintability ? • Site-Level Features: – Number of embedded resources – Number of fonts – Screenshot size – Use of a CMS? – …

  35. Can we predict if a site will be fingerprintable? “Meta-classifier” Random forest regressor

  36. Results: importance of site-level features

  37. Take aways • WF threatens Tor, especially its Hidden services. • Disparate impact: some pages are more fingerprintable than others (there is a bias in reporting average results). • WF defenses that alter the website design (app layer) are easier to implement and as effective as network-layer defenses. • Changes of the paget that protect against WF: – Small (e.g., fewer resources) and dynamic.

  38. Take aways • WF threatens Tor, especially its Hidden services. • Disparate impact: some pages are more fingerprintable than others (there is a bias in reporting average results). • WF defenses that alter the website design (app layer) are easier Future work Re-design ALPaCA to follow to implement and as effective as network-layer defenses. these guidelines. • Changes of the paget that protect against WF: – Small (e.g., fewer resources) and dynamic.

  39. Software and Data • HSes have incentives to support server-side defenses: SecureDrop has implemented a prototype of ALPaCA • ALPaCA is running on a HS: 3tmaadslguc72xc2.onion • Source code defenses: github.com/camelids • Source code and data for fingerprintability analysis: cosic.esat.kuleuven.be/fingerprintability 40

  40. The HS world • Exploratory crawl: 5,000 HSes (from Ahmia.fi) • Stats for the HS world (from intercepted HTTP headers) - Distribution of types, sizes and number of resources • Most HSes are small compared to an average website • Few HSes have any JS or 3rd-party content - JS: less than 13% Assumption: no JS - 3rd party content: less than 20% Assumption: no 3rd parties 41

  41. Limitations and Future Work • ALPaCA can only make sites bigger, but not smaller • What’s the optimal padding at the app layer? Lack of a thorough feature analysis. • How do the distributions change over time? How do we update our defenses accordingly? - How does the strategy need be adapted as HSes adopt our defense(s)? 42

  42. LLaMA Client Server • Inspired by Randomized Pipelining C 1 Goal: randomize HTTP requests C 2 • Same goal from a FF add-on: δ C 1 ’ - Random delays ( δ) C 2 - Repeat previous requests (C 1 ) 43

  43. LLaMA: results • Accuracy drops between 20% and 30% • Less than 10% latency and bandwidth overheads 44

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend