Website fingerprinting on Tor: attacks and defenses Claudia Diaz KU - PowerPoint PPT Presentation

Website fingerprinting on Tor: attacks and defenses Claudia Diaz KU Leuven Joint work with: Marc Juarez , Sadia Afroz, Gunes Acar, Rachel Greenstadt, Mohsen Imani, Mike Perry, Matthew Wright Post-Snowden Cryptography Workshop Brussels, December 10, 2015

Metadata It’s not just about communications content: Sigint Time, duration, size, identities, location, pattern Exposed by default in communications protocols Bulk collection: size much smaller than content Machine readable, cheap to analyze, highly revealing Much lower level of legal protection Dedicated systems to protect metadata Tor network NSA program “Egotistical Giraffe”

Introduction: how does WF work? Tor Web User User = Alice Adversary Webpage = ?? 2

Why is WF so important? �� Tor as the most advanced anonymity network (according to NSA) �� Allows an adversary to recover users web browsing history �� Series of successful attacks �� Weak adversary model (local adversary) Number of top conference publications on WF (30) 3

Introduction: assumptions Client settings : Tor Browsing behaviour: which Web pages, one at the time User Adversary 4

Introduction: assumptions Adversary : Tor Replicability system Web config, parsing (start/ User end page), clean traces Adversary 4

Introduction: assumptions Web : Tor No personalisation, Web or staleness User Adversary 4

Methodology Based on Wang and Goldberg’s Batches and k-fold cross-validation Fast-levenshtein attack (SVM) Comparative experiments Key: isolate variable under evaluation (e.g., TBB version) 6

Comparative experiments: example ● Step 1: ● Step 2: 7

Comparative experiments: example ● Step 1: Train: on data with default value Accuracy Test: on data with default value ● Step 2: Control 7

Comparative experiments: example ● Step 1: Train: on data with default value Accuracy ● Step 2: Test: on data with value of interest Test 7

Experiments: multitab browsing ● FF users use average 2 or 3 tabs 9

Experiments: multitab browsing ● FF users use average 2 or 3 tabs ● Experiment with 2 tabs: 0.5s, 3s, 5s 9

Experiments: multitab browsing Foreground Foreground Background Background ● FF users use average 2 or 3 tabs ● Experiment with 2 tabs: 0.5s, 3s, 5s ● Background page picked at random 9

Experiments: multitab browsing ● FF users use average 2 or 3 tabs ● Experiment with 2 tabs: 0.5s, 3s, 5s ● Background page picked at random for a batch ● Success: detection of either page 9

Experiments: multitab browsing Accuracy for different time gaps Tab 1 Tab 2 77.08% BW Time 9.8% 7.9% 8.23% Control Test (3s) Test (0.5s) Test (5s) 10

Experiments: TBB versions Coexisting Tor Browser Bundle (TBB) versions Versions: 2.4.7, 3.5 and 3.5.2.1 (changes in RP, etc.) 11

Experiments: TBB versions Coexisting Tor Browser Bundle (TBB) versions Versions: 2.4.7, 3.5 and 3.5.2.1 (changes in RP, etc.) 79.58% 66.75% 6.51% Control Test Test (3.5.2.1) (3.5) (2.4.7) 11

Experiments: network conditions VM Leuven VM New York VM Singapore KU Leuven DigitalOcean (virtual private servers) 12

Experiments: network conditions VM Leuven VM New York VM Singapore 66.95% 8.83% Control (LVN) Test (NY) 12

Experiments: network conditions VM Leuven VM New York VM Singapore 66.95% 9.33% Control (LVN) Test (SI) 12

Experiments: network conditions VM Leuven VM New York VM Singapore 76.40% 68.53% Control (SI) Test (NY) 12

Experiments: data staleness Staleness of our collected data over 90 days (Alexa Top 100) Less than 50% after 9d. Accuracy (%) Time (days) 15

Summary 16

Closed vs Open world Early prior WF works considered closed world of pages users may browse (train and test on that world) In practice: in the Tor case, extremely large universe of web pages How likely is the user (a priori) to visit a target web page? - If adversary has a good prior, the attack becomes “confirmation attack” - BUT may be hard for adversary to have a good prior, particularly for less popular pages - If the prior is not a good estimate: base rate fallacy � many false positives “False positives matter a lot” 1 1 Mike Perry, “A Critique of Website Traffic Fingerprinting Attacks”, Tor project Blog, 2013. https:// blog.torproject.org/blog/critique-website-traffic-fingerprinting-attacks.

The base rate fallacy: example Breathalyzer test: 0.88 identifies truly drunk drivers (true positives) 0.05 false positives Alice gives positive in the test What is the probability that she is indeed drunk? ( BDR ) Is it 0.95? Is it 0.88? Something in between? 17

The base rate fallacy: example Breathalyzer test: 0.88 identifies truly drunk drivers (true positives) 0.05 false positives Alice gives positive in the test Only 0.1! What is the probability that she is indeed drunk? ( BDR ) Is it 0.95? Is it 0.88? Something in between? 17

The base rate fallacy: example ● Circumference represents the world of drivers. ● Each dot represents a driver. 18

The base rate fallacy: example ● 1% of drivers are driving drunk ( base rate or prior ). 19

The base rate fallacy: example ● From drunk people 88% are identified as drunk by the test 20

The base rate fallacy: example ● From the not drunk people, 5% are erroneously identified as drunk 21

The base rate fallacy: example ● Alice must be within the black circumference ● Ratio of red dots within the black circumference: BDR = 7/70 = 0.1 ! 22

The base rate fallacy in WF Base rate must be taken into account In WF: Blue: webpages Red: monitored Base rate? 23

The base rate fallacy in WF Probability of visiting a monitored page? Experiment - 4 monitored pages - Train on Alexa top 100, test on Alexa top 35K - Binary classification: monitored / non-monitored Prior probability of visiting a monitored page: - Uniform in 35K - Priors estimated from Active Linguistic Authentication Dataset (ALAD ) dataset (3,5%): Real-world users (80 users, 40K unique URLs) 24

Experiment: BDR in a 35K world 0.8 ● Uniform world 0.13 ● Non-popular pages 0.026 from ALAD Size of the world 25

Classify, but verify Verification step to test classifier confidence Number of FPs reduced But BDR is still very low for non popular pages 26

Cost for the adversary Adversary’s cost will depend on: Number of pages (versions, personalisation) 27

Cost for the adversary Adversary’s cost will depend on: Number of pages (versions, personalisation) Number of target users (system configuration, location) 29

Cost for the adversary Adversary’s cost will depend on: Number of pages (versions, personalisation) Number of target users (system configuration, location) Training and testing complexities of the classifier 31

Cost for the adversary Adversary’s cost will depend on: Number of pages (versions, personalisation) Number of target users (system configuration, location) Training and testing complexities of the classifier Maintaining a successful WF system is costly 32

Defenses against WF attacks High level: Randomized pipelining, HTTPOS Ineffective � Supersequence approaches, traffic morphing: grouping pages to create anonymity sets Infeasible � BuFLO: constant rate Expensive (bandwidth), usability (latency) � Tamaraw, CS-BuFLO Still expensive (bandwidth), usability (latency) �

Requirements defenses Effectiveness Do not increase latency No need for computing / distributing auxiliary information No server-side cooperation needed Bandwidth: some increase is tolerable in the input connections to the network

Adaptive padding Based on proposal by Shmatikov and Wang as defense for E2E traffic confirmation attacks Generates traffic packets at random times Inter-packet timings following distribution of general web traffic Does NOT introduce latency: real packets are not delayed Disturbs key traffic features exploited by classifiers (burst features, total size) in an unpredictable way, different for each visit to the same page

Adaptive padding implementation Implemented as a pluggable transport Implemented by both ends (OP �� Guard or Bridge) Controlled by the client (OP) Need to obtain the distribution of inter-packet delays: crawl

Adaptive padding

Modifications to adaptive padding Interactivity : two additional histograms to generate dummies in response to a packet received from the other end Control messages : for client to tell server parameters of padding Soft-stop condition : sampling an infinity value (probabilistic)

Adaptive padding evaluation Classifier: kNN (Wang et al.) Experimental setup: Training: Top Alexa 100 Monitored pages: 10, 25, 80 Open world: 5K-30K pages

Evaluation results Comparison with other defenses Closed world: 100 pages Ideal attack conditions

Website fingerprinting on Tor: attacks and defenses Claudia Diaz KU - PowerPoint PPT Presentation

Website fingerprinting on Tor: attacks and defenses Claudia Diaz KU Leuven Joint work with: Marc Juarez , Sadia Afroz, Gunes Acar, Rachel Greenstadt, Mohsen Imani, Mike Perry, Matthew Wright Post-Snowden Cryptography Workshop Brussels, December

Anonymity and Censorship Resistance Entry node Middle node Exit node Tor user Tor Node Tor

k -fingerprinting: a Robust Scalable Website Fingerprinting Technique George Danezis Jamie Hayes

Circumventing Internet censorship with Tor Philipp Winter The Tor Project What Tor Browser does

Website fingerprinting attacks against Tor Browser Bundle: a comparison between HTTP/1.1 and

Acoustic Fingerprinting Soundz Jake Runzer June 28, 2018 Jake Runzer Acoustic Fingerprinting

Tor: The Onion Router 2 / 13 Tor: The Onion Router www.cbc.ca 2 / 13 Tor: The Onion Router

Website Fingerprinting Attacks and Defenses in the Tor Onion Space Marc Juarez imec-COSIC KU

Feature Selection in Website Fingerprinting Junhua Yan Advisor: Prof. Jasleen Kaur July 24,

Tor and circumvention: Lessons learned Roger Dingledine The Tor Project https://torproject.org/

Tor update 2012 Roger Dingledine The Tor Project https://torproject.org/ 1 Today's plan 0)

Overview of Tor issues Roger Dingledine The Tor Project https://torproject.org/ 1 Today's plan

Cryptographic Challenges in and around Tor Nick Mathewson The Tor Project 9 January 2013

Bayes, not Nave Security Bounds on Website Fingerprinting Defenses Giovanni Cherubin Privacy

Fingerprinting hardware devices Fingerprinting hardware devices using clock-skewing using

CO 447 | LEC6 BLOCKCHAIN SECURITY Dr. Benjamin Livshits Stateless Fingerprinting 2 EFF

Tor: a quick overview Roger Dingledine The Tor Project https://torproject.org/ 1 What is Tor?

The material below presents the briefing slide text accompanied by recommended oral comments for

The case for dynamic defenses against adversarial examples Ian Goodfellow SafeML ICLR Workshop

Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial

Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial

Outline 2 Motivation Current cyber defense landscape & open questions Pro-active

Poseidon: Mitigating Volumetric DDoS Attacks with Programmable Switches Menghao Zhang 1 , Guanyu

Fragmentation Considered Vulnerable Yossi Gilad & Amir Herzberg Computer Science Department,

On the Feasibility of Rerouting-based DDoS Defenses Muoi Tran , Min Suk Kang, Hsu-Chun Hsiao,

Website fingerprinting on Tor: attacks and defenses Claudia Diaz KU - PowerPoint PPT Presentation

Website fingerprinting on Tor: attacks and defenses Claudia Diaz KU Leuven Joint work with: Marc Juarez , Sadia Afroz, Gunes Acar, Rachel Greenstadt, Mohsen Imani, Mike Perry, Matthew Wright Post-Snowden Cryptography Workshop Brussels, December

Anonymity and Censorship Resistance Entry node Middle node Exit node Tor user Tor Node Tor

k -fingerprinting: a Robust Scalable Website Fingerprinting Technique George Danezis Jamie Hayes

Circumventing Internet censorship with Tor Philipp Winter The Tor Project What Tor Browser does

Website fingerprinting attacks against Tor Browser Bundle: a comparison between HTTP/1.1 and

Acoustic Fingerprinting Soundz Jake Runzer June 28, 2018 Jake Runzer Acoustic Fingerprinting

Tor: The Onion Router 2 / 13 Tor: The Onion Router www.cbc.ca 2 / 13 Tor: The Onion Router

Website Fingerprinting Attacks and Defenses in the Tor Onion Space Marc Juarez imec-COSIC KU

Feature Selection in Website Fingerprinting Junhua Yan Advisor: Prof. Jasleen Kaur July 24,

Tor and circumvention: Lessons learned Roger Dingledine The Tor Project https://torproject.org/

Tor update 2012 Roger Dingledine The Tor Project https://torproject.org/ 1 Today's plan 0)

Overview of Tor issues Roger Dingledine The Tor Project https://torproject.org/ 1 Today's plan

Cryptographic Challenges in and around Tor Nick Mathewson The Tor Project 9 January 2013

Bayes, not Nave Security Bounds on Website Fingerprinting Defenses Giovanni Cherubin Privacy

Fingerprinting hardware devices Fingerprinting hardware devices using clock-skewing using

CO 447 | LEC6 BLOCKCHAIN SECURITY Dr. Benjamin Livshits Stateless Fingerprinting 2 EFF

Tor: a quick overview Roger Dingledine The Tor Project https://torproject.org/ 1 What is Tor?

The material below presents the briefing slide text accompanied by recommended oral comments for

The case for dynamic defenses against adversarial examples Ian Goodfellow SafeML ICLR Workshop

Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial

Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial

Outline 2 Motivation Current cyber defense landscape &amp; open questions Pro-active

Poseidon: Mitigating Volumetric DDoS Attacks with Programmable Switches Menghao Zhang 1 , Guanyu

Fragmentation Considered Vulnerable Yossi Gilad &amp; Amir Herzberg Computer Science Department,

On the Feasibility of Rerouting-based DDoS Defenses Muoi Tran , Min Suk Kang, Hsu-Chun Hsiao,

Outline 2 Motivation Current cyber defense landscape & open questions Pro-active

Fragmentation Considered Vulnerable Yossi Gilad & Amir Herzberg Computer Science Department,