Website Fingerprinting at Internet Scale Andriy Panchenko 1 , Fabian - PowerPoint PPT Presentation

Website Fingerprinting at Internet Scale Andriy Panchenko 1 , Fabian Lanze 1 , Andreas Zinnen 2 , Martin Henze 3 , Jan Pennekamp 1 , Klaus Wehrle 3 , Thomas Engel 1 1 Interdisciplinary Centre for Security, Reliability and Trust (SnT), Luxembourg 2 RheinMain University of Applied Sciences, Germany 3 RWTH Aachen University, Germany

Background Why people use Tor... Privacy has become a general concern Access to the Internet is censored in many countries

Website Fingerprinting OR OR OR OR OR Client Server OR OR Tor : The Onion Router Most popular low-latency anonymization network Many users rely on Tor to access unfiltered information

Website Fingerprinting OR OR Entry OR Middle OR Exit OR Client Server OR OR Tor : The Onion Router Most popular low-latency anonymization network Many users rely on Tor to access unfiltered information

Website Fingerprinting OR OR ? Entry OR Middle OR Exit OR Client Server OR OR What is website fingerprinting? Identify website accessed without breaking cryptography Attacker is a passive observer Features based on packet size, direction, ordering, timing

Website Fingerprinting - state of the art Widely discussed and hot topic in anonymity research State-of-the-art approach: Wang et al . ( Usenix Sec’14 ) k - N earest N eighbor approach manually selected features (e.g., bursts, unique lengths) about 4,000 features recognition rates > 90% 2 scenarios for evaluation Closed world : user visits only a fixed number of websites Open world : monitor set of sites (user may visit unknown sites)

Our method Idea Don’t try to guess which characteristics may be relevant Use a representation that implicitly covers all characteristics Our feature set: ( N in ,N out ,S in ,S out , C 1 , · · · , C n ) � �� basic properties cumulative features 7000 C ( T 1 ) Cumulative Sum of Packet Sizes 6000 C i sampled for T 1 5000 C ( T 2 ) C i sampled for T 2 4000 3000 2000 1000 0 − 1000 0 2 4 6 8 10 12 14 16 18 Packet Number

Example 200 about.com google.de Feature Value [kByte] 150 100 50 0 20 40 60 80 100 Feature Index Fixed number of distinctive characteristics from traces with varying lengths Fingerprints can be visualized Used as input for a Support Vector Machine

Layers of data representation Tor cells Cell 1 Cell 2 Cell 3 Cell 4 Cell 5 TLS records Record 1 * Record 2 TCP packets Packet 1 Packet 2 Packet 3 Information src for feature extraction: Cell vs. TLS vs. TCP Practically nigligible effect on the classification accuracy

Comparison with state of the art – classification Closed world Accuracy [%] for 100 most popular websites 90 instances 40 instances k-NN (3736 features) 90.84 89.19 Our method (104 features) 91.38 92.03 Open world Foreground : 100 blocked websites, background : 9,000 popular websites TPR FPR k-NN 90.59 2.24 Our method 96.92 1.98

Comparison of computational performance 10 3 10 2 Average Processing Time [h] 10 1 10 0 10 − 1 10 − 2 k-NN CUMUL 10 − 3 CUMUL (parallelized) 10 − 4 0 10000 20000 30000 40000 50000 Background Set Size Computation time for 100 random monitored pages in open world

Website fingerprinting in reality Critique Data sets used are not representative! too small, only popular websites / index pages Simplified assumptions, wrong metrics for evaluation RND-WWW: How do people access the world wide web?  Twitter     Alexa-one-click     > 120,000 web pages Googling the trends   Googling at random      Censored in China  Tor-Exit: Which pages do users actually access over Tor? Monitor a Tor Exit node ⇒ 211,148 web pages

Webpage fingerprinting at Internet scale Question : Does the attack scale under realistic assumptions? Which metric to evaluate? Accuracy : fraction of true results True Positive rate / Recall : fraction of monitored pages detected False Positive Rate : fraction of false alarms Problem : misleading interpretation ⇒ base rate fallacy Precision : probability that the classifier is correct given it has detected a monitored page Focus of evaluation Precision and recall for increasing background set sizes Random subset as foreground

Webpage fingerprinting at Internet scale Question : Does the attack scale under realistic assumptions? Results for RND-WWW 100 100 Fraction of Foreground Pages [%] Fraction of Foreground Pages [%] 80 80 60 60 b = 1000 b = 1000 40 40 b = 5000 b = 5000 b = 9000 b = 9000 b = 20000 b = 20000 20 20 b = 50000 b = 50000 b = 111884 b = 111884 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Recall Precision

Webpage fingerprinting at Internet scale Question : Does the attack scale under realistic assumptions? Results for Tor-Exit 100 100 Fraction of Foreground Pages [%] Fraction of Foreground Pages [%] 80 80 60 60 b = 1000 b = 1000 b = 5000 b = 5000 40 40 b = 9000 b = 9000 b = 20000 b = 20000 20 b = 50000 20 b = 50000 b = 111884 b = 111884 b = 211148 b = 211148 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Recall Precision

Webpage fingerprinting at Internet scale Question : Does the attack scale under realistic assumptions? Results for Tor-Exit 100 100 Fraction of Foreground Pages [%] Fraction of Foreground Pages [%] 80 80 60 60 b = 1000 b = 1000 b = 5000 b = 5000 40 40 b = 9000 b = 9000 b = 20000 b = 20000 20 b = 50000 20 b = 50000 b = 111884 b = 111884 b = 211148 b = 211148 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Recall Precision Answer : No.

Webpage fingerprinting at Internet scale Question : Is it at least possible for certain pages?

Webpage fingerprinting at Internet scale Question : Is it at least possible for certain pages? Minimum number of mistakenly confused pages 100 b =20 000 Fraction of Foreground Pages [%] b =50 000 b =100 000 80 60 40 20 0 0 50 100 150 200 250 300 350 400 Number of Webpage Confusions No single page without a confusingly similar page in a realistic universe.

How about fingerprinting web sites ? (1/2) A website is a collection of web pages served under the same domain Is it possible to fingerprint a website when only a subset of its pages are available for training? Experiment: 20 websites 1 . 0 ALJAZEERA 47 1 2 1 ALJAZEERA 51 AMAZON 28 5 1 1 4 3 1 1 3 3 1 AMAZON 51 0 . 9 BBC 43 1 1 4 2 BBC 50 1 CNN 2 45 1 3 CNN 51 0 . 8 EBAY 2 1 32 3 1 2 2 1 2 2 2 1 EBAY 51 FACEBOOK 41 2 1 1 1 2 3 FACEBOOK 50 1 0 . 7 IMDB 49 2 IMDB 51 KICKASS 1 49 1 KICKASS 51 0 . 6 LOVESHACK 1 45 2 2 1 LOVESHACK 49 1 1 RAKUTEN 1 2 2 44 1 1 RAKUTEN 51 0 . 5 REDDIT 3 48 REDDIT 51 RT 4 1 44 1 1 RT 51 0 . 4 SPIEGEL 1 1 48 1 SPIEGEL 1 2 1 47 STACKOVERFLOW 1 3 2 1 2 3 31 1 1 2 2 2 STACKOVERFLOW 51 0 . 3 TMZ 1 50 TMZ 1 2 1 46 1 TORPROJECT 51 TORPROJECT 1 1 3 7 31 1 7 0 . 2 TWITTER 4 2 1 1 1 5 1 1 1 1 33 TWITTER 50 1 WIKIPEDIA 51 WIKIPEDIA 1 3 1 1 5 3 37 0 . 1 XHAMSTER 1 50 XHAMSTER 3 1 47 XNXX 51 XNXX 1 50 0 . 0 ALJAZEERA AMAZON BBC CNN EBAY FACEBOOK IMDB KICKASS LOVESHACK RAKUTEN REDDIT RT SPIEGEL STACKOVERFLOW TMZ TORPROJECT TWITTER WIKIPEDIA XHAMSTER XNXX ALJAZEERA AMAZON BBC CNN EBAY FACEBOOK IMDB KICKASS LOVESHACK RAKUTEN REDDIT RT SPIEGEL STACKOVERFLOW TMZ TORPROJECT TWITTER WIKIPEDIA XHAMSTER XNXX (a) only index pages (b) different pages

How about fingerprinting web sites ? (2/2) Transition of results from closed-world to the realistic open-world setting is typically not trivial Website fingerprinting scales better than webpage fingerprinting 1.0 1.0 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 Precision Precision Recall Recall 0.0 0.0 0 20000 40000 60000 80000 100000 120000 0 20000 40000 60000 80000 100000 120000 Background Set Size Background Set Size

Summary Our classifier with 104 features outperforms state of the art Alarming results under simplified assumptions can’t be generalized Webpage fingerprinting does not scale for appropriate universe sizes for any webpage Website fingerprinting is not only more realistic and also significantly more effective Conclusions drawn need to be reconsidered Scripts and RND-WWW dataset: http://lorre.uni.lu/~andriy/zwiebelfreunde/

We are hiring! Our lab within the Interdisciplinary Centre for Security, Reliability and Trust (Uni Luxembourg) is looking for PhD candidates and PostDocs in the area of anonymity and privacy More information: http://secan-lab.uni.lu/jobs

Website Fingerprinting at Internet Scale Andriy Panchenko 1 , Fabian - PowerPoint PPT Presentation

Website Fingerprinting at Internet Scale Andriy Panchenko 1 , Fabian Lanze 1 , Andreas Zinnen 2 , Martin Henze 3 , Jan Pennekamp 1 , Klaus Wehrle 3 , Thomas Engel 1 1 Interdisciplinary Centre for Security, Reliability and Trust (SnT), Luxembourg 2

k -fingerprinting: a Robust Scalable Website Fingerprinting Technique George Danezis Jamie Hayes

Acoustic Fingerprinting Soundz Jake Runzer June 28, 2018 Jake Runzer Acoustic Fingerprinting

Feature Selection in Website Fingerprinting Junhua Yan Advisor: Prof. Jasleen Kaur July 24,

Website fingerprinting attacks against Tor Browser Bundle: a comparison between HTTP/1.1 and

Website Fingerprinting Attacks and Defenses in the Tor Onion Space Marc Juarez imec-COSIC KU

Bayes, not Nave Security Bounds on Website Fingerprinting Defenses Giovanni Cherubin Privacy

Fingerprinting hardware devices Fingerprinting hardware devices using clock-skewing using

CO 447 | LEC6 BLOCKCHAIN SECURITY Dr. Benjamin Livshits Stateless Fingerprinting 2 EFF

Website Website www.nhec.coop www.nhec.coop Website Website www.nhec.coop www.nhec.coop

Visit our website www.parasdewsgurgaon.in Visit our website www.parasdewsgurgaon.in Visit our

Articulus Detecting IP Hijacking Through Server Fingerprinting Research Question How can we

Fingerprinting of Defendants October 11, 2018 VIRGINIA STATE CRIME COMMISSION N I A S I G

Fingerprinting ECUs for Vehicle Intrusion Detection Kyong-Tak Cho, Kang G. Shin, University of

Clock Around the Clock Time-Based Device Fingerprinting Iskander Sanchez-Rola, Igor Santos,

Overview n Melody-Based Retrieval n Audio-Score Alignment n Music Fingerprinting 2 Carnegie

Blind Elephant: Web Application Fingerprinting & Vulnerability Inferencing Patrick Thomas

Br Browser fi fingerprinting Nataliia Bielova @nataliabielova February 12

1 Introduction There are three fundamental principles of There are three fundamental

Deriving intelligence from USB stack interactions Andy Davis, Research Director NCC Group Image

Deduplication CSCI 333 Spring 2019 Logistics Lab 2a/b Final Project Final Exam

Enabling Privacy-Aware Zone Exchanges Among Authoritative and Recursive DNS Servers Nikos

Visualization for Biometric Evaluation Romain Giot <romain.giot@u-bordeaux.fr> Romain

Fingerprinting Requirements for Increased Controls Licensees Chris Einberg, Senior Project

Automatic Fingerprinting Of Vulnerable BLE IoT Devices With Static UUIDs From Mobile Apps Chaoshun

Sambuz

Useful Links

Newsletter

Mail Us

Website Fingerprinting at Internet Scale Andriy Panchenko 1 , Fabian - PowerPoint PPT Presentation

Website Fingerprinting at Internet Scale Andriy Panchenko 1 , Fabian Lanze 1 , Andreas Zinnen 2 , Martin Henze 3 , Jan Pennekamp 1 , Klaus Wehrle 3 , Thomas Engel 1 1 Interdisciplinary Centre for Security, Reliability and Trust (SnT), Luxembourg 2

k -fingerprinting: a Robust Scalable Website Fingerprinting Technique George Danezis Jamie Hayes

Acoustic Fingerprinting Soundz Jake Runzer June 28, 2018 Jake Runzer Acoustic Fingerprinting

Feature Selection in Website Fingerprinting Junhua Yan Advisor: Prof. Jasleen Kaur July 24,

Website fingerprinting attacks against Tor Browser Bundle: a comparison between HTTP/1.1 and

Website Fingerprinting Attacks and Defenses in the Tor Onion Space Marc Juarez imec-COSIC KU

Bayes, not Nave Security Bounds on Website Fingerprinting Defenses Giovanni Cherubin Privacy

Fingerprinting hardware devices Fingerprinting hardware devices using clock-skewing using

CO 447 | LEC6 BLOCKCHAIN SECURITY Dr. Benjamin Livshits Stateless Fingerprinting 2 EFF

Website Website www.nhec.coop www.nhec.coop Website Website www.nhec.coop www.nhec.coop

Visit our website www.parasdewsgurgaon.in Visit our website www.parasdewsgurgaon.in Visit our

Articulus Detecting IP Hijacking Through Server Fingerprinting Research Question How can we

Fingerprinting of Defendants October 11, 2018 VIRGINIA STATE CRIME COMMISSION N I A S I G

Fingerprinting ECUs for Vehicle Intrusion Detection Kyong-Tak Cho, Kang G. Shin, University of

Clock Around the Clock Time-Based Device Fingerprinting Iskander Sanchez-Rola, Igor Santos,

Overview n Melody-Based Retrieval n Audio-Score Alignment n Music Fingerprinting 2 Carnegie

Blind Elephant: Web Application Fingerprinting &amp; Vulnerability Inferencing Patrick Thomas

Br Browser fi fingerprinting Nataliia Bielova @nataliabielova February 12

1 Introduction There are three fundamental principles of There are three fundamental

Deriving intelligence from USB stack interactions Andy Davis, Research Director NCC Group Image

Deduplication CSCI 333 Spring 2019 Logistics Lab 2a/b Final Project Final Exam

Enabling Privacy-Aware Zone Exchanges Among Authoritative and Recursive DNS Servers Nikos

Visualization for Biometric Evaluation Romain Giot &lt;romain.giot@u-bordeaux.fr&gt; Romain

Fingerprinting Requirements for Increased Controls Licensees Chris Einberg, Senior Project

Automatic Fingerprinting Of Vulnerable BLE IoT Devices With Static UUIDs From Mobile Apps Chaoshun

Sambuz

Useful Links

Newsletter

Mail Us

Blind Elephant: Web Application Fingerprinting & Vulnerability Inferencing Patrick Thomas

Visualization for Biometric Evaluation Romain Giot <romain.giot@u-bordeaux.fr> Romain