Reverse-engineering Online Tracking From niche research field to - - PowerPoint PPT Presentation

reverse engineering online tracking
SMART_READER_LITE
LIVE PREVIEW

Reverse-engineering Online Tracking From niche research field to - - PowerPoint PPT Presentation

Reverse-engineering Online Tracking From niche research field to easy-to-use tool Steven Englehardt webtap.princeton.edu Source: Mayer & Mitchell; Third-Party Web Tracking: Policy and Technology Evercookies Respawn cookies using


slide-1
SLIDE 1

Reverse-engineering Online Tracking

From niche research field to easy-to-use tool Steven Englehardt webtap.princeton.edu

slide-2
SLIDE 2

Source: Mayer & Mitchell; Third-Party Web Tracking: Policy and Technology

slide-3
SLIDE 3
slide-4
SLIDE 4

Evercookies

Respawn cookies using alternative locations ○ Flash cookies, HTML5 localStorage, ETags, etc.

slide-5
SLIDE 5

If you’re going to track me, please use cookies

Ed Felten July 7th, 2009 freedom-to-tinker.com

https://freedom-to-tinker.com/blog/felten/if-youre-going-track-me-please-use-cookies/

slide-6
SLIDE 6
slide-7
SLIDE 7

Canvas Fingerprinting

slide-8
SLIDE 8

2009

If you’re going to track me, please use cookies

slide-9
SLIDE 9

If you’re going to track me, please use browser storage

2009 2010

If you’re going to track me, please use cookies

slide-10
SLIDE 10
slide-11
SLIDE 11

?

slide-12
SLIDE 12
slide-13
SLIDE 13
slide-14
SLIDE 14
slide-15
SLIDE 15
slide-16
SLIDE 16

If you’re going to track me, please use browser storage

2009 2010

If you’re going to track me, please use cookies

slide-17
SLIDE 17

If you’re going to track me, please use browser storage

2009 2010

If you’re going to track me, please use cookies

2015

If you’re going to track me, please limit it to one device

slide-18
SLIDE 18

2015

If you’re going to track me, please limit it to one device

2020

If you’re going to track me, please ___________________

?

slide-19
SLIDE 19

Measurement can help!

slide-20
SLIDE 20
slide-21
SLIDE 21

Web measurement hurdles

  • 1. Engineering Debt
slide-22
SLIDE 22
slide-23
SLIDE 23

Many Studies, Many Platforms

  • Automation:

○ 7 used Selenium (Full browser) ○ 4 used PhantomJS/CapsperJS (Headless webkit)

  • Instrumentation

○ 5 used FourthParty ○ 9 used a Proxy

slide-24
SLIDE 24

Many Studies, Many Platforms

  • Automation:

○ 7 used Selenium (Full browser) ○ 4 used PhantomJS/CapsperJS (Headless webkit)

  • Instrumentation

○ 5 used FourthParty ○ 9 used a Proxy

FourthParty is the only shared code

slide-25
SLIDE 25

Web measurement hurdles

  • 1. Engineering Debt
  • 2. Lasting Impact
slide-26
SLIDE 26

Canvas Fingerprinting in May 2014

  • Acar, et.al (2014)
  • 5% of Top 100k

The Web Never Forgets: Persistent Tracking Mechanisms in the Wild. Acar, et.al.

slide-27
SLIDE 27

Canvas Fingerprinting in May 2014

  • Acar, et.al (2014)
  • 5% of Top 100k

The Web Never Forgets: Persistent Tracking Mechanisms in the Wild. Acar, et.al.

slide-28
SLIDE 28

Canvas Fingerprinting in October 2015

Over 100 first-party domains on the Top 100k

slide-29
SLIDE 29

Canvas Fingerprinting in October 2015

Over 100 first-party domains on the Top 100k

The Web Never Forgets: Persistent Tracking Mechanisms in the Wild. Acar, et.al.

slide-30
SLIDE 30

Overcoming these hurdles:

  • 1. A Common Platform
  • 2. A Web Privacy Census
slide-31
SLIDE 31

OpenWPM

slide-32
SLIDE 32

OpenWPM

Web

slide-33
SLIDE 33

OpenWPM

Web

slide-34
SLIDE 34

OpenWPM

Web

slide-35
SLIDE 35

OpenWPM

Web Browser Instance

slide-36
SLIDE 36
slide-37
SLIDE 37

OpenWPM

  • Supports browsing with persistent state

○ Browser keeps profile through crashes and freezes.

  • Real Browser

○ Extensions ○ Privacy Features ○ WebRTC, Audio, Video, WebGL

  • Stable
slide-38
SLIDE 38

A Web Privacy Census

Monthly 1 Million Site Crawl

slide-39
SLIDE 39

A Web Privacy Census

Monthly 1 Million Site Crawl

  • Javascript Calls
  • All javascript files
  • HTTP Requests and Responses
  • Storage (cookies, Flash, etc)

Collecting:

slide-40
SLIDE 40

Targeted Crawls

Type Use Stateful Stateless

  • ID Cookies
  • Respawning
  • Cookie

syncing

  • Ghostery
  • AdBlock Plus
  • HTTPS Everywhere
slide-41
SLIDE 41

A Web Privacy Census

  • 1. Measure how effective tools are
  • 2. Quickly deploy new measurements
  • 3. Release data and analysis monthly
slide-42
SLIDE 42

Detecting WebRTC Local IP Sniffing

slide-43
SLIDE 43
  • 1. I saw a tweet that nytimes.com is IP sniffing
slide-44
SLIDE 44
  • 2. I added code to JS Instrumentation for next crawl

// Access to webRTC instrumentPrototype(window.mozRTCPeerConnection.prototype, "mozRTCPeerConnection");

slide-45
SLIDE 45
  • 3. I wrote some analysis code
  • Grab all urls that execute

○ mozRTCPeerConnection.onicecandidate ○ mozRTCPeerConnection.createDataChannel ○ mozRTCPeerConnection.createOffer

  • Check JS Files to confirm
slide-46
SLIDE 46
  • 4. Results (October 2015)
  • 121 first-party sites

○ 29 in the top 10k

  • 24 unique scripts
  • Only 1 of which is blocked by

EasyList/EasyPrivacy

slide-47
SLIDE 47

With regular measurement we can:

  • 1. Inform the public
  • 2. Build block lists
  • 3. Change the incentives
slide-48
SLIDE 48

2020

If you’re going to track me, ___________________

slide-49
SLIDE 49

2020

If you’re going to track me, ___________________ I’ll know!

slide-50
SLIDE 50

Help us make the web more private!

  • Contribute?

○ github.com/citp/OpenWPM

  • Collaborate?

○ webtap.princeton.edu

Image Assets from the Noun Project: Microphone by Pavel N.; Megaphone by Piero Borgo; Smartphone by Aaron K. Kim; desktop computer and Databas by Creative Stall; link by Hash Basheer; Spider Bot by Siwat Vatatiyaporn; Browser by Dirtyworks; programmer by Hadi Davodpour