Third-party Tracking on the Web: A Swedish Perspective Joel Purra - - PowerPoint PPT Presentation

third party tracking on the web
SMART_READER_LITE
LIVE PREVIEW

Third-party Tracking on the Web: A Swedish Perspective Joel Purra - - PowerPoint PPT Presentation

Third-party Tracking on the Web: A Swedish Perspective Joel Purra and Niklas Carlsson Linkping University, Sweden @ IEEE LCN, Dubai, Nov. 2016 Bullet 1 Bullet 2 We are all tracked When browsing, information is recorded by the


slide-1
SLIDE 1

Third-party Tracking on the Web: A Swedish Perspective

Joel Purra and Niklas Carlsson

Linköping University, Sweden

@ IEEE LCN, Dubai, Nov. 2016

slide-2
SLIDE 2
  • Bullet 1
  • Bullet 2
slide-3
SLIDE 3

We are all tracked …

  • When browsing, information is recorded by the servers

you communicate directly with

  • Resources from other services might be requested as

well, with or without being visible.

  • Information can be passively recorded during transmission;

some of which can't be avoided

  • Specialized tracking code can actively extract extended

information

slide-4
SLIDE 4

Why is tracking used?

  • Information is collected and stored to gain knowledge

about the visitors a website has.

  • Website owners: to improve/personalize content
  • Advertisement firms: to sell targeted ads
  • Media analytics firms: to verify statistics (for ads).
  • Data brokers: to package and sell (inferred) user data
slide-5
SLIDE 5

The downside

  • Users lose control over who they share information
  • with. This can be considered an invasion of privacy.
  • Information is easily stored and easily retrieved,
  • Anything done online in the past can haunt you for ever.
  • Self-censorship, effectively limiting freedom of speech.
  • What is illegal for governments, companies are allowed

to do, through user agreements. Governments still have control over companies within their jurisdiction.

  • The full scope of the tracking is still unknown
  • Could become a historical thought police.
  • Could mean online companies have a grip on all current

and future politicians, company leaders and celebrities.

slide-6
SLIDE 6

Passive tracking and HTTPS

  • Bullet 1
  • Bullet 2
slide-7
SLIDE 7

Passive vs active tracking

  • Passive tracking: Anyone can listen in anywhere along the

network path ...

  • People are becoming increasingly aware of monitoring by

ISPs and nation state ...

  • HTTPS prevents passive tracking of some information (e.g.,

exact page, browser model, OS, language settings, cookies, etc.)

slide-8
SLIDE 8

Passive vs active tracking

  • Passive tracking: Anyone can listen in anywhere along the

network path ...

  • People are becoming increasingly aware of monitoring by

ISPs and nation state ...

  • HTTPS prevents passive tracking of some information (e.g.,

exact page, browser model, OS, language settings, cookies, etc.)

  • Active tracking: A script or plugin executed in the browser

to extract and collect extended information.

  • HTTPS does not prevent this.
  • Example info include time spent on each page, window size,

screen resolution, color depth, mouse movements, scrollbar location, installed fonts, plugins and extensions.

slide-9
SLIDE 9

Passive vs active tracking

  • Passive tracking: Anyone can listen in anywhere along the

network path ...

  • People are becoming increasingly aware of monitoring by

ISPs and nation state ...

  • HTTPS prevents passive tracking of some information (e.g.,

exact page, browser model, OS, language settings, cookies, etc.)

  • Active tracking: A script or plugin executed in the browser

to extract and collect extended information.

  • HTTPS does not prevent this.
  • Example info include time spent on each page, window size,

screen resolution, color depth, mouse movements, scrollbar location, installed fonts, plugins and extensions.

  • We focus on third-party tracking, but ask if sites

implementing HTTPS use less tracking themselves

slide-10
SLIDE 10

This paper …

  • … presents measurement methodology and

characterization of the current third-party tracking landscape

slide-11
SLIDE 11

This paper …

  • … presents measurement methodology and

characterization of the current third-party tracking landscape

  • Third-party usage across a number of website classes and

breakdown the coverage of different tracker types

  • Aggregate analysis that combines the tracker services based
  • n the organizations operating them so to gain insights into the

big players aggregate coverage

  • Try to answer if websites that have adopted HTTPS in fact are

more privacy conscious (on behalf of their users) and use less third-party tracking.

slide-12
SLIDE 12

This paper …

  • … presents measurement methodology and

characterization of the current third-party tracking landscape

  • Third-party usage across a number of website classes and

breakdown the coverage of different tracker types

  • Aggregate analysis that combines the tracker services based
  • n the organizations operating them so to gain insights into the

big players aggregate coverage

  • Try to answer if websites that have adopted HTTPS in fact are

more privacy conscious (on behalf of their users) and use less third-party tracking.

slide-13
SLIDE 13

This paper …

  • … presents measurement methodology and

characterization of the current third-party tracking landscape

  • Third-party usage across a number of website classes and

breakdown the coverage of different tracker types

  • Aggregate analysis that combines the tracker services based
  • n the organizations operating them so to gain insights into the

big players aggregate coverage

  • Try to answer if websites that have adopted HTTPS in fact are

more privacy conscious (on behalf of their users) and use less third-party tracking

slide-14
SLIDE 14

Methodology

  • Developed data collection tool
  • Headless phantom.js browser
  • Visit front page of large number of sites
  • HTTP vs HTTPS (with and without www)
  • Measure redirects etc.
  • Process/execute scripts to build pages
  • No blocking
  • Extract URL, domain, and other info
  • Classify resources
  • Internal vs. external
  • Known trackers (using Disconnect.me)
  • Type of resource; e.g., advertising, analytics, content
slide-15
SLIDE 15

Swedish perspective

  • Measurements performed from Sweden
  • Important and popular Swedish domains
  • Global baseline
slide-16
SLIDE 16

What are third-party resources?

  • A resource belonging to the origin's primary domain is

called internal. Otherwise it's an external resource.

  • Assumption: Any external resource is a third-party

resource.

Resource examples Branded (videos, services, images) Unbranded (fonts, useful scripts, images) Ads (scripts, images, flash) Web beacons (hidden images, analytics scripts) Domain examples example.se (primary domain) www.example.se (subdomain) example.org (third-party domain) doubleclick.net (known tracker domain)

slide-17
SLIDE 17

Blocked domains on Disconnect.me

  • Many have few: 521 out of 980 organization have 1

domain; 331 have 2 domain.

  • Some have many: Google has 271, Yahoo 71, AOL

40, Microsoft 32.

slide-18
SLIDE 18

Blocked domains on Disconnect.me

  • Many have few: 521 out of 980 organization have 1

domain; 331 have 2 domain.

  • Some have many: Google has 271, Yahoo 71, AOL

40, Microsoft 32.

  • Spread over advertising, analytics, content
slide-19
SLIDE 19

Blocked domains on Disconnect.me

  • Many have few: 521 out of 980 organization have 1

domain; 331 have 2 domain.

  • Some have many: Google has 271, Yahoo 71, AOL

40, Microsoft 32.

  • Spread over advertising, analytics, content
  • “Disconnect category”: Google, Facebook, Twitter
slide-20
SLIDE 20
  • Bullet 1
  • Bullet 2
slide-21
SLIDE 21
  • Bullet 1
  • Bullet 2
slide-22
SLIDE 22
  • Bullet 1
  • Bullet 2
slide-23
SLIDE 23
  • Bullet 1
  • Bullet 2
slide-24
SLIDE 24

External third-party resources

  • Upper bound: Third-parties typically have server logs and/or

analytics software to record your online habits

  • Each third-party (external) resource leaks at least some info
slide-25
SLIDE 25

External third-party resources

  • Upper bound: Third-parties typically have server logs and/or

analytics software to record your online habits

  • Each third-party (external) resource leaks at least some info
slide-26
SLIDE 26

External third-party resources

  • Upper bound: Third-parties typically have server logs and/or

analytics software to record your online habits

  • Each third-party (external) resource leaks at least some info
  • External resource usage high
  • Especially among most popular domains (e.g., 93% at least some)
slide-27
SLIDE 27

External third-party resources

  • Upper bound: Third-parties typically have server logs and/or

analytics software to record your online habits

  • Each third-party (external) resource leaks at least some info
  • External resource usage high
  • Especially among most popular domains (e.g., 93% at least some)
slide-28
SLIDE 28

External third-party resources

  • Upper bound: Third-parties typically have server logs and/or

analytics software to record your online habits

  • Each third-party (external) resource leaks at least some info
  • External resource usage high
  • Especially among most popular domains (e.g., 93% at least some)
  • HTTP and HTTPS results similar (except for rand 100k .se)
slide-29
SLIDE 29

Known trackers

  • Lower bound
  • Use Disconnect’s tracker list (2,149 known domains: resp. for <10% external)
  • Only front page
slide-30
SLIDE 30

Known trackers

  • Lower bound
  • Use Disconnect’s tracker list (2,149 known domains: resp. for <10% external)
  • Only front page

Swedish domain categories Global categories

slide-31
SLIDE 31

Known trackers

  • Lower bound
  • Use Disconnect’s tracker list (2,149 known domains: resp. for <10% external)
  • Only front page
  • Biggest differences: popular vs. less popular (e.g., advertising)

Global categories

slide-32
SLIDE 32

Known trackers

  • Lower bound
  • Use Disconnect’s tracker list (2,149 known domains: resp. for <10% external)
  • Only front page
  • Biggest differences: popular vs. less popular (e.g., advertising)
  • Popular has at least one known tracker in 95+ % of cases

Global categories

slide-33
SLIDE 33

Known trackers

  • Lower bound
  • Use Disconnect’s tracker list (2,149 known domains: resp. for <10% external)
  • Only front page
  • Biggest differences: popular vs. less popular (e.g., advertising)
  • Popular has at least one known tracker in 95+ % of cases
  • 70+ % use at least 2; 10% more than 12; 1% allow 48

Global categories

slide-34
SLIDE 34

Known trackers

  • Lower bound
  • Use Disconnect’s tracker list (2,149 known domains: resp. for <10% external)
  • Only front page
  • Biggest differences: popular vs. less popular (e.g., advertising)
  • Popular has at least one known tracker in 95+ % of cases
  • 70+ % use at least 2; 10% more than 12; 1% allow 48
  • Other: Media worst (e.g., 50% >7 trackers); content typically not blocked …

Swedish domain categories

slide-35
SLIDE 35

HTTP vs HTTPS

  • Small differences between HTTP and HTTPS
  • If anything, slightly higher for HTTPS …
slide-36
SLIDE 36

HTTP vs HTTPS

  • Small differences between HTTP and HTTPS
  • If anything, slightly higher for HTTPS …
slide-37
SLIDE 37

HTTP vs HTTPS

  • Small differences between HTTP and HTTPS
  • If anything, slightly higher for HTTPS …
  • As tracked by third-parties when using HTTPS as

when using HTTP

slide-38
SLIDE 38

The big players

  • Google has 90+ % coverage in popular domains
  • Even higher than disconnect (owns domains outside the

Disconnect category)

  • Facebook and Twitter far behind

Swedish domain categories Global categories

slide-39
SLIDE 39

Conclusions

  • Measurement framework for automated, repeatable

data collection of websites (tools made public)

  • Analysis of the third-party tracking landscape
  • Swedish perspective vs global baseline
  • Across domain categories
  • Breakdown based on tracker types
  • HTTP and HTTPS
  • HTTPS domains use at least as much (if not more)

third-party tracking

slide-40
SLIDE 40

Niklas Carlsson (niklas.carlsson@liu.se)

Research overview and pubs: www.ida.liu.se/~nikca/

Thanks for listening! Third-party Tracking on the Web: A Swedish Perspective