Tracing Cross Border Web Tracking Costas Iordanou Georgios - - PowerPoint PPT Presentation

tracing cross border web tracking
SMART_READER_LITE
LIVE PREVIEW

Tracing Cross Border Web Tracking Costas Iordanou Georgios - - PowerPoint PPT Presentation

Tracing Cross Border Web Tracking Costas Iordanou Georgios Smaragdakis Ingmar Poese Nikolaos Laoutaris We Web adver)sing fuels the w fuels the web eb 1 The r Th e rise of e of t targeted ed a ads Why Targeted ads? How it


slide-1
SLIDE 1

Tracing Cross Border Web Tracking

Costas Iordanou Georgios Smaragdakis Ingmar Poese Nikolaos Laoutaris

slide-2
SLIDE 2

We Web adver)sing fuels the w fuels the web eb

1

slide-3
SLIDE 3

Th The r e rise of e of t targeted ed a ads

Why Targeted ads?

  • Users get relevant ads
  • Increase user engagement
  • More efficient ad campaigns
  • Higher ROI for the adverGsers
  • BeIer use of resources
  • Etc.

2

How it works?

  • Tracking and profiling users
  • Real Gme aucGons of ads (RTB)
  • Cookie synchronizaGon
  • Etc.

User typed in “used cars for sale”

slide-4
SLIDE 4

Th The r e rea eac) c)on

  • n of
  • f u

user ers a and r reg egulator

  • rs

3

Browsers Regulators Users Browser extensions

slide-5
SLIDE 5

User Users and r s and regula egulators r s reac eac)o )on n

4

Browsers Regulators Users Browser extensions

slide-6
SLIDE 6

5

Gen Gener eral D Data P Prot

  • tec)

ec)on

  • n R

Reg egula)on

  • n - D
  • Details

One of the biggest changes with respect to privacy and regulaGon

  • n the web in the last few years (Enforcement date: 25th May, 2018)

In general the new legislaGon:

  • 1. tries to regulate how users’ data are collected, processed and stored

and

  • 2. if they include any sensiGve informaGon about the user
slide-7
SLIDE 7

6

Gen Gener eral D Data P Prot

  • tec)

ec)on

  • n R

Reg egula)on

  • n - D
  • Details

ImplementaGon – Per member state Data ProtecGon Authority (DPA) DPA: Responsible for complaints – invesGgaGons & enforcement InvesGgaGon starGng point – Ad & Tracking flows entry point servers locaGon One of the biggest changes with respect to privacy and regulaGon

  • n the web in the last few years (Enforcement date: 25th May, 2018)

In general the new legislaGon:

  • 1. tries to regulate how users’ data are collected, processed and stored

and

  • 2. if they include any sensiGve informaGon about the user

RQ: How can we idenGfy the physical locaGons of such servers?

slide-8
SLIDE 8
  • 2. How to ensure correct geoloca7on of infrastructure servers?

7

Ch Challen enges es

  • 1. How to effecGvely detect ad and tracking related domains in the wild?
slide-9
SLIDE 9
  • 4. How to maintain a balance between accuracy and scalability?
  • 2. How to ensure correct geoloca7on of infrastructure servers?

8

Ch Challen enges es

  • 1. How to effecGvely detect ad and tracking related domains in the wild?
  • 3. How to ensure that all possible ad and tracking servers are observed?
slide-10
SLIDE 10

Why real users instead of ju just Web crawling?

9

Real Users User interacGon

slide-11
SLIDE 11

Geo load balancing

Why real users instead of ju just Web crawling?

10

Real Users User interacGon

slide-12
SLIDE 12

Mapping 3 Mapping 3rd

rd party doma

mains to IPs

11

hIp://www.example.com

chrome. webRequest.

  • nBeforeSendHeader

chrome. webRequest.

  • nCompleted

tracker.com analyGcs.com … tracker.com 213.121.66.99

Chrome API event listeners Mapping Table - example.com Domain IP

tracker.com analyGcs.com … 213.121.66.99

Chrome Browser Extension

slide-13
SLIDE 13

12

Iden)fy Ad and Tracking related doma mains

ABP Parser easylist easyprivacy CorrecGon Script Custom keywords

slide-14
SLIDE 14

13

Iden)fy Ad and Tracking related doma mains

ABP Parser easylist easyprivacy

url 1 + meta data url 2 + meta data url 3 + meta data …

CorrecGon Script Custom keywords

1

AD + Tracking Domains Should block? YES NO

2

NO Ad + Tracking related? YES

3

slide-15
SLIDE 15
  • 4. How to maintain a balance between accuracy and scalability?
  • 2. How to ensure correct geoloca7on of infrastructure servers?

14

Ch Challen enges es

  • 1. How to effecGvely detect ad and tracking related domains in the wild?
  • 3. How to ensure that all possible ad and tracking servers are observed?
slide-16
SLIDE 16

15

Accu Accurate g e geo-l eo-loc

  • ca)on
  • n of
  • f s

ser erver er IP IPs

prefix region service 46.51.128.0/18 eu-west-1 AMAZON 46.51.216.0/21 ap- southeast-1 AMAZON 13.73.232.0/21 japaneast AZURE 20.19.14.128 /25 koreacentral AZURE … … …

RIPE IPmap validaGon process - infrastructure servers IPs

99.6% match with the reported country Regions maps

eu-west-1: Ireland, Ireland ap-southeast-1: Singapore, Singapore

RIPE IPmap

slide-17
SLIDE 17
  • 4. How to maintain a balance between accuracy and scalability?
  • 2. How to ensure correct geoloca7on of infrastructure servers?

16

Ch Challen enges es

  • 1. How to effecGvely detect ad and tracking related domains in the wild?
  • 3. How to ensure that all possible ad and tracking servers are observed?
slide-18
SLIDE 18

Av Avoiding piIalls…

  • IdenGfy all domains behind each IP (Reverse DNS query)

Query: hIps://freeapi.robtex.com/pdns/reverse/93.184.216.34 Response:

rrname:example.org, rrdata:93.184.216.34, rrtype:A, time_first:1440526884, time_last:1535919774, count:18 rrname:www.example.org, rrdata:93.184.216.34, rrtype:A, time_first:1440723354, time_last:1527899734, count:18 rrname:www.example.com, rrdata:93.184.216.34, rrtype:A, time_first:1441108386, time_last:1535371292, count:18 rrname:www.example.net, rrdata:93.184.216.34, rrtype:A, time_first:1436692690, time_last:1527900018, count:18 rrname:imrek.org, rrdata:93.184.216.34, rrtype:A, time_first:1440827324, time_last:1508103356, count:18 rrname:example.net, rrdata:93.184.216.34, rrtype:A, time_first:1440526998, time_last:1533895598, count:18

17

slide-19
SLIDE 19

Av Avoiding piIalls…

  • IdenGfy all domains behind each IP (Reverse DNS query)

Query: hIps://freeapi.robtex.com/pdns/reverse/93.184.216.34 Response:

rrname:example.org, rrdata:93.184.216.34, rrtype:A, time_first:1440526884, time_last:1535919774, count:18 rrname:www.example.org, rrdata:93.184.216.34, rrtype:A, time_first:1440723354, time_last:1527899734, count:18 rrname:www.example.com, rrdata:93.184.216.34, rrtype:A, time_first:1441108386, time_last:1535371292, count:18 rrname:www.example.net, rrdata:93.184.216.34, rrtype:A, time_first:1436692690, time_last:1527900018, count:18 rrname:imrek.org, rrdata:93.184.216.34, rrtype:A, time_first:1440827324, time_last:1508103356, count:18 rrname:example.net, rrdata:93.184.216.34, rrtype:A, time_first:1440526998, time_last:1533895598, count:18

18

  • IdenGfy all IPs for each domain (Forward DNS query)

Query: hIps://freeapi.robtex.com/pdns/forward/example.com Response:

rrname:example.com, rrdata:2606:280::::::1946, rrtype:AAAA, time_first:1441278890, time_last:1535952170, count:18 rrname:example.com, rrdata:93.184.216.34, rrtype:A, time_first:1441278890, time_last:1535952170, count:18 rrname:example.com, rrdata:208.77.188.166, rrtype:A, time_first:1246678898, time_last:1246678898, count:1

slide-20
SLIDE 20

Av Avoiding piIalls…

  • IdenGfy all domains behind each IP (Reverse DNS query)

Query: hIps://freeapi.robtex.com/pdns/reverse/93.184.216.34 Response:

rrname:example.org, rrdata:93.184.216.34, rrtype:A, time_first:1440526884, time_last:1535919774, count:18 rrname:www.example.org, rrdata:93.184.216.34, rrtype:A, time_first:1440723354, time_last:1527899734, count:18 rrname:www.example.com, rrdata:93.184.216.34, rrtype:A, time_first:1441108386, time_last:1535371292, count:18 rrname:www.example.net, rrdata:93.184.216.34, rrtype:A, time_first:1436692690, time_last:1527900018, count:18 rrname:imrek.org, rrdata:93.184.216.34, rrtype:A, time_first:1440827324, time_last:1508103356, count:18 rrname:example.net, rrdata:93.184.216.34, rrtype:A, time_first:1440526998, time_last:1533895598, count:18

19

  • IdenGfy all IPs for each domain (Forward DNS query)

Query: hIps://freeapi.robtex.com/pdns/forward/example.com Response:

rrname:example.com, rrdata:a.iana-servers.net, rrtype:NS, time_first:1246678898, time_last:1535952170, count:2 rrname:example.com, rrdata:b.iana-servers.net, rrtype:NS, time_first:1246678898, time_last:1535952170, count:2 rrname:example.com, rrdata:2606:280::::::1946, rrtype:AAAA, time_first:1441278890, time_last:1535952170, count:18 rrname:example.com, rrdata:93.184.216.34, rrtype:A, time_first:1441278890, time_last:1535952170, count:18 rrname:example.com, rrdata:208.77.188.166, rrtype:A, time_first:1246678898, time_last:1246678898, count:1

slide-21
SLIDE 21

20 Mapping Table - example.com Domain IP tracker.com 213.121.66.99 analyGcs.com 130.12.88.110 … …

ABP Parser CorrecGon Script

&

Jo Joining ining everything thing togethe ther r

Source country 3rd party flow Mapping IP(s) Filtering DesGnaGon country Spain hIp://tracker.com 213.121.66.99 Ad + Tracking Germany France hIp://example.com 145.100.210.5 Clean USA … … … … … Browser extension with real users hIps://ipmap.ripe.net/

RIPE IPmap

slide-22
SLIDE 22

21

Results - EU 28 member states confinement level

MaxMind geo-locaGon

slide-23
SLIDE 23

22

MaxMind geo-locaGon RIPE IPmap geo-locaGon

Results - EU 28 member states confinement level

slide-24
SLIDE 24

Wha What abo t about sensi)v ut sensi)ve w e web ebsit sites? es?

23

SensiGve categories as defined by GDPR

PoliGcal beliefs Race & Ethnicity Health Sexual OrientaGon Religion GeneGc & biometric data

slide-25
SLIDE 25

Re Results - Sensi)ve websites based on EU 28 users

24

DesGnaGon ConGnent SensiGve Category

slide-26
SLIDE 26
  • 4. How to maintain a balance between accuracy and scalability?
  • 2. How to ensure correct geoloca7on of infrastructure servers?

25

Ch Challen enges es

  • 1. How to effecGvely detect ad and tracking related domains in the wild?
  • 3. How to ensure that all possible ad and tracking servers are observed?
slide-27
SLIDE 27

26

Scaling up – From real users to ISP flows

slide-28
SLIDE 28

Sc Scaling aling up up – Fr From m real users to ISP fl flows

27

List of Ad + Tracking IPs < 28k IPs

Datasets

ISPs Datasets

+

slide-29
SLIDE 29

Sc Scaling aling up up – Fr From m real users to ISP fl flows

28

List of Ad + Tracking IPs < 28k IPs

Datasets

ISPs Datasets

+

Four 24h daily snapshots

  • 1. Wednesday
  • Nov. 8, 2017
  • 2. Wednesday
  • Apr. 4, 2018
  • 3. Wednesday

May 16, 2018

  • 4. Wednesday

June 20, 2018

slide-30
SLIDE 30

Sc Scaling aling up up – Con Con)nen ent le level IS el ISPs r s resu esult lts s

DE-Broadband DE-Mobile PL HU Nov 8 April 4 May 16 June 20 Nov 8 April 4 May 16 June 20 Nov 8 April 4 May 16 June 20 Nov 8 April 4 May 16 June 20 #Sampled Tracking Flows (in Millions) 1,057.0 1,200.8 1,105.3 963.4 70.4 77.4 70.8 74.5 13.8 13.8 12.4 11.9 43.3 50.2 39.3 33.6 EU28 88.5% 87.7% 86.5% 88.3% 91.1% 90.8% 89.9% 92.5% 77.5% 75.6% 74.7% 75% 89.5% 93.1% 92.4% 91.6% North America 10% 9.3% 9.2% 8.4% 6.9% 6.6% 6.4% 5.1% 19.8% 21.5% 22% 21.3% 10.2% 6.3% 7% 7.7% Rest Europe <1% 1.7% 2.9% 1.8% <1% 2% 3.1% 1.3% 1.9% 1.9% 1.7% 3.4% <1% <1% <1% <1% Asia <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% Rest World <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% 1.1% <1% <1% <1% <1% <1%

slide-31
SLIDE 31

30

Sc Scaling aling up up – Con Con)nen ent le level IS el ISPs r s resu esult lts s

DE-Broadband DE-Mobile PL HU Nov 8 April 4 May 16 June 20 Nov 8 April 4 May 16 June 20 Nov 8 April 4 May 16 June 20 Nov 8 April 4 May 16 June 20 #Sampled Tracking Flows (in Millions) 1,057.0 1,200.8 1,105.3 963.4 70.4 77.4 70.8 74.5 13.8 13.8 12.4 11.9 43.3 50.2 39.3 33.6 EU28 88.5% 87.7% 86.5% 88.3% 91.1% 90.8% 89.9% 92.5% 77.5% 75.6% 74.7% 75% 89.5% 93.1% 92.4% 91.6% North America 10% 9.3% 9.2% 8.4% 6.9% 6.6% 6.4% 5.1% 19.8% 21.5% 22% 21.3% 10.2% 6.3% 7% 7.7% Rest Europe <1% 1.7% 2.9% 1.8% <1% 2% 3.1% 1.3% 1.9% 1.9% 1.7% 3.4% <1% <1% <1% <1% Asia <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% Rest World <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% 1.1% <1% <1% <1% <1% <1%

Timeline

November December January February March April May June

GDPR acGvaGon date May 25th

EU28 Confinement %

100 90 80 70 60 50 40 30 20 10

Nov 8

2017 2018

slide-32
SLIDE 32

31

Sc Scaling aling up up – Con Con)nen ent le level IS el ISPs r s resu esult lts s

DE-Broadband DE-Mobile PL HU Nov 8 April 4 May 16 June 20 Nov 8 April 4 May 16 June 20 Nov 8 April 4 May 16 June 20 Nov 8 April 4 May 16 June 20 #Sampled Tracking Flows (in Millions) 1,057.0 1,200.8 1,105.3 963.4 70.4 77.4 70.8 74.5 13.8 13.8 12.4 11.9 43.3 50.2 39.3 33.6 EU28 88.5% 87.7% 86.5% 88.3% 91.1% 90.8% 89.9% 92.5% 77.5% 75.6% 74.7% 75% 89.5% 93.1% 92.4% 91.6% North America 10% 9.3% 9.2% 8.4% 6.9% 6.6% 6.4% 5.1% 19.8% 21.5% 22% 21.3% 10.2% 6.3% 7% 7.7% Rest Europe <1% 1.7% 2.9% 1.8% <1% 2% 3.1% 1.3% 1.9% 1.9% 1.7% 3.4% <1% <1% <1% <1% Asia <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% Rest World <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% 1.1% <1% <1% <1% <1% <1%

Timeline

November December January February March April May June

GDPR acGvaGon date May 25th

EU28 Confinement %

100 90 80 70 60 50 40 30 20 10

Nov 8 April 4

2017 2018

slide-33
SLIDE 33

32

Sc Scaling aling up up – Con Con)nen ent le level IS el ISPs r s resu esult lts s

DE-Broadband DE-Mobile PL HU Nov 8 April 4 May 16 June 20 Nov 8 April 4 May 16 June 20 Nov 8 April 4 May 16 June 20 Nov 8 April 4 May 16 June 20 #Sampled Tracking Flows (in Millions) 1,057.0 1,200.8 1,105.3 963.4 70.4 77.4 70.8 74.5 13.8 13.8 12.4 11.9 43.3 50.2 39.3 33.6 EU28 88.5% 87.7% 86.5% 88.3% 91.1% 90.8% 89.9% 92.5% 77.5% 75.6% 74.7% 75% 89.5% 93.1% 92.4% 91.6% North America 10% 9.3% 9.2% 8.4% 6.9% 6.6% 6.4% 5.1% 19.8% 21.5% 22% 21.3% 10.2% 6.3% 7% 7.7% Rest Europe <1% 1.7% 2.9% 1.8% <1% 2% 3.1% 1.3% 1.9% 1.9% 1.7% 3.4% <1% <1% <1% <1% Asia <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% Rest World <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% 1.1% <1% <1% <1% <1% <1%

Timeline

November December January February March April May June

GDPR acGvaGon date May 25th

EU28 Confinement %

100 90 80 70 60 50 40 30 20 10

Nov 8 April 4 May 16

2017 2018

slide-34
SLIDE 34

33

Sc Scaling aling up up – Con Con)nen ent le level IS el ISPs r s resu esult lts s

DE-Broadband DE-Mobile PL HU Nov 8 April 4 May 16 June 20 Nov 8 April 4 May 16 June 20 Nov 8 April 4 May 16 June 20 Nov 8 April 4 May 16 June 20 #Sampled Tracking Flows (in Millions) 1,057.0 1,200.8 1,105.3 963.4 70.4 77.4 70.8 74.5 13.8 13.8 12.4 11.9 43.3 50.2 39.3 33.6 EU28 88.5% 87.7% 86.5% 88.3% 91.1% 90.8% 89.9% 92.5% 77.5% 75.6% 74.7% 75% 89.5% 93.1% 92.4% 91.6% North America 10% 9.3% 9.2% 8.4% 6.9% 6.6% 6.4% 5.1% 19.8% 21.5% 22% 21.3% 10.2% 6.3% 7% 7.7% Rest Europe <1% 1.7% 2.9% 1.8% <1% 2% 3.1% 1.3% 1.9% 1.9% 1.7% 3.4% <1% <1% <1% <1% Asia <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% Rest World <1% <1% <1% <1% <1% <1% <1% <1% <1% <1% 1.1% <1% <1% <1% <1% <1%

Timeline

November December January February March April May June

GDPR acGvaGon date May 25th

EU28 Confinement %

100 90 80 70 60 50 40 30 20 10

Nov 8 April 4 May 16 June 20

2017 2018

slide-35
SLIDE 35

34

Country level confi fineme ments

ISPs dataset at April 4th

slide-36
SLIDE 36

35

Can we further imp mprove localiza)on?

Two approaches:

  • 1. Using DNS opGmizaGon

Group server IPs (locaGons) based on: a) Fully Qualified Domain Names (FQDN) i.e., sub_d.tracker.com b) Top Level Domain plus one (TLD+1) i.e., tracker.com

  • 2. Using PoP Mirroring

Deploy/migrate PoP servers based on cloud services datacenters availability

slide-37
SLIDE 37

EU 28 localiza)on imp mproveme ment

36

FQDN DNS TLD+1 DNS PoP Mirroring TLD+1 & PoP Mirroring 25% 50% 75% 100% Default 27.6% 88% 52.15% 93.53% Country EU 28

OpGmizaGon policy Overall confinement percentage

DNS RedirecGon

slide-38
SLIDE 38

EU 28 localiza)on imp mproveme ment

37

FQDN DNS TLD+1 DNS PoP Mirroring TLD+1 & PoP Mirroring 25% 50% 75% 100% Default 27.6% 88% 52.15% 93.53% 66.13% 98.33% Country EU 28

OpGmizaGon policy Overall confinement percentage

DNS RedirecGon

slide-39
SLIDE 39

EU 28 localiza)on imp mproveme ment

38

FQDN DNS TLD+1 DNS PoP Mirroring TLD+1 & PoP Mirroring 25% 50% 75% 100% Default 27.6% 88% 52.15% 93.53% 66.13% 98.33% 30.79% 92.09% 68.12% 99.20% Country EU 28

OpGmizaGon policy Overall confinement percentage

DNS RedirecGon PoP Mirroring &

slide-40
SLIDE 40

39

In In t the p e pap aper er

  • Details on the methodology
  • More results
slide-41
SLIDE 41

40

Main t Main tak akea eaways s

  • 1. ≈90% of tracking flows from EU 28 terminates within EU 28
  • 2. Incorrect geolocaGon approach can totally flip the results
  • 3. Country level confinement is correlated with the IT infrastructure
  • 4. DNS redirecGon & PoP Mirroring can improve confinement levels
  • 5. ≈3% of the tracking flows are in sensiGve categories
slide-42
SLIDE 42

Tracing Cross Border Web Tracking

Costas Iordanou email: costas@ima.tu-berlin.de