Between Ad Exchanges Using Retargeted Ads Muhammad Ahmad Bashir , - - PowerPoint PPT Presentation

between ad exchanges
SMART_READER_LITE
LIVE PREVIEW

Between Ad Exchanges Using Retargeted Ads Muhammad Ahmad Bashir , - - PowerPoint PPT Presentation

Tracing Information Flows Between Ad Exchanges Using Retargeted Ads Muhammad Ahmad Bashir , Sajjad Arshad, William Robertson, Christo Wilson Northeastern University Your Privacy Footprint 2 Your Privacy Footprint 2 Your Privacy Footprint 2


slide-1
SLIDE 1

Tracing Information Flows Between Ad Exchanges Using Retargeted Ads

Muhammad Ahmad Bashir, Sajjad Arshad, William Robertson, Christo Wilson Northeastern University

slide-2
SLIDE 2

Your Privacy Footprint

2

slide-3
SLIDE 3

Your Privacy Footprint

2

slide-4
SLIDE 4

Your Privacy Footprint

2

slide-5
SLIDE 5

Your Privacy Footprint

2

slide-6
SLIDE 6

Your Privacy Footprint

2

slide-7
SLIDE 7

Your Privacy Footprint

2

slide-8
SLIDE 8

Real Time Bidding

3

slide-9
SLIDE 9

Real Time Bidding

  • RTB brings more flexibility in the ad ecosystem.
  • Ad request managed by an Ad Exchange which holds an auction.
  • Advertisers bid on each ad impression.

3

slide-10
SLIDE 10

Real Time Bidding

  • RTB brings more flexibility in the ad ecosystem.
  • Ad request managed by an Ad Exchange which holds an auction.
  • Advertisers bid on each ad impression.

3

Exchange Advertiser

Cookie matching is a prerequisite.

slide-11
SLIDE 11

Real Time Bidding

  • RTB brings more flexibility in the ad ecosystem.
  • Ad request managed by an Ad Exchange which holds an auction.
  • Advertisers bid on each ad impression.
  • RTB spending to cross $20B by 2017[1].
  • 49% annual growth.
  • Will account for 80% of US Display Ad spending by 2022.

3

[1] http://www.prnewswire.com/news-releases/new-idc-study-shows-real-time-bidding-rtb-display-ad- spend-to-grow-worldwide-to-208-billion-by-2017-228061051.html Exchange Advertiser

Cookie matching is a prerequisite.

slide-12
SLIDE 12

4

User Publisher Ad Exchange Advertisers

slide-13
SLIDE 13

4

GET, CNN’s Cookie

User Publisher Ad Exchange Advertisers

slide-14
SLIDE 14

4

GET, CNN’s Cookie GET, DoubleClick’s Cookie

User Publisher Ad Exchange Advertisers

slide-15
SLIDE 15

4

GET, CNN’s Cookie GET, DoubleClick’s Cookie

User Publisher Ad Exchange Advertisers

Solicit bids, DoubleClick’s Cookie Bid

slide-16
SLIDE 16

Real Time Bidding (RTB)

4

GET, CNN’s Cookie GET, DoubleClick’s Cookie

User Publisher Ad Exchange Advertisers

Solicit bids, DoubleClick’s Cookie Bid

slide-17
SLIDE 17

Real Time Bidding (RTB)

4

GET, CNN’s Cookie GET, DoubleClick’s Cookie

User Publisher Ad Exchange Advertisers

Solicit bids, DoubleClick’s Cookie GET, RightMedia’s Cookie Advertisement Bid

slide-18
SLIDE 18

Real Time Bidding (RTB)

4

GET, CNN’s Cookie GET, DoubleClick’s Cookie

User Publisher Ad Exchange Advertisers

Solicit bids, DoubleClick’s Cookie GET, RightMedia’s Cookie Advertisement Bid

Advertisers cannot read their cookie!

slide-19
SLIDE 19

Cookie Matching

Key problem: Advertisers cannot read their cookies in the RTB auction

  • How can they submit reasonable bids if they cannot identify the user?

Solution: cookie matching

  • Also known as cookie synching
  • Process of linking the identifiers used by two ad exchanges

5

slide-20
SLIDE 20

Cookie Matching

Key problem: Advertisers cannot read their cookies in the RTB auction

  • How can they submit reasonable bids if they cannot identify the user?

Solution: cookie matching

  • Also known as cookie synching
  • Process of linking the identifiers used by two ad exchanges

5

GET, Cookie=12345 301 Redirect, Location=http://criteo.com/?dblclk_id=12345

slide-21
SLIDE 21

Cookie Matching

Key problem: Advertisers cannot read their cookies in the RTB auction

  • How can they submit reasonable bids if they cannot identify the user?

Solution: cookie matching

  • Also known as cookie synching
  • Process of linking the identifiers used by two ad exchanges

5

GET, Cookie=12345 301 Redirect, Location=http://criteo.com/?dblclk_id=12345

slide-22
SLIDE 22

Cookie Matching

Key problem: Advertisers cannot read their cookies in the RTB auction

  • How can they submit reasonable bids if they cannot identify the user?

Solution: cookie matching

  • Also known as cookie synching
  • Process of linking the identifiers used by two ad exchanges

5

GET, Cookie=12345 GET ?dblclk_id=12345, Cookie=ABCDE 301 Redirect, Location=http://criteo.com/?dblclk_id=12345

slide-23
SLIDE 23

Cookie Matching

Key problem: Advertisers cannot read their cookies in the RTB auction

  • How can they submit reasonable bids if they cannot identify the user?

Solution: cookie matching

  • Also known as cookie synching
  • Process of linking the identifiers used by two ad exchanges

5

GET, Cookie=12345 GET ?dblclk_id=12345, Cookie=ABCDE 301 Redirect, Location=http://criteo.com/?dblclk_id=12345

slide-24
SLIDE 24

Cookie Matching

Key problem: Advertisers cannot read their cookies in the RTB auction

  • How can they submit reasonable bids if they cannot identify the user?

Solution: cookie matching

  • Also known as cookie synching
  • Process of linking the identifiers used by two ad exchanges

5

GET, Cookie=12345 GET ?dblclk_id=12345, Cookie=ABCDE 301 Redirect, Location=http://criteo.com/?dblclk_id=12345

slide-25
SLIDE 25

Prior Work

  • Several studies have examined cookie matching
  • Acar et al. found hundreds of domains passing identifiers to each other
  • Olejnik et al. found 125 exchanges matching cookies
  • Falahrastegar et al. analyzed clusters of exchanges that share the exact same

cookies

6

slide-26
SLIDE 26

Prior Work

  • Several studies have examined cookie matching
  • Acar et al. found hundreds of domains passing identifiers to each other
  • Olejnik et al. found 125 exchanges matching cookies
  • Falahrastegar et al. analyzed clusters of exchanges that share the exact same

cookies

  • These studies rely on studying HTTP requests/responses.

6

slide-27
SLIDE 27

Challenge 1: Server Side Matching

7

slide-28
SLIDE 28

Challenge 1: Server Side Matching

7

1)

Criteo observes the user.

(IP: 207.91.160.7)

slide-29
SLIDE 29

Challenge 1: Server Side Matching

7

1) 2)

Criteo observes the user.

(IP: 207.91.160.7)

RightMedia observes the user.

(IP: 207.91.160.7)

slide-30
SLIDE 30

Challenge 1: Server Side Matching

7

1) 2)

Criteo observes the user.

(IP: 207.91.160.7)

RightMedia observes the user.

(IP: 207.91.160.7)

Behind the scene, RightMedia and Criteo sync up. (IP: 207.91.160.7)

slide-31
SLIDE 31

Challenge 2: Obfuscation

8

slide-32
SLIDE 32

Challenge 2: Obfuscation

8

slide-33
SLIDE 33

Challenge 2: Obfuscation

8

amazon.com

slide-34
SLIDE 34

Challenge 2: Obfuscation

8

amazon.com dbclk.js

slide-35
SLIDE 35

Challenge 2: Obfuscation

8

GET %^$ck#&93#&, Cookie=XYZYX amazon.com dbclk.js

slide-36
SLIDE 36

Challenge 2: Obfuscation

8

GET %^$ck#&93#&, Cookie=XYZYX amazon.com dbclk.js

slide-37
SLIDE 37

Challenge 2: Obfuscation

8

GET %^$ck#&93#&, Cookie=XYZYX amazon.com dbclk.js

slide-38
SLIDE 38

Challenge 2: Obfuscation

8

GET %^$ck#&93#&, Cookie=XYZYX amazon.com dbclk.js

slide-39
SLIDE 39

Challenge 2: Obfuscation

8

GET %^$ck#&93#&, Cookie=XYZYX amazon.com dbclk.js

slide-40
SLIDE 40

Goal

Develop a method to identify information flows (cookie matching) between ad exchanges

  • Mechanism agnostic: resilient to obfuscation
  • Platform agnostic: detect sharing on the client- and server-side

9

slide-41
SLIDE 41

Goal

Develop a method to identify information flows (cookie matching) between ad exchanges

  • Mechanism agnostic: resilient to obfuscation
  • Platform agnostic: detect sharing on the client- and server-side

9

?

slide-42
SLIDE 42

Key Insight: Use Retargeted Ads

Retargeted ads are the most highly targeted form of online ads

10 $15.99

slide-43
SLIDE 43

Key Insight: Use Retargeted Ads

Retargeted ads are the most highly targeted form of online ads

10 $15.99

slide-44
SLIDE 44

Key Insight: Use Retargeted Ads

Retargeted ads are the most highly targeted form of online ads

10

Key insight: because retargets are so specific, they can be used to conduct controlled experiments

  • Information must be shared between ad exchanges to serve retargeted ads

$15.99

slide-45
SLIDE 45

Contributions

  • 1. Novel methodology for identifying information flows between ad

exchanges

  • 2. Demonstrate the impact of ad network obfuscation in practice
  • 31% of cookie matching partners cannot be identified using heuristics
  • 3. Develop a method to categorize information sharing relationships
  • 4. Use graph analysis to infer the roles of actors in the ad ecosystem

11

slide-46
SLIDE 46

Contributions

  • 1. Novel methodology for identifying information flows between ad

exchanges

  • 2. Demonstrate the impact of ad network obfuscation in practice
  • 31% of cookie matching partners cannot be identified using heuristics
  • 3. Develop a method to categorize information sharing relationships
  • 4. Use graph analysis to infer the roles of actors in the ad ecosystem

11

slide-47
SLIDE 47

Data Collection Classifying Ad Network Flows Results

12

slide-48
SLIDE 48

Using Retargets as an Experimental Tool

13

Key observation: retargets are only served under very specific circumstances

1)

slide-49
SLIDE 49

Using Retargets as an Experimental Tool

13

Key observation: retargets are only served under very specific circumstances

1)

Advertiser observes the user at a shop

slide-50
SLIDE 50

Using Retargets as an Experimental Tool

13

Key observation: retargets are only served under very specific circumstances

1) 2)

Advertiser observes the user at a shop

slide-51
SLIDE 51

Using Retargets as an Experimental Tool

13

Key observation: retargets are only served under very specific circumstances

1) 2)

Advertiser observes the user at a shop Advertiser and the exchange must have matched cookies

slide-52
SLIDE 52

Using Retargets as an Experimental Tool

This implies a causal flow of information from Exchange  Advertiser

13

Key observation: retargets are only served under very specific circumstances

1) 2)

Advertiser observes the user at a shop Advertiser and the exchange must have matched cookies

slide-53
SLIDE 53

Data Collection Overview

14

slide-54
SLIDE 54

Data Collection Overview

14

Single Persona

10 websites/persona 10 products/website Visit Persona

slide-55
SLIDE 55

Data Collection Overview

14

150 Publishers 15 pages/publisher

Single Persona

10 websites/persona 10 products/website Visit Persona Visit Publishers

slide-56
SLIDE 56

Data Collection Overview

14

150 Publishers 15 pages/publisher

Single Persona

10 websites/persona 10 products/website Visit Persona Visit Publishers Store Images, Inclusion Chains, HTTP requests/ responses

571,636 Images

slide-57
SLIDE 57

Data Collection Overview

14

150 Publishers 15 pages/publisher

Single Persona

10 websites/persona 10 products/website Visit Persona Visit Publishers Store Images, Inclusion Chains, HTTP requests/ responses

571,636 Images

slide-58
SLIDE 58

Data Collection Overview

14

150 Publishers 15 pages/publisher

Single Persona

10 websites/persona 10 products/website Visit Persona Visit Publishers Store Images, Inclusion Chains, HTTP requests/ responses

90 Personas

571,636 Images

{

slide-59
SLIDE 59

Data Collection Overview

14

150 Publishers 15 pages/publisher

Single Persona

10 websites/persona 10 products/website Visit Persona Visit Publishers Store Images, Inclusion Chains, HTTP requests/ responses Potential Targeted Ads 31,850 Ad Detection Filter Images which appeared in > 1 persona

90 Personas

571,636 Images

{

slide-60
SLIDE 60

Data Collection Overview

14

150 Publishers 15 pages/publisher

Single Persona

10 websites/persona 10 products/website Visit Persona Visit Publishers Store Images, Inclusion Chains, HTTP requests/ responses Potential Targeted Ads 31,850 Ad Detection Isolated Retargeted Ads Filter Images which appeared in > 1 persona

90 Personas

571,636 Images

Crowd Sourcing

{

slide-61
SLIDE 61

Crowd Sourcing

15

We used Amazon Mechanical Turk (AMT) to label 31,850 ads.

slide-62
SLIDE 62

Crowd Sourcing

15

We used Amazon Mechanical Turk (AMT) to label 31,850 ads.

  • Total 1,142 Tasks.
  • 30 ads / Task.
  • 27 unlabeled.
  • 3 labeled by us.
  • 2 workers per ad.
  • $415 spent.
slide-63
SLIDE 63

Crowd Sourcing

15

We used Amazon Mechanical Turk (AMT) to label 31,850 ads.

  • Total 1,142 Tasks.
  • 30 ads / Task.
  • 27 unlabeled.
  • 3 labeled by us.
  • 2 workers per ad.
  • $415 spent.
slide-64
SLIDE 64

Crowd Sourcing

15

We used Amazon Mechanical Turk (AMT) to label 31,850 ads.

  • Total 1,142 Tasks.
  • 30 ads / Task.
  • 27 unlabeled.
  • 3 labeled by us.
  • 2 workers per ad.
  • $415 spent.
slide-65
SLIDE 65

Crowd Sourcing

15

We used Amazon Mechanical Turk (AMT) to label 31,850 ads.

  • Total 1,142 Tasks.
  • 30 ads / Task.
  • 27 unlabeled.
  • 3 labeled by us.
  • 2 workers per ad.
  • $415 spent.
slide-66
SLIDE 66

Final Dataset

5,102 unique retargeted ads

  • From 281 distinct online retailers

35,448 publisher-side chains that served the retargets

  • We observed some retargets multiple times

16

slide-67
SLIDE 67

Data Collection Classifying Ad Network Flows Results

17

slide-68
SLIDE 68

A look at Publisher Chains

18

slide-69
SLIDE 69

A look at Publisher Chains

18

Example Publisher-side chain

slide-70
SLIDE 70

A look at Publisher Chains

18

Example Shopper-side chain Publisher-side chain

slide-71
SLIDE 71

A look at Publisher Chains

18

Example Shopper-side chain Publisher-side chain

  • How does Criteo know to serve ad on BBC?
slide-72
SLIDE 72

A look at Publisher Chains

18

Example Shopper-side chain Publisher-side chain

  • How does Criteo know to serve ad on BBC?
  • In this case it is pretty trivial.
  • Criteo observed us on the shopper.
slide-73
SLIDE 73

A look at Publisher Chains

18

Example Shopper-side chain Publisher-side chain

  • How does Criteo know to serve ad on BBC?
  • In this case it is pretty trivial.
  • Criteo observed us on the shopper.
  • Can we classify all such publisher-side chains?
slide-74
SLIDE 74

What is a Chain?

19

slide-75
SLIDE 75

What is a Chain?

19

slide-76
SLIDE 76

What is a Chain?

19

slide-77
SLIDE 77

What is a Chain?

19

slide-78
SLIDE 78

What is a Chain?

19

slide-79
SLIDE 79

What is a Chain?

19

slide-80
SLIDE 80

What is a Chain?

19

slide-81
SLIDE 81

What is a Chain?

19

slide-82
SLIDE 82

What is a Chain?

19

slide-83
SLIDE 83

What is a Chain?

19

slide-84
SLIDE 84

What is a Chain?

19

slide-85
SLIDE 85

What is a Chain?

19

a a e e

slide-86
SLIDE 86

What is a Chain?

19

^pub .* e a$

a a e e

slide-87
SLIDE 87

Four Classifications

Four possible ways for a retargeted ad to be served

1. Direct (Trivial) Matching 2. Cookie Matching 3. Indirect Matching 4. Latent (Server-side) Matching

20

slide-88
SLIDE 88

Four Classifications

Four possible ways for a retargeted ad to be served

1. Direct (Trivial) Matching 2. Cookie Matching 3. Indirect Matching 4. Latent (Server-side) Matching

20

slide-89
SLIDE 89

1) Direct (Trivial) Matching

21

Shopper-side Publisher-side Example Rule

slide-90
SLIDE 90

1) Direct (Trivial) Matching

21

Shopper-side Publisher-side Example Rule ^shop .* a .*$ ^pub a$

slide-91
SLIDE 91

1) Direct (Trivial) Matching

21

Shopper-side Publisher-side Example Rule ^shop .* a .*$ ^pub a$ a is the advertiser that serves the retarget

slide-92
SLIDE 92

1) Direct (Trivial) Matching

21

Shopper-side Publisher-side Example Rule ^shop .* a .*$ ^pub a$ a is the advertiser that serves the retarget a must appear

  • n the shopper-

side… … but other trackers may also appear

slide-93
SLIDE 93

2) Cookie Matching

22

Shopper-side Publisher-side Example Rule

slide-94
SLIDE 94

2) Cookie Matching

22

Shopper-side Publisher-side Example Rule ^shop .* a .*$ ^pub .* e a$

slide-95
SLIDE 95

2) Cookie Matching

22

Shopper-side Publisher-side Example Rule ^shop .* a .*$ ^pub .* e a$ e precedes a, which implies an RTB auction

slide-96
SLIDE 96

2) Cookie Matching

22

Shopper-side Publisher-side Example Rule ^shop .* a .*$ a must appear

  • n the

shopper-side ^pub .* e a$ e precedes a, which implies an RTB auction

slide-97
SLIDE 97

2) Cookie Matching

22

Shopper-side Publisher-side Example Rule ^shop .* a .*$ a must appear

  • n the

shopper-side ^pub .* e a$ ^* .* e a .*$ Anywhere e precedes a, which implies an RTB auction Transition ea is where cookie match occurs

slide-98
SLIDE 98

3) Latent (Server-side) Matching

23

Shopper-side Publisher-side Example Rule

slide-99
SLIDE 99

3) Latent (Server-side) Matching

23

Shopper-side Publisher-side Example Rule ^shop [^ea]$ ^pub .* e a$

slide-100
SLIDE 100

3) Latent (Server-side) Matching

23

Shopper-side Publisher-side Example Rule ^shop [^ea]$ Neither e nor a appears on the shopper-side ^pub .* e a$

slide-101
SLIDE 101

3) Latent (Server-side) Matching

23

Shopper-side Publisher-side Example Rule ^shop [^ea]$ Neither e nor a appears on the shopper-side ^pub .* e a$ a must receive information from some shopper-side tracker

slide-102
SLIDE 102

3) Latent (Server-side) Matching

23

Shopper-side Publisher-side Example Rule ^shop [^ea]$ Neither e nor a appears on the shopper-side ^pub .* e a$ a must receive information from some shopper-side tracker We find latent matches in practice!

slide-103
SLIDE 103

Data Collection Classifying Ad Network Flows Results

24

slide-104
SLIDE 104

Categorizing Chains

Type Chains % Chains % Direct (Trivial) Match 1770 5 8449 24 Cookie Match 25049 71 25873 73 Latent (Server-side) Match 5362 15 343 1 No Match 775 2 183 1

25

Clustered Raw Chains

slide-105
SLIDE 105

Categorizing Chains

Type Chains % Chains % Direct (Trivial) Match 1770 5 8449 24 Cookie Match 25049 71 25873 73 Latent (Server-side) Match 5362 15 343 1 No Match 775 2 183 1

25

Clustered

Take away:

Raw Chains

slide-106
SLIDE 106

Categorizing Chains

Type Chains % Chains % Direct (Trivial) Match 1770 5 8449 24 Cookie Match 25049 71 25873 73 Latent (Server-side) Match 5362 15 343 1 No Match 775 2 183 1

25

Clustered

Take away:

1- As expected, most retargets are due to cookie matching Raw Chains

slide-107
SLIDE 107

Categorizing Chains

Type Chains % Chains % Direct (Trivial) Match 1770 5 8449 24 Cookie Match 25049 71 25873 73 Latent (Server-side) Match 5362 15 343 1 No Match 775 2 183 1

25

Clustered

Take away:

1- As expected, most retargets are due to cookie matching 2- Very small number of chains that cannot be categorized

  • Suggests low false positive rate of AMT image labeling task

Raw Chains

slide-108
SLIDE 108

Categorizing Chains

Type Chains % Chains % Direct (Trivial) Match 1770 5 8449 24 Cookie Match 25049 71 25873 73 Latent (Server-side) Match 5362 15 343 1 No Match 775 2 183 1

25

Clustered

Take away:

1- As expected, most retargets are due to cookie matching 2- Very small number of chains that cannot be categorized

  • Suggests low false positive rate of AMT image labeling task

3- Surprisingly large amount latent matches…

Raw Chains

slide-109
SLIDE 109

Categorizing Chains

Type Chains % Chains % Direct (Trivial) Match 1770 5 8449 24 Cookie Match 25049 71 25873 73 Latent (Server-side) Match 5362 15 343 1 No Match 775 2 183 1

26

Raw Chains Clustered Chains

Cluster together domains by “owner”

  • E.g. google.com, doubleclick.com, googlesyndication.com
slide-110
SLIDE 110

Categorizing Chains

Type Chains % Chains % Direct (Trivial) Match 1770 5 8449 24 Cookie Match 25049 71 25873 73 Latent (Server-side) Match 5362 15 343 1 No Match 775 2 183 1

26

Raw Chains Clustered Chains

Cluster together domains by “owner”

  • E.g. google.com, doubleclick.com, googlesyndication.com

Latent matches essentially disappear

  • The vast majority of these chains involve Google
  • Suggests that Google shares tracking data across their services
slide-111
SLIDE 111

Who is Cookie Matching?

Participant 1 Participant 2 Chains Ads Heuristics criteo  googlesyndication 9090 1887  P criteo  doubleclick 3610 1144  E, P  DC, P criteo  adnxs 3263 1066  E, P criteo  rubiconproject 1586 749  E, P criteo  servedbyopenx 707 460  P doubleclick  steelhousemedia 362 27  P  E, P mathtag  mediaforge 360 124  E, P netmng  scene7 267 119  E  ? googlesyndication  adsrvr 107 29  P rubiconproject  steelhousemedia 86 30  E googlesyndication  steelhousemedia 47 22 ? adtechus  adacado 36 18 ? atwola  adacado 32 6 ? adroll  adnxs 31 8 ?

27

Heuristics Key

(used by prior work)

E – share exact cookies P – special URL parameters DC – DoubleClick URL parameters ? – Unknown sharing method

slide-112
SLIDE 112

Who is Cookie Matching?

Participant 1 Participant 2 Chains Ads Heuristics criteo  googlesyndication 9090 1887  P criteo  doubleclick 3610 1144  E, P  DC, P criteo  adnxs 3263 1066  E, P criteo  rubiconproject 1586 749  E, P criteo  servedbyopenx 707 460  P doubleclick  steelhousemedia 362 27  P  E, P mathtag  mediaforge 360 124  E, P netmng  scene7 267 119  E  ? googlesyndication  adsrvr 107 29  P rubiconproject  steelhousemedia 86 30  E googlesyndication  steelhousemedia 47 22 ? adtechus  adacado 36 18 ? atwola  adacado 32 6 ? adroll  adnxs 31 8 ?

27

Heuristics Key

(used by prior work)

E – share exact cookies P – special URL parameters DC – DoubleClick URL parameters ? – Unknown sharing method

slide-113
SLIDE 113

Who is Cookie Matching?

Participant 1 Participant 2 Chains Ads Heuristics criteo  googlesyndication 9090 1887  P criteo  doubleclick 3610 1144  E, P  DC, P criteo  adnxs 3263 1066  E, P criteo  rubiconproject 1586 749  E, P criteo  servedbyopenx 707 460  P doubleclick  steelhousemedia 362 27  P  E, P mathtag  mediaforge 360 124  E, P netmng  scene7 267 119  E  ? googlesyndication  adsrvr 107 29  P rubiconproject  steelhousemedia 86 30  E googlesyndication  steelhousemedia 47 22 ? adtechus  adacado 36 18 ? atwola  adacado 32 6 ? adroll  adnxs 31 8 ?

27

Heuristics Key

(used by prior work)

E – share exact cookies P – special URL parameters DC – DoubleClick URL parameters ? – Unknown sharing method

slide-114
SLIDE 114

Who is Cookie Matching?

Participant 1 Participant 2 Chains Ads Heuristics criteo  googlesyndication 9090 1887  P criteo  doubleclick 3610 1144  E, P  DC, P criteo  adnxs 3263 1066  E, P criteo  rubiconproject 1586 749  E, P criteo  servedbyopenx 707 460  P doubleclick  steelhousemedia 362 27  P  E, P mathtag  mediaforge 360 124  E, P netmng  scene7 267 119  E  ? googlesyndication  adsrvr 107 29  P rubiconproject  steelhousemedia 86 30  E googlesyndication  steelhousemedia 47 22 ? adtechus  adacado 36 18 ? atwola  adacado 32 6 ? adroll  adnxs 31 8 ?

27

Heuristics Key

(used by prior work)

E – share exact cookies P – special URL parameters DC – DoubleClick URL parameters ? – Unknown sharing method

slide-115
SLIDE 115

Who is Cookie Matching?

Participant 1 Participant 2 Chains Ads Heuristics criteo  googlesyndication 9090 1887  P criteo  doubleclick 3610 1144  E, P  DC, P criteo  adnxs 3263 1066  E, P criteo  rubiconproject 1586 749  E, P criteo  servedbyopenx 707 460  P doubleclick  steelhousemedia 362 27  P  E, P mathtag  mediaforge 360 124  E, P netmng  scene7 267 119  E  ? googlesyndication  adsrvr 107 29  P rubiconproject  steelhousemedia 86 30  E googlesyndication  steelhousemedia 47 22 ? adtechus  adacado 36 18 ? atwola  adacado 32 6 ? adroll  adnxs 31 8 ?

27

Heuristics Key

(used by prior work)

E – share exact cookies P – special URL parameters DC – DoubleClick URL parameters ? – Unknown sharing method

slide-116
SLIDE 116

Who is Cookie Matching?

Participant 1 Participant 2 Chains Ads Heuristics criteo  googlesyndication 9090 1887  P criteo  doubleclick 3610 1144  E, P  DC, P criteo  adnxs 3263 1066  E, P criteo  rubiconproject 1586 749  E, P criteo  servedbyopenx 707 460  P doubleclick  steelhousemedia 362 27  P  E, P mathtag  mediaforge 360 124  E, P netmng  scene7 267 119  E  ? googlesyndication  adsrvr 107 29  P rubiconproject  steelhousemedia 86 30  E googlesyndication  steelhousemedia 47 22 ? adtechus  adacado 36 18 ? atwola  adacado 32 6 ? adroll  adnxs 31 8 ?

27

Heuristics Key

(used by prior work)

E – share exact cookies P – special URL parameters DC – DoubleClick URL parameters ? – Unknown sharing method

slide-117
SLIDE 117

Who is Cookie Matching?

Participant 1 Participant 2 Chains Ads Heuristics criteo  googlesyndication 9090 1887  P criteo  doubleclick 3610 1144  E, P  DC, P criteo  adnxs 3263 1066  E, P criteo  rubiconproject 1586 749  E, P criteo  servedbyopenx 707 460  P doubleclick  steelhousemedia 362 27  P  E, P mathtag  mediaforge 360 124  E, P netmng  scene7 267 119  E  ? googlesyndication  adsrvr 107 29  P rubiconproject  steelhousemedia 86 30  E googlesyndication  steelhousemedia 47 22 ? adtechus  adacado 36 18 ? atwola  adacado 32 6 ? adroll  adnxs 31 8 ?

27

Heuristics Key

(used by prior work)

E – share exact cookies P – special URL parameters DC – DoubleClick URL parameters ? – Unknown sharing method

31% of cookie matching partners would be missed.

slide-118
SLIDE 118

Summary

We develop a novel methodology to detect information flows between ad exchanges

  • Controlled methodology enables causal inference
  • Defeats obfuscation attempts
  • Detects client- and server-side flows

Dataset gives a better picture of ad ecosystem

  • Reveals which ad exchanges are linking information about users
  • Allows us to reason about how information is being transferred

28

slide-119
SLIDE 119

Questions?

Muhammad Ahmad Bashir ahmad@ccs.neu.edu

29

slide-120
SLIDE 120

Inclusion Chains

  • Instrumented Chromium binary that records the provenance of page

elements

  • Uses Information Flow Analysis techniques (IFA)
  • Handles Flash, exec(), setTimeout(), cross-frame, inline scripts, etc.

30

slide-121
SLIDE 121

Inclusion Chains

  • Instrumented Chromium binary that records the provenance of page

elements

  • Uses Information Flow Analysis techniques (IFA)
  • Handles Flash, exec(), setTimeout(), cross-frame, inline scripts, etc.

30

<html> <body> <script src=“b.com/adlib.js”></script> <iframe src=“c.net/adbox.html”> <html> <script src=“code.js”></script> <object data=“d.org/flash.swf”> </object> </html> </iframe> </body> </html>

DOM: a.com/index.html

slide-122
SLIDE 122

Inclusion Chains

  • Instrumented Chromium binary that records the provenance of page

elements

  • Uses Information Flow Analysis techniques (IFA)
  • Handles Flash, exec(), setTimeout(), cross-frame, inline scripts, etc.

30

<html> <body> <script src=“b.com/adlib.js”></script> <iframe src=“c.net/adbox.html”> <html> <script src=“code.js”></script> <object data=“d.org/flash.swf”> </object> </html> </iframe> </body> </html>

DOM: a.com/index.html Inclusion Chain a.com/index.html b.com/adlib.js c.net/adbox.html c.net/code.js d.org/flash.swf

slide-123
SLIDE 123

3) Indirect Matching

31

Shopper-side Publisher-side Example Rule

slide-124
SLIDE 124

[^a]

3) Indirect Matching

31

Shopper-side Publisher-side Example Rule ^shop e [^a]$ ^pub .* e a$

slide-125
SLIDE 125

[^a]

3) Indirect Matching

31

Shopper-side Publisher-side Example Rule ^shop e [^a]$ Only the exchange e appears on the shopper-side… ^pub .* e a$

slide-126
SLIDE 126

[^a]

3) Indirect Matching

31

Shopper-side Publisher-side Example Rule ^shop e [^a]$ Only the exchange e appears on the shopper-side… ^pub .* e a$ e must pass browsing history data to participants in the auction, thus no cookie matching is necessary

slide-127
SLIDE 127

[^a]

3) Indirect Matching

31

Shopper-side Publisher-side Example Rule ^shop e [^a]$ Only the exchange e appears on the shopper-side… ^pub .* e a$ e must pass browsing history data to participants in the auction, thus no cookie matching is necessary We do not expect to find indirect matches in the data.

slide-128
SLIDE 128

References

Gunes Acar, Christian Eubank, Steven Englehardt, Marc Juarez, Arvind Narayanan, Claudia Diaz. “The web never forgets: Persistent tracking mechanisms in the wild.” CCS, 2014. Muhammad Ahmad Bashir, Sajjad Arshad, William Robertson, Christo

  • Wilson. “Tracing Information Flows Between Ad Exchanges Using

Retargeted Ads.” Usenix Security, 2016. Marjan Falahrastegar, Hamed Haddadi, Steve Uhlig, Richard Mortier. “Tracking personal identifiers across the web.” PAM, 2016. Lukasz Olejnik, Tran Minh-Dung, Claude Castelluccia. “Selling off privacy at auction.” NDSS, 2014.

32

slide-129
SLIDE 129

Filtering Images

Filter Total Unique Images All images from the crawlers 571,636 Use EasyList to identify advertisements 93,726 Remove ads that are shown to >1 persona 31,850 Use crowdsourcing to locate retargets 5,102

33

slide-130
SLIDE 130

Filtering Images

Filter Total Unique Images All images from the crawlers 571,636 Use EasyList to identify advertisements 93,726 Remove ads that are shown to >1 persona 31,850 Use crowdsourcing to locate retargets 5,102

33

slide-131
SLIDE 131

Filtering Images

Filter Total Unique Images All images from the crawlers 571,636 Use EasyList to identify advertisements 93,726 Remove ads that are shown to >1 persona 31,850 Use crowdsourcing to locate retargets 5,102

33

  • Personas visited non-overlapping retailers
  • By definition, retargets should only be shown to a single persona
slide-132
SLIDE 132

Filtering Images

Filter Total Unique Images All images from the crawlers 571,636 Use EasyList to identify advertisements 93,726 Remove ads that are shown to >1 persona 31,850 Use crowdsourcing to locate retargets 5,102

33

  • Personas visited non-overlapping retailers
  • By definition, retargets should only be shown to a single persona
  • Spent $415 uploading 1,142 HITs to Amazon Mechanical Turk
  • Each HIT asked the worker to label 30 ad images
  • 27 were unlabeled, 3 were known retargets (control images)
  • All ads were labeled by 2 workers
  • Any ad identified as targeted was also manually inspected by us