Quantity vs. Quality: Evaluating User Interest Profiles Using Ad - - PowerPoint PPT Presentation

quantity vs quality evaluating user interest profiles
SMART_READER_LITE
LIVE PREVIEW

Quantity vs. Quality: Evaluating User Interest Profiles Using Ad - - PowerPoint PPT Presentation

Quantity vs. Quality: Evaluating User Interest Profiles Using Ad Preference Managers Muhammad Ahmad Bashir Umar Farooq Maryam Shahid Muhammad Fareed Za ff ar Christo Wilson Online Tracking 2 Online Tracking 2 Online


slide-1
SLIDE 1

Quantity vs. Quality: Evaluating User Interest Profiles Using Ad Preference Managers

Muhammad Ahmad Bashir Umar Farooq Maryam Shahid 
 Muhammad Fareed Zaffar Christo Wilson

slide-2
SLIDE 2

Online Tracking

2

slide-3
SLIDE 3

Online Tracking

2

slide-4
SLIDE 4

Online Tracking

2

slide-5
SLIDE 5

Online Tracking

soccer shoes sports shoes men’s soccer shoes soccer shoes soccer shoes
 (phantom series) shoes

2

slide-6
SLIDE 6

Inferences Used For Targeted Ads

washingtonpost.com

3

slide-7
SLIDE 7

Inferences Used For Targeted Ads

washingtonpost.com

3

slide-8
SLIDE 8

We Don’t Know What Ad Networks Infer

4

slide-9
SLIDE 9

We Don’t Know What Ad Networks Infer

4

slide-10
SLIDE 10

Goals of the Study

  • 1. Who knows what and how much?
  • 2. How do users perceive interests inferred about them?
  • 3. How are the interests inferred?
  • 4. How do privacy practices impact amount of inferences drawn?

5

5

slide-11
SLIDE 11

Goals of the Study

  • 1. Who knows what and how much?
  • 2. How do users perceive interests inferred about them?
  • 3. How are the interests inferred?
  • 4. How do privacy practices impact amount of inferences drawn?

5

5

slide-12
SLIDE 12

Ad Preference Managers (APMs)

  • Transparency tools
  • Let users control the inferred interests about them

6

slide-13
SLIDE 13

Ad Preference Managers (APMs)

  • Transparency tools
  • Let users control the inferred interests about them

6

slide-14
SLIDE 14

Overview

  • 1. Data collection
  • 2. Interests inferred by different APMs
  • 3. Perception of interests
  • 4. Limitations & Conclusion

7

slide-15
SLIDE 15

Data Collection

8

slide-16
SLIDE 16

Data Collection

  • We recruited 220 participants
  • 82 from Pakistan (university students), 138 from US (crowdsource)

8

slide-17
SLIDE 17

Data Collection

  • We recruited 220 participants
  • 82 from Pakistan (university students), 138 from US (crowdsource)
  • Used our browser extension to

A. Take a survey B. Contribute data from their APMs + Historical Data

8

slide-18
SLIDE 18

Data Collection

  • We recruited 220 participants
  • 82 from Pakistan (university students), 138 from US (crowdsource)
  • Used our browser extension to

A. Take a survey B. Contribute data from their APMs + Historical Data

Ethics

  • Obtained IRB from both LUMS and Northeastern University
  • Obtained informed consent.

8

slide-19
SLIDE 19

Browser Extension

Foreground Background

9

slide-20
SLIDE 20

Browser Extension

Foreground Background

Basic Demographics

Age Location Education

9

slide-21
SLIDE 21

Browser Extension

Foreground Background

Basic Demographics

Age Location Education

General Web Usage

9

slide-22
SLIDE 22

Browser Extension

Foreground Background

Basic Demographics

Age Location Education

General Web Usage Online Ads & Privacy Practices

9

slide-23
SLIDE 23

Browser Extension

Foreground Background

Basic Demographics

Age Location Education

General Web Usage Online Ads & Privacy Practices Historical Data

Browsing History Search History

9

slide-24
SLIDE 24

Browser Extension

Foreground Background

Basic Demographics

Age Location Education

General Web Usage Online Ads & Privacy Practices Ad Preference Managers Historical Data

Browsing History Search History

9

slide-25
SLIDE 25

Browser Extension

Foreground Background

Basic Demographics

Age Location Education

General Web Usage Online Ads & Privacy Practices Ad Preference Managers Historical Data

Browsing History Search History

Dynamic 
 Questions

Randomly Sampled Interests

9

slide-26
SLIDE 26

Dynamic Questions

10

slide-27
SLIDE 27

Dynamic Questions

11

slide-28
SLIDE 28

Summary of Data Collection

220 participants (82 from Pakistan, 138 from US) For each participant, we have:

12

slide-29
SLIDE 29

Summary of Data Collection

220 participants (82 from Pakistan, 138 from US) For each participant, we have:

Foreground Background

12

slide-30
SLIDE 30

Summary of Data Collection

220 participants (82 from Pakistan, 138 from US) For each participant, we have:

Foreground Background

  • Survey
  • 1. Basic demographics
  • 2. General web usage
  • 3. Interaction with Ads
  • 4. Privacy practices
  • 5. Knowledge about APMs
  • 6. Relevance of interests

12

slide-31
SLIDE 31

Summary of Data Collection

220 participants (82 from Pakistan, 138 from US) For each participant, we have:

Foreground Background

  • Survey
  • 1. Basic demographics
  • 2. General web usage
  • 3. Interaction with Ads
  • 4. Privacy practices
  • 5. Knowledge about APMs
  • 6. Relevance of interests
  • Interests from 4 APMS
  • 1. Facebook
  • 2. Google
  • 3. BlueKai
  • 4. eXelate
  • Browsing history (last 3 months)
  • Search term history (last 3 months)

12

slide-32
SLIDE 32

Goals of the Study

  • 1. Who knows what and how much?
  • What inferences are drawn by each APM?
  • Does every APM infer the same information?
  • 2. How do users perceive these interests inferred about them?

13

slide-33
SLIDE 33

Which APM Knows More?

14

slide-34
SLIDE 34

Which APM Knows More?

Inferred Interests APM Users Unique Total

  • Avg. per User

Google 213 594 9,013 42.3 Facebook 208 25,818 108,930 523.7 BlueKai 220 3,522 92,926 422.4 eXelate 218 139 1,941 8.9

Table: Interests gathered from 220 participants

14

slide-35
SLIDE 35

Which APM Knows More?

Inferred Interests APM Users Unique Total

  • Avg. per User

Google 213 594 9,013 42.3 Facebook 208 25,818 108,930 523.7 BlueKai 220 3,522 92,926 422.4 eXelate 218 139 1,941 8.9

Table: Interests gathered from 220 participants

  • Facebook gathers maximum interests, while

eXelate has the least

14

slide-36
SLIDE 36

Which APM Knows More?

Inferred Interests APM Users Unique Total

  • Avg. per User

Google 213 594 9,013 42.3 Facebook 208 25,818 108,930 523.7 BlueKai 220 3,522 92,926 422.4 eXelate 218 139 1,941 8.9

Table: Interests gathered from 220 participants

  • Facebook gathers maximum interests, while

eXelate has the least

  • Bluekai had a profile on every user

14

slide-37
SLIDE 37

Which APM Knows More?

Inferred Interests APM Users Unique Total

  • Avg. per User

Google 213 594 9,013 42.3 Facebook 208 25,818 108,930 523.7 BlueKai 220 3,522 92,926 422.4 eXelate 218 139 1,941 8.9

Table: Interests gathered from 220 participants

  • Facebook gathers maximum interests, while

eXelate has the least

  • Bluekai had a profile on every user

0.2 0.4 0.6 0.8 1 1 10 100 1000 10000 CDF # Interests

Fig: CDF of interests per user

14

slide-38
SLIDE 38

Which APM Knows More?

Inferred Interests APM Users Unique Total

  • Avg. per User

Google 213 594 9,013 42.3 Facebook 208 25,818 108,930 523.7 BlueKai 220 3,522 92,926 422.4 eXelate 218 139 1,941 8.9

Table: Interests gathered from 220 participants

  • Facebook gathers maximum interests, while

eXelate has the least

  • Bluekai had a profile on every user

0.2 0.4 0.6 0.8 1 1 10 100 1000 10000 CDF # Interests

Fig: CDF of interests per user

Categories capped by Google

14

slide-39
SLIDE 39

Canonicalization of Interests

We cannot directly compare interests from different APMs

  • Synonyms: Real Estate, Property
  • Granularity: Sports, Tennis, Wimbledon


For fair comparison, we need to map interests to a common space

15

slide-40
SLIDE 40

Canonicalization of Interests

We cannot directly compare interests from different APMs

  • Synonyms: Real Estate, Property
  • Granularity: Sports, Tennis, Wimbledon


For fair comparison, we need to map interests to a common space

15

slide-41
SLIDE 41

Canonicalization of Interests

We cannot directly compare interests from different APMs

  • Synonyms: Real Estate, Property
  • Granularity: Sports, Tennis, Wimbledon


For fair comparison, we need to map interests to a common space We used Open Directory Project (ODP)

  • Manually mapped raw interest to 465 ODP categories

15

slide-42
SLIDE 42

Canonicalization of Interests

We cannot directly compare interests from different APMs

  • Synonyms: Real Estate, Property
  • Granularity: Sports, Tennis, Wimbledon


For fair comparison, we need to map interests to a common space We used Open Directory Project (ODP)

  • Manually mapped raw interest to 465 ODP categories

Soccer FB Softball Bluekai

15

slide-43
SLIDE 43

Canonicalization of Interests

We cannot directly compare interests from different APMs

  • Synonyms: Real Estate, Property
  • Granularity: Sports, Tennis, Wimbledon


For fair comparison, we need to map interests to a common space We used Open Directory Project (ODP)

  • Manually mapped raw interest to 465 ODP categories

Sports Soccer FB Softball Bluekai ODP
 Category

15

slide-44
SLIDE 44

Inferred Interests After ODP Mapping

0.2 0.4 0.6 0.8 1 1 10 100 1000 10000 CDF # Interests

Fig: CDF of raw interests per user

16

slide-45
SLIDE 45

Inferred Interests After ODP Mapping

0.2 0.4 0.6 0.8 1 1 10 100 1000 10000 CDF # Interests

Fig: CDF of raw interests per user 0.2 0.4 0.6 0.8 1 100 200 300 CDF # ODP Categories Fig: CDF of ODP categories per user

16

slide-46
SLIDE 46

Do APMs Infer Similar Interests?

0.25 0.5 0.75 1 Google FB eXelate BlueKai Fractional Overlap

BlueKai eXelate FB Google

Fig: Per Participant overlap of ODP categorized interests (min, 5th, median, 95th, max)

17

slide-47
SLIDE 47

Do APMs Infer Similar Interests?

0.25 0.5 0.75 1 Google FB eXelate BlueKai Fractional Overlap

BlueKai eXelate FB Google

Fig: Per Participant overlap of ODP categorized interests (min, 5th, median, 95th, max)

17

slide-48
SLIDE 48

Do APMs Infer Similar Interests?

0.25 0.5 0.75 1 Google FB eXelate BlueKai Fractional Overlap

BlueKai eXelate FB Google

Fig: Per Participant overlap of ODP categorized interests (min, 5th, median, 95th, max)

Median Google user’s interest profile has 20% overlap with BlueKai

17

slide-49
SLIDE 49

Do APMs Infer Similar Interests?

0.25 0.5 0.75 1 Google FB eXelate BlueKai Fractional Overlap

BlueKai eXelate FB Google

Fig: Per Participant overlap of ODP categorized interests (min, 5th, median, 95th, max)

Median Google user’s interest profile has 20% overlap with BlueKai

17

slide-50
SLIDE 50

Key Takeaways

18

Different APMs have different ‘portraits’ of users Lack of overlap across APMs

slide-51
SLIDE 51

Goals of the Study

  • 1. Who knows what and how much?
  • What inferences are drawn by each APM?
  • Does everyone infer the same information?
  • 2. How do users perceive these interests inferred about them?
  • Do some APMs infer more relevant interests?
  • Do users find ads targeted against these interests relevant?

19

slide-52
SLIDE 52

“Half the money I spend on advertising is wasted; the trouble is I don't know which half.”

  • - John Wanamaker

20

slide-53
SLIDE 53

Relevant Interests According to Participants

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 CDF Fraction of Relevant Interests

Fig: Fractions of interests rated as relevant 
 (on a 1-5 scale) by participants

21

slide-54
SLIDE 54

Relevant Interests According to Participants

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 CDF Fraction of Relevant Interests

Fig: Fractions of interests rated as relevant 
 (on a 1-5 scale) by participants

21

slide-55
SLIDE 55

Relevant Interests According to Participants

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 CDF Fraction of Relevant Interests 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 CDF Fraction of Relevant Interests 4-5 3-5

Fig: Fractions of interests rated as relevant 
 (on a 1-5 scale) by participants

21

slide-56
SLIDE 56

Relevant Interests According to Participants

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 CDF Fraction of Relevant Interests 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 CDF Fraction of Relevant Interests 4-5 3-5

83% 56%

Fig: Fractions of interests rated as relevant 
 (on a 1-5 scale) by participants

21

slide-57
SLIDE 57

Participants’ Ratings of Interests

20 40 60 80 100 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 Seen Ads Related to ’X’? Interested in ’X’? eXelate Google FB

Fig: Interest Relevance vs. Seeing Ads

22

slide-58
SLIDE 58

Participants’ Ratings of Interests

20 40 60 80 100 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 Seen Ads Related to ’X’? Interested in ’X’? eXelate Google FB

Fig: Interest Relevance vs. Seeing Ads

22

slide-59
SLIDE 59

Participants’ Ratings of Interests

20 40 60 80 100 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 Seen Ads Related to ’X’? Interested in ’X’? eXelate Google FB

Fig: Interest Relevance vs. Seeing Ads

22

slide-60
SLIDE 60

Participants’ Ratings of Interests

20 40 60 80 100 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 Seen Ads Related to ’X’? Interested in ’X’? eXelate Google FB

Fig: Interest Relevance vs. Seeing Ads

  • General trend of more ads

seen for more relevant interests.

  • Similar distribution across all.

22

slide-61
SLIDE 61

Participants’ Ratings of Interests

20 40 60 80 100 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 Seen Ads Related to ’X’? Interested in ’X’? eXelate Google FB

Fig: Interest Relevance vs. Seeing Ads

  • General trend of more ads

seen for more relevant interests.

  • Similar distribution across all.

22

slide-62
SLIDE 62

Majority of Interests Marked Irrelevant

Fig: Fractions of interests rated as relevant 
 (on a 1-5 scale) by participants

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 CDF Fraction of Relevant Interests 4-5 3-5

83% 56%

23

slide-63
SLIDE 63

Majority of Interests Marked Irrelevant

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 CDF Fraction of Relevant Interests 4-5 3-5

24

Fig: Interest Relevance vs. Seeing Relevant Ads

20 40 60 80 100 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 Ads for ’X’ Relevant / Useful? Interested in ’X’? eXelate Google FB

slide-64
SLIDE 64

Majority of Interests Marked Irrelevant

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 CDF Fraction of Relevant Interests 4-5 3-5

24

Fig: Interest Relevance vs. Seeing Relevant Ads

20 40 60 80 100 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 Ads for ’X’ Relevant / Useful? Interested in ’X’? eXelate Google FB

slide-65
SLIDE 65

Majority of Interests Marked Irrelevant

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 CDF Fraction of Relevant Interests 4-5 3-5

24

Fig: Interest Relevance vs. Seeing Relevant Ads

20 40 60 80 100 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 Ads for ’X’ Relevant / Useful? Interested in ’X’? eXelate Google FB

Users marked ads targeted to low relevant interests less useful

slide-66
SLIDE 66

Key Takeaways

25

Majority of the interests marked not relevant Ads targeted to low relevance interests marked not useful

slide-67
SLIDE 67

Limitations & Challenges

  • 1. Participant sample is not representative of all web users
  • 2. Single snapshot of APMs.
  • A better way would be to conduct a longitudinal study.
  • 3. Users can have biases in recalling relevant ads.

26

slide-68
SLIDE 68

Summary

  • First large-scale study of interest profiles from four APMs
  • Different APMs have different ‘portraits’ of the user.
  • Participants rated only < 30% interests as strongly relevant.


 Q: Are the marginal utility gains from targeted ads justified at the cost of privacy?

27

slide-69
SLIDE 69

More Results in the Paper …

  • 1. Origin of Interests
  • What fraction of the interests could be explained by historical data?
  • A majority of interests could not be explained by recent browsing history
  • 2. Affect of privacy-conscious behaviors on interest profiles
  • No significant correlations

Questions?

ahmad@ccs.neu.edu

Quantity vs. Quality: Evaluating User Interest Profiles Using Ad Preference Managers

slide-70
SLIDE 70

Backup Slides

slide-71
SLIDE 71

Participants Dropping Out

  • Overall 9 participants refused to take the survey
  • 3 provided feedback.
  • 1 did not have time and 2 had privacy reservations

30

slide-72
SLIDE 72

Knowledge of APMs

20 40 60 80 100 Google Facebook Google Facebook Total (%) Not Familiar Never Visited Visited Edited United States Pakistan

slide-73
SLIDE 73

Goals of the Study

  • 1. Who knows what and how much?
  • What inferences are drawn by the APMs?
  • Does everyone infer the same information?
  • 2. How do users perceive these interests inferred about them?
  • Do some APMs draw better inferences?
  • 3. How are the inferences drawn?

32

slide-74
SLIDE 74

How Are The Inferences Drawn?

33

slide-75
SLIDE 75

How Are The Inferences Drawn?

33

Browsing History Search History

slide-76
SLIDE 76

How Are The Inferences Drawn?

33

20 40 60 80 100 20 40 60 80 100 % People in Bin Days of Browsing History

20 40 60 80 100 20 40 60 80 100 % People in Bin Days of Search History

Browsing History Search History

Fig: Amount of historical data collected from the participants

slide-77
SLIDE 77

How Are The Inferences Drawn?

33

20 40 60 80 100 20 40 60 80 100 % People in Bin Days of Browsing History

20 40 60 80 100 20 40 60 80 100 % People in Bin Days of Search History

Browsing History Search History

Fig: Amount of historical data collected from the participants

  • 50% people had 80-90 days of browsing history
  • 90% people had 30-40 days if search history
slide-78
SLIDE 78

Domains From Browsing & Search History

34

slide-79
SLIDE 79

Domains From Browsing & Search History

34

Browsing

  • Out of 1.2M unique URLs, we extracted ~42K unique FQDNs
slide-80
SLIDE 80

Domains From Browsing & Search History

34

Browsing

  • Out of 1.2M unique URLs, we extracted ~42K unique FQDNs
  • We used PhantomJS to collect trackers from these 42K FQDNs
  • We crawl home page + 5 additional pages
slide-81
SLIDE 81

Domains From Browsing & Search History

34

Browsing

  • Out of 1.2M unique URLs, we extracted ~42K unique FQDNs
  • We used PhantomJS to collect trackers from these 42K FQDNs
  • We crawl home page + 5 additional pages
  • Only considered those domains, where any of the APM trackers were present
slide-82
SLIDE 82

Domains From Browsing & Search History

34

Browsing

  • Out of 1.2M unique URLs, we extracted ~42K unique FQDNs
  • We used PhantomJS to collect trackers from these 42K FQDNs
  • We crawl home page + 5 additional pages
  • Only considered those domains, where any of the APM trackers were present

Search

  • Considered the URL of the first search result
slide-83
SLIDE 83

Domains Mapped to Common Space

35

20 40 60 80 100 20 40 60 80 100 % People in Bin Days of Browsing History 20 40 60 80 100 20 40 60 80 100 % People in Bin Days of Search History

slide-84
SLIDE 84

Domains Mapped to Common Space

35

20 40 60 80 100 20 40 60 80 100 % People in Bin Days of Browsing History 20 40 60 80 100 20 40 60 80 100 % People in Bin Days of Search History

51,500 unique domains

slide-85
SLIDE 85

Domains Mapped to Common Space

35

20 40 60 80 100 20 40 60 80 100 % People in Bin Days of Browsing History 20 40 60 80 100 20 40 60 80 100 % People in Bin Days of Search History

51,500 unique domains

We use SimilarWeb tool to map domains to (221) categories

  • 77% success rate
  • We then map each category to ODP category
slide-86
SLIDE 86

Domains Mapped to Common Space

35

20 40 60 80 100 20 40 60 80 100 % People in Bin Days of Browsing History 20 40 60 80 100 20 40 60 80 100 % People in Bin Days of Search History

51,500 unique domains

We use SimilarWeb tool to map domains to (221) categories

  • 77% success rate
  • We then map each category to ODP category

tennis.com nba.com

slide-87
SLIDE 87

Domains Mapped to Common Space

35

20 40 60 80 100 20 40 60 80 100 % People in Bin Days of Browsing History 20 40 60 80 100 20 40 60 80 100 % People in Bin Days of Search History

51,500 unique domains

We use SimilarWeb tool to map domains to (221) categories

  • 77% success rate
  • We then map each category to ODP category

Sports SimilarWeb
 Category

tennis.com nba.com

slide-88
SLIDE 88

Domains Mapped to Common Space

35

20 40 60 80 100 20 40 60 80 100 % People in Bin Days of Browsing History 20 40 60 80 100 20 40 60 80 100 % People in Bin Days of Search History

51,500 unique domains

We use SimilarWeb tool to map domains to (221) categories

  • 77% success rate
  • We then map each category to ODP category

Sports SimilarWeb
 Category

tennis.com nba.com

Sports ODP
 Category

slide-89
SLIDE 89

Origins of Interests

36

0.25 0.5 0.75 1 FB-A BlueKai eXelate FB-I Google Fractional Overlap

G

  • g

l e

Fig: Overlap of Participants history with each APM
 (min, 5th, median, 95th, max)

slide-90
SLIDE 90

Origins of Interests

36

0.25 0.5 0.75 1 FB-A BlueKai eXelate FB-I Google Fractional Overlap

G

  • g

l e

Fig: Overlap of Participants history with each APM
 (min, 5th, median, 95th, max)

slide-91
SLIDE 91

Origins of Interests

36

0.25 0.5 0.75 1 FB-A BlueKai eXelate FB-I Google Fractional Overlap

Key Takeaways


  • Browsing History explain <10% of interests, except for Google (30%)
  • Search History does not add much to the explanation on top of BH

G

  • g

l e

Fig: Overlap of Participants history with each APM
 (min, 5th, median, 95th, max)

slide-92
SLIDE 92

Browsing & Search History Domains

37

0.2 0.4 0.6 0.8 1 20 40 60 80 100 CDF % Labeled URLs Per User Browsing History Search & Click Search

  • More domains in Search as compared to Browsing
  • Very high label rate for Search
  • >75% Browsing domains labeled for 80% people

0.2 0.4 0.6 0.8 1 150 300 450 600 750 CDF Unique Domains Per User Search Search & Click Browsing History

slide-93
SLIDE 93

38