R-SUSCEPTIBILITY An IR-Centric Approach to Assessing Privacy Risks - - PowerPoint PPT Presentation

r susceptibility
SMART_READER_LITE
LIVE PREVIEW

R-SUSCEPTIBILITY An IR-Centric Approach to Assessing Privacy Risks - - PowerPoint PPT Presentation

R-SUSCEPTIBILITY An IR-Centric Approach to Assessing Privacy Risks for Users in Online Communities Joanna Asia Biega Krishna P . Gummadi, Ida Mele, Dragan Milchevski, Christos Tryfonopoulos, Gerhard Weikum SIGIR 2016 APPROACHING


slide-1
SLIDE 1

R-SUSCEPTIBILITY

Joanna Asia Biega Krishna P . Gummadi, Ida Mele, 
 Dragan Milchevski, Christos Tryfonopoulos, 
 Gerhard Weikum

An IR-Centric Approach to Assessing Privacy Risks
 for Users in Online Communities

SIGIR 2016

slide-2
SLIDE 2

APPROACHING PRIVACY

2

Prevent deanonymization, Prevent attribute disclosure

gender age disease user1 male 37 cancer user2 male 37 heart disease user3 female 42 cancer

Data publishing Online communities Account linking Attribute inference female 20-30

slide-3
SLIDE 3

APPROACHING PRIVACY

3

Prevent deanonymization, Prevent attribute disclosure

gender age disease user1 male 37 cancer user2 male 37 heart disease user3 female 42 cancer

Data publishing Online communities Account linking Attribute inference female 20-30

(not in this work)

slide-4
SLIDE 4

PRIVACY IN ONLINE COMMUNITIES WITH TEXTUAL DATA

4

Quantify, inform, and guide

17.05.2011, user2: Should I inform my potential employer during an interview that I am 3 months pregnant? 13.07.2011, user1: Studies show alarming depression rates among teenagers. 13.07.2011, user3: On a cocktail of antidepressants and getting 
 crazy hallucinations :o

Build reputation Get information Share information Not obvious how to apply noise

slide-5
SLIDE 5

IN THE (IR) WILD

Search:

5

slide-6
SLIDE 6

IN THE (IR) WILD

Search:

6

Great student party #sigir2016 Shouldn’t have drunk that much #wine. #drunk ;)

slide-7
SLIDE 7

IN THE (IR) WILD

Search:

7

drunk wine party

HR

slide-8
SLIDE 8

IN THE (IR) WILD

Search:

8

user_1 user_2 user_3 user_4 HR

drunk wine party

Great student party #sigir2016. 
 Shouldn’t have drunk that much #wine. #drunk ;)

slide-9
SLIDE 9

IN THE (IR) WILD - MORE EXAMPLES

Search:

9

  • drunk wine wasted party

  • bungee jump adrenaline

  • depressed anxiety antidepressant

Remote 
 search U_1 U_2 U_k … HR Local
 crawl

slide-10
SLIDE 10

IN THE WILD

10

slide-11
SLIDE 11

PRIVACY RISKS VIA EXPOSURE IN A COMMUNITY

11

<topic>

Criterion: U_1 U_2 U_k …

slide-12
SLIDE 12

R-SUSCEPTIBILITY

12

Criterion: 1. 2. k. … Rank-Susceptibility

<topic>

U_1 U_2 U_k …

slide-13
SLIDE 13

R-SUSCEPTIBILITY: FRAMEWORK FOR TEXTUAL DATA

13

drug addiction: depression: financial debts

drug, addiction, 
 addict, cocaine, … depression, suicide, 
 depressed, suffer, … debt, loan, 
 pay, student, …

high high medium

(1) Topics: (2) Sensitivity:

R-Susceptibility U_1 U_2 U_k

U_7 U_13 U_14

U_78 U_1 U_k

(3) Risk Scores

slide-14
SLIDE 14

OVERVIEW

➤ R-Susceptibility framework ➤ Topics ➤ Topic sensitivity ➤ Risk measures ➤ Baselines ➤ Topic-model-based ➤ Experiments ➤ Summary


14

slide-15
SLIDE 15

OVERVIEW

15

➤ R-Susceptibility framework ➤ Topics ➤ Topic sensitivity ➤ Risk measures ➤ Baselines ➤ Topic-model-based ➤ Experiments ➤ Summary


slide-16
SLIDE 16

R-SUSCEPTIBILITY: TOPICS

16

drug addiction: depression: financial debts

drug, addiction, 
 addict, cocaine, … depression, suicide, 
 depressed, suffer, … debt, loan, 
 pay, student, …

Topics: LDA Quora: 500 topics 600k posts NYT: 500 topics 700k articles

slide-17
SLIDE 17

OVERVIEW

17

➤ R-Susceptibility framework ➤ Topics ➤ Topic sensitivity ➤ Risk measures ➤ Baselines ➤ Topic-model-based ➤ Experiments ➤ Summary


slide-18
SLIDE 18

18

drug addiction: depression: financial debts

drug, addiction, 
 addict, cocaine, … depression, suicide, 
 depressed, suffer, … debt, loan, 
 pay, student, …

Topics: Sensitivity:

R-SUSCEPTIBILITY: TOPIC SENSITIVITY

high high medium ?

slide-19
SLIDE 19

19

drug addiction: depression: financial debts

drug, addiction, 
 addict, cocaine, … depression, suicide, 
 depressed, suffer, … debt, loan, 
 pay, student, …

IDENTIFYING SENSITIVE TOPICS

If a user’s post in an online community contained these words, would you consider it privacy sensitive?

Sensitivity: Topics:

slide-20
SLIDE 20

20

drug addiction: depression: financial debts

drug, addiction, 
 addict, cocaine, … depression, suicide, 
 depressed, suffer, … debt, loan, 
 pay, student, …

IDENTIFYING SENSITIVE TOPICS

Sensitivity: Topics: yes yes yes yes no yes no no no # yes / # (2 topic models * 500 topics * 7 judgements per topic)

slide-21
SLIDE 21

OVERVIEW

21

➤ R-Susceptibility framework ➤ Topics ➤ Topic sensitivity ➤ Risk measures ➤ Baselines ➤ Topic-model-based ➤ Experiments ➤ Summary


slide-22
SLIDE 22

ENTROPY BASELINE

22

salient attributes for topic X X1 = depression X2 = anxiety X3 = psychiatrist X4 = paxil (community without user U) (community with user U)

P(0), P(1)

U

U ∗

  • ver attributes
  • ver values

average KL-divergence of salient words distributions

risk(U0, X) = 1 j X

i

X

v={0,1}

PU ∗[xi = v]log( PU[xi = v] PU ∗[xi = v])

slide-23
SLIDE 23

salient attributes for topic X X1 = depression X2 = anxiety X3 = psychiatrist X4 = paxil (community without user U) (community with user U)

DIFF-PRIV BASELINE

23

risk(U0, X) = max

xi

✓ max ✓ log ✓ PU[xi] PU∗[xi] ◆ , log ✓PU∗[xi] PU[xi] ◆◆◆

and

P(0), P(1)

U

U ∗

PU[xi] ≤ 2PU∗[xi] PU∗[xi] ≤ 2PU[xi]

Inspired by the differential privacy principle

  • ver attributes

probability increases or decreases

slide-24
SLIDE 24

OVERVIEW

24

➤ R-Susceptibility framework ➤ Topics ➤ Topic sensitivity ➤ Risk measures ➤ Baselines ➤ Topic-model-based ➤ Strength of interest ➤ Breadth of interest ➤ Temporal variation of interest ➤ Experiments ➤ Summary


}

Which aspects matter when it comes to human risk perception?

slide-25
SLIDE 25

25

R-Susceptibility topic model

antidepressant depression celebrity

  • scar

psychiatrist

TOPIC-MODEL RISK SCORE: BUILDING BLOCKS

slide-26
SLIDE 26

26

Quantifying user interest in a topic R-Susceptibility topic model

antidepressant depression celebrity

  • scar

X = D e p r e s s i

  • n

U _ X = U s e r psychiatrist Details in the paper

TOPIC-MODEL RISK SCORE: BUILDING BLOCKS

slide-27
SLIDE 27

27

24.10.2012 24.10.2012 27.10.2012 29.10.2012 03.12.2012 10.12.2012 24.10.2012 24.10.2012 27.10.2012 29.10.2012 03.12.2012 10.12.2012 misbehaving dog dog trainers dentists LA knitting tutorial christmas tree shop LA christmas recipes anxiety feeling lonely psychiatrist nyc central park events antidepressants xanax side effects

TOPIC-MODEL RISK SCORE: STRENGTH OF INTEREST

slide-28
SLIDE 28

28

24.10.2012 24.10.2012 27.10.2012 29.10.2012 03.12.2012 10.12.2012 24.10.2012 24.10.2012 27.10.2012 29.10.2012 03.12.2012 10.12.2012 misbehaving dog dog trainers dentists LA knitting tutorial christmas tree shop LA christmas recipes anxiety feeling lonely psychiatrist nyc central park events antidepressants xanax side effects

TOPIC-MODEL RISK SCORE: STRENGTH OF INTEREST

slide-29
SLIDE 29

29

Three dimensions of user interest

Strength of interest

depression xanax psychiatrist

R a n k i n g p

  • s

i t i

  • n

risk(U, X) = cos(~ U, ~ X)

TOPIC-MODEL RISK SCORE: STRENGTH OF INTEREST

slide-30
SLIDE 30

30

24.10.2012 24.10.2012 03.11.2012 07.11.2012 03.12.2012 10.12.2012 anxiety clinical depression anatomy course book central park events liver cancer stats anorexia nervosa anxiety feeling lonely psychiatrist nyc central park events antidepressants xanax side effects 24.10.2012 24.10.2012 03.11.2012 07.11.2012 03.12.2012 10.12.2012

TOPIC-MODEL RISK SCORE: BREADTH OF INTEREST

slide-31
SLIDE 31

31

24.10.2012 24.10.2012 03.11.2012 07.11.2012 03.12.2012 10.12.2012 24.10.2012 24.10.2012 03.11.2012 07.11.2012 03.12.2012 10.12.2012 anxiety clinical depression anatomy course book central park events liver cancer stats anorexia nervosa anxiety feeling lonely psychiatrist nyc central park events antidepressants xanax side effects

TOPIC-MODEL RISK SCORE: BREADTH OF INTEREST

slide-32
SLIDE 32

32

Quantifying user interest in a topic R-Susceptibility topic model

antidepressant depression celebrity

  • scar

X = D e p r e s s i

  • n

U _ X = U s e r psychiatrist D = Psychiatry Details in the paper

TOPIC-MODEL RISK SCORE: BUILDING BLOCKS REVISITED

slide-33
SLIDE 33

33

Three dimensions of user interest

Breadth of interest Strength of interest

depression xanax psychiatrist depression research career psychiatry

R a n k i n g p

  • s

i t i

  • n

risk(U, X, D) = cos(~ U, ~ X) − cos(~ U, ~ D − ~ X)

TOPIC-MODEL RISK SCORE: BREADTH OF INTEREST

slide-34
SLIDE 34

34

24.10.2012 24.10.2012 03.11.2012 07.11.2012 03.12.2012 10.12.2012 24.10.2012 24.10.2012 24.10.2012 24.10.2012 24.10.2012 24.10.2012 anxiety feeling lonely psychiatrist nyc central park events antidepresssants xanax side effects anxiety feeling lonely psychiatrist nyc central park events antidepresssants xanax side effects

TOPIC-MODEL RISK SCORE: TEMPORAL VARIATION OF INTEREST

slide-35
SLIDE 35

35

24.10.2012 24.10.2012 03.11.2012 07.11.2012 03.12.2012 10.12.2012 24.10.2012 24.10.2012 24.10.2012 24.10.2012 24.10.2012 24.10.2012 anxiety feeling lonely psychiatrist nyc central park events antidepresssants xanax side effects anxiety feeling lonely psychiatrist nyc central park events antidepresssants xanax side effects

TOPIC-MODEL RISK SCORE: TEMPORAL VARIATION OF INTEREST

slide-36
SLIDE 36

36

U = {(v1, t1), ..., ..., ..., ..., ..., ..., (vk, tk)} time

TOPIC-MODEL RISK SCORE: TEMPORAL VARIATION OF INTEREST

slide-37
SLIDE 37

U = {(v1, t1), ..., ..., ..., ..., ..., ..., (vk, tk)} time

TOPIC-MODEL RISK SCORE: TEMPORAL VARIATION OF INTEREST

slide-38
SLIDE 38

38

U = {(v1, t1), ..., ..., ..., ..., ..., ..., (vk, tk)} time ~ U1 ~ U2 ~ Uj ...

TOPIC-MODEL RISK SCORE: TEMPORAL VARIATION OF INTEREST

slide-39
SLIDE 39

39

U = {(v1, t1), ..., ..., ..., ..., ..., ..., (vk, tk)} ~ U1 ~ U2 ~ Uj ... risk(Ui, X, D) time

TOPIC-MODEL RISK SCORE: TEMPORAL VARIATION OF INTEREST

slide-40
SLIDE 40

40

U = {(v1, t1), ..., ..., ..., ..., ..., ..., (vk, tk)} ~ U1 ~ U2 ~ Uj ... risk(Ui, X, D) time

}

top-n

TOPIC-MODEL RISK SCORE: TEMPORAL VARIATION OF INTEREST

slide-41
SLIDE 41

41

Three dimensions of user interest

Breadth of interest Strength of interest Temporal variation of interest

depression xanax psychiatrist depression research career psychiatry

R a n k i n g p

  • s

i t i

  • n

lonely depressed xanax effects

time

risk(U, X) = min

i=1..m

n cos ⇣ ~ U ∗

i , ~

X ⌘o − cos ⇣ ~ U, ⇣ ~ D − ~ X ⌘⌘ .

TOPIC-MODEL RISK SCORE: TEMPORAL VARIATION OF INTEREST

slide-42
SLIDE 42

42

Three dimensions of user interest

Breadth of interest Strength of interest Temporal variation of interest

depression xanax psychiatrist depression research career psychiatry

R a n k i n g p

  • s

i t i

  • n

lonely depressed xanax effects

time

TOPIC-MODEL RISK SCORE: TEMPORAL VARIATION OF INTEREST

slide-43
SLIDE 43

OVERVIEW

43

➤ R-Susceptibility framework ➤ Topics ➤ Topic sensitivity ➤ Risk measures ➤ Baselines ➤ Topic-model-based ➤ Experiments ➤ Summary


slide-44
SLIDE 44

QUESTIONS

➤ Risk assessment: baselines vs. topic-model-based ➤ Topic-model-based risk assessment: dimensions of interest

44

slide-45
SLIDE 45

EXPERIMENTAL SETUP

AOL
 (500) Health
 Forums
 (400) Quora
 (200) Topics: depression, drugs pregnancy, hiv, debts BOW LDA W2V entropy diff-priv topic-model:strength

45

topic-model:breadth topic-model:time (vector space) (datasets) (risk scoring methods) topic-model:breadth+time

slide-46
SLIDE 46

RISK LABELS USER STUDY (GROUND TRUTH COLLECTION)

46

24.10.2012 24.10.2012 27.10.2012 29.10.2012 03.12.2012 10.12.2012 anxiety feeling lonely psychiatrist nyc central park events antidepresssants xanax side effects

Based on the contents of this profile, would you suspect that the user … <is depressed / is drug addicted / has financial debts/ …> ? 1100 profiles * 5 workers moderate agreement

slide-47
SLIDE 47

RESULTS: BASELINES VS TOPIC-MODEL-BASED

47

AOL Health Q&A Quora R-prec NDCG R-prec NDCG R-prec NDCG

entropy

0.495 0.819 0.560 0.870 0.239 0.632

diff-priv

0.475 0.789 0.560 0.794 0.239 0.623

topic-model: strength (word2vec)

0.556 0.836 0.664 0.894 0.343 0.637 Results averaged over 5 topics R - # of positive (sensitive) samples

slide-48
SLIDE 48

RESULTS: TOPIC-MODEL-BASED CONFIGURATIONS

48

AOL Health Q&A Quora R-prec NDCG R-prec NDCG R-prec NDCG

topic-model: strength (lda)

0.525 0.796 0.649 0.913 0.358 0.715

topic-model: breadth (lda)

0.566 0.803 0.716 0.921 0.493 0.752

topic-model: time (lda)

0.578 0.879 0.709 0.925 0.299 0.669

topic-model: breadth+time (lda)

0.616 0.859 0.716 0.957 0.418 0.751 Results averaged over 5 topics R - # of positive (sensitive) samples

slide-49
SLIDE 49

Risk measures
 for sensitive topics Framework for 
 quantifying R-Susceptibility 
 in online communities 
 with textual data
 
 
 R-Susceptibility paradigm: 
 privacy risks through exposure in rankings


SUMMING UP: NOVEL CONTRIBUTIONS

49

slide-50
SLIDE 50

Risk measures
 for sensitive topics Framework for 
 quantifying R-Susceptibility 
 in online communities 
 with textual data
 
 
 R-Susceptibility paradigm: 
 privacy risks through exposure in rankings


50

THANKS!

Joanna Asia Biega - jbiega@mpii.de