R-SUSCEPTIBILITY
Joanna Asia Biega Krishna P . Gummadi, Ida Mele, Dragan Milchevski, Christos Tryfonopoulos, Gerhard Weikum
An IR-Centric Approach to Assessing Privacy Risks for Users in Online Communities
SIGIR 2016
R-SUSCEPTIBILITY An IR-Centric Approach to Assessing Privacy Risks - - PowerPoint PPT Presentation
R-SUSCEPTIBILITY An IR-Centric Approach to Assessing Privacy Risks for Users in Online Communities Joanna Asia Biega Krishna P . Gummadi, Ida Mele, Dragan Milchevski, Christos Tryfonopoulos, Gerhard Weikum SIGIR 2016 APPROACHING
Joanna Asia Biega Krishna P . Gummadi, Ida Mele, Dragan Milchevski, Christos Tryfonopoulos, Gerhard Weikum
SIGIR 2016
2
Prevent deanonymization, Prevent attribute disclosure
gender age disease user1 male 37 cancer user2 male 37 heart disease user3 female 42 cancer
Data publishing Online communities Account linking Attribute inference female 20-30
3
Prevent deanonymization, Prevent attribute disclosure
gender age disease user1 male 37 cancer user2 male 37 heart disease user3 female 42 cancer
Data publishing Online communities Account linking Attribute inference female 20-30
(not in this work)
4
Quantify, inform, and guide
17.05.2011, user2: Should I inform my potential employer during an interview that I am 3 months pregnant? 13.07.2011, user1: Studies show alarming depression rates among teenagers. 13.07.2011, user3: On a cocktail of antidepressants and getting crazy hallucinations :o
Build reputation Get information Share information Not obvious how to apply noise
Search:
5
Search:
6
Great student party #sigir2016 Shouldn’t have drunk that much #wine. #drunk ;)
Search:
7
drunk wine party
HR
Search:
8
user_1 user_2 user_3 user_4 HR
drunk wine party
Great student party #sigir2016. Shouldn’t have drunk that much #wine. #drunk ;)
Search:
9
Remote search U_1 U_2 U_k … HR Local crawl
10
11
<topic>
Criterion: U_1 U_2 U_k …
12
Criterion: 1. 2. k. … Rank-Susceptibility
<topic>
U_1 U_2 U_k …
13
drug addiction: depression: financial debts
drug, addiction, addict, cocaine, … depression, suicide, depressed, suffer, … debt, loan, pay, student, …
high high medium
(1) Topics: (2) Sensitivity:
R-Susceptibility U_1 U_2 U_k
…
U_7 U_13 U_14
…
U_78 U_1 U_k
…
(3) Risk Scores
➤ R-Susceptibility framework ➤ Topics ➤ Topic sensitivity ➤ Risk measures ➤ Baselines ➤ Topic-model-based ➤ Experiments ➤ Summary
14
15
➤ R-Susceptibility framework ➤ Topics ➤ Topic sensitivity ➤ Risk measures ➤ Baselines ➤ Topic-model-based ➤ Experiments ➤ Summary
16
drug addiction: depression: financial debts
drug, addiction, addict, cocaine, … depression, suicide, depressed, suffer, … debt, loan, pay, student, …
Topics: LDA Quora: 500 topics 600k posts NYT: 500 topics 700k articles
17
➤ R-Susceptibility framework ➤ Topics ➤ Topic sensitivity ➤ Risk measures ➤ Baselines ➤ Topic-model-based ➤ Experiments ➤ Summary
18
drug addiction: depression: financial debts
drug, addiction, addict, cocaine, … depression, suicide, depressed, suffer, … debt, loan, pay, student, …
Topics: Sensitivity:
high high medium ?
19
drug addiction: depression: financial debts
drug, addiction, addict, cocaine, … depression, suicide, depressed, suffer, … debt, loan, pay, student, …
If a user’s post in an online community contained these words, would you consider it privacy sensitive?
Sensitivity: Topics:
20
drug addiction: depression: financial debts
drug, addiction, addict, cocaine, … depression, suicide, depressed, suffer, … debt, loan, pay, student, …
Sensitivity: Topics: yes yes yes yes no yes no no no # yes / # (2 topic models * 500 topics * 7 judgements per topic)
21
➤ R-Susceptibility framework ➤ Topics ➤ Topic sensitivity ➤ Risk measures ➤ Baselines ➤ Topic-model-based ➤ Experiments ➤ Summary
22
salient attributes for topic X X1 = depression X2 = anxiety X3 = psychiatrist X4 = paxil (community without user U) (community with user U)
P(0), P(1)
U ∗
average KL-divergence of salient words distributions
risk(U0, X) = 1 j X
i
X
v={0,1}
PU ∗[xi = v]log( PU[xi = v] PU ∗[xi = v])
salient attributes for topic X X1 = depression X2 = anxiety X3 = psychiatrist X4 = paxil (community without user U) (community with user U)
23
risk(U0, X) = max
xi
✓ max ✓ log ✓ PU[xi] PU∗[xi] ◆ , log ✓PU∗[xi] PU[xi] ◆◆◆
and
P(0), P(1)
U ∗
PU[xi] ≤ 2PU∗[xi] PU∗[xi] ≤ 2PU[xi]
Inspired by the differential privacy principle
probability increases or decreases
24
➤ R-Susceptibility framework ➤ Topics ➤ Topic sensitivity ➤ Risk measures ➤ Baselines ➤ Topic-model-based ➤ Strength of interest ➤ Breadth of interest ➤ Temporal variation of interest ➤ Experiments ➤ Summary
Which aspects matter when it comes to human risk perception?
25
R-Susceptibility topic model
antidepressant depression celebrity
psychiatrist
26
Quantifying user interest in a topic R-Susceptibility topic model
antidepressant depression celebrity
X = D e p r e s s i
U _ X = U s e r psychiatrist Details in the paper
27
24.10.2012 24.10.2012 27.10.2012 29.10.2012 03.12.2012 10.12.2012 24.10.2012 24.10.2012 27.10.2012 29.10.2012 03.12.2012 10.12.2012 misbehaving dog dog trainers dentists LA knitting tutorial christmas tree shop LA christmas recipes anxiety feeling lonely psychiatrist nyc central park events antidepressants xanax side effects
28
24.10.2012 24.10.2012 27.10.2012 29.10.2012 03.12.2012 10.12.2012 24.10.2012 24.10.2012 27.10.2012 29.10.2012 03.12.2012 10.12.2012 misbehaving dog dog trainers dentists LA knitting tutorial christmas tree shop LA christmas recipes anxiety feeling lonely psychiatrist nyc central park events antidepressants xanax side effects
29
Three dimensions of user interest
Strength of interest
depression xanax psychiatrist
R a n k i n g p
i t i
risk(U, X) = cos(~ U, ~ X)
30
24.10.2012 24.10.2012 03.11.2012 07.11.2012 03.12.2012 10.12.2012 anxiety clinical depression anatomy course book central park events liver cancer stats anorexia nervosa anxiety feeling lonely psychiatrist nyc central park events antidepressants xanax side effects 24.10.2012 24.10.2012 03.11.2012 07.11.2012 03.12.2012 10.12.2012
31
24.10.2012 24.10.2012 03.11.2012 07.11.2012 03.12.2012 10.12.2012 24.10.2012 24.10.2012 03.11.2012 07.11.2012 03.12.2012 10.12.2012 anxiety clinical depression anatomy course book central park events liver cancer stats anorexia nervosa anxiety feeling lonely psychiatrist nyc central park events antidepressants xanax side effects
32
Quantifying user interest in a topic R-Susceptibility topic model
antidepressant depression celebrity
X = D e p r e s s i
U _ X = U s e r psychiatrist D = Psychiatry Details in the paper
33
Three dimensions of user interest
Breadth of interest Strength of interest
depression xanax psychiatrist depression research career psychiatry
R a n k i n g p
i t i
risk(U, X, D) = cos(~ U, ~ X) − cos(~ U, ~ D − ~ X)
34
24.10.2012 24.10.2012 03.11.2012 07.11.2012 03.12.2012 10.12.2012 24.10.2012 24.10.2012 24.10.2012 24.10.2012 24.10.2012 24.10.2012 anxiety feeling lonely psychiatrist nyc central park events antidepresssants xanax side effects anxiety feeling lonely psychiatrist nyc central park events antidepresssants xanax side effects
35
24.10.2012 24.10.2012 03.11.2012 07.11.2012 03.12.2012 10.12.2012 24.10.2012 24.10.2012 24.10.2012 24.10.2012 24.10.2012 24.10.2012 anxiety feeling lonely psychiatrist nyc central park events antidepresssants xanax side effects anxiety feeling lonely psychiatrist nyc central park events antidepresssants xanax side effects
36
U = {(v1, t1), ..., ..., ..., ..., ..., ..., (vk, tk)} time
U = {(v1, t1), ..., ..., ..., ..., ..., ..., (vk, tk)} time
38
U = {(v1, t1), ..., ..., ..., ..., ..., ..., (vk, tk)} time ~ U1 ~ U2 ~ Uj ...
39
U = {(v1, t1), ..., ..., ..., ..., ..., ..., (vk, tk)} ~ U1 ~ U2 ~ Uj ... risk(Ui, X, D) time
40
U = {(v1, t1), ..., ..., ..., ..., ..., ..., (vk, tk)} ~ U1 ~ U2 ~ Uj ... risk(Ui, X, D) time
top-n
41
Three dimensions of user interest
Breadth of interest Strength of interest Temporal variation of interest
depression xanax psychiatrist depression research career psychiatry
R a n k i n g p
i t i
lonely depressed xanax effects
time
risk(U, X) = min
i=1..m
n cos ⇣ ~ U ∗
i , ~
X ⌘o − cos ⇣ ~ U, ⇣ ~ D − ~ X ⌘⌘ .
42
Three dimensions of user interest
Breadth of interest Strength of interest Temporal variation of interest
depression xanax psychiatrist depression research career psychiatry
R a n k i n g p
i t i
lonely depressed xanax effects
time
43
➤ R-Susceptibility framework ➤ Topics ➤ Topic sensitivity ➤ Risk measures ➤ Baselines ➤ Topic-model-based ➤ Experiments ➤ Summary
➤ Risk assessment: baselines vs. topic-model-based ➤ Topic-model-based risk assessment: dimensions of interest
44
AOL (500) Health Forums (400) Quora (200) Topics: depression, drugs pregnancy, hiv, debts BOW LDA W2V entropy diff-priv topic-model:strength
45
topic-model:breadth topic-model:time (vector space) (datasets) (risk scoring methods) topic-model:breadth+time
46
24.10.2012 24.10.2012 27.10.2012 29.10.2012 03.12.2012 10.12.2012 anxiety feeling lonely psychiatrist nyc central park events antidepresssants xanax side effects
Based on the contents of this profile, would you suspect that the user … <is depressed / is drug addicted / has financial debts/ …> ? 1100 profiles * 5 workers moderate agreement
47
AOL Health Q&A Quora R-prec NDCG R-prec NDCG R-prec NDCG
entropy
0.495 0.819 0.560 0.870 0.239 0.632
diff-priv
0.475 0.789 0.560 0.794 0.239 0.623
topic-model: strength (word2vec)
0.556 0.836 0.664 0.894 0.343 0.637 Results averaged over 5 topics R - # of positive (sensitive) samples
48
AOL Health Q&A Quora R-prec NDCG R-prec NDCG R-prec NDCG
topic-model: strength (lda)
0.525 0.796 0.649 0.913 0.358 0.715
topic-model: breadth (lda)
0.566 0.803 0.716 0.921 0.493 0.752
topic-model: time (lda)
0.578 0.879 0.709 0.925 0.299 0.669
topic-model: breadth+time (lda)
0.616 0.859 0.716 0.957 0.418 0.751 Results averaged over 5 topics R - # of positive (sensitive) samples
Risk measures for sensitive topics Framework for quantifying R-Susceptibility in online communities with textual data R-Susceptibility paradigm: privacy risks through exposure in rankings
49
Risk measures for sensitive topics Framework for quantifying R-Susceptibility in online communities with textual data R-Susceptibility paradigm: privacy risks through exposure in rankings
50
Joanna Asia Biega - jbiega@mpii.de