Privacy in Recommender Systems
CompSci 590.03 Instructor: Ashwin Machanavajjhala
Lecture 21: 590.03 Fall 12 1
Privacy in Recommender Systems CompSci 590.03 Instructor: Ashwin - - PowerPoint PPT Presentation
Privacy in Recommender Systems CompSci 590.03 Instructor: Ashwin Machanavajjhala Lecture 21: 590.03 Fall 12 1 Outline What is a Recommender System? Recommender Systems & Privacy Breaches Algorithms for privacy recommender
CompSci 590.03 Instructor: Ashwin Machanavajjhala
Lecture 21: 590.03 Fall 12 1
– Untrusted Server: – Trusted Server:
– A theoretical trade-off between privacy and utility
Lecture 21: 590.03 Fall 12 4
Lecture 21: 590.03 Fall 12 5
– Rate items (movies/products/etc) – Click items (news article/advertisement) – Browse items (products/webpages)
database of history of activities from a number of users.
Lecture 21: 590.03 Fall 12 6
Lecture 21: 590.03 Fall 12 7
Items Users
– Utility of a new item to a user is proportional utility of the item to similar users.
– Users and items are described by a latent model with a small number of dimensions – User likes “science fiction action movies”
– Popular items have higher utility in general – Items may have higher utility due to presentation bias – …
Lecture 21: 590.03 Fall 12 8
I1 I2 I3 I4 I5 I6 I7 I8 I9 U1 2 1 1 4 4 3 4 3 U2 1 1 5 5 4 4 3 U3 4 5 5 2 3 3 2 U4 5 4 1 3 2 2
Lecture 21: 590.03 Fall 12 9
Average Rating of User 1 = 2.8 Average Rating of User 1 = 3.4
I1 I2 I3 I4 I5 I6 I7 I8 I9 U1
1.2 1.2 0.2 1.2 0.2 U2
1.7 1.7 0.7 0.7
U3 0.6 1.6 1.6
U4 2.2 1.2
0.2
Lecture 21: 590.03 Fall 12 10
Rescale the users
I1 I2 I3 I4 I5 I6 I7 I8 I9 U1
1.2 1.2 0.2 1.2 0.2 U2
1.7 1.7 0.7 0.7
U3 0.6 1.6 1.6
U4 2.2 1.2
0.2
Lecture 21: 590.03 Fall 12 11
Compute Similarities
U1 U2 U3 U4 U1 1 0.78
U2 0.78 1
U3
1 0.83 U4
0.83 1
Lecture 21: 590.03 Fall 12 12
Predict a missing Rating
I3
1.6 ?
– Untrusted Server: – Trusted Server:
– A theoretical trade-off between privacy and utility
Lecture 21: 590.03 Fall 12 13
user.
items.
fake user (adversary).
Lecture 21: 590.03 Fall 12 14
I1 I2 I3 I4 I5 I6 I7 I8 I9 U1 2 1 1 4 4 3 4 3 U2 1 1 5 5 4 4 3 U3 4 5 5 2 3 3 2 U4 5 4 1 3 2 2
– Untrusted Server: – Trusted Server:
– A theoretical trade-off between privacy and utility
Lecture 21: 590.03 Fall 12 15
their true set of ratings
[Canny SIGIR 02]
– Protects information from untrusted server – Does not protect against active attack
[Evfimievski et al PODS 02]
– Protects information from untrusted server – Protects against active attack
Lecture 21: 590.03 Fall 12 16
Server
Alice Bob
baseball, cnn.com, … J.S. Bach, painting, nasa.gov, …
Chris
camping, linux.org, …
17 Lecture 21: 590.03 Fall 12
Server
Alice Bob
J.S. Bach, painting, nasa.gov, …
baseball, cnn.com, …
camping, linux.org, …
baseball, cnn.com, … J.S. Bach, painting, nasa.gov, …
Chris
camping, linux.org, …
18 Lecture 21: 590.03 Fall 12
Server Data Mining Model Usage
Alice Bob
J.S. Bach, painting, nasa.gov, …
baseball, cnn.com, …
camping, linux.org, …
baseball, cnn.com, … J.S. Bach, painting, nasa.gov, …
Chris
camping, linux.org, …
19 Lecture 21: 590.03 Fall 12
Server Data Mining Model Usage
Alice Bob
Metallica, painting, nasa.gov, …
soccer, bbc.co.uk, …
camping, microsoft.com …
baseball, cnn.com, … J.S. Bach, painting, nasa.gov, … Statistics Recovery
Chris
camping, linux.org, …
20 Lecture 21: 590.03 Fall 12
Lecture 21: 590.03 Fall 12 21
Lecture 21: 590.03 Fall 12 22
– Untrusted Server: – Trusted Server:
– A theoretical trade-off between privacy and utility
Lecture 21: 590.03 Fall 12 23
Recommend ads based on private shopping histories of “friends” in the social network.
24
Lecture 21: 590.03 Fall 12
25
A product that is followed by your friends …
Lecture 21: 590.03 Fall 12
26
Fact that “Betty” liked “VistaPrint” is leaked to “Alice”
Alice Betty
Lecture 21: 590.03 Fall 12
27
Alice Betty
Lecture 21: 590.03 Fall 12
28
Alice Betty
Alice is recommended ‘X’ Can we provide accurate recommendations to Alice based on the social network, while ensuring that Alice cannot deduce that Betty likes ‘X’ ?
Lecture 21: 590.03 Fall 12
– Yahoo/Facebook users, medical patients
– Other users (friends) , advertisements, products (drugs)
– Social network, patient-doctor and patient-drug history
– Recommend a new item i to agent a based on the network
29 Lecture 21: 590.03 Fall 12
– Yahoo/Facebook users, medical patients
– Other users (friends) , advertisements, products (drugs)
– Social network, patient-doctor and patient-drug history
– Recommend a new friend i to target user a based on the social network
30 Lecture 21: 590.03 Fall 12
31
Target Node (a) Candidate Recommendations u(a, i3) u(a, i2) u(a, i1)
Utility Function – u(a, i)
utility of recommending candidate i to target a
Lecture 21: 590.03 Fall 12
32
u(a, i3) u(a, i2) u(a, i1)
Utility Function – u(a, i)
utility of recommending candidate i to target a
Algorithm
For each target node a For each candidate i Compute p(a, i) that maximizes Σ u(a,i) p(a,i) endfor
Randomly pick one of the candidates with probability p(a,i)
endfor a
Lecture 21: 590.03 Fall 12
33
[Liben-Nowell, Kleinberg 2003]
2-hop neighborhood
Holistic
Lecture 21: 590.03 Fall 12
34
Utility Function – u(a, i)
utility of recommending candidate i to target a Common Neighbors Utility: “Alice and Bob are likely to be friends if they have many common neighbors”
u(a,i1) = f(2), u(a, i2) = f(3), u(a,i3) = f(1) Non-Private Algorithm
u(a, i3) u(a, i2) u(a, i1) a
Lecture 21: 590.03 Fall 12
– Two nodes are more similar if they have more common neighbors that have smaller degrees
– Two nodes are similar if they are connected by shorter paths
Lecture 21: 590.03 Fall 12 35
Should not disclose existence of private edges in the network vs Allow recommendations based on private edges
36
Lecture 21: 590.03 Fall 12
[Dwork 2006]
37
Lecture 21: 590.03 Fall 12
38
a i
Lecture 21: 590.03 Fall 12
39
Pr[ recommending (j, a) | G1] Pr[ recommending (j, a) | G2]
a i
a i
j j
Lecture 21: 590.03 Fall 12
40
Pr[ recommending (j, a) | G1] Pr[ recommending (j, a) | G2]
a i
a i
Lecture 21: 590.03 Fall 12
probability pi.
– comparison with utility of best non-private algorithm
41 Lecture 21: 590.03 Fall 12
Theorem: No deterministic algorithm guarantees differential privacy.
– Sample output space based on a distance metric.
– Add noise from a Laplace distribution to query answers.
42
Lecture 21: 590.03 Fall 12
Must pick a node with non-zero probability even if u = 0
43
[McSherry et al. 2007]
(Δ is maximum change in utilities by changing one edge)
u(a, i3) u(a, i2) u(a, i1) a
Lecture 21: 590.03 Fall 12
– Stanford network analysis package – Users casting votes for administrators. – 7K nodes and 100K edges
– Follow relationships. – 96K nodes and 490K edges
44
Lecture 21: 590.03 Fall 12
45
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
% of nodes receiving recommendations of accuracy
Accuracy
60% of users have accuracy < 10%
[Machanavajjhala et. al VLDB 2011]
Lecture 21: 590.03 Fall 12
46
% of nodes receiving recommendations of accuracy
Accuracy
98% of users have accuracy < 5%
[Machanavajjhala et. al VLDB 2011]
Lecture 21: 590.03 Fall 12
utility …
– Consider general utility functions that follow intuitive axioms
sufficient accuracy ...
– Consider any algorithm that satisfies differential privacy
47 Lecture 21: 590.03 Fall 12
Random Predictor: Every node in the graph is an equally good recommendation for a target node.
48
Lecture 21: 590.03 Fall 12
There exists a subset of nodes S in the graph such that, |S| = β = o(n / log n) , and
49
“Most of the utility of recommendation to a target is concentrated on a small number of candidates.”
Lecture 21: 590.03 Fall 12
Random Predictor: Every node in the graph is an equally good recommendation for a target node.
50
Lecture 21: 590.03 Fall 12
Impersonal Predictor: There is one (or a small set) of nodes that are good recommendations for every node.
51
Lecture 21: 590.03 Fall 12
u(a, i4)
Let G be a graph and h be an isomorphism
52
u(a, i3) u(a, i2) u(a, i1) a Identical with respect to ‘a’. Hence, u(a, i3) = u(a, i4)
Lecture 21: 590.03 Fall 12
Impersonal Predictor: There is one (or a small set) of nodes that are good recommendations for every node.
53
Lecture 21: 590.03 Fall 12
The heorem: For a graph with maximum degree dmax
ax ,
a differentially private algorithm can guarantee con consta tant t acc ccuracy only if
54
[Machanavajjhala et. al VLDB 2011]
Lecture 21: 590.03 Fall 12
Cor
functions …
55
* under some mild assumptions on the Katz (weighted paths) utility … [Machanavajjhala et. al VLDB 2011]
Lecture 21: 590.03 Fall 12
56
0% 20% 40% 60% 80% 100% 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
% of nodes receiving recommendations of accuracy
Accuracy,
60% of users have accuracy < 55%
[Machanavajjhala et. al VLDB 2011]
Lecture 21: 590.03 Fall 12
57
% of nodes receiving recommendations of accuracy
Accuracy,
95% of users have accuracy < 5%
[Machanavajjhala et. al VLDB 2011]
Lecture 21: 590.03 Fall 12
For majority of the nodes in the network, recommendations must either be inaccurate or violate differential privacy!
58 Lecture 21: 590.03 Fall 12
59
The heorem: For a graph with maximum degree dmax
ax =
= α log log n, a differentially private algorithm can guarantee co constant acc ccuracy only if
Lecture 21: 590.03 Fall 12
60
a i
G1
j a i
G2
j
u1(a, i), p1(a, i) u1(a, j), p1(a, j) u2(a, i), p2(a, i) u2(a, j), p2(a, j) p1(a,i) p2(a,i)
Lecture 21: 590.03 Fall 12
61
a i
G2
j
a i
G3
j
a i
G1
j
Lecture 21: 590.03 Fall 12
62
a i
G2
j
a i
G3
j
Lecture 21: 590.03 Fall 12
63
p1(a,i) p1(a,j)
Lecture 21: 590.03 Fall 12
changes (wrt the utility function).
64
p1(a,i) p1(a,j)
Lecture 21: 590.03 Fall 12
– 10s of nodes share a common neighbor with a
– Millions of nodes don’t share a common neighbor with a
65
p1(a,i) p1(a,j)
Lecture 21: 590.03 Fall 12
guaranteeing strong privacy conditions?”
– General utility functions satisfying natural axioms – Any algorithm satisfying differential privacy
66 Lecture 21: 590.03 Fall 12
described?
recommendations ?
Lecture 21: 590.03 Fall 12 67
PVLDB 2011
Mining”, PODS 2003
Lecture 21: 590.03 Fall 12 68