Privacy in Recommender Systems CompSci 590.03 Instructor: Ashwin - - PowerPoint PPT Presentation

privacy in recommender systems
SMART_READER_LITE
LIVE PREVIEW

Privacy in Recommender Systems CompSci 590.03 Instructor: Ashwin - - PowerPoint PPT Presentation

Privacy in Recommender Systems CompSci 590.03 Instructor: Ashwin Machanavajjhala Lecture 21: 590.03 Fall 12 1 Outline What is a Recommender System? Recommender Systems & Privacy Breaches Algorithms for privacy recommender


slide-1
SLIDE 1

Privacy in Recommender Systems

CompSci 590.03 Instructor: Ashwin Machanavajjhala

Lecture 21: 590.03 Fall 12 1

slide-2
SLIDE 2

Outline

  • What is a Recommender System?
  • Recommender Systems & Privacy Breaches
  • Algorithms for privacy recommender systems

– Untrusted Server: – Trusted Server:

  • Social Recommendations:

– A theoretical trade-off between privacy and utility

Lecture 21: 590.03 Fall 12 4

slide-3
SLIDE 3

Recommender systems appear in everyday life …

Lecture 21: 590.03 Fall 12 5

slide-4
SLIDE 4

Recommendation Engine

  • Users activity:

– Rate items (movies/products/etc) – Click items (news article/advertisement) – Browse items (products/webpages)

  • Task: Predict the utility of items to a particular user based on a

database of history of activities from a number of users.

Lecture 21: 590.03 Fall 12 6

slide-5
SLIDE 5

Database of ratings

Lecture 21: 590.03 Fall 12 7

Items Users

slide-6
SLIDE 6

Algorithms for Collaborative Filtering

  • Neighborhood-based

– Utility of a new item to a user is proportional utility of the item to similar users.

  • Latent Factor Models

– Users and items are described by a latent model with a small number of dimensions – User likes “science fiction action movies”

  • Accounting for temporal dynamics and biases

– Popular items have higher utility in general – Items may have higher utility due to presentation bias – …

  • See Yehuda Koren’s tutorial

Lecture 21: 590.03 Fall 12 8

slide-7
SLIDE 7

Example: Neighbor based Algorithm

I1 I2 I3 I4 I5 I6 I7 I8 I9 U1 2 1 1 4 4 3 4 3 U2 1 1 5 5 4 4 3 U3 4 5 5 2 3 3 2 U4 5 4 1 3 2 2

Lecture 21: 590.03 Fall 12 9

Average Rating of User 1 = 2.8 Average Rating of User 1 = 3.4

slide-8
SLIDE 8

Example: Neighbor based Algorithm

I1 I2 I3 I4 I5 I6 I7 I8 I9 U1

  • 0.8
  • 1.8
  • 1.8

1.2 1.2 0.2 1.2 0.2 U2

  • 2.3
  • 2.3

1.7 1.7 0.7 0.7

  • 0.3

U3 0.6 1.6 1.6

  • 1.4
  • 0.4
  • 0.4
  • 1.4

U4 2.2 1.2

  • 1.8

0.2

  • 0.8
  • 0.8

Lecture 21: 590.03 Fall 12 10

Rescale the users

slide-9
SLIDE 9

Example: Neighbor based Algorithm

I1 I2 I3 I4 I5 I6 I7 I8 I9 U1

  • 0.8
  • 1.8
  • 1.8

1.2 1.2 0.2 1.2 0.2 U2

  • 2.3
  • 2.3

1.7 1.7 0.7 0.7

  • 0.3

U3 0.6 1.6 1.6

  • 1.4
  • 0.4
  • 0.4
  • 1.4

U4 2.2 1.2

  • 1.8

0.2

  • 0.8
  • 0.8

Lecture 21: 590.03 Fall 12 11

Compute Similarities

slide-10
SLIDE 10

Example: Neighbor based Algorithms

U1 U2 U3 U4 U1 1 0.78

  • 0.96
  • 0.85

U2 0.78 1

  • 0.74
  • 0.77

U3

  • 0.96
  • 0.74

1 0.83 U4

  • 0.85
  • 0.77

0.83 1

Lecture 21: 590.03 Fall 12 12

Predict a missing Rating

I3

  • 1.8
  • 2.3

1.6 ?

slide-11
SLIDE 11

Outline

  • What is a Recommender System?
  • Recommender Systems & Privacy Breaches
  • Algorithms for privacy recommender systems

– Untrusted Server: – Trusted Server:

  • Social Recommendations:

– A theoretical trade-off between privacy and utility

Lecture 21: 590.03 Fall 12 13

slide-12
SLIDE 12

Active Privacy Attack

  • Adversary knows a subset of items rated/purchased by the target

user.

  • Adversary creates a new fake account and rates the same set of

items.

  • Other items highly rated by the user are recommended to the

fake user (adversary).

Lecture 21: 590.03 Fall 12 14

I1 I2 I3 I4 I5 I6 I7 I8 I9 U1 2 1 1 4 4 3 4 3 U2 1 1 5 5 4 4 3 U3 4 5 5 2 3 3 2 U4 5 4 1 3 2 2

slide-13
SLIDE 13

Outline

  • What is a Recommender System?
  • Recommender Systems & Privacy Breaches
  • Algorithms for privacy recommender systems

– Untrusted Server: – Trusted Server:

  • Social Recommendations:

– A theoretical trade-off between privacy and utility

Lecture 21: 590.03 Fall 12 15

slide-14
SLIDE 14

Untrusted Server

  • The users do not trust the server and do not want to disclose

their true set of ratings

  • Distributed Recommendations

[Canny SIGIR 02]

– Protects information from untrusted server – Does not protect against active attack

  • Randomized Response

[Evfimievski et al PODS 02]

– Protects information from untrusted server – Protects against active attack

Lecture 21: 590.03 Fall 12 16

slide-15
SLIDE 15

Randomized Response

Server

Alice Bob

  • B. Spears,

baseball, cnn.com, … J.S. Bach, painting, nasa.gov, …

Chris

  • B. Marley,

camping, linux.org, …

17 Lecture 21: 590.03 Fall 12

slide-16
SLIDE 16

Server

Alice Bob

J.S. Bach, painting, nasa.gov, …

  • B. Spears,

baseball, cnn.com, …

  • B. Marley,

camping, linux.org, …

  • B. Spears,

baseball, cnn.com, … J.S. Bach, painting, nasa.gov, …

Chris

  • B. Marley,

camping, linux.org, …

Randomized Response

18 Lecture 21: 590.03 Fall 12

slide-17
SLIDE 17

Server Data Mining Model Usage

Alice Bob

J.S. Bach, painting, nasa.gov, …

  • B. Spears,

baseball, cnn.com, …

  • B. Marley,

camping, linux.org, …

  • B. Spears,

baseball, cnn.com, … J.S. Bach, painting, nasa.gov, …

Chris

  • B. Marley,

camping, linux.org, …

Randomized Response

19 Lecture 21: 590.03 Fall 12

slide-18
SLIDE 18

Server Data Mining Model Usage

Alice Bob

Metallica, painting, nasa.gov, …

  • B. Spears,

soccer, bbc.co.uk, …

  • B. Marley,

camping, microsoft.com …

  • B. Spears,

baseball, cnn.com, … J.S. Bach, painting, nasa.gov, … Statistics Recovery

Chris

  • B. Marley,

camping, linux.org, …

Randomized Response

20 Lecture 21: 590.03 Fall 12

slide-19
SLIDE 19

One Algorithm: Select-a-size

  • Pick a number j at random
  • Select j original items
  • Insert new items with probability ρ

Lecture 21: 590.03 Fall 12 21

slide-20
SLIDE 20

Trusted Server

  • Differentially Private Recommendations [McSherry et al KDD 09]

Lecture 21: 590.03 Fall 12 22

slide-21
SLIDE 21

Outline

  • What is a Recommender System?
  • Recommender Systems & Privacy Breaches
  • Algorithms for privacy recommender systems

– Untrusted Server: – Trusted Server:

  • Social Recommendations:

– A theoretical trade-off between privacy and utility

Lecture 21: 590.03 Fall 12 23

slide-22
SLIDE 22

Personalized Social Recommendations

  • Armani
  • Gucci
  • Prada

Recommend ads based on private shopping histories of “friends” in the social network.

24

Alice Betty

  • Nikon
  • HP
  • Nike

Lecture 21: 590.03 Fall 12

slide-23
SLIDE 23

25

Social Advertising … in real world

A product that is followed by your friends …

Items (products/people) liked by Alice’s friends are better recommendations for Alice

Lecture 21: 590.03 Fall 12

slide-24
SLIDE 24

Social Advertising … privacy problem

26

Fact that “Betty” liked “VistaPrint” is leaked to “Alice”

Alice Betty

Only the items (products/people) liked by Alice’s friends are recommendations for Alice

Lecture 21: 590.03 Fall 12

slide-25
SLIDE 25

Social Advertising … privacy problem

27

Alice Betty

Recommending irrelevant items some times improves privacy, but reduces accuracy

Lecture 21: 590.03 Fall 12

slide-26
SLIDE 26

28

Social Advertising Privacy problem

Alice Betty

Alice is recommended ‘X’ Can we provide accurate recommendations to Alice based on the social network, while ensuring that Alice cannot deduce that Betty likes ‘X’ ?

Lecture 21: 590.03 Fall 12

slide-27
SLIDE 27

Social Recommendations

  • A set of agents

– Yahoo/Facebook users, medical patients

  • A set of recommended items

– Other users (friends) , advertisements, products (drugs)

  • A network of edges connecting the agents, items

– Social network, patient-doctor and patient-drug history

  • Problem:

– Recommend a new item i to agent a based on the network

29 Lecture 21: 590.03 Fall 12

slide-28
SLIDE 28

Social Recommendations(this talk)

  • A set of agents

– Yahoo/Facebook users, medical patients

  • A set of recommended items

– Other users (friends) , advertisements, products (drugs)

  • A network of edges connecting the agents, items

– Social network, patient-doctor and patient-drug history

  • Problem:

– Recommend a new friend i to target user a based on the social network

30 Lecture 21: 590.03 Fall 12

slide-29
SLIDE 29

Social Recommendations

31

Target Node (a) Candidate Recommendations u(a, i3) u(a, i2) u(a, i1)

Utility Function – u(a, i)

utility of recommending candidate i to target a

Lecture 21: 590.03 Fall 12

slide-30
SLIDE 30

Non-Private Recommendation Algorithm

32

u(a, i3) u(a, i2) u(a, i1)

Utility Function – u(a, i)

utility of recommending candidate i to target a

Algorithm

For each target node a For each candidate i Compute p(a, i) that maximizes Σ u(a,i) p(a,i) endfor

Randomly pick one of the candidates with probability p(a,i)

endfor a

Lecture 21: 590.03 Fall 12

slide-31
SLIDE 31

Good utility functions for link prediction

33

[Liben-Nowell, Kleinberg 2003]

2-hop neighborhood

  • Common Neighbors
  • Adamic/Adar

Holistic

  • Katz (weighted paths)
  • Personalized PageRank

Lecture 21: 590.03 Fall 12

slide-32
SLIDE 32

Example: Common Neighbors Utility

34

Utility Function – u(a, i)

utility of recommending candidate i to target a Common Neighbors Utility: “Alice and Bob are likely to be friends if they have many common neighbors”

u(a,i1) = f(2), u(a, i2) = f(3), u(a,i3) = f(1) Non-Private Algorithm

  • Return the candidate with max u(a, i)
  • Randomly pick a candidate with probability proportional to u(a,i)

u(a, i3) u(a, i2) u(a, i1) a

Lecture 21: 590.03 Fall 12

slide-33
SLIDE 33

Other utility functions

  • Adamic/Adar

– Two nodes are more similar if they have more common neighbors that have smaller degrees

  • Katz

– Two nodes are similar if they are connected by shorter paths

Lecture 21: 590.03 Fall 12 35

slide-34
SLIDE 34

Privacy

Should not disclose existence of private edges in the network vs Allow recommendations based on private edges

36

Lecture 21: 590.03 Fall 12

slide-35
SLIDE 35

Differential Privacy

For every output … O D2 D1 Adversary should not be able to distinguish between any D1 and D2 based on any O Pr[D1  O] Pr[D2  O] . For every pair of inputs that differ in one value < ε (ε>0)

log

[Dwork 2006]

37

Lecture 21: 590.03 Fall 12

slide-36
SLIDE 36

Differential Privacy for Social Recommendations

  • Sensitive edges: edges not connected to target node a

38

a i

G1

Lecture 21: 590.03 Fall 12

slide-37
SLIDE 37

Differential Privacy for Social Recommendations

  • Sensitive edges: edges not connected to target node a.
  • Recommending a node j should not disclose the existence of a

sensitive edge to a.

39

Pr[ recommending (j, a) | G1] Pr[ recommending (j, a) | G2]

log < ε

a i

G1

a i

G2

j j

Lecture 21: 590.03 Fall 12

slide-38
SLIDE 38

Differential Privacy for Social Recommendations

  • Smaller values of ε mean more privacy.
  • Want ε to be a small constant.

40

Pr[ recommending (j, a) | G1] Pr[ recommending (j, a) | G2]

log < ε

a i

G1

a i

G2

Lecture 21: 590.03 Fall 12

slide-39
SLIDE 39

Measuring loss in utility due to privacy

  • Suppose algorithm A recommends node i of utility ui with

probability pi.

  • Accuracy of (A, u) is defined as

– comparison with utility of best non-private algorithm

41 Lecture 21: 590.03 Fall 12

slide-40
SLIDE 40

Algorithms for Differential Privacy

Theorem: No deterministic algorithm guarantees differential privacy.

  • Exponential Mechanism

– Sample output space based on a distance metric.

  • Laplace Mechanism

– Add noise from a Laplace distribution to query answers.

42

Lecture 21: 590.03 Fall 12

slide-41
SLIDE 41

Privacy Preserving Recommendations

Must pick a node with non-zero probability even if u = 0

43

Exponential Mechanism

[McSherry et al. 2007]

Randomly pick a candidate with probability proportional to exp( ε∙u(a,i) / Δ )

(Δ is maximum change in utilities by changing one edge)

u(a, i3) u(a, i2) u(a, i1) a

Satisfies ε-differential privacy

Lecture 21: 590.03 Fall 12

slide-42
SLIDE 42

Accuracy vs Privacy in real graphs

  • WikiVote Network

– Stanford network analysis package – Users casting votes for administrators. – 7K nodes and 100K edges

  • Sample of Twitter

– Follow relationships. – 96K nodes and 490K edges

44

Lecture 21: 590.03 Fall 12

slide-43
SLIDE 43

Accuracy of Exponential Mechanism + Common Neighbors Utility

45

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

% of nodes receiving recommendations of accuracy  

Accuracy 

WikiVote Network (ε = 0.5)

60% of users have accuracy < 10%

[Machanavajjhala et. al VLDB 2011]

Lecture 21: 590.03 Fall 12

slide-44
SLIDE 44

Accuracy of Exponential Mechanism + Common Neighbors Utility

46

Twitter sample (ε = 1)

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

% of nodes receiving recommendations of accuracy  

Accuracy 

98% of users have accuracy < 5%

[Machanavajjhala et. al VLDB 2011]

Lecture 21: 590.03 Fall 12

slide-45
SLIDE 45

Can we do better?

  • Maybe common neighbors utility is an especially non-private

utility …

– Consider general utility functions that follow intuitive axioms

  • Maybe the Exponential Mechanism algorithm does not guarantee

sufficient accuracy ...

– Consider any algorithm that satisfies differential privacy

47 Lecture 21: 590.03 Fall 12

slide-46
SLIDE 46

Example 1 of Private “Utility” Functions

Random Predictor: Every node in the graph is an equally good recommendation for a target node.

Not useful

48

Lecture 21: 590.03 Fall 12

slide-47
SLIDE 47

Axiom 1: Concentration

There exists a subset of nodes S in the graph such that, |S| = β = o(n / log n) , and

49

“Most of the utility of recommendation to a target is concentrated on a small number of candidates.”

Lecture 21: 590.03 Fall 12

slide-48
SLIDE 48

Example 1 of Private “Utility” Functions

Random Predictor: Every node in the graph is an equally good recommendation for a target node.

Does not satisfy the Concentration Axiom

50

Lecture 21: 590.03 Fall 12

slide-49
SLIDE 49

Example 2 of Private “Utility” Functions

Impersonal Predictor: There is one (or a small set) of nodes that are good recommendations for every node.

Independent of the graph.

51

Lecture 21: 590.03 Fall 12

slide-50
SLIDE 50

u(a, i4)

Axiom 2: Exchangeability

Let G be a graph and h be an isomorphism

  • n the nodes resulting in graph Gh .

52

u(a, i3) u(a, i2) u(a, i1) a Identical with respect to ‘a’. Hence, u(a, i3) = u(a, i4)

Lecture 21: 590.03 Fall 12

slide-51
SLIDE 51

Example 2 of Private “Utility” Functions

Impersonal Predictor: There is one (or a small set) of nodes that are good recommendations for every node.

Does not satisfy Exchangeability

53

Lecture 21: 590.03 Fall 12

slide-52
SLIDE 52

General Accuracy-Privacy Tradeoff

The heorem: For a graph with maximum degree dmax

ax ,

a differentially private algorithm can guarantee con consta tant t acc ccuracy only if

54

[Machanavajjhala et. al VLDB 2011]

Lecture 21: 590.03 Fall 12

slide-53
SLIDE 53

Specific Accuracy-Privacy Tradeoff

Cor

  • rollary: For Common Neighbors, Adamic/Adar and Katz* utility

functions …

55

* under some mild assumptions on the Katz (weighted paths) utility … [Machanavajjhala et. al VLDB 2011]

Lecture 21: 590.03 Fall 12

slide-54
SLIDE 54

Implications of Accuracy-Privacy Tradeoff

56

0% 20% 40% 60% 80% 100% 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

% of nodes receiving recommendations of accuracy  

Accuracy, 

Exponential Mech Theoretical WikiVote Network (ε = 0.5)

60% of users have accuracy < 55%

[Machanavajjhala et. al VLDB 2011]

Lecture 21: 590.03 Fall 12

slide-55
SLIDE 55

Implications of Accuracy-Privacy Tradeoff

57

Twitter sample (ε = 1)

0% 20% 40% 60% 80% 100% 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

% of nodes receiving recommendations of accuracy  

Accuracy, 

Exponential Mech Theoretical

95% of users have accuracy < 5%

[Machanavajjhala et. al VLDB 2011]

Lecture 21: 590.03 Fall 12

slide-56
SLIDE 56

Takeaway …

For majority of the nodes in the network, recommendations must either be inaccurate or violate differential privacy!

58 Lecture 21: 590.03 Fall 12

slide-57
SLIDE 57

Intuition behind main result

59

The heorem: For a graph with maximum degree dmax

ax =

= α log log n, a differentially private algorithm can guarantee co constant acc ccuracy only if

Lecture 21: 590.03 Fall 12

slide-58
SLIDE 58

Intuition behind main result

60

a i

G1

j a i

G2

j

u1(a, i), p1(a, i) u1(a, j), p1(a, j) u2(a, i), p2(a, i) u2(a, j), p2(a, j) p1(a,i) p2(a,i)

< eε

Lecture 21: 590.03 Fall 12

slide-59
SLIDE 59

Intuition behind main result

61

a i

G2

j

p1(a,i) p2(a,i)

< eε

a i

G3

j

p3(a,j) p1(a,j)

< eε

a i

G1

j

Lecture 21: 590.03 Fall 12

slide-60
SLIDE 60

Using Exchangeability

62

a i

G2

j

p1(a,i) p2(a,i)

< eε

a i

G3

j

p3(a,j) p1(a,j)

< eε

G3 is an isomorphism of G2.

u2(a,i) = u3(a,j) implies p2(a,i) = p3(a,j)

Lecture 21: 590.03 Fall 12

slide-61
SLIDE 61

Using Exchangeability

63

p1(a,i) p1(a,j)

< e2ε

G3 is an isomorphism of G2.

u2(a,i) = u3(a,j) implies p2(a,i) = p3(a,j)

Lecture 21: 590.03 Fall 12

slide-62
SLIDE 62

Using Exchangeability

  • In general if any node i can be “transformed” to node j in t edge

changes (wrt the utility function).

  • Then,

64

p1(a,i) p1(a,j)

< etε

probability of recommending highest utility node is at most etε times probability of recommending worst utility node.

Lecture 21: 590.03 Fall 12

slide-63
SLIDE 63

Final Act: Using Concentration

  • Few nodes have high utility for target a (O(1))

– 10s of nodes share a common neighbor with a

  • Many nodes have low utility for target a (o(1/n))

– Millions of nodes don’t share a common neighbor with a

  • Thus, there exist i and j such that

65

p1(a,i) p1(a,j)

< etε Ω(n) =

Lecture 21: 590.03 Fall 12

slide-64
SLIDE 64

Summary of Social Recommendations

  • Question: “Can social recommendations be made while

guaranteeing strong privacy conditions?”

– General utility functions satisfying natural axioms – Any algorithm satisfying differential privacy

  • Answer: “For majority of nodes in the network, recommendations

must either be inaccurate or violate differential privacy!”

66 Lecture 21: 590.03 Fall 12

slide-65
SLIDE 65

Open Questions

  • Do we really need differential privacy to guard against the attack

described?

  • Are there weaker notions of privacy that allow accurate

recommendations ?

Lecture 21: 590.03 Fall 12 67

slide-66
SLIDE 66

References

  • A. Machanavajjhala, A. Korolova, A. Das Sarma, “Personalized Social Recommendations”,

PVLDB 2011

  • F. McSherry, I. Moronov, “Differentially Private Recommender Systems”, KDD 2009
  • Y. Koren, “Recent Progress in Collaborative Filtering”, Tutorial, ACM RecSys 2008
  • A. Evfimievski, R. Srikant, J. Gehrke, “Limiting Privacy Breaches in Privacy Preserving Data

Mining”, PODS 2003

  • J. Canny, “Collaborative Filtering with Privacy”, IEEE Security & Privacy, 2002
  • D. Liben-Nowell, J. Kleinberg, “The Link Prediction Problem for Social Networks”, CIKM 2003

Lecture 21: 590.03 Fall 12 68