[PPT] - The Price of Privacy In Untrusted Recommendation Engines Siddhartha PowerPoint Presentation

SLIDE 1

The Price of Privacy In Untrusted Recommendation Engines

Siddhartha Banerjee, Nidhi Hegde & Laurent Massoulié

UT Austin Technicolor

SLIDE 2

Privacy – efficiency trade-offs

q Google & FaceBook track online browsing behaviour q Apple & Android phones track geographical location q Official reason for harvesting user data: better service results

q Amazon’s “You might also like” q Netflix’s cinematch engine

q Privacy ≠ Anonymity: Netflix sued for disclosing anonymized “Prize” dataset à What trade-offs between recommendation accuracy and user privacy when service providers are untrusted?

SLIDE 3

Roadmap

q Recommendation as Learning q “Local” Differential Privacy q Query Complexity Bounds

q Mutual Information and Fano’s Inequality q Information-Rich Regime: Optimal Complexity via Spectral Clustering q Information-Scarce Regime: Complexity Gap and Optimality of “MaxSense”

SLIDE 4

Recommendation

q Users watch and rate items (movies) q Engine predicts unobserved ratings & recommends items with highest predicted ratings

?

SLIDE 5

A Simple Generative Model: The “Stochastic Block Model” [Holland et al. 83] q Each user belongs to one

f K user classes

q Each movie belongs to

ne of L movie classes

q The rating of a user for a movie depends only on the user & movie classes

SLIDE 6

A Simple Generative Model: The “Stochastic Block Model”

P(+)=b1,1 P(+)=b1,2 P(+)=b2,2 P(+)=b2,1

SLIDE 7

Minimal requirement for recommendation: learn movie clusters

à Can tell what “Users who liked this have also liked” à Can reveal clusters and let users decide on their own their affinity to distinct clusters Challenge: how to do so while respecting users’ privacy? Without them trusting you?

SLIDE 8

Roadmap

q Recommendation as Learning q “Local” Differential Privacy q Query Complexity Bounds

q Mutual Information and Fano’s Inequality q Information-Rich Regime: Optimal Complexity via Spectral Clustering q Information-Scarce Regime: Complexity Gap and Optimality of “MaxSense”

SLIDE 9

Formal definition: Differential Privacy [Dwork 06]

q Input (private) data: X àx, x’: any two possible values differing in just one user’s input q Output (public) data Y ày: any possible value Definition Key property: attacker holding any side information S trying to know whether user u has any property A. Then public data does not help:

) ' | ( ) | ( x X y Y P e x X y Y P = = ≤ = =

ε

ε ε

e S u P Y S u P e ≤ ≤

−

) | A has user ( ) and | A has user (

SLIDE 10

SLIDE 11

Differential Privacy: Centralized versus Local xU xU-1  x3 x2 x1

Priv

Centralized model § Trusted DataBase aggregates Users’ private data § DP applied at egress of DB àlearning is not affected by DP

PrivU PrivU-1 Priv3 Priv2 Priv1

xU xU-1 x3 x2 x1

Local model § No trusted DataBase § DP applied locally at user end àlearning is affected by DP

SLIDE 12

∑

=

i i

X S

Example mechanisms: Laplacian noise and bit flipping xU xU-1  x3 x2 x1

Priv

N S S + = ' ( )

n

e n N P N

ε

−

= = 2 : x

Priv

ε ε ε

e e e X X + = + = 1 1 prob with X

1

, 1 prob with '

SLIDE 13

Aka “Randomized response technique” [Warner 1965]: Used to conduct polls about embarrassing questions “Do you understand the impact of euro-bonds on Europe’s future?” Answer truthfully only if score >2 àSpecific answers are deniable àEmpirical sums are still valid for learning few parameters Inadequate for learning many parameters: with k distinct ε-private sketch releases, overall privacy guarantee becomes k ε

Local DP- historical perspective

SLIDE 14

Roadmap

q Recommendation as Learning q “Local” Differential Privacy q Query Complexity Bounds

q Mutual Information and Fano’s Inequality q Information-Rich Regime: Optimal Complexity via Spectral Clustering q Information-Scarce Regime: Complexity Gap and Optimality of “MaxSense”

SLIDE 15

Learning, Mutual Information and DP

Want to learn hypothesis H from M distinct possibilities (e.g. clustering of N movies into L clusters: M ≈ LN options), Having observed G (e.g., DP inputs of U distinct users) Fano’s inequality: Learning will fail with high probability, unless mutual information I(H;G) close to log(M) Mutual information:

( ) ( ) ( ) ( ) ( )⎟

⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ = = = = = = =∑ g G P h H P g G h H P g G h H P G H I

g h

, log , ;

,

SLIDE 16

Result: DP-sketch X’ based on private data X verifies for any side information S: à Mutual information I(H;G): at most U*ε à “Query complexity”: need at least N/ε users’ private inputs to recover hidden clusters

G

Learning, Mutual Information and DP

( )

ε ≤ S X X I | ' ;

Priv1

X1 H

Priv2

X2

PrivU

XU X’1 X’2 X’U

SLIDE 17

?

Out of N items in total, users rate W movies (assumed picked uniformly at random) à Information-rich regime: W=Ω(N) à Information-scarce regime: W=o(N) Users’ “information wealth” will affect optimal query complexity

The Information-Rich and the Information-Scarce Regimes

? ? + +

+ ?

SLIDE 18

The information-rich regime: Pairwise-preference algorithm

Construct item affinity matrix A: Spectral clustering of items based on A

⎟ ⎠ ⎞ ⎜ ⎝ ⎛ =

∑

= = U u ij j i u ij

u u

X A

1 ) ( ) (

1 ' , 1 Min

X’u=bit-flip(Xu) X’1

xU xu x1

X’U

“did you rate as + both items iu, ju?”

SLIDE 19

The information-rich regime: Pairwise-preference algorithm

Result: Algorithm finds hidden clusters w.h.p. if U=Ω(N log N) under “block distinguishability” conditions on underlying model àoptimal, up to logarithmic factor Proof elements: matrix A: adjacency of ER-like graph, with When prefactor is Ω(log N/N) , top eigenvectors determine underlying block structure [Feige-Ofek 2005; Tomozei-M 2011]

( )

( ) ( ) ( ) ( )

( ) ( )

[ ]

∑

+ − − − − =

k j k i k k ij

b b N N W W N N U A E ε ε π

 

2 1 1 1 1 2

SLIDE 20

The information-scarce regime: lower bounds

Channel mismatch will make end-to-end mutual information much lower than minimum of each mutual information Intuition: to question “did you rate item i with a +?”, user’s answer will be informative only with chance W/N à Information in public sketch is “diluted” by factor W/N

User’s private ratings

X’1 H

Block structure: Movie clusters Public sketch

Channel 1: User sampling & rating Channel 2: Local DP mechanism

SLIDE 21

The information-scarce regime: lower bounds

Result: Assume two item clusters, and each user u observes true type Zi of W randomly picked items i Then: a user’s DP sketch X’ verifies I(H;X’)=O(W/N) Corollary: to learn hidden clustering of N items from parallel queries to U users needs U=Ω(N2/W) e.g. N=104 , W=100 needs U= Ω(106) N=106 , W=100 needs U= Ω(1010) à need to query non-humans!

SLIDE 22

Proof elements

1) Bound on mutual information à A convex quadratic form of the kernels p(I,Z | S) 2) Identification of extremal kernels 3) Some Euclidean geometry…

SLIDE 23

Information-scarce regime: Max-Sense algorithm

User query: Sense random set S(u) of size N/W Item representative:

X’u=bit-flip(Xu) X’1

xU xu x1

X’U

“did you rate as + any item i in set S(u)?”

( )

∑

= ∈

=

U u u S i u

X i T

1

1 '

SLIDE 24

Information-scarce regime: Max-Sense algorithm

Result: under separability assumption, k-means clustering of item representatives find hidden clusters w.h.p. if U=Ω(N2log(N)/W) àOptimal scaling, up to logarithmic factor

SLIDE 25

Conclusions and Outlook

q Mutual Information adequate to characterize learning complexity under local DP constraints q Accurate Clustering, Local Differential Privacy, Low (linear) Query Complexity: leave one out! q MaxSense achieves optimal complexity for parallel queries q Can one beat its complexity with adaptive queries? q Alternatives to Differential Privacy?

SLIDE 26

Questions?

SLIDE 27

Lower bounds for adaptive queries

Can one improve complexity by adapting queries based on previous user answers? Result: for W=1, arbitrary side information S Then user’s DP sketch X’u verifies àAdaptive query complexity at least Ω(N log(N)) Larger than initial lower bound by logarithmic factor CONJECTURE: Query complexity lower bound of N2/W still holds with adaptive queries ( ) ( ) ( )

S H I N O S H X I

u