SLIDE 1 The Price of Privacy In Untrusted Recommendation Engines
Siddhartha Banerjee, Nidhi Hegde & Laurent Massoulié
UT Austin Technicolor
SLIDE 2 Privacy – efficiency trade-offs
q Google & FaceBook track online browsing behaviour q Apple & Android phones track geographical location q Official reason for harvesting user data: better service results
q Amazon’s “You might also like” q Netflix’s cinematch engine
q Privacy ≠ Anonymity: Netflix sued for disclosing anonymized “Prize” dataset à What trade-offs between recommendation accuracy and user privacy when service providers are untrusted?
SLIDE 3 Roadmap
q Recommendation as Learning q “Local” Differential Privacy q Query Complexity Bounds
q Mutual Information and Fano’s Inequality q Information-Rich Regime: Optimal Complexity via Spectral Clustering q Information-Scarce Regime: Complexity Gap and Optimality of “MaxSense”
SLIDE 4
Recommendation
q Users watch and rate items (movies) q Engine predicts unobserved ratings & recommends items with highest predicted ratings
?
SLIDE 5 A Simple Generative Model: The “Stochastic Block Model” [Holland et al. 83] q Each user belongs to one
q Each movie belongs to
q The rating of a user for a movie depends only on the user & movie classes
SLIDE 6 A Simple Generative Model: The “Stochastic Block Model”
P(+)=b1,1 P(+)=b1,2 P(+)=b2,2 P(+)=b2,1
SLIDE 7
Minimal requirement for recommendation: learn movie clusters
à Can tell what “Users who liked this have also liked” à Can reveal clusters and let users decide on their own their affinity to distinct clusters Challenge: how to do so while respecting users’ privacy? Without them trusting you?
SLIDE 8 Roadmap
q Recommendation as Learning q “Local” Differential Privacy q Query Complexity Bounds
q Mutual Information and Fano’s Inequality q Information-Rich Regime: Optimal Complexity via Spectral Clustering q Information-Scarce Regime: Complexity Gap and Optimality of “MaxSense”
SLIDE 9 Formal definition: Differential Privacy [Dwork 06]
q Input (private) data: X àx, x’: any two possible values differing in just one user’s input q Output (public) data Y ày: any possible value Definition Key property: attacker holding any side information S trying to know whether user u has any property A. Then public data does not help:
) ' | ( ) | ( x X y Y P e x X y Y P = = ≤ = =
ε
ε ε
e S u P Y S u P e ≤ ≤
−
) | A has user ( ) and | A has user (
SLIDE 10
SLIDE 11 Differential Privacy: Centralized versus Local xU xU-1 x3 x2 x1
Priv
Centralized model § Trusted DataBase aggregates Users’ private data § DP applied at egress of DB àlearning is not affected by DP
PrivU PrivU-1 Priv3 Priv2 Priv1
xU xU-1 x3 x2 x1
Local model § No trusted DataBase § DP applied locally at user end àlearning is affected by DP
SLIDE 12 ∑
=
i i
X S
Example mechanisms: Laplacian noise and bit flipping xU xU-1 x3 x2 x1
Priv
N S S + = ' ( )
n
e n N P N
ε
ε
−
= = 2 : x
Priv
ε ε ε
e e e X X + = + = 1 1 prob with X
, 1 prob with '
SLIDE 13
Aka “Randomized response technique” [Warner 1965]: Used to conduct polls about embarrassing questions “Do you understand the impact of euro-bonds on Europe’s future?” Answer truthfully only if score >2 àSpecific answers are deniable àEmpirical sums are still valid for learning few parameters Inadequate for learning many parameters: with k distinct ε-private sketch releases, overall privacy guarantee becomes k ε
Local DP- historical perspective
SLIDE 14 Roadmap
q Recommendation as Learning q “Local” Differential Privacy q Query Complexity Bounds
q Mutual Information and Fano’s Inequality q Information-Rich Regime: Optimal Complexity via Spectral Clustering q Information-Scarce Regime: Complexity Gap and Optimality of “MaxSense”
SLIDE 15 Learning, Mutual Information and DP
Want to learn hypothesis H from M distinct possibilities (e.g. clustering of N movies into L clusters: M ≈ LN options), Having observed G (e.g., DP inputs of U distinct users) Fano’s inequality: Learning will fail with high probability, unless mutual information I(H;G) close to log(M) Mutual information:
( ) ( ) ( ) ( ) ( )⎟
⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ = = = = = = =∑ g G P h H P g G h H P g G h H P G H I
g h
, log , ;
,
SLIDE 16
Result: DP-sketch X’ based on private data X verifies for any side information S: à Mutual information I(H;G): at most U*ε à “Query complexity”: need at least N/ε users’ private inputs to recover hidden clusters
G
Learning, Mutual Information and DP
( )
ε ≤ S X X I | ' ;
Priv1
X1 H
Priv2
X2
PrivU
XU X’1 X’2 X’U
SLIDE 17 ?
Out of N items in total, users rate W movies (assumed picked uniformly at random) à Information-rich regime: W=Ω(N) à Information-scarce regime: W=o(N) Users’ “information wealth” will affect optimal query complexity
The Information-Rich and the Information-Scarce Regimes
? ? + +
SLIDE 18 The information-rich regime: Pairwise-preference algorithm
Construct item affinity matrix A: Spectral clustering of items based on A
⎟ ⎠ ⎞ ⎜ ⎝ ⎛ =
∑
= = U u ij j i u ij
u u
X A
1 ) ( ) (
1 ' , 1 Min
X’u=bit-flip(Xu) X’1
xU xu x1
X’U
“did you rate as + both items iu, ju?”
SLIDE 19 The information-rich regime: Pairwise-preference algorithm
Result: Algorithm finds hidden clusters w.h.p. if U=Ω(N log N) under “block distinguishability” conditions on underlying model àoptimal, up to logarithmic factor Proof elements: matrix A: adjacency of ER-like graph, with When prefactor is Ω(log N/N) , top eigenvectors determine underlying block structure [Feige-Ofek 2005; Tomozei-M 2011]
( )
( ) ( ) ( ) ( )
( ) ( )
[ ]
∑
+ − − − − =
k j k i k k ij
b b N N W W N N U A E ε ε π
2 1 1 1 1 2
SLIDE 20 The information-scarce regime: lower bounds
Channel mismatch will make end-to-end mutual information much lower than minimum of each mutual information Intuition: to question “did you rate item i with a +?”, user’s answer will be informative only with chance W/N à Information in public sketch is “diluted” by factor W/N
User’s private ratings
X’1 H
Block structure: Movie clusters Public sketch
Channel 1: User sampling & rating Channel 2: Local DP mechanism
SLIDE 21
The information-scarce regime: lower bounds
Result: Assume two item clusters, and each user u observes true type Zi of W randomly picked items i Then: a user’s DP sketch X’ verifies I(H;X’)=O(W/N) Corollary: to learn hidden clustering of N items from parallel queries to U users needs U=Ω(N2/W) e.g. N=104 , W=100 needs U= Ω(106) N=106 , W=100 needs U= Ω(1010) à need to query non-humans!
SLIDE 22
Proof elements
1) Bound on mutual information à A convex quadratic form of the kernels p(I,Z | S) 2) Identification of extremal kernels 3) Some Euclidean geometry…
SLIDE 23 Information-scarce regime: Max-Sense algorithm
User query: Sense random set S(u) of size N/W Item representative:
X’u=bit-flip(Xu) X’1
xU xu x1
X’U
“did you rate as + any item i in set S(u)?”
( )
( )
∑
= ∈
=
U u u S i u
X i T
1
1 '
SLIDE 24
Information-scarce regime: Max-Sense algorithm
Result: under separability assumption, k-means clustering of item representatives find hidden clusters w.h.p. if U=Ω(N2log(N)/W) àOptimal scaling, up to logarithmic factor
SLIDE 25
Conclusions and Outlook
q Mutual Information adequate to characterize learning complexity under local DP constraints q Accurate Clustering, Local Differential Privacy, Low (linear) Query Complexity: leave one out! q MaxSense achieves optimal complexity for parallel queries q Can one beat its complexity with adaptive queries? q Alternatives to Differential Privacy?
SLIDE 26
Questions?
SLIDE 27 Lower bounds for adaptive queries
Can one improve complexity by adapting queries based on previous user answers? Result: for W=1, arbitrary side information S Then user’s DP sketch X’u verifies àAdaptive query complexity at least Ω(N log(N)) Larger than initial lower bound by logarithmic factor CONJECTURE: Query complexity lower bound of N2/W still holds with adaptive queries ( ) ( ) ( )
S H I N O S H X I
u
; , 1 Max 1 | ; ' ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ ≤