 
              Providing Input-Discriminative Protection for Local Differential Privacy Xiaolan Gu * , Ming Li * , Li Xiong # and Yang Cao † *University of Arizona # Emory University † Kyoto University IEEE International Conference on Data Engineering (ICDE), April 2020
Overview • Background on LDP • Our Privacy Notion: ID-LDP • Our Privacy Mechanism on ID-LDP • Evaluation • Conclusion
Background • Companies are collecting our private data to provide better services (Google, Facebook, Apple, Yahoo, Uber, …) • Yahoo: massive data breaches impacted 3 billion user account, 2013 • Facebook: 267 million users’ data has reportedly been leaked, 2019 • However, privacy concerns arise • … • Possible solution: locally private data collection model Upload perturbed data Randomized mechanism y 1 M x i y i Analysis y 2 Raw Perturbed ⋮ data data Untrusted y n server
Local Differential Privacy (LDP) [Duchi et al, FOCS’ 13] A mechanism satisfies -LDP if and only if for any pair of inputs x , x ′ M ϵ and any output y Pr( M ( x ) = y ) ) = y ) ⩽ e ϵ Pr( M ( x ′ • : the possible input (raw) data (generated by the user) x , x ′ • : the output (perturbed) data (public and known by adversary) y • : privacy budget (a smaller indicates stronger privacy) ϵ ϵ An adversary cannot infer whether the input is or with high confidence x ′ x (controlled by ) ϵ
Applications of LDP Apple: discovering popular Emojis under LDP Source: Source: https://developers.googleblog.com/2019/09/enabling-developers-and-organizations.html https://machinelearning.apple.com/2017/12/06/learning-with-privacy-at-scale.html
Limitations of LDP • LDP notion requires the same privacy budget for all pairs of possible inputs • Existing LDP protocols perturb the data in the same way for all inputs • However, in many practical scenarios, di ff erent inputs have di ff erent degrees of sensitiveness, thus require distinct levels of privacy protection. Scenarios High sensitiveness Low sensitiveness Website-click records Politics-related Facebook and Amazon Medical records HIV and cancer Anemia and headache • LDP protocols can provide excessive protection for some inputs that do not need such strong privacy (leading to an inferior privacy-utility tradeo ff )
Our Privacy Notion: Input-Discriminative LDP (ID-LDP) is the privacy budget ϵ x of an input x • Given a privacy budget set , a randomized mechanism satisfies ℰ = { ϵ x } x ∈ M -ID-LDP if and only if for any pair of inputs and output ℰ x , x ′ ∈  y ∈ Range ( M ) Pr( M ( x ) = y ) ) = y ) ⩽ e r ( ϵ x , ϵ x ′ ) is a function of two privacy budgets r ( ⋅ , ⋅ ) Pr( M ( x ′ • In this paper, we focus on an instantiation called MinID-LDP with r ( ϵ x , ϵ x ′ ) = min{ ϵ x , ϵ x ′ } Intuition: for any pair of inputs , MinID-LDP guarantees the adversary’s capability of distinguishing x , x ′ them would not exceed the bound controlled by both and (thus achieving di ff erentiated privacy ϵ x ϵ x ′ protection for each pair) MinID-LDP has Sequential Composition like LDP , which guarantees the overall privacy for a sequence of mechanisms.
Relationships with LDP 1. If for all , then -MinID-LDP -LDP x ∈  ℰ ⇔ ϵ ϵ x = ϵ 2. If , then -LDP -MinID-LDP min{ ℰ } ⩾ ϵ ⇒ ℰ ϵ 3. If , then -MinID-LDP -LDP ϵ ⩾ min{max{ ℰ }, 2 min{ ℰ }} ℰ ⇒ ϵ Factor 2 is due to the symmetric property of the indistinguishability definition . It captures user’s fine-grained MinID-LDP can be regarded as a relaxation compared with LDP privacy requirement , when LDP is too strong (i.e., provides overprotection).
Related Privacy Notions • Personalized LDP (PLDP) [Chen et al, ICDE’ 16] User-discriminative Distance-discriminative Input-discriminative ID-LDP PLDP GI or CLDP • Geo-indistinguishability (GI) [Andres et al, CCS’ 13] 𝑦 � ( ϵ � ) 𝑦 � ( ϵ � ) 𝜗 �� 𝜗𝑒 �� 𝜗 𝜗 𝑣 𝑦 � 𝑦 � 𝑦 � 𝑦 � 𝑦 � 𝑦 � 𝜗𝑒 �� 𝜗 �� 𝜗 𝜗 𝑣 • Condensed LDP (CLDP) [Gursoy et al, TDSC’ 19] 𝜗 �� 𝜗 𝑣 𝜗 𝜗 𝜗𝑒 �� 𝜗 �� 𝜗 𝑣 𝜗𝑒 �� 𝜗 �� 𝜗𝑒 �� 𝜗 𝑣 𝜗 • Utility-optimized LDP (ULDP) 𝜗 �� 𝜗 𝑦 � 𝑦 � 𝜗 𝑣 𝑦 � 𝑦 � ( ϵ � ) 𝑦 � ( ϵ � ) 𝜗𝑒 �� 𝑦 � 𝑦 � 𝑦 � [Murakami and Kawamoto, USENIX Security’ 19] 𝜗 all 𝜗 𝑣 : the privacy budget of 𝜗𝑒 �� : the privacy budget for 𝜗 � : the privacy budget of 𝑦 � y x ts, a user 𝑣 for all pairs of 𝜗 �� : the privacy budget of a pair a pair of inputs 𝑦 � , 𝑦 � 𝜗 � min�𝜗 � � inputs (different user Sensitive of inputs 𝑦 � , 𝑦 � for all users 𝑒 �� : distance between 𝑦 � , 𝑦 � ULDP 𝒴 S 𝒵 P 𝑦 � 𝜗 � may have different 𝜗 𝑣 ) inputs MinID-LDP: 𝜗 �� � min�𝜗 � , 𝜗 � � Privacy budget of a pair of inputs in several related notions Non- sensitive 𝒵 I 𝒴 N inputs -LDP ϵ ULDP does not guarantee the indistinguishability between the sensitive and non-sensitive inputs when observing some outputs, thus ULDP does not guarantee LDP .
Privacy Mechanism Design under ID-LDP Problem Statement • Data types: categorical (two cases: each user has only one item or an item-set) • Analysis Task/Application: frequency estimation (which is the building block for many applications) • Objectives: minimize MSE of frequency estimation while satisfying ID-LDP ID-LDP protocols perturb inputs with di ff erent probabilities Challenges • The number of variables (perturbation parameters) and privacy constraints (to be satisfied for any Example: assume domain size , m ) can be very large (especially for a large domain or item-set data). x , x ′ , y m 2 m 3 then variables and constraints • Objective function (MSE) is dependent on the unknown true frequencies; Preliminaries: LDP protocols • Randomized Response • Unary Encoding Our protocol satisfying ID-LDP is based on this
̂ LDP Protocol: Randomized Response • Randomized Response (RR) [Warner, 1965]: reports the truth with some probability (for binary answer: yes-or-no) Advanced versions: Unary Encoding, Generalized RR, … • Example: Is your annual income more than 100k? Truth x Response y Frequency of response y w.p. p 1 f = f − (1 − p ) Frequency estimation: 1 2 p − 1 w.p. 1 − p 0 𝔽 [ ̂ Unbiasedness: f ] = f * w.p. 1 − p 1 0 True frequency 0 w.p. p e ϵ p 1 − p = e ϵ To satisfy -LDP: (since ) ϵ p = 𝔽 [ f ] = f * p + (1 − f *)(1 − p ) = (2 p − 1) f * + (1 − p ) e ϵ + 1
LDP Protocol: Unary Encoding (UE) • To handle more general case (domain size is ), UE represents the input/output by multiple bits. d • Step 1. encode the input into vector with length x = [0, ⋯ ,0,1,0, ⋯ ,0] x = i d • Step 2. perturb each bit independently By minimizing the approximate MSE of frequency estimation RAPPOR OUE x [ k ] y [ k ] [Erlingsson et al, CCS’ 14] [Wang et al, USENIX Security’ 17] w.p. p 1 w.p. 0.5 1 To satisfy -LDP: ϵ 0 w.p. 0.5 w.p. 1 − p e ϵ /2 1 , p = q = 1 w.p. 1 − p w.p. q e ϵ /2 + 1 e ϵ + 1 0 0 w.p. p w.p. 1 − q
Overview of Our Protocol for ID-LDP Recall the two challenges: 1) High complexity of the optimization problem. 2) MSE depends on unknown true frequencies. For single-item data: IDUE (Input-Discriminative Unary Encoding) m 2 1. We propose Unary Encoding based protocol with only variables and constraints 2 m 2. We address the second challenge by developing three variants of optimization models (some models can further reduce the problem complexity) For item-set data: IDUE-PS (with Padding-and-Sampling protocol) 1. We extend IDUE for item-set data (by combining with a sampling protocol) to solve the scalability issue 2. We show IDUE-PS also satisfies MinID-LDP (if the base protocol IDUE satisfies MinID-LDP)
̂ ̂ Privacy Mechanism for Single-Item Data • Step 1, encode the input into x = [0, ⋯ ,0,1,0, ⋯ ,0] x [ k ] y [ k ] x = i w.p. a k 1 • Step 2, perturb each bit independently (with di ff erent probabilities) 1 w.p. 1 − a k 0 ∑ u y u [ i ] − nb i • Step 3, estimate frequency/counting by 1 w.p. b k c i = a i − b i 0 — number of users n 0 w.p. 1 − b k — perturbation probabilities a i , b i — true frequency c * a i (1 − b j ) nb i (1 − b j ) i (1 − a i − b i ) ( a i − b i ) 2 + c * b i (1 − a j ) ⩽ e r ( ϵ i , ϵ j ) ( ∀ i , j ) i MSE ̂ c i = Var [ ̂ c i ] = — estimated frequency c i a i − b i Benefits m 2 1. The optimization problem only has variables and constraints 2 m 2. The frequency estimator is unbiased, and its MSE can be composed by two terms, where only the second term is dependent on the true frequencies c * i
Comparison with LDP Protocols Example: a health organization is taking a survey which asks participants to return a response n perturbed from categories {HIV, anemia, headache, stomachache, toothache}, where HIV ( ) is i = 1 more sensitive, thus we set di ff erent privacy budgets, such as and . ϵ i = ln 6 ( i = 2, ⋯ ,5) ϵ 1 = ln 4 More perturbation Less perturbation noise for i = 1 noise for i ≠ 1 The total variance of IDUE is in a range because it depends on the distribution of true input data, and the upper bound is still less than that of RAPPOR and OUE.
Recommend
More recommend