Providing Input-Discriminative Protection for Local Differential - PowerPoint PPT Presentation

Providing Input-Discriminative Protection for Local Differential Privacy Xiaolan Gu * , Ming Li * , Li Xiong # and Yang Cao † *University of Arizona # Emory University † Kyoto University IEEE International Conference on Data Engineering (ICDE), April 2020

Overview • Background on LDP • Our Privacy Notion: ID-LDP • Our Privacy Mechanism on ID-LDP • Evaluation • Conclusion

Background • Companies are collecting our private data to provide better services (Google, Facebook, Apple, Yahoo, Uber, …) • Yahoo: massive data breaches impacted 3 billion user account, 2013 • Facebook: 267 million users’ data has reportedly been leaked, 2019 • However, privacy concerns arise • … • Possible solution: locally private data collection model Upload perturbed data Randomized mechanism y 1 M x i y i Analysis y 2 Raw Perturbed ⋮ data data Untrusted y n server

Local Differential Privacy (LDP) [Duchi et al, FOCS’ 13] A mechanism satisfies -LDP if and only if for any pair of inputs x , x ′ M ϵ and any output y Pr( M ( x ) = y ) ) = y ) ⩽ e ϵ Pr( M ( x ′ • : the possible input (raw) data (generated by the user) x , x ′ • : the output (perturbed) data (public and known by adversary) y • : privacy budget (a smaller indicates stronger privacy) ϵ ϵ An adversary cannot infer whether the input is or with high confidence x ′ x (controlled by ) ϵ

Applications of LDP Apple: discovering popular Emojis under LDP Source: Source: https://developers.googleblog.com/2019/09/enabling-developers-and-organizations.html https://machinelearning.apple.com/2017/12/06/learning-with-privacy-at-scale.html

Limitations of LDP • LDP notion requires the same privacy budget for all pairs of possible inputs • Existing LDP protocols perturb the data in the same way for all inputs • However, in many practical scenarios, di ff erent inputs have di ff erent degrees of sensitiveness, thus require distinct levels of privacy protection. Scenarios High sensitiveness Low sensitiveness Website-click records Politics-related Facebook and Amazon Medical records HIV and cancer Anemia and headache • LDP protocols can provide excessive protection for some inputs that do not need such strong privacy (leading to an inferior privacy-utility tradeo ff )

Our Privacy Notion: Input-Discriminative LDP (ID-LDP) is the privacy budget ϵ x of an input x • Given a privacy budget set , a randomized mechanism satisfies ℰ = { ϵ x } x ∈𝒠 M -ID-LDP if and only if for any pair of inputs and output ℰ x , x ′ ∈ 𝒠 y ∈ Range ( M ) Pr( M ( x ) = y ) ) = y ) ⩽ e r ( ϵ x , ϵ x ′ ) is a function of two privacy budgets r ( ⋅ , ⋅ ) Pr( M ( x ′ • In this paper, we focus on an instantiation called MinID-LDP with r ( ϵ x , ϵ x ′ ) = min{ ϵ x , ϵ x ′ } Intuition: for any pair of inputs , MinID-LDP guarantees the adversary’s capability of distinguishing x , x ′ them would not exceed the bound controlled by both and (thus achieving di ff erentiated privacy ϵ x ϵ x ′ protection for each pair) MinID-LDP has Sequential Composition like LDP , which guarantees the overall privacy for a sequence of mechanisms.

Relationships with LDP 1. If for all , then -MinID-LDP -LDP x ∈ 𝒠 ℰ ⇔ ϵ ϵ x = ϵ 2. If , then -LDP -MinID-LDP min{ ℰ } ⩾ ϵ ⇒ ℰ ϵ 3. If , then -MinID-LDP -LDP ϵ ⩾ min{max{ ℰ }, 2 min{ ℰ }} ℰ ⇒ ϵ Factor 2 is due to the symmetric property of the indistinguishability definition . It captures user’s fine-grained MinID-LDP can be regarded as a relaxation compared with LDP privacy requirement , when LDP is too strong (i.e., provides overprotection).

Related Privacy Notions • Personalized LDP (PLDP) [Chen et al, ICDE’ 16] User-discriminative Distance-discriminative Input-discriminative ID-LDP PLDP GI or CLDP • Geo-indistinguishability (GI) [Andres et al, CCS’ 13] 𝑦 � ( ϵ � ) 𝑦 � ( ϵ � ) 𝜗 �� 𝜗𝑒 �� 𝜗 𝜗 𝑣 𝑦 � 𝑦 � 𝑦 � 𝑦 � 𝑦 � 𝑦 � 𝜗𝑒 �� 𝜗 �� 𝜗 𝜗 𝑣 • Condensed LDP (CLDP) [Gursoy et al, TDSC’ 19] 𝜗 �� 𝜗 𝑣 𝜗 𝜗 𝜗𝑒 �� 𝜗 �� 𝜗 𝑣 𝜗𝑒 �� 𝜗 �� 𝜗𝑒 �� 𝜗 𝑣 𝜗 • Utility-optimized LDP (ULDP) 𝜗 �� 𝜗 𝑦 � 𝑦 � 𝜗 𝑣 𝑦 � 𝑦 � ( ϵ � ) 𝑦 � ( ϵ � ) 𝜗𝑒 �� 𝑦 � 𝑦 � 𝑦 � [Murakami and Kawamoto, USENIX Security’ 19] 𝜗 all 𝜗 𝑣 : the privacy budget of 𝜗𝑒 �� : the privacy budget for 𝜗 � : the privacy budget of 𝑦 � y x ts, a user 𝑣 for all pairs of 𝜗 �� : the privacy budget of a pair a pair of inputs 𝑦 � , 𝑦 � 𝜗 � min�𝜗 � � inputs (different user Sensitive of inputs 𝑦 � , 𝑦 � for all users 𝑒 �� : distance between 𝑦 � , 𝑦 � ULDP 𝒴 S 𝒵 P 𝑦 � 𝜗 � may have different 𝜗 𝑣 ) inputs MinID-LDP: 𝜗 �� min�𝜗 � , 𝜗 � � Privacy budget of a pair of inputs in several related notions Non- sensitive 𝒵 I 𝒴 N inputs -LDP ϵ ULDP does not guarantee the indistinguishability between the sensitive and non-sensitive inputs when observing some outputs, thus ULDP does not guarantee LDP .

Privacy Mechanism Design under ID-LDP Problem Statement • Data types: categorical (two cases: each user has only one item or an item-set) • Analysis Task/Application: frequency estimation (which is the building block for many applications) • Objectives: minimize MSE of frequency estimation while satisfying ID-LDP ID-LDP protocols perturb inputs with di ff erent probabilities Challenges • The number of variables (perturbation parameters) and privacy constraints (to be satisfied for any Example: assume domain size , m ) can be very large (especially for a large domain or item-set data). x , x ′ , y m 2 m 3 then variables and constraints • Objective function (MSE) is dependent on the unknown true frequencies; Preliminaries: LDP protocols • Randomized Response • Unary Encoding Our protocol satisfying ID-LDP is based on this

̂ LDP Protocol: Randomized Response • Randomized Response (RR) [Warner, 1965]: reports the truth with some probability (for binary answer: yes-or-no) Advanced versions: Unary Encoding, Generalized RR, … • Example: Is your annual income more than 100k? Truth x Response y Frequency of response y w.p. p 1 f = f − (1 − p ) Frequency estimation: 1 2 p − 1 w.p. 1 − p 0 𝔽 [ ̂ Unbiasedness: f ] = f * w.p. 1 − p 1 0 True frequency 0 w.p. p e ϵ p 1 − p = e ϵ To satisfy -LDP: (since ) ϵ p = 𝔽 [ f ] = f * p + (1 − f *)(1 − p ) = (2 p − 1) f * + (1 − p ) e ϵ + 1

LDP Protocol: Unary Encoding (UE) • To handle more general case (domain size is ), UE represents the input/output by multiple bits. d • Step 1. encode the input into vector with length x = [0, ⋯ ,0,1,0, ⋯ ,0] x = i d • Step 2. perturb each bit independently By minimizing the approximate MSE of frequency estimation RAPPOR OUE x [ k ] y [ k ] [Erlingsson et al, CCS’ 14] [Wang et al, USENIX Security’ 17] w.p. p 1 w.p. 0.5 1 To satisfy -LDP: ϵ 0 w.p. 0.5 w.p. 1 − p e ϵ /2 1 , p = q = 1 w.p. 1 − p w.p. q e ϵ /2 + 1 e ϵ + 1 0 0 w.p. p w.p. 1 − q

Overview of Our Protocol for ID-LDP Recall the two challenges: 1) High complexity of the optimization problem. 2) MSE depends on unknown true frequencies. For single-item data: IDUE (Input-Discriminative Unary Encoding) m 2 1. We propose Unary Encoding based protocol with only variables and constraints 2 m 2. We address the second challenge by developing three variants of optimization models (some models can further reduce the problem complexity) For item-set data: IDUE-PS (with Padding-and-Sampling protocol) 1. We extend IDUE for item-set data (by combining with a sampling protocol) to solve the scalability issue 2. We show IDUE-PS also satisfies MinID-LDP (if the base protocol IDUE satisfies MinID-LDP)

̂ ̂ Privacy Mechanism for Single-Item Data • Step 1, encode the input into x = [0, ⋯ ,0,1,0, ⋯ ,0] x [ k ] y [ k ] x = i w.p. a k 1 • Step 2, perturb each bit independently (with di ff erent probabilities) 1 w.p. 1 − a k 0 ∑ u y u [ i ] − nb i • Step 3, estimate frequency/counting by 1 w.p. b k c i = a i − b i 0 — number of users n 0 w.p. 1 − b k — perturbation probabilities a i , b i — true frequency c * a i (1 − b j ) nb i (1 − b j ) i (1 − a i − b i ) ( a i − b i ) 2 + c * b i (1 − a j ) ⩽ e r ( ϵ i , ϵ j ) ( ∀ i , j ) i MSE ̂ c i = Var [ ̂ c i ] = — estimated frequency c i a i − b i Benefits m 2 1. The optimization problem only has variables and constraints 2 m 2. The frequency estimator is unbiased, and its MSE can be composed by two terms, where only the second term is dependent on the true frequencies c * i

Comparison with LDP Protocols Example: a health organization is taking a survey which asks participants to return a response n perturbed from categories {HIV, anemia, headache, stomachache, toothache}, where HIV ( ) is i = 1 more sensitive, thus we set di ff erent privacy budgets, such as and . ϵ i = ln 6 ( i = 2, ⋯ ,5) ϵ 1 = ln 4 More perturbation Less perturbation noise for i = 1 noise for i ≠ 1 The total variance of IDUE is in a range because it depends on the distribution of true input data, and the upper bound is still less than that of RAPPOR and OUE.

Providing Input-Discriminative Protection for Local Differential - PowerPoint PPT Presentation

Providing Input-Discriminative Protection for Local Differential Privacy Xiaolan Gu * , Ming Li * , Li Xiong # and Yang Cao *University of Arizona # Emory University Kyoto University IEEE International Conference on Data Engineering

Discriminative Models Joakim Nivre Uppsala University Department of Linguistics and Philology

File Input and Output File Input and Output 1 / 9 File input/output input function reads values

Input Input devices Text entry Positional input Input Devices 1 iPod Wheel Input Devices 2

Generative vs. discriminative Generative Discriminative Belief network A is more More

Tra ffi c Management as a Service | Ghent, Belgium INPUT PROCESS OUTPUT INPUT PROCESS OUTPUT

Input Input devices Text entry Positional input Input Devices 1 MacBook Wheel (The Onion) -

Discriminative word alignment by learning the Discriminative word alignment by learning the

Three models for discriminative machine Three models for discriminative machine translation using

Input Devices Managing text and positional input 1 CS 349 - Input Devices iPod Wheel 2 CS 349

Expanding the Reach of Fuzzing Caroline Lemieux September 8 th , 2020 Fuzzcon Europe

Wayland Input Methods Michael Hasselmann Openismus GmbH Wayland Input Methods Input methods?

Generative Models for Discriminative Problems Chris Dyer DeepMind ASRU 2017 December 19,

problems of direct input and solutions Input devices vs. Finger-based input Indirect vs. Direct

Exercise 1: Basic Input Exercise 1: Basic Input FLUKA Beginners Course Exercise 1: Basic Input

17. Recursion 2 Input: 3 + 5 * 20 Output: 103 Input: (3 + 5) * 20 Output: 160 Input: -(3 + 5) + 20

7. Java Input/Output User Input/Console Output, File Input and Output (I/O) 133 User Input (half

An Integrated Architecture for Generating Parenthetical Constructions Eva Banik The Open

Announcement HW 1 out TODAY Watch your email 1 What is Machine Learning? (Formally) 2

1. Evolution and Classification 1.1 Origin of Life and Plants 1.2 Animal Evolution 1.3 Human

INTRODUCTION TO GENETIC EPIDEMIOLOGY (EPID0754) Prof. Dr. Dr. K. Van Steen Introduction to

Energy management Water and steam Exergy Exergy by heat transfer Exergy in the case steam

Understanding the Economic Imperative of Energy Efficiency * John A. Skip Laitner

semiconductors [Fonstad, Sze02, Ghione] Semiconductors Conducibility: - Insulators: s <10-

Random Forest Applied Multivariate Statistics Spring 2012 Overview Intuition of Random

Sambuz

Useful Links

Newsletter

Mail Us

Providing Input-Discriminative Protection for Local Differential - PowerPoint PPT Presentation

Providing Input-Discriminative Protection for Local Differential Privacy Xiaolan Gu * , Ming Li * , Li Xiong # and Yang Cao *University of Arizona # Emory University Kyoto University IEEE International Conference on Data Engineering

Discriminative Models Joakim Nivre Uppsala University Department of Linguistics and Philology

File Input and Output File Input and Output 1 / 9 File input/output input function reads values

Input Input devices Text entry Positional input Input Devices 1 iPod Wheel Input Devices 2

Generative vs. discriminative Generative Discriminative Belief network A is more More

Tra ffi c Management as a Service | Ghent, Belgium INPUT PROCESS OUTPUT INPUT PROCESS OUTPUT

Input Input devices Text entry Positional input Input Devices 1 MacBook Wheel (The Onion) -

Discriminative word alignment by learning the Discriminative word alignment by learning the

Three models for discriminative machine Three models for discriminative machine translation using

Input Devices Managing text and positional input 1 CS 349 - Input Devices iPod Wheel 2 CS 349

Expanding the Reach of Fuzzing Caroline Lemieux September 8 th , 2020 Fuzzcon Europe

Wayland Input Methods Michael Hasselmann Openismus GmbH Wayland Input Methods Input methods?

Generative Models for Discriminative Problems Chris Dyer DeepMind ASRU 2017 December 19,

problems of direct input and solutions Input devices vs. Finger-based input Indirect vs. Direct

Exercise 1: Basic Input Exercise 1: Basic Input FLUKA Beginners Course Exercise 1: Basic Input

17. Recursion 2 Input: 3 + 5 * 20 Output: 103 Input: (3 + 5) * 20 Output: 160 Input: -(3 + 5) + 20

7. Java Input/Output User Input/Console Output, File Input and Output (I/O) 133 User Input (half

An Integrated Architecture for Generating Parenthetical Constructions Eva Banik The Open

Announcement HW 1 out TODAY Watch your email 1 What is Machine Learning? (Formally) 2

1. Evolution and Classification 1.1 Origin of Life and Plants 1.2 Animal Evolution 1.3 Human

INTRODUCTION TO GENETIC EPIDEMIOLOGY (EPID0754) Prof. Dr. Dr. K. Van Steen Introduction to

Energy management Water and steam Exergy Exergy by heat transfer Exergy in the case steam

Understanding the Economic Imperative of Energy Efficiency * John A. Skip Laitner

semiconductors [Fonstad, Sze02, Ghione] Semiconductors Conducibility: - Insulators: s &lt;10-

Random Forest Applied Multivariate Statistics Spring 2012 Overview Intuition of Random

Sambuz

Useful Links

Newsletter

Mail Us

semiconductors [Fonstad, Sze02, Ghione] Semiconductors Conducibility: - Insulators: s <10-