KOBBI NISSIM
BGU/Harvard
Caltech, Spring 2015 Based on joint work with: Amos Beimel, Avrim Blum, Hai Brenner, Mark Bun, Cynthia Dwork, Shiva Kasiviswanathan, Homin Lee, Frank McSherry, Sofya Raskhodnikova, Adam Smith, Uri Stemmer, and Salil Vadhan.
L EARNING UNDER D IFFERENTIAL P RIVACY K OBBI N ISSIM BGU/Harvard - - PowerPoint PPT Presentation
L EARNING UNDER D IFFERENTIAL P RIVACY K OBBI N ISSIM BGU/Harvard Caltech, Spring 2015 Based on joint work with: Amos Beimel, Avrim Blum, Hai Brenner, Mark Bun, Cynthia Dwork, Shiva Kasiviswanathan, Homin Lee, Frank McSherry, Sofya
BGU/Harvard
Caltech, Spring 2015 Based on joint work with: Amos Beimel, Avrim Blum, Hai Brenner, Mark Bun, Cynthia Dwork, Shiva Kasiviswanathan, Homin Lee, Frank McSherry, Sofya Raskhodnikova, Adam Smith, Uri Stemmer, and Salil Vadhan.
Scurvy: a problem throughout human history Caused by vitamin C deficiency How much vitamin C is enough?
Thanks: Mark Bun
2 24 57 83 121 153 176 182 . . . . . .
0 1 2 3 . . . . . . T
Vitamin C level Thanks: Mark Bun
. . . . . .
0 1 2 3 . . . . . . T
2 24 57 83 121 153 176 182 Vitamin C level Thanks: Mark Bun x x x x
c: threshold func that is consistent with data
Thanks: Mark Bun . . . . . .
0 1 2 3 . . . . . . T
2 24 57 83 121 153 176 182 Vitamin C level x x x x
c: threshold func that is consistent with data
c: threshold func that is consistent with data Theorem: if n > n0 then c also “agrees” with
¡ n0 depends on learner accuracy and success probability ¡ n0 examples suffice independent of domain size!
Thanks: Mark Bun . . . 2 24 57 83 121 153 x x x x
The hypothesis threshold reveals someone’s data
With the right auxiliary information, could be
Thanks: Mark Bun . . . 2 24 57 83 121 153 x x x x
Idea: “noisy” choice of threshold hides individual
Thanks: Mark Bun . . . . . .
0 1 2 3 . . . . . . T
2 24 57 83 121 153 176 182 Vitamin C level x x x x
Brilliant, isn’t it? Time for coffee and cookies
But ...
Given:
¡ A dataset with sensitive information
How to:
¡ Compute and release functions of the dataset without
compromising individual privacy
queries answers )
Government, researchers, businesses (or) Malicious adversary
Server/agency Users Database X
Given:
¡ A dataset with sensitive information
How to:
¡ Compute and release functions of the dataset without
compromising individual privacy
Hospital: (based on past patients) predict whether a patient
is prone to scurvy, based on vitamin c level in her blood
Bank: (based on past customers) predict whether a new
customer is good/bad credit, based on her attributes
Example, label, presence in database may all be sensitive!
Evolved in [DN’03, EGS’03, DN’04, BDMN’05, DMNS’06, DKMMN’06] Intuition: to protect an individual make sure
𝜀≪1/𝑜
Pure: 𝜀=0 Approx.: 𝜀>0
𝑓↑𝜗 ≈1+𝜗. . Take 𝜗>1/𝑜 ¡,
utility!
LEARNING
Fresh point picked according to distribution P With high probability (over randomness of learner and distribution), a random point drawn according to P is “classified” correctly
Given distribution P over examples, labeled by c Hypothesis h is α-‑good if error(h) = Prx~P [h(x) ≠ c(x)] ≤ α C: a set of concepts {c: {0,1}d→{0, 1}} H: a set of hypotheses {h: {0,1}d→{0, 1}} Algorithm A PAC learns C with H if,
Given examples drawn from P, labeled by some c ∈ C: (x1,c(x1)),…,(xn,c(xn))
A outputs an α-‑good hypothesis h ∈ H w.p. 1-β ¡
Proper: C = H Fact: Θ(VC(C)) samples for PAC learning C (properly)
¡ VC(C) ≤ log|C|
PRIVATE LEARNING
Party line [KLNRS’08]: abstracts many of the computations
done over collections of sensitive information
Test-bed for ideas – problems and mitigation
¡ Learning theory tools useful for privacy [BLR’08, HR’10, …] ¡ Differential privacy implies generalization [M’?, DFHPRR’15,
BSSU’15, NS’15]
÷ In a sense, all differential privacy allows us is to learn!
Definition [KLNRS’08]:
¡ Algorithm A Private-PAC learns C with H if,
÷ A PAC learns C with H, and ÷ A is (ε, δ)–differentially private
1) Differential Privacy 2) If 𝑻 is consistent with some 𝒅↓𝒌
𝑸𝑷 𝑸𝑷𝑱𝑶𝑼↓ 𝑼↓𝒆 ={ ¡█□. ¡ ¡ ¡𝟏 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡𝒌 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡𝑼 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡} 𝒅↓𝒌 𝒅↓𝒌 (𝒚)=𝟐 ¡ ¡⟺ ¡ ¡𝒚=𝒌 𝒅↓𝒌 𝒅↓𝒌 (𝒚) 𝒚
1) Differential Privacy 2) If 𝑻 is consistent with some 𝒅↓𝒌
𝐔𝐈𝐒𝐅𝐅𝐓𝐈𝐏𝐌 𝐏𝐌𝐄↓𝒆 ↓𝒆 ={ ¡█□. ¡ ¡ ¡𝟏 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡𝒌 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡𝑼 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ } 𝒅↓𝒌 𝒅↓𝒌 (𝒚)=𝟐 ¡ ¡⟺ ¡ ¡𝒚<𝒌 𝒅↓𝒌 𝒅↓𝒌 (𝒚) 𝒚
Theorem [KLNRS 08]: Every finite concept class C
Generic Construction (based on the Exponential
¡ Define q(D,h) = # of xi’s correctly classified by h
¡ Output hypothesis h from C w.p. ≈ eεq(D,h)
q(D,h)=3 q(D,h)=4
Generic Construction (based on the Exponential
¡ Define q(D,h) = # of xi’s incorrectly classified by h
¡ Output hypothesis h from C w.p. ≈ e-εq(D,h)
Privacy:
¡ changing one example changes q(D,h) by at most 1 ¡ Probability of outputting h changes by a factor of at most eε
Utility:
¡ If h has error > α, probability of outputting h is at most e-εαn ¡ Union bound: probability of outputting some h with error > α at
most |C|e-εαn
¡ Suffices to take n =O(log |C|)
Brilliant, isn’t it? Time for coffee and cookies
But …
Fact: Proper Point/Threshold learner with O(1)
Generic construction of private learners results in
Why do we care?
¡ Want private learners to be as efficient as non-private ones ¡ Generic construction fails when domain infinite
Thm [BKN 10]: Any proper pure-private learner of
2d Is this gap essential?
Recall: O(log|C|) examples to beat union bound in
Idea: what if we choose the outcome hypothesis
Deterministic Representation for class 𝓓:
¡ DRep: A hypothesis class ℋ s.t.
÷ for every 𝑑∈𝒟 and distribution P over examples, there exists a
hypothesis ℎ∈ℋ s.t. errorP,c(h) ≤ ¼.
÷ The size of DRep is ln |ℋ|
¡ POINTd: DRep = Θ(log log T)
÷ Yields improper learner with sample complexity O(log log T)
Probabilistic Representation for class 𝓓:
¡ Rep: List of hypothesis classes ℋ1,ℋ2,…,ℋ𝑠 s.t.
÷ for every 𝑑∈𝒟 and distribution P over examples, ÷ w.p. ¾ , a randomly chosen ℋ𝑗 contains a hypothesis ℎ∈ℋ s.t.
errorP,c(h) ≤ ¼.
÷ The size of Rep is defined as maxi (ln |ℋ𝑗|)
𝑆𝑓𝑞𝐸𝑗𝑛(𝒟): the size of C’s minimal probabilistic representation
Define 𝑆𝑓𝑞𝐸𝑗𝑛(𝒟): the size of C’s minimal probabilistic
Theorem: Θ(𝑆𝑓𝑞𝐸𝑗𝑛(𝒟)) samples are necessary and
¡ Analogous to the VC dimension for non-private learners
thresholds Proper
Θ(log(𝑈) ) [KLNRS’08, BKN’10]
Improper
Θ(log(𝑈)) [FX’13]
points Proper
Θ(log(𝑈)) [KLNRS’08, BKN’10]
Improper
Θ(1) [BKN’10, BNS’13]
Concept class learner Sample complexity C Proper
𝑃(log|𝐷| ) [KLNRS’08]
Improper
Θ(𝑆𝑓𝑞𝐸𝑗𝑛(𝐷)) [BNS’13]
M I T I G A T I N G T H E S A M P L E C O M P L E X I T Y O F P R I V A T E L E A R N E R S
Improper learning [BKN’10, BNS’13] When labeled examples expensive, unlabeled
¡ Active learning [BF’15, BNS’15] ¡ Semi-supervised learning [BNS’15]
Approximate privacy [BNS’14, BNSV’15]
Input: batches of labeled and unlabeled samples Generic construction:
¡ The construction uses O(|C|) unlabeled examples
Boosting the labeled sample complexity:
¡ While maintaining the unlabeled sample complexity
LEARNER 𝓑
Base learner 𝓑 with sample
complexity 𝒐.
Input: Database 𝑻 of size 𝒐,
partially labeled
1.
Let 𝑰 be the set of all dichotomies over 𝑻 realized by the target concept class 𝑫
2.
Choose 𝒊∈𝑰 using the exponential mechanism with the labeled portion of 𝑻
3.
Relabel 𝑻 using 𝒊,
4.
Execute 𝓑
𝒚↓ 𝒚↓𝟐 , 𝒛↓ 𝒛↓𝟐 𝒚↓ 𝒚↓𝟑 , 𝒛↓ 𝒛↓𝟑 𝒚↓ 𝒚↓𝟒 , 𝒛↓ 𝒛↓𝟒 𝒚↓ 𝒚↓𝟓 , 𝒛↓ 𝒛↓𝟓 ⋮ 𝒚↓ 𝒚↓𝒖 , 𝒛↓ 𝒛↓𝒖 𝒚↓ 𝒚↓𝒖+𝟐 , ? 𝒚↓ 𝒚↓𝒖+𝟑 , ? 𝒚↓ 𝒚↓𝒖+𝟒 , ? ⋮ 𝒚↓𝒐 𝒐 , ? 𝒛 𝒛 ↓𝟑 𝒛 𝒛 ↓𝒖 +𝟐 𝒛 𝒛 ↓𝒖 +𝟑 𝒛 𝒛 ↓𝒖 +𝟒 𝒛 ↓𝒐 𝒐
LEARNER 𝓑
Base learner 𝓑 with sample
complexity 𝒐.
Input: Database 𝑻 of size 𝒐,
partially labeled
1.
Let 𝑰 be the set of all dichotomies over 𝑻 realized by the target concept class 𝑫
2.
Choose 𝒊∈𝑰 using the exponential mechanism with the labeled portion of 𝑻
3.
Relabel 𝑻 using 𝒊
4.
Execute 𝓑
∃𝐠∈𝐈 s.t. 𝐟𝐬𝐬𝐩𝐬 𝐩𝐬↓𝐓 (𝐠)=𝟏 s.t. 𝐟𝐬𝐬𝐩𝐬 𝐩𝐬↓𝐓 (𝐠)=𝟏 If 𝐓 contains ≈𝐖𝐃 𝐖𝐃(𝐃) contains ≈𝐖𝐃 𝐖𝐃(𝐃) 𝐦𝐩𝐡 𝐩𝐡|𝑻| labeled exampless then 𝐢 is is close to the target concept 𝓑 returns a hypothesis returns a hypothesis that is close to 𝐢
LEARNER 𝓑
Base learner 𝓑 with sample
complexity 𝒐.
Input: Database 𝑻 of size 𝒐,
partially labeled
1.
Let 𝑰 be the set of all dichotomies over 𝑻 realized by the target concept class 𝑫
2.
Choose 𝒊∈𝑰 using the exponential mechanism with the labeled portion of 𝑻
3.
Relabel 𝑻 using 𝒊
4.
Execute 𝓑
Difficulty: H depends on S! Outputting h would breach privacy! Solution: use h to relabel sample, analyze distribution of relabeled databases
. . . . . .
0 1 2 3 . . . . . . T
2 24 57 83 121 153 176 182 Vitamin C level x x x x
Seems fundamental and simple, yet disturbing
[BNSV’15] Get four in the price of one:
¡ Distribution learning: ÷ D – unknown distrib over X with cumulative FD ÷ Goal: Given oracle access to D, find F: X à [0,1] with small
|F(x)-FD(x)| for all x ∈ X
¡ Query release: ÷ Given points (x1,…,xn) ∈ X, output data structure approximating
|{i : xi < z}|/n for all z ∈ X
¡ (Approximate) Median: ÷ Given points (x1,…,xn) ∈ X, output z such that (approx.) half the
points are smaller/greater than z
¡ Interior point: ÷ Given points (x1,…,xn) ∈ X, output z between min and max points
New results [BNSV’15]
Pure privacy:
¡ Θ (log T) samples [KLNRS’08, BNS’10, FX’13]
Approx. privacy:
¡ O(8 log*T) samples [BNS’14]
Non-privately:
¡ O(1) samples
Is this gap essential?
Observe: Impossible to have 𝒐=𝟐 when T ≥ 2 Induction:
¡ Approx. dp mechanism M for solving IP over T(n+1) w/ n+1 samples à Approx. dp mechanism M’ for solving IP over T(n) w/ n samples ¡ Where T(n+1) = b(n)T(n)
𝟐 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡𝟑 𝟐 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡𝟑
Output 1 w.p. ≥3/4 Output 1 w.p. ≥3/4 −𝜀/𝑓↑𝜗 >
1/4
M’
𝑦↓1 ,𝑦↓2 ,…, ¡𝑦↓𝑜 ∈[𝑈(𝑜)] 𝑦
If M approx private so is M’ Suppose M succeeds
¡ 𝑨 ≥min(𝑨↓1 ,…, ¡𝑨↓𝑜 )
¡ Hence, 𝑦 ≥min(𝑦↓1 ,…, ¡𝑦↓𝑜 )
Let 𝑥=max(𝑦↓1 ,…, ¡𝑦↓𝑜 )
¡ If 𝑦 >𝑥 then 𝑨 reveals 𝑧↓0↑𝑥
+1
¡ By approx. privacy, this can
happen with probability at most
𝑓↑𝜗 /𝑐 +𝜀
y↓i↑1 y↓i↑2 …𝑧↓𝑗↑𝑈(𝑜) ∈↓𝑆 𝑐↑𝑈(𝑜) 𝑨↓𝑗 =𝑧↓0↑1 𝑧↓0↑2 …𝑧↓0↑𝑦↓𝑗 𝑧↓𝑗↑ 𝑦↓𝑗 +1 …𝑧↓𝑗↑𝑈(𝑜) 𝑨↓0 =𝑧↓0↑1 𝑧↓0↑2 … 𝑧↓0↑𝑈(𝑜) 𝑨↓0 ,𝑨↓1 ,…, ¡𝑨↓𝑜 ∈[𝑈(𝑜+1)]
M
𝑨 𝑦 =|𝑞𝑠𝑓𝑔𝑗𝑦(𝑨 , ¡𝑨↓0 )|
Can be stated using a (generalization of)
¡ Fingerprinting codes used to prove other lowerbounds in
differential privacy [BUV14, DTTZ14, BST14]
Other bounds:
¡ Approx private learning of d-dim threshold funcs over [T]d
is Ω(𝑒log↑∗ 𝑈 )=Ω(𝑊𝐷log↑∗ 𝑈 )
¡ Improper pure-privacy learning of POINT over countably
infinite domains
÷ [BNS13]: O(1) samples, but infinite description length
hypotheses
÷ New: This is inherent!
[DMNS 05] First private learning algorithms. SQ
[KLNRS 08] Define private learning, characterize
[BKN 10] Sample complexity of private learning. [CH 11] Learning in continuous domain, label privacy. [CM 08, …] Machine learning. [BLR 08, DNRRV 09, …] Synthetic Data. [DRV 10] Private Boosting.
Private PAC learning exhibits a lot of complexity:
¡ Even for simple complexity classes like points and thresholds
÷ Behaves very differently from non-private learning!
¡ A variety of applicable strategies:
÷ Improper vs. proper for pure DP
¢ RepDim characterizes sample complexity
÷ Semi-supervised
¢ VC labeled samples suffice
÷ Active learning ÷ Label privacy
¢ VC characterizes sample complexity
÷ Approx. vs. pure DP
¢ δ > 0 helps! ¢ Open: improper, approx. privacy ¢ Characterization of sample complexity under approx. privacy
¡ Open: time complexity of private PAC learning
Oh, well ! Time for coffee and cookies
Order Revealing Encryption: Learn {<, >, =} but nothing else Efficiently learnable, but not efficiently privately learnable
Thanks: Mark Bun . . . . . .
0 1 2 3 . . . . . . T
2 24 57 83 121 153 176 182 x x x x