l earning under


L EARNING UNDER D IFFERENTIAL P RIVACY K OBBI N ISSIM BGU/Harvard Caltech, Spring 2015 Based on joint work with: Amos Beimel, Avrim Blum, Hai Brenner, Mark Bun, Cynthia Dwork, Shiva Kasiviswanathan, Homin Lee, Frank McSherry, Sofya

  1. L EARNING UNDER D IFFERENTIAL P RIVACY K OBBI N ISSIM BGU/Harvard Caltech, Spring 2015 Based on joint work with: Amos Beimel, Avrim Blum, Hai Brenner, Mark Bun, Cynthia Dwork, Shiva Kasiviswanathan, Homin Lee, Frank McSherry, Sofya Raskhodnikova, Adam Smith, Uri Stemmer, and Salil Vadhan.

  2. L ET ’ S DO SOME SWEET SCIENCE ! — Scurvy: a problem throughout human history — Caused by vitamin C deficiency — How much vitamin C is enough? Thanks: Mark Bun

  3. S O YOU HAVE SOME DATA … 2 57 83 24 153 121 176 182 . . . . . . 0 1 2 3 . . . . . . T Vitamin C level Thanks: Mark Bun

  4. S O YOU HAVE SOME DATA … — c: threshold func that is consistent with data 2 57 83 24 x x x x 153 121 176 182 . . . . . . 0 1 2 3 . . . . . . T Vitamin C level Thanks: Mark Bun

  5. S O YOU HAVE SOME DATA … — c: threshold func that is consistent with data 2 57 83 24 x x x x 153 121 176 182 . . . . . . 0 1 2 3 . . . . . . T Vitamin C level Thanks: Mark Bun

  6. S O YOU HAVE SOME DATA … — c: threshold func that is consistent with data — Theorem: if n > n 0 then c also “agrees” with underlying distribution ¡ n 0 depends on learner accuracy and success probability ¡ n 0 examples suffice independent of domain size! 2 24 57 83 x x x x 121 153 Thanks: Mark Bun . . .

  7. W HAT ’ S THE P ROBLEM ? — The hypothesis threshold reveals someone’s data point! — With the right auxiliary information, could be linked to Shiva! 2 24 57 83 x x x x 121 153 Thanks: Mark Bun . . .

  8. S AVING S HIVA ’ S P RIVACY — Idea: “noisy” choice of threshold hides individual contribution! 2 57 83 24 x x x x 153 121 176 182 . . . . . . 0 1 2 3 . . . . . . T Vitamin C level Thanks: Mark Bun

  9. Had it been the year 2000 (AD) …

  10. Had it been the year 2000 (AD) …

  11. Brilliant, isn’t it? Time for coffee and cookies THANK YOU! ? But ... ? ?


  13. D ATA P RIVACY – T HE P ROBLEM — Given: ¡ A dataset with sensitive information — How to: ¡ Compute and release functions of the dataset without compromising individual privacy Database X Server/agency Users x 1 x 2 Government, answers ) ( queries x 3 researchers, A businesses � (or) x n-1 Malicious x n adversary

  14. D ATA P RIVACY – T HE P ROBLEM — Given: ¡ A dataset with sensitive information — How to: ¡ Compute and release functions of the dataset without compromising individual privacy — Hospital: (based on past patients) predict whether a patient is prone to scurvy, based on vitamin c level in her blood — Bank: (based on past customers) predict whether a new customer is good/bad credit, based on her attributes — Example, label, presence in database may all be sensitive!

  15. Differential Privacy [DMNS 06] — Evolved in [DN’03, EGS’03, DN’04, BDMN’05, DMNS’06, DKMMN’06] — Intuition: to protect an individual make sure that changing her record does not change the output distribution (by too much) D D’ x 1 x 1 x 2 x 2 x 3 A( D ) x’ 3 A( D’ ) A A � � x n-1 x n-1 x n x n

  16. Differential Privacy [DMNS 06] Definition: A algorithm A is ( ε , δ )-differentially private if: for all neighboring databases D , D’ and for all sets of answers S: Pr[A( D ) ∈ S] ≤ e ε Pr[A( D’ ) ∈ S] + δ D D’ x 1 x 1 x 2 x 2 x 3 A( D ) x’ 3 A( D’ ) A A � � x n-1 x n-1 x n x n

  17. Differential Privacy [DMNS 06] Definition: A algorithm A is ( ε , δ )-differentially private if: for all neighboring databases D , D’ and for all sets of answers S: Pr[A( D ) ∈ S] ≤ e ε Pr[A( D’ ) ∈ S] + δ ​𝑓↑𝜗 ≈1+ 𝜗 . . 𝜀 ≪ ​ 1 /𝑜 Take 𝜗 > ​ 1 /𝑜 ¡ , Pure: 𝜀 =0 otherwise, no Approx.: 𝜀 >0 utility!


  19. PAC Model [Valiant 84] Fresh point picked according to distribution P With high probability (over randomness of learner and distribution), a random point drawn according to P is “classified” correctly A distribution P on X. Each point in X labeled 0/1 Samples drawn according to P

  20. PAC Learning: Definition — Given distribution P over examples, labeled by c — Hypothesis h is α-­‑ good if error(h) = Pr x~P [h(x) ≠ c(x)] ≤ α — C: a set of concepts {c: {0,1} d → {0, 1}} — H: a set of hypotheses {h: {0,1} d → {0, 1}} — Algorithm A PAC learns C with H if, Given examples drawn from P, labeled by some c ∈ C: — (x 1 ,c(x 1 )),…,(x n ,c(x n )) A outputs an α-­‑ good hypothesis h ∈ H w.p. 1 - β ¡ — — Proper: C = H — Fact: Θ (VC(C)) samples for PAC learning C (properly) ¡ VC(C) ≤ log|C|


  22. W HY P RIVATE L EARNING ? — Party line [KLNRS’08]: abstracts many of the computations done over collections of sensitive information — Test-bed for ideas – problems and mitigation — Learning intimately related with differential privacy ¡ Learning theory tools useful for privacy [BLR’08, HR’10, …] ¡ Differential privacy implies generalization [M’?, DFHPRR’15, BSSU’15, NS’15] ÷ In a sense, all differential privacy allows us is to learn!

  23. P RIVATE L EARNING — Definition [KLNRS’08]: ¡ Algorithm A Private- PAC learns C with H if, ÷ A PAC learns C with H, and ÷ A is ( ε , δ )–differentially private

  24. Example 1: Privately Learning Points E 𝒅↓𝒌 ( 𝒚 ) ​𝒅↓𝒌 𝑼↓𝒆 = { ¡ █□ ⁠ . ⁠ ¡ ¡ ¡𝟏 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ 𝒌 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ 𝑼 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ } 𝑸𝑷𝑱𝑶​𝑼↓ 𝑸𝑷 𝒅↓𝒌 (𝒚) =𝟐 ¡ ¡⟺ ¡ ¡ 𝒚 = 𝒌 ​𝒅↓𝒌 𝒚 Given labeled examples 𝑻 = ​(​𝒚↓ ↑𝒏 provide: 𝒚↓𝒋 , ​𝒛↓ 𝒛↓𝒋 )↓𝒋 =𝟐 ↑𝒏 1) Differential Privacy 2) If 𝑻 is consistent with some ​𝒅↓𝒌 ↓𝒆 , then 𝒅↓𝒌 ∈𝐐𝐏𝐉 𝐏𝐉𝐎 ​ 𝐔 ↓𝒆 w.h.p. outputs an 𝒊 s.t. 𝒛↓𝒋 }| ≤ 𝜷 𝒇𝒔𝒔𝒑​𝒔↓ 𝒇𝒔 𝒔↓𝑻 (𝒊) = ​ 𝟐 /𝒏 /𝒏 |{𝒋 ¡: 𝒊(​𝒚↓ 𝒚↓𝒋 ) ≠ ​𝒛↓

  25. Example 2: Privately Learning Thresholds ↓𝒆 = { ¡ █□ ⁠ . ⁠ ¡ ¡ ¡𝟏 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ 𝒌 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ 𝑼 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ } 𝐔𝐈𝐒𝐅𝐅𝐓𝐈𝐏𝐌 𝐏𝐌 ​ 𝐄 ↓𝒆 𝒅↓𝒌 ( 𝒚 ) ​𝒅↓𝒌 𝒅↓𝒌 (𝒚) =𝟐 ¡ ¡⟺ ¡ ¡ 𝒚 < 𝒌 ​𝒅↓𝒌 𝒚 Given labeled examples 𝑻 = ​(​𝒚↓ ↑𝒏 provide: 𝒚↓𝒋 , ​𝒛↓ 𝒛↓𝒋 )↓𝒋 =𝟐 ↑𝒏 1) Differential Privacy 2) If 𝑻 is consistent with some ​𝒅↓𝒌 ↓𝒆 , 𝒅↓𝒌 ∈𝐔𝐈𝐒𝐅𝐓𝐈𝐏𝐌 𝐏𝐌 ​ 𝐄 ↓𝒆 then w.h.p. outputs an 𝒊 s.t. 𝒛↓𝒋 }| ≤ 𝜷 𝒇𝒔 𝒇𝒔𝒔𝒑​𝒔↓ 𝒔↓𝑻 (𝒊) = ​ 𝟐 /𝒏 /𝒏 |{𝒋 ¡: 𝒊(​𝒚↓ 𝒚↓𝒋 ) ≠ ​𝒛↓

  26. A General Feasibility Result — Theorem [KLNRS 08]: Every finite concept class C can be learned privately (and properly), using O(log| C|) examples — Generic Construction (based on the Exponential Mechanism of [MT07]): ¡ Define q(D,h) = # of x i ’s correctly classified by h ¡ Output hypothesis h from C w.p. ≈ e ε q(D,h) q(D,h)=4 q(D,h)=3

  27. A General Feasibility Result — Generic Construction (based on the Exponential Mechanism of [MT07]): ¡ Define q(D,h) = # of x i ’s incorrectly classified by h ¡ Output hypothesis h from C w.p. ≈ e - ε q(D,h) — Privacy: ¡ changing one example changes q(D,h) by at most 1 ¡ Probability of outputting h changes by a factor of at most e ε — Utility: ¡ If h has error > α , probability of outputting h is at most e - ε α n ¡ Union bound: probability of outputting some h with error > α at most |C| e - ε α n ¡ Suffices to take n =O(log |C|)

  28. Had it been the year 2008 …

  29. Brilliant, isn’t it? Time for coffee and cookies THANK YOU! ? But … ? ?

  30. Privately Learning Points/Thresholds — Fact: Proper Point/Threshold learner with O(1) 2 d samples — Generic construction of private learners results in O(log |C|) = O(log(T)) samples Is this gap essential? — Why do we care? ¡ Want private learners to be as efficient as non-private ones ¡ Generic construction fails when domain infinite — Thm [BKN 10]: Any proper pure-private learner of Points/Threshold must use Ω ( log​( 𝑈 ) ) samples

  31. C AN WE DO BETTER ? — Recall: O(log|C|) examples to beat union bound in exponential mechanism analysis — Idea: what if we choose the outcome hypothesis from a set smaller than C?

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.


More recommend