locally private learning without interaction requires
play

Locally private learning without interaction requires separation - PowerPoint PPT Presentation

Locally private learning without interaction requires separation Vitaly Feldman Research with Amit Daniely Hebrew University Local Differential Privacy (LDP) 1 [KLNRS 08] -LDP if for every user , message is sent using a


  1. Locally private learning without interaction requires separation Vitaly Feldman Research with Amit Daniely Hebrew University

  2. Local Differential Privacy (LDP) 𝑨 1 [KLNRS β€˜08] πœ— -LDP if for every user 𝑗, message π‘˜ is sent using a local πœ— 𝑗,π‘˜ -DP randomizer 𝑨 2 𝐡 𝑗,π‘˜ and ≀ πœ— πœ— 𝑗,π‘˜ 𝑨 3 π‘˜ Server 𝑨 π‘œ

  3. Non-interactive LDP 𝑨 1 𝑨 2 𝑨 3 Server 𝑨 π‘œ

  4. PAC learning PAC model [Valiant β€˜84] : Let 𝐷 be a set of binary classifiers over π‘Œ 𝐡 is a PAC learning algorithm for 𝐷 if βˆ€π‘” ∈ 𝐷 and distribution 𝐸 over π‘Œ , given i.i.d. examples 𝑦 𝑗 , 𝑔 𝑦 𝑗 for 𝑦 𝑗 ∼ 𝐸 , 𝐡 outputs β„Ž such that w.h.p. π‘¦βˆΌπΈ β„Ž 𝑦 β‰  𝑔(𝑦) ≀ 𝛽 𝐐𝐬 Distribution-specific learning: 𝐸 is fixed and known to 𝐡 4

  5. Statistical query model [Kearns β€˜93] 𝜚 1 𝑀 1 𝑄 distribution over π‘Ž 𝜚 2 π‘Ž = π‘Œ Γ— {Β±1} 𝑀 2 𝑄 is the distribution of 𝑄 (𝑦, 𝑔 𝑦 ) for 𝑦 ∼ 𝐸 𝜚 π‘Ÿ SQ algorithm 𝑀 π‘Ÿ SQ oracle 𝑀 1 βˆ’ 𝐅 π‘¨βˆΌπ‘„ 𝜚 1 𝑨 ≀ 𝜐 𝜚 1 : π‘Ž β†’ 0,1 𝜐 is tolerance of the query; 𝜐 = 1/ π‘œ [KLNRS β€˜08] Simulation with success prob. 1 βˆ’ 𝛾 (πœ— ≀ 1) 𝛾 πœ— -LDP with 𝑛 messages β‡’ 𝑃 𝑛 queries with 𝜐 = Ξ© 𝑛 β€’ π‘Ÿ log π‘Ÿ/𝛾 π‘Ÿ queries with tolerance 𝜐 β‡’ πœ— -LDP with π‘œ = 𝑃 samples/messages β€’ πœπœ— 2 Non-interactive if and only if queries are non-adaptive

  6. Known results 𝐷 is SQ-learnable efficiently (non-adaptively) if and only if learnable efficiently with πœ— -LDP (non-interactively) Examples: Yes: halfspaces/linear classifiers [Dunagan,Vempala β€˜04] β€’ No: parity functions [Kearns β€˜93] β€’ Yes, non-adaptively: Boolean conjunctions β€’ [KLNRS 08] There exists 𝐷 that is 1. SQ/LDP-learnable efficiently over the uniform distribution on 0,1 𝑒 but 2. requires exponential num. of samples to learn non-interactively by an LDP algorithm [KLNRS 08] : Does separation hold for distribution-independent learning? Masked parity 6

  7. Margin Complexity - - - - - - Margin complexity of 𝐷 over π‘Œ - 𝐍𝐃(𝐷) : - - smallest 𝑁 such that exists an embedding Ξ¨: π‘Œ β†’ 𝐂 𝑒 (1) under - - - 1 - which every 𝑔 ∈ 𝐷 is linearly separable with margin 𝛿 β‰₯ 𝑁 + + + Positive examples Ξ¨ 𝑦 𝑔 𝑦 = +1} + + + + + Negative examples Ξ¨ 𝑦 𝑔 𝑦 = βˆ’1} + + + + 7

  8. Lower bound Thm: Let 𝐷 be a negation-closed set of classifiers. Any non-interactive 1 -LPD algorithm that learns 𝐷 with error 𝛽 < 1/2 and success probability Ξ© 1 needs π‘œ = Ξ© 𝐍𝐃 𝐷 2/3 Corollaries: Decision lists over 0,1 𝑒 : π‘œ = 2 Ξ© 𝑒 1/3 β€’ [Buhrman,Vereshchagin,de Wolf β€˜07] 𝑒 (Interactively) learnable with π‘œ = poly π›½πœ— [Kearns ’93] Linear classifiers over 0,1 𝑒 : π‘œ = 2 Ξ© 𝑒 β€’ [Goldmann,Hastad,Razborov β€˜92; Sherstov β€˜07] 𝑒 (Interactively) learnable with π‘œ = poly π›½πœ— [Dunagan,Vempala ’04] 8

  9. Upper bound Thm: For any 𝐷 and distribution 𝐸 there exists a non-adaptive πœ— -LPD algorithm that learns 𝐷 over 𝐸 with error 𝛽 and success probability 1 βˆ’ 𝛾 using π‘œ = poly 𝐍𝐃 𝐷 β‹… log 1/𝛾 π›½πœ— Instead of fixed 𝐸 access to public unlabeled samples from 𝐸 β€’ (interactive) LDP access to unlabeled samples from 𝐸 β€’ Lower bound holds against the hybrid model 9

  10. Lower bound technique Thm: Let 𝐷 be a negation-closed set of classifiers. If exists a non-adaptive SQ algorithm that uses π‘Ÿ queries of tolerance 1/π‘Ÿ to learn 𝐷 with error 𝛽 < 1/2 and success probability Ξ© 1 then 𝐍𝐃 𝐷 = 𝑃 π‘Ÿ 3/2 Correlation dimension of 𝐷 - CSQdim(𝐷) [F. ’08] : smallest 𝑒 for which exist 𝑒 functions β„Ž 1 , … , β„Ž 𝑒 : π‘Œ β†’ [βˆ’1,1] such that for every 𝑔 ∈ 𝐷 and distribution 𝐸 exists 𝑗 such that β‰₯ 1 π‘¦βˆΌπΈ 𝑔 𝑦 β„Ž 𝑗 𝑦 𝐅 𝑒 Thm: [F. ’08; Kallweit,Simon β€˜11] : 𝐍𝐃 𝐷 ≀ CSQdim 𝐷 3/2 10

  11. Proof If exists a non-adaptive SQ algorithm 𝐡 that uses π‘Ÿ queries of tolerance 1/π‘Ÿ to learn 𝐷 with error 𝛽 < 1/2 then CSQdim 𝐷 ≀ π‘Ÿ Let 𝜚 1 , … , 𝜚 π‘Ÿ : π‘Œ Γ— Β±1 β†’ 0,1 be the (non-adaptive) queries of 𝐡 Decompose 𝜚 𝑦, 𝑧 = 𝜚 𝑦, 1 + 𝜚 𝑦, βˆ’1 + 𝜚 𝑦, 1 βˆ’ 𝜚 𝑦, βˆ’1 β‹… 𝑧 2 2 𝑕 β„Ž π‘¦βˆΌπΈ 𝑔 𝑦 β„Ž 𝑗 𝑦 π‘¦βˆΌπΈ 𝜚 𝑗 (𝑦, 𝑔 𝑦 ) = 𝐅 𝐅 π‘¦βˆΌπΈ 𝑕 𝑗 𝑦 + 𝐅 ≀ 1 If 𝐅 π‘Ÿ then 𝐅 π‘¦βˆΌπΈ 𝜚 𝑗 (𝑦, βˆ’π‘” 𝑦 ) π‘¦βˆΌπΈ 𝑔 𝑦 β„Ž 𝑗 𝑦 π‘¦βˆΌπΈ 𝜚 𝑗 (𝑦, 𝑔 𝑦 ) β‰ˆ 𝐅 If this holds for all 𝑗 ∈ [π‘Ÿ] , then the algorithm cannot distinguish between 𝑔 and βˆ’π‘” Cannot achieve error < 1/2 11

  12. Upper bound Thm: For any 𝐷 and distribution 𝐸 there exists a non-adaptive πœ— -LPD algorithm that learns 𝐷 over 𝐸 with error 𝛽 < 1/2 and success probability 1 βˆ’ 𝛾 using π‘œ = poly 𝐍𝐃 𝐷 β‹… log 1/𝛾 π›½πœ— Margin complexity of 𝐷 over π‘Œ - 𝐍𝐃(𝐷) : smallest 𝑁 such that exists an embedding Ξ¨: π‘Œ β†’ 𝐂 𝑒 (1) under which every 𝑔 ∈ 𝐷 is 1 linearly separable with margin 𝛿 β‰₯ 𝑁 Thm [Arriaga,Vempala ’99; Ben -David,Eiron,Simon β€˜02] : For every every 𝑔 ∈ 𝐷, random projection into 𝐂 𝑒 (1) for 𝑒 = 𝑃(𝐍𝐃 𝐷 2 log(1/𝛾)) 1 πŸ‘ 𝐍𝐃 𝐷 ensures that with prob. 1 βˆ’ 𝛾 , 1 βˆ’ 𝛾 fraction of points are linearly separable with margin 𝛿 β‰₯ 12

  13. Algorithm Perceptron: if sign( π‘₯ 𝑒 , 𝑦 ) β‰  𝑧 then update π‘₯ 𝑒+1 ← π‘₯ 𝑒 + 𝑧𝑦 Expected update: (𝑦,𝑧)βˆΌπ‘„ 𝑧𝑦 | sign( π‘₯ 𝑒 , 𝑦 ) β‰  𝑧 𝐅 || scalar β‰₯ 𝛽 (𝑦,𝑧)βˆΌπ‘„ 𝑧𝑦 β‹… 𝟚(sign( π‘₯ 𝑒 , 𝑦 ) β‰  𝑧) / 𝐅 (𝑦,𝑧)βˆΌπ‘„ sign( π‘₯ 𝑒 , 𝑦 ) β‰  𝑧 𝐐𝐬 || 𝑦,𝑧 βˆΌπ‘„ 𝑦𝑧 + 𝐅 𝑦,𝑧 βˆΌπ‘„ 𝑦 sign π‘₯ 𝑒 , 𝑦 𝐅 (𝑦,𝑧)βˆΌπ‘„ 𝑦 β‹… 𝑧 βˆ’ sign π‘₯ 𝑒 , 𝑦 = 𝐅 2 2 independent of the label non-adaptive Estimate the mean vector with β„“ 2 error β€’ LDP [Duchi,Jordan,Wainright β€˜ 13] β€’ SQs [F.,Guzman,Vempala β€˜15] 13

  14. Conclusions β€’ New approach to lower bounds for non-interactive LDP o Reduction to margin-complexity lower bounds β€’ Lower bounds for classical learning problems β€’ Same results for communication constrained protocols o Also equivalent to SQ β€’ Interaction is necessary for learning β€’ Open: o Distribution-independent learning in poly 𝐍𝐃 𝐷 o Lower bounds against 2 + round protocols o Stochastic convex optimization https://arxiv.org/abs/1809.09165 14

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend