learning from untrusted data
play

Learning from Untrusted Data Moses Charikar, Jacob Steinhardt, - PowerPoint PPT Presentation

Learning from Untrusted Data Moses Charikar, Jacob Steinhardt, Gregory Valiant Symposium on the Theory of Computing June 19, 2017 (Icon credit: Annie Lin) Motivation: data poisoning attacks: 1 (Icon credit: Annie Lin) Motivation: data


  1. Learning from Untrusted Data Moses Charikar, Jacob Steinhardt, Gregory Valiant Symposium on the Theory of Computing June 19, 2017

  2. (Icon credit: Annie Lin) Motivation: data poisoning attacks: 1

  3. (Icon credit: Annie Lin) Motivation: data poisoning attacks: Question: what concepts can be learned in the presence of arbitrarily corrupted data? 1

  4. Related Work • 60 years of work on robust statistics... PCA: • XCM ’10, CLMW ’11, CSPW ’11 Mean estimation: • LRV ’16, DKKLMS ’16, DKKLMS ’17, L ’17, DBS ’17, S CV ’17 Regression: • NTN ’11, NT ’13, CCM ’13, BJK ’15 Classification: • FHKP ’09, GR ’09, KLS ’09, ABL ’14 Semi-random graphs: • FK ’01, C ’07, MMV ’12, S ’17 Other: • HM ’13, C ’14, C ’16, DKS ’16, S CV ’16 2

  5. Problem Setting Observe n points x 1 , . . . , x n 3

  6. Problem Setting Observe n points x 1 , . . . , x n Unknown subset of αn points drawn i.i.d. from p ∗ 3

  7. Problem Setting Observe n points x 1 , . . . , x n Unknown subset of αn points drawn i.i.d. from p ∗ Remaining (1 − α ) n points are arbitrary 3

  8. Problem Setting Observe n points x 1 , . . . , x n Unknown subset of αn points drawn i.i.d. from p ∗ Remaining (1 − α ) n points are arbitrary Goal: estimate parameter of interest θ ( p ∗ ) • assuming p ∗ ∈ P (e.g. bounded moments) • θ ( p ∗ ) could be mean, best fit line, ranking, etc. 3

  9. Problem Setting Observe n points x 1 , . . . , x n Unknown subset of αn points drawn i.i.d. from p ∗ Remaining (1 − α ) n points are arbitrary Goal: estimate parameter of interest θ ( p ∗ ) • assuming p ∗ ∈ P (e.g. bounded moments) • θ ( p ∗ ) could be mean, best fit line, ranking, etc. New regime: α ≪ 1 3

  10. Why Is This Possible? If e.g. α = 1 3 , estimation seems impossible: 4

  11. Why Is This Possible? If e.g. α = 1 3 , estimation seems impossible: 4

  12. Why Is This Possible? If e.g. α = 1 3 , estimation seems impossible: But can narrow down to 3 possibilities! 4

  13. Why Is This Possible? If e.g. α = 1 3 , estimation seems impossible: But can narrow down to 3 possibilities! List-decodable learning [Balcan, Blum, Vempala ’08] • output O (1 /α ) answers, one of which is approximately correct 4

  14. Why Is This Possible? If e.g. α = 1 3 , estimation seems impossible: But can narrow down to 3 possibilities! List-decodable learning [Balcan, Blum, Vempala ’08] • output O (1 /α ) answers, one of which is approximately correct Semi-verified learning • observe O (1) verified points from p ∗ 4

  15. Why Is This Possible? If e.g. α = 1 3 , estimation seems impossible: But can narrow down to 3 possibilities! List-decodable learning [Balcan, Blum, Vempala ’08] • output O (1 /α ) answers, one of which is approximately correct Semi-verified learning • observe O (1) verified points from p ∗ 4

  16. Why Care? Practical problem: data poisoning attacks • How can we build learning algorithms that are provably secure to manipulation? 5

  17. Why Care? Practical problem: data poisoning attacks • How can we build learning algorithms that are provably secure to manipulation? Fundamental problem in robust statistics • What can be learned in presence of arbitrary outliers? 5

  18. Why Care? Practical problem: data poisoning attacks • How can we build learning algorithms that are provably secure to manipulation? Fundamental problem in robust statistics • What can be learned in presence of arbitrary outliers? Agnostic learning of mixtures • When is it possible to learn about one mixture component, with no assumptions about the other components? 5

  19. Main Theorem Observed functions: f 1 , . . . , f n Want to minimize unknown target function: ¯ f 6

  20. Main Theorem Observed functions: f 1 , . . . , f n Want to minimize unknown target function: ¯ f Key quantity: spectral norm bound on a subset I : w ∈ R d � [ ∇ f i ( w ) − ∇ ¯ √ 1 | I | max f ( w )] i ∈ I � op ≤ S. 6

  21. Main Theorem Observed functions: f 1 , . . . , f n Want to minimize unknown target function: ¯ f Key quantity: spectral norm bound on a subset I : w ∈ R d � [ ∇ f i ( w ) − ∇ ¯ √ 1 | I | max f ( w )] i ∈ I � op ≤ S. Meta-Theorem Given a spectral norm bound on an unknown subset of αn functions, learning is possible: • in the semi-verified model (for convex f i ) • in the list-decodable model (for strongly convex f i ) 6

  22. Main Theorem Observed functions: f 1 , . . . , f n Want to minimize unknown target function: ¯ f Key quantity: spectral norm bound on a subset I : w ∈ R d � [ ∇ f i ( w ) − ∇ ¯ √ 1 | I | max f ( w )] i ∈ I � op ≤ S. Meta-Theorem Given a spectral norm bound on an unknown subset of αn functions, learning is possible: • in the semi-verified model (for convex f i ) • in the list-decodable model (for strongly convex f i ) All results direct corollaries of meta-theorem! 6

  23. Corollary: Mean Estimation Setting: distribution p ∗ on R d with mean µ and bounded 1st moments: E p ∗ [ |� x − µ, v �| ] ≤ σ � v � 2 for all v ∈ R d . 7

  24. Corollary: Mean Estimation Setting: distribution p ∗ on R d with mean µ and bounded 1st moments: E p ∗ [ |� x − µ, v �| ] ≤ σ � v � 2 for all v ∈ R d . Observe αn samples from p ∗ and (1 − α ) n arbitrary points, and want to estimate µ . 7

  25. Corollary: Mean Estimation Setting: distribution p ∗ on R d with mean µ and bounded 1st moments: E p ∗ [ |� x − µ, v �| ] ≤ σ � v � 2 for all v ∈ R d . Observe αn samples from p ∗ and (1 − α ) n arbitrary points, and want to estimate µ . Theorem (Mean Estimation) If αn ≥ d , it is possible to output estimates ˆ µ 1 , . . . , ˆ µ m of the mean µ such that • m ≤ 2 /α , and O ( σ/ √ α ) w.h.p. µ j − µ � 2 = ˜ • min m j =1 � ˆ 7

  26. Corollary: Mean Estimation Setting: distribution p ∗ on R d with mean µ and bounded 1st moments: E p ∗ [ |� x − µ, v �| ] ≤ σ � v � 2 for all v ∈ R d . Observe αn samples from p ∗ and (1 − α ) n arbitrary points, and want to estimate µ . Theorem (Mean Estimation) If αn ≥ d , it is possible to output estimates ˆ µ 1 , . . . , ˆ µ m of the mean µ such that • m ≤ 2 /α , and O ( σ/ √ α ) w.h.p. µ j − µ � 2 = ˜ • min m j =1 � ˆ Alternately, it is possible to output an estimate ˆ µ given a single verified point from p ∗ . 7

  27. Comparisons Mean estimation: Bound Regime Assumption Samples σ √ 1 − α α > 1 − c 4 th moments d LRV ’16 d 3 σ (1 − α ) α > 1 − c DKKLMS ’16 sub-Gaussian σ/ √ α α > 0 1 st moments d CSV ’17 8

  28. Comparisons Mean estimation: Bound Regime Assumption Samples σ √ 1 − α α > 1 − c 4 th moments d LRV ’16 d 3 σ (1 − α ) α > 1 − c DKKLMS ’16 sub-Gaussian σ/ √ α α > 0 1 st moments d CSV ’17 Estimating mixtures: Separation Robust? σ ( k + 1 / √ α ) AM ’05 no σk KK ’10 no √ σ k AS ’12 no σ/ √ α CSV ’17 yes 8

  29. Other Results Stochastic Block Model: (sparse regime: cf. GV ’14, LLV ’15, RT ’15, RV ’16) Average Degree Robust? 1 /α 4 GV ’14 no 1 /α 2 AS ’15 no 1 /α 3 yes CSV ’17 9

  30. Other Results Stochastic Block Model: (sparse regime: cf. GV ’14, LLV ’15, RT ’15, RV ’16) Average Degree Robust? 1 /α 4 GV ’14 no 1 /α 2 AS ’15 no 1 /α 3 yes CSV ’17 Others: • discrete product distributions • exponential families • ranking 9

  31. Proof Overview (Mean Estimation) Recall goal: given n points x 1 , . . . , x n , αn drawn from p ∗ , estimate mean µ of p ∗ 10

  32. Proof Overview (Mean Estimation) Recall goal: given n points x 1 , . . . , x n , αn drawn from p ∗ , estimate mean µ of p ∗ Key tension: balance adversarial and statistical error 10

  33. Proof Overview (Mean Estimation) Recall goal: given n points x 1 , . . . , x n , αn drawn from p ∗ , estimate mean µ of p ∗ Key tension: balance adversarial and statistical error Key tension: balance adversarial and statistical error 10

  34. Proof Overview (Mean Estimation) Recall goal: given n points x 1 , . . . , x n , αn drawn from p ∗ , estimate mean µ of p ∗ Key tension: balance adversarial and statistical error Key tension: balance adversarial and statistical error Key tension: balance adversarial and statistical error 10

  35. Proof Overview (Mean Estimation) Recall goal: given n points x 1 , . . . , x n , αn drawn from p ∗ , estimate mean µ of p ∗ Key tension: balance adversarial and statistical error Key tension: balance adversarial and statistical error Key tension: balance adversarial and statistical error High-level strategy: solve convex optimization problem • if cost is low, estimation succeeds (spectral norm bound) • if cost is high, identify and remove outliers 10

  36. Proof Overview (Mean Estimation) Recall goal: given n points x 1 , . . . , x n , αn drawn from p ∗ , estimate mean µ of p ∗ Key tension: balance adversarial and statistical error Key tension: balance adversarial and statistical error Key tension: balance adversarial and statistical error High-level strategy: solve convex optimization problem • if cost is low, estimation succeeds (spectral norm bound) • if cost is high, identify and remove outliers 10

  37. Algorithm � n i =1 � x i − µ � 2 First pass: minimize µ 2 11

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend