resilience a criterion for learning in the presence of
play

Resilience: A Criterion for Learning in the Presence of Arbitrary - PowerPoint PPT Presentation

Resilience: A Criterion for Learning in the Presence of Arbitrary Outliers Jacob Steinhardt, Moses Charikar, Gregory Valiant ITCS 2018 January 14, 2018 Motivation: Robust Learning Question What concepts can be learned robustly , even if some


  1. Resilience: A Criterion for Learning in the Presence of Arbitrary Outliers Jacob Steinhardt, Moses Charikar, Gregory Valiant ITCS 2018 January 14, 2018

  2. Motivation: Robust Learning Question What concepts can be learned robustly , even if some data is arbitrarily corrupted? 1

  3. Example: Mean Estimation Problem Given data x 1 , . . . , x n ∈ R d , of which (1 − ǫ ) n come from p ∗ (and remaining ǫn are arbitrary outliers), estimate mean µ of p ∗ . 2

  4. Example: Mean Estimation Problem Given data x 1 , . . . , x n ∈ R d , of which (1 − ǫ ) n come from p ∗ (and remaining ǫn are arbitrary outliers), estimate mean µ of p ∗ . 2

  5. Example: Mean Estimation Problem Given data x 1 , . . . , x n ∈ R d , of which (1 − ǫ ) n come from p ∗ (and remaining ǫn are arbitrary outliers), estimate mean µ of p ∗ . 2

  6. Example: Mean Estimation Problem Given data x 1 , . . . , x n ∈ R d , of which (1 − ǫ ) n come from p ∗ (and remaining ǫn are arbitrary outliers), estimate mean µ of p ∗ . 2

  7. Example: Mean Estimation Problem Given data x 1 , . . . , x n ∈ R d , of which (1 − ǫ ) n come from p ∗ (and remaining ǫn are arbitrary outliers), estimate mean µ of p ∗ . Issue: high dimensions 2

  8. Mean Estimation: Gaussian Example Suppose clean data is Gaussian: x i ∼ N ( µ, I ) � �� � Gaussian mean µ variance 1 each coord. 3

  9. Mean Estimation: Gaussian Example Suppose clean data is Gaussian: x i ∼ N ( µ, I ) � �� � Gaussian mean µ variance 1 each coord. √ √ √ 1 2 + · · · + 1 2 = � x i − µ � 2 ≈ d d 3

  10. Mean Estimation: Gaussian Example Suppose clean data is Gaussian: x i ∼ N ( µ, I ) � �� � Gaussian mean µ variance 1 each coord. √ √ √ 1 2 + · · · + 1 2 = � x i − µ � 2 ≈ d d √ ǫ d 3

  11. Mean Estimation: Gaussian Example Suppose clean data is Gaussian: x i ∼ N ( µ, I ) � �� � Gaussian mean µ variance 1 each coord. √ √ √ 1 2 + · · · + 1 2 = � x i − µ � 2 ≈ d d √ ǫ d Cannot filter independently even if know true density! 3

  12. History Progress in high dimensions only recently: • Tukey median [1975]: robust but NP-hard • Donoho estimator [1982]: high error • [DKKLMS16, LRV16]: first dimension-independent error bounds 4

  13. History Progress in high dimensions only recently: • Tukey median [1975]: robust but NP-hard • Donoho estimator [1982]: high error • [DKKLMS16, LRV16]: first dimension-independent error bounds • large body of work since then [CSV17, DKKLMS17, L17, DBS17] • many other problems including PCA [XCM10], regression [NTN11], classification [FHKP09], etc. 4

  14. This Talk Question What general and simple properties enable robust estimation? 5

  15. This Talk Question What general and simple properties enable robust estimation? New information-theoretic criterion: resilience . 5

  16. Resilience Suppose { x i } i ∈ S is a set of points in R d . Definition (Resilience) A set S is ( σ, ǫ ) -resilient in a norm � · � around a point µ if for all subsets T ⊆ S of size at least (1 − ǫ ) | S | , � 1 � � � ( x i − µ ) � ≤ σ. � � | T | i ∈ T Intuition: all large subsets have similar mean. 6

  17. Main Result Let S ⊆ R d be a set of of (1 − ǫ ) n “good” points. Let S out be a set of ǫn arbitrary outliers. We observe ˜ S = S ∪ S out . Theorem ǫ If S is ( σ, 1 − ǫ ) -resilient around µ , then it is possible to output µ such that � ˆ ˆ µ − µ � ≤ 2 σ . In fact, outputting the center of any resilient subset of ˜ S will work! 7

  18. Pigeonhole Argument Claim: If S and S ′ are ( σ, 1 − ǫ ) -resilient around µ and µ ′ and have size ǫ (1 − ǫ ) n , then � µ − µ ′ � ≤ 2 σ . 8

  19. Pigeonhole Argument Claim: If S and S ′ are ( σ, 1 − ǫ ) -resilient around µ and µ ′ and have size ǫ (1 − ǫ ) n , then � µ − µ ′ � ≤ 2 σ . Proof: S ˜ S 8

  20. Pigeonhole Argument Claim: If S and S ′ are ( σ, 1 − ǫ ) -resilient around µ and µ ′ and have size ǫ (1 − ǫ ) n , then � µ − µ ′ � ≤ 2 σ . Proof: S ˜ S S ′ 8

  21. Pigeonhole Argument Claim: If S and S ′ are ( σ, 1 − ǫ ) -resilient around µ and µ ′ and have size ǫ (1 − ǫ ) n , then � µ − µ ′ � ≤ 2 σ . Proof: S ˜ S S ∩ S ′ S ′ • Let µ S ∩ S ′ be the mean of S ∩ S ′ . ǫ • By Pigeonhole, | S ∩ S ′ | ≥ 1 − ǫ | S ′ | . 8

  22. Pigeonhole Argument Claim: If S and S ′ are ( σ, 1 − ǫ ) -resilient around µ and µ ′ and have size ǫ (1 − ǫ ) n , then � µ − µ ′ � ≤ 2 σ . Proof: S ˜ S S ∩ S ′ S ′ • Let µ S ∩ S ′ be the mean of S ∩ S ′ . ǫ • By Pigeonhole, | S ∩ S ′ | ≥ 1 − ǫ | S ′ | . • Then � µ ′ − µ S ∩ S ′ � ≤ σ by resilience. • Similarly, � µ − µ S ∩ S ′ � ≤ σ . • Result follows by triangle inequality. 8

  23. Implication: Mean Estimation Lemma If a dataset has bounded covariance, it is ( ǫ, O ( √ ǫ )) -resilient (in the ℓ 2 -norm). 9

  24. Implication: Mean Estimation Lemma If a dataset has bounded covariance, it is ( ǫ, O ( √ ǫ )) -resilient (in the ℓ 2 -norm). Proof: If ǫn points ≫ 1 / √ ǫ from mean, would make variance ≫ 1 . Therefore, deleting ǫn points changes mean by at most ≈ ǫ · 1 / √ ǫ = √ ǫ . 9

  25. Implication: Mean Estimation Lemma If a dataset has bounded covariance, it is ( ǫ, O ( √ ǫ )) -resilient (in the ℓ 2 -norm). Proof: If ǫn points ≫ 1 / √ ǫ from mean, would make variance ≫ 1 . Therefore, deleting ǫn points changes mean by at most ≈ ǫ · 1 / √ ǫ = √ ǫ . Corollary If the clean data has bounded covariance, its mean can be estimated to ℓ 2 -error O ( √ ǫ ) in the presence of ǫn outliers. 9

  26. Implication: Mean Estimation Lemma If a dataset has bounded covariance, it is ( ǫ, O ( √ ǫ )) -resilient (in the ℓ 2 -norm). Proof: If ǫn points ≫ 1 / √ ǫ from mean, would make variance ≫ 1 . Therefore, deleting ǫn points changes mean by at most ≈ ǫ · 1 / √ ǫ = √ ǫ . Corollary If the clean data has bounded k th moments, its mean can be estimated to ℓ 2 -error O ( ǫ 1 − 1 /k ) in the presence of ǫn outliers. 9

  27. Implication: Learning Discrete Distributions Suppose we observe samples from a distribution π on { 1 , . . . , m } . Samples come in r -tuples, which are either all good or all outliers. 10

  28. Implication: Learning Discrete Distributions Suppose we observe samples from a distribution π on { 1 , . . . , m } . Samples come in r -tuples, which are either all good or all outliers. Corollary The distribution π can be estimated (in TV distance) to error � O ( ǫ log(1 /ǫ ) /r ) in the presence of ǫn outliers. 10

  29. Implication: Learning Discrete Distributions Suppose we observe samples from a distribution π on { 1 , . . . , m } . Samples come in r -tuples, which are either all good or all outliers. Corollary The distribution π can be estimated (in TV distance) to error � O ( ǫ log(1 /ǫ ) /r ) in the presence of ǫn outliers. • follows from resilience in ℓ 1 -norm • see also [Qiao & Valiant, 2018] later in this session! 10

  30. A Majority of Outliers Can also handle the case where clean set has size only αn ( α < 1 2 ): ˜ S S 11

  31. A Majority of Outliers Can also handle the case where clean set has size only αn ( α < 1 2 ): ˜ S S • cover ˜ S by resilient sets 11

  32. A Majority of Outliers Can also handle the case where clean set has size only αn ( α < 1 2 ): S ′ ˜ S S S ∩ S ′ • cover ˜ S by resilient sets • at least one set S ′ must have high overlap with S ... 11

  33. A Majority of Outliers Can also handle the case where clean set has size only αn ( α < 1 2 ): S ′ ˜ S S S ∩ S ′ • cover ˜ S by resilient sets • at least one set S ′ must have high overlap with S ... • ...and hence � µ ′ − µ � ≤ 2 σ as before. 11

  34. A Majority of Outliers Can also handle the case where clean set has size only αn ( α < 1 2 ): S ′ ˜ S S S ∩ S ′ • cover ˜ S by resilient sets • at least one set S ′ must have high overlap with S ... • ...and hence � µ ′ − µ � ≤ 2 σ as before. • Recovery in list-decodable model [BBV08]. 11

  35. Implication: Stochastic Block Models Set of αn good and (1 − α ) n bad vertices. 12

  36. Implication: Stochastic Block Models Set of αn good and (1 − α ) n bad vertices. • good ↔ good: dense (avg. deg. = a ) • good ↔ bad: sparse (avg. deg. = b ) 12

  37. Implication: Stochastic Block Models Set of αn good and (1 − α ) n bad vertices. • good ↔ good: dense (avg. deg. = a ) • good ↔ bad: sparse (avg. deg. = b ) • bad ↔ bad: arbitrary 12

  38. Implication: Stochastic Block Models Set of αn good and (1 − α ) n bad vertices. • good ↔ good: dense (avg. deg. = a ) • good ↔ bad: sparse (avg. deg. = b ) • bad ↔ bad: arbitrary Question: when can good set be recovered (in terms of α, a, b )? 12

  39. Implication: Stochastic Block Models Using resilience in “truncated ℓ 1 -norm”, can show: Corollary The set of good vertices can be approximately recovered when- ever ( a − b ) 2 ≫ log(2 /α ) . a α 2 13

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend