Robustness Meets Algorithms Ankur Moitra (MIT) Robust Statistics - - PowerPoint PPT Presentation

robustness meets algorithms
SMART_READER_LITE
LIVE PREVIEW

Robustness Meets Algorithms Ankur Moitra (MIT) Robust Statistics - - PowerPoint PPT Presentation

Robustness Meets Algorithms Ankur Moitra (MIT) Robust Statistics Summer School CLASSIC PARAMETER ESTIMATION Given samples from an unknown distribution in some class e.g. a 1-D Gaussian can we accurately estimate its parameters? CLASSIC


slide-1
SLIDE 1

Robustness Meets Algorithms

Ankur Moitra (MIT)

Robust Statistics Summer School

slide-2
SLIDE 2

CLASSIC PARAMETER ESTIMATION

Given samples from an unknown distribution in some class e.g. a 1-D Gaussian can we accurately estimate its parameters?

slide-3
SLIDE 3

CLASSIC PARAMETER ESTIMATION

Given samples from an unknown distribution in some class e.g. a 1-D Gaussian can we accurately estimate its parameters? Yes!

slide-4
SLIDE 4

CLASSIC PARAMETER ESTIMATION

Given samples from an unknown distribution in some class e.g. a 1-D Gaussian can we accurately estimate its parameters? empirical mean: empirical variance: Yes!

slide-5
SLIDE 5

The maximum likelihood estimator is asymptotically efficient (1910-1920)

  • R. A. Fisher
slide-6
SLIDE 6

The maximum likelihood estimator is asymptotically efficient (1910-1920)

  • R. A. Fisher
  • J. W. Tukey

What about errors in the model itself? (1960)

slide-7
SLIDE 7

ROBUST PARAMETER ESTIMATION

Given corrupted samples from a 1-D Gaussian: can we accurately estimate its parameters?

= +

ideal model noise

  • bserved model
slide-8
SLIDE 8

How do we constrain the noise?

slide-9
SLIDE 9

How do we constrain the noise? Equivalently: L1-norm of noise at most O(ε)

slide-10
SLIDE 10

How do we constrain the noise? Equivalently: L1-norm of noise at most O(ε) Arbitrarily corrupt O(ε)-fraction

  • f samples (in expectation)
slide-11
SLIDE 11

How do we constrain the noise? Equivalently: This generalizes Huber’s Contamination Model: An adversary can add an ε-fraction of samples L1-norm of noise at most O(ε) Arbitrarily corrupt O(ε)-fraction

  • f samples (in expectation)
slide-12
SLIDE 12

How do we constrain the noise? Equivalently: This generalizes Huber’s Contamination Model: An adversary can add an ε-fraction of samples L1-norm of noise at most O(ε) Arbitrarily corrupt O(ε)-fraction

  • f samples (in expectation)

Outliers: Points adversary has corrupted, Inliers: Points he hasn’t

slide-13
SLIDE 13

In what norm do we want the parameters to be close?

slide-14
SLIDE 14

In what norm do we want the parameters to be close? Definition: The total variation distance between two distributions with pdfs f(x) and g(x) is

slide-15
SLIDE 15

In what norm do we want the parameters to be close? From the bound on the L1-norm of the noise, we have:

  • bserved

ideal Definition: The total variation distance between two distributions with pdfs f(x) and g(x) is

slide-16
SLIDE 16

In what norm do we want the parameters to be close? Definition: The total variation distance between two distributions with pdfs f(x) and g(x) is estimate ideal Goal: Find a 1-D Gaussian that satisfies

slide-17
SLIDE 17

In what norm do we want the parameters to be close? estimate

  • bserved

Definition: The total variation distance between two distributions with pdfs f(x) and g(x) is Equivalently, find a 1-D Gaussian that satisfies

slide-18
SLIDE 18

Do the empirical mean and empirical variance work?

slide-19
SLIDE 19

Do the empirical mean and empirical variance work? No!

slide-20
SLIDE 20

Do the empirical mean and empirical variance work? No!

= +

ideal model noise

  • bserved model
slide-21
SLIDE 21

Do the empirical mean and empirical variance work? No!

= +

ideal model noise

  • bserved model

But the median and median absolute deviation do work

slide-22
SLIDE 22

Fact [Folklore]: Given samples from a distribution that is ε-close in total variation distance to a 1-D Gaussian the median and MAD recover estimates that satisfy where

slide-23
SLIDE 23

Fact [Folklore]: Given samples from a distribution that is ε-close in total variation distance to a 1-D Gaussian the median and MAD recover estimates that satisfy where Also called (properly) agnostically learning a 1-D Gaussian

slide-24
SLIDE 24

Fact [Folklore]: Given samples from a distribution that is ε-close in total variation distance to a 1-D Gaussian the median and MAD recover estimates that satisfy where What about robust estimation in high-dimensions?

slide-25
SLIDE 25

What about robust estimation in high-dimensions? e.g. microarrays with 10k genes Fact [Folklore]: Given samples from a distribution that is ε-close in total variation distance to a 1-D Gaussian the median and MAD recover estimates that satisfy where

slide-26
SLIDE 26

Part I: Introduction Ÿ Robust Estimation in One-dimension Ÿ Robustness vs. Hardness in High-dimensions Ÿ Our Results Part II: Agnostically Learning a Gaussian Ÿ Parameter Distance Ÿ Detecting When an Estimator is Compromised Ÿ A Win-Win Algorithm Ÿ Unknown Covariance

OUTLINE

Part III: Experiments

slide-27
SLIDE 27

Part I: Introduction Ÿ Robust Estimation in One-dimension Ÿ Robustness vs. Hardness in High-dimensions Ÿ Our Results Part II: Agnostically Learning a Gaussian Ÿ Parameter Distance Ÿ Detecting When an Estimator is Compromised Ÿ A Win-Win Algorithm Ÿ Unknown Covariance

OUTLINE

Part III: Experiments

slide-28
SLIDE 28

Main Problem: Given samples from a distribution that is ε-close in total variation distance to a d-dimensional Gaussian give an efficient algorithm to find parameters that satisfy

slide-29
SLIDE 29

Main Problem: Given samples from a distribution that is ε-close in total variation distance to a d-dimensional Gaussian give an efficient algorithm to find parameters that satisfy Special Cases: (1) Unknown mean (2) Unknown covariance

slide-30
SLIDE 30

A COMPENDIUM OF APPROACHES

Error Guarantee Running Time Unknown Mean

slide-31
SLIDE 31

A COMPENDIUM OF APPROACHES

Error Guarantee Running Time Tukey Median Unknown Mean

slide-32
SLIDE 32

A COMPENDIUM OF APPROACHES

Error Guarantee Running Time Tukey Median Unknown Mean O(ε)

slide-33
SLIDE 33

A COMPENDIUM OF APPROACHES

Error Guarantee Running Time Tukey Median Unknown Mean O(ε) NP-Hard

slide-34
SLIDE 34

A COMPENDIUM OF APPROACHES

Error Guarantee Running Time Tukey Median Unknown Mean O(ε) NP-Hard Geometric Median

slide-35
SLIDE 35

A COMPENDIUM OF APPROACHES

Error Guarantee Running Time Tukey Median Unknown Mean O(ε) NP-Hard Geometric Median poly(d,N)

slide-36
SLIDE 36

A COMPENDIUM OF APPROACHES

Error Guarantee Running Time Tukey Median Unknown Mean O(ε) NP-Hard Geometric Median poly(d,N) O(ε√d)

slide-37
SLIDE 37

A COMPENDIUM OF APPROACHES

Error Guarantee Running Time Tukey Median Unknown Mean O(ε) NP-Hard Geometric Median poly(d,N) O(ε√d) Tournament O(ε) NO(d)

slide-38
SLIDE 38

A COMPENDIUM OF APPROACHES

Error Guarantee Running Time Tukey Median Unknown Mean O(ε) NP-Hard Geometric Median poly(d,N) O(ε√d) Tournament O(ε) NO(d) O(ε√d) Pruning O(dN)

slide-39
SLIDE 39

A COMPENDIUM OF APPROACHES

Error Guarantee Running Time Tukey Median O(ε) NP-Hard Geometric Median O(ε√d) poly(d,N) Tournament O(ε) NO(d) O(ε√d) Pruning O(dN) Unknown Mean

slide-40
SLIDE 40

The Price of Robustness? All known estimators are hard to compute or lose polynomial factors in the dimension

slide-41
SLIDE 41

The Price of Robustness? All known estimators are hard to compute or lose polynomial factors in the dimension Equivalently: Computationally efficient estimators can only handle fraction of errors and get non-trivial (TV < 1) guarantees

slide-42
SLIDE 42

The Price of Robustness? All known estimators are hard to compute or lose polynomial factors in the dimension Equivalently: Computationally efficient estimators can only handle fraction of errors and get non-trivial (TV < 1) guarantees

slide-43
SLIDE 43

The Price of Robustness? All known estimators are hard to compute or lose polynomial factors in the dimension Equivalently: Computationally efficient estimators can only handle fraction of errors and get non-trivial (TV < 1) guarantees Is robust estimation algorithmically possible in high-dimensions?

slide-44
SLIDE 44

Part I: Introduction Ÿ Robust Estimation in One-dimension Ÿ Robustness vs. Hardness in High-dimensions Ÿ Our Results Part II: Agnostically Learning a Gaussian Ÿ Parameter Distance Ÿ Detecting When an Estimator is Compromised Ÿ A Win-Win Algorithm Ÿ Unknown Covariance

OUTLINE

Part III: Experiments

slide-45
SLIDE 45

Part I: Introduction Ÿ Robust Estimation in One-dimension Ÿ Robustness vs. Hardness in High-dimensions Ÿ Our Results Part II: Agnostically Learning a Gaussian Ÿ Parameter Distance Ÿ Detecting When an Estimator is Compromised Ÿ A Win-Win Algorithm Ÿ Unknown Covariance

OUTLINE

Part III: Experiments

slide-46
SLIDE 46

OUR RESULTS

Theorem [Diakonikolas, Li, Kamath, Kane, Moitra, Stewart ‘16]: There is an algorithm when given samples from a distribution that is ε-close in total variation distance to a d-dimensional Gaussian finds parameters that satisfy Robust estimation is high-dimensions is algorithmically possible! Moreover the algorithm runs in time poly(N, d)

slide-47
SLIDE 47

OUR RESULTS

Theorem [Diakonikolas, Li, Kamath, Kane, Moitra, Stewart ‘16]: There is an algorithm when given samples from a distribution that is ε-close in total variation distance to a d-dimensional Gaussian finds parameters that satisfy Robust estimation is high-dimensions is algorithmically possible! Moreover the algorithm runs in time poly(N, d) Extensions: Can weaken assumptions to sub-Gaussian or bounded second moments (with weaker guarantees) for the mean

slide-48
SLIDE 48

Simultaneously [Lai, Rao, Vempala ‘16] gave agnostic algorithms that achieve:

slide-49
SLIDE 49

Simultaneously [Lai, Rao, Vempala ‘16] gave agnostic algorithms that achieve: When the covariance is bounded, this translates to:

slide-50
SLIDE 50

Simultaneously [Lai, Rao, Vempala ‘16] gave agnostic algorithms that achieve: When the covariance is bounded, this translates to: Subsequently many works handling more errors via list decoding,

slide-51
SLIDE 51

Simultaneously [Lai, Rao, Vempala ‘16] gave agnostic algorithms that achieve: When the covariance is bounded, this translates to: Subsequently many works handling more errors via list decoding, giving lower bounds against statistical query algorithms,

slide-52
SLIDE 52

Simultaneously [Lai, Rao, Vempala ‘16] gave agnostic algorithms that achieve: When the covariance is bounded, this translates to: Subsequently many works handling more errors via list decoding, giving lower bounds against statistical query algorithms, weakening the distributional assumptions,

slide-53
SLIDE 53

Simultaneously [Lai, Rao, Vempala ‘16] gave agnostic algorithms that achieve: When the covariance is bounded, this translates to: Subsequently many works handling more errors via list decoding, giving lower bounds against statistical query algorithms, weakening the distributional assumptions, exploiting sparsity,

slide-54
SLIDE 54

Simultaneously [Lai, Rao, Vempala ‘16] gave agnostic algorithms that achieve: When the covariance is bounded, this translates to: Subsequently many works handling more errors via list decoding, giving lower bounds against statistical query algorithms, weakening the distributional assumptions, exploiting sparsity, working with more complex generative models

slide-55
SLIDE 55

A GENERAL RECIPE

Robust estimation in high-dimensions: Ÿ Step #1: Find an appropriate parameter distance Ÿ Step #2: Detect when the naïve estimator has been compromised Ÿ Step #3: Find good parameters, or make progress Filtering: Fast and practical Convex Programming: Better sample complexity

slide-56
SLIDE 56

A GENERAL RECIPE

Robust estimation in high-dimensions: Ÿ Step #1: Find an appropriate parameter distance Ÿ Step #2: Detect when the naïve estimator has been compromised Ÿ Step #3: Find good parameters, or make progress Filtering: Fast and practical Convex Programming: Better sample complexity Let’s see how this works for unknown mean…

slide-57
SLIDE 57

Part I: Introduction Ÿ Robust Estimation in One-dimension Ÿ Robustness vs. Hardness in High-dimensions Ÿ Our Results Part II: Agnostically Learning a Gaussian Ÿ Parameter Distance Ÿ Detecting When an Estimator is Compromised Ÿ A Win-Win Algorithm Ÿ Unknown Covariance

OUTLINE

Part III: Experiments

slide-58
SLIDE 58

Part I: Introduction Ÿ Robust Estimation in One-dimension Ÿ Robustness vs. Hardness in High-dimensions Ÿ Our Results Part II: Agnostically Learning a Gaussian Ÿ Parameter Distance Ÿ Detecting When an Estimator is Compromised Ÿ A Win-Win Algorithm Ÿ Unknown Covariance

OUTLINE

Part III: Experiments

slide-59
SLIDE 59

PARAMETER DISTANCE

Step #1: Find an appropriate parameter distance for Gaussians

slide-60
SLIDE 60

PARAMETER DISTANCE

Step #1: Find an appropriate parameter distance for Gaussians A Basic Fact: (1)

slide-61
SLIDE 61

PARAMETER DISTANCE

Step #1: Find an appropriate parameter distance for Gaussians A Basic Fact: (1) This can be proven using Pinsker’s Inequality and the well-known formula for KL-divergence between Gaussians

slide-62
SLIDE 62

PARAMETER DISTANCE

Step #1: Find an appropriate parameter distance for Gaussians A Basic Fact: (1)

slide-63
SLIDE 63

PARAMETER DISTANCE

Step #1: Find an appropriate parameter distance for Gaussians A Basic Fact: (1) Corollary: If our estimate (in the unknown mean case) satisfies then

slide-64
SLIDE 64

PARAMETER DISTANCE

Step #1: Find an appropriate parameter distance for Gaussians A Basic Fact: (1) Corollary: If our estimate (in the unknown mean case) satisfies then Our new goal is to be close in Euclidean distance

slide-65
SLIDE 65

Part I: Introduction Ÿ Robust Estimation in One-dimension Ÿ Robustness vs. Hardness in High-dimensions Ÿ Our Results Part II: Agnostically Learning a Gaussian Ÿ Parameter Distance Ÿ Detecting When an Estimator is Compromised Ÿ A Win-Win Algorithm Ÿ Unknown Covariance

OUTLINE

Part III: Experiments

slide-66
SLIDE 66

Part I: Introduction Ÿ Robust Estimation in One-dimension Ÿ Robustness vs. Hardness in High-dimensions Ÿ Our Results Part II: Agnostically Learning a Gaussian Ÿ Parameter Distance Ÿ Detecting When an Estimator is Compromised Ÿ A Win-Win Algorithm Ÿ Unknown Covariance

OUTLINE

Part III: Experiments

slide-67
SLIDE 67

DETECTING CORRUPTIONS

Step #2: Detect when the naïve estimator has been compromised

slide-68
SLIDE 68

DETECTING CORRUPTIONS

Step #2: Detect when the naïve estimator has been compromised = uncorrupted = corrupted

slide-69
SLIDE 69

DETECTING CORRUPTIONS

Step #2: Detect when the naïve estimator has been compromised = uncorrupted = corrupted There is a direction of large (> 1) variance

slide-70
SLIDE 70

Key Lemma: If X1, X2, … XN come from a distribution that is ε-close to and then for (1) (2) with probability at least 1-δ

slide-71
SLIDE 71

Key Lemma: If X1, X2, … XN come from a distribution that is ε-close to and then for (1) (2) with probability at least 1-δ Take-away: An adversary needs to mess up the second moment in order to corrupt the first moment

slide-72
SLIDE 72

Part I: Introduction Ÿ Robust Estimation in One-dimension Ÿ Robustness vs. Hardness in High-dimensions Ÿ Our Results Part II: Agnostically Learning a Gaussian Ÿ Parameter Distance Ÿ Detecting When an Estimator is Compromised Ÿ A Win-Win Algorithm Ÿ Unknown Covariance

OUTLINE

Part III: Experiments

slide-73
SLIDE 73

Part I: Introduction Ÿ Robust Estimation in One-dimension Ÿ Robustness vs. Hardness in High-dimensions Ÿ Our Results Part II: Agnostically Learning a Gaussian Ÿ Parameter Distance Ÿ Detecting When an Estimator is Compromised Ÿ A Win-Win Algorithm Ÿ Unknown Covariance

OUTLINE

Part III: Experiments

slide-74
SLIDE 74

A WIN-WIN ALGORITHM

Step #3: Either find good parameters, or remove many outliers

slide-75
SLIDE 75

A WIN-WIN ALGORITHM

Step #3: Either find good parameters, or remove many outliers Filtering Approach: Suppose that:

slide-76
SLIDE 76

A WIN-WIN ALGORITHM

Step #3: Either find good parameters, or remove many outliers Filtering Approach: Suppose that: We can throw out more corrupted than uncorrupted points: v where v is the direction of largest variance

slide-77
SLIDE 77

A WIN-WIN ALGORITHM

Step #3: Either find good parameters, or remove many outliers Filtering Approach: Suppose that: We can throw out more corrupted than uncorrupted points: v where v is the direction of largest variance, and T has a formula

slide-78
SLIDE 78

A WIN-WIN ALGORITHM

Step #3: Either find good parameters, or remove many outliers Filtering Approach: Suppose that: We can throw out more corrupted than uncorrupted points: v T where v is the direction of largest variance, and T has a formula

slide-79
SLIDE 79

A WIN-WIN ALGORITHM

Step #3: Either find good parameters, or remove many outliers Filtering Approach: Suppose that: We can throw out more corrupted than uncorrupted points

slide-80
SLIDE 80

A WIN-WIN ALGORITHM

Step #3: Either find good parameters, or remove many outliers Filtering Approach: Suppose that: We can throw out more corrupted than uncorrupted points If we continue too long, we’d have no corrupted points left!

slide-81
SLIDE 81

A WIN-WIN ALGORITHM

Step #3: Either find good parameters, or remove many outliers Filtering Approach: Suppose that: We can throw out more corrupted than uncorrupted points If we continue too long, we’d have no corrupted points left! Eventually we find (certifiably) good parameters

slide-82
SLIDE 82

A WIN-WIN ALGORITHM

Step #3: Either find good parameters, or remove many outliers Filtering Approach: Suppose that: We can throw out more corrupted than uncorrupted points If we continue too long, we’d have no corrupted points left! Eventually we find (certifiably) good parameters Running Time: Sample Complexity:

slide-83
SLIDE 83

A WIN-WIN ALGORITHM

Step #3: Either find good parameters, or remove many outliers Filtering Approach: Suppose that: We can throw out more corrupted than uncorrupted points If we continue too long, we’d have no corrupted points left! Eventually we find (certifiably) good parameters Running Time: Sample Complexity: Concentration of LTFs

slide-84
SLIDE 84

Part I: Introduction Ÿ Robust Estimation in One-dimension Ÿ Robustness vs. Hardness in High-dimensions Ÿ Our Results Part II: Agnostically Learning a Gaussian Ÿ Parameter Distance Ÿ Detecting When an Estimator is Compromised Ÿ A Win-Win Algorithm Ÿ Unknown Covariance

OUTLINE

Part III: Experiments

slide-85
SLIDE 85

Part I: Introduction Ÿ Robust Estimation in One-dimension Ÿ Robustness vs. Hardness in High-dimensions Ÿ Our Results Part II: Agnostically Learning a Gaussian Ÿ Parameter Distance Ÿ Detecting When an Estimator is Compromised Ÿ A Win-Win Algorithm Ÿ Unknown Covariance

OUTLINE

Part III: Experiments

slide-86
SLIDE 86

A GENERAL RECIPE

Robust estimation in high-dimensions: Ÿ Step #1: Find an appropriate parameter distance Ÿ Step #2: Detect when the naïve estimator has been compromised Ÿ Step #3: Find good parameters, or make progress Filtering: Fast and practical Convex Programming: Better sample complexity

slide-87
SLIDE 87

A GENERAL RECIPE

Robust estimation in high-dimensions: Ÿ Step #1: Find an appropriate parameter distance Ÿ Step #2: Detect when the naïve estimator has been compromised Ÿ Step #3: Find good parameters, or make progress Filtering: Fast and practical Convex Programming: Better sample complexity How about for unknown covariance?

slide-88
SLIDE 88

PARAMETER DISTANCE

Step #1: Find an appropriate parameter distance for Gaussians

slide-89
SLIDE 89

PARAMETER DISTANCE

Step #1: Find an appropriate parameter distance for Gaussians Another Basic Fact: (2)

slide-90
SLIDE 90

PARAMETER DISTANCE

Step #1: Find an appropriate parameter distance for Gaussians Another Basic Fact: Again, proven using Pinsker’s Inequality (2)

slide-91
SLIDE 91

PARAMETER DISTANCE

Step #1: Find an appropriate parameter distance for Gaussians Another Basic Fact: Again, proven using Pinsker’s Inequality (2) Our new goal is to find an estimate that satisfies:

slide-92
SLIDE 92

PARAMETER DISTANCE

Step #1: Find an appropriate parameter distance for Gaussians Another Basic Fact: Again, proven using Pinsker’s Inequality (2) Our new goal is to find an estimate that satisfies: Distance seems strange, but it’s the right one to use to bound TV

slide-93
SLIDE 93

UNKNOWN COVARIANCE

What if we are given samples from ?

slide-94
SLIDE 94

UNKNOWN COVARIANCE

What if we are given samples from ? How do we detect if the naïve estimator is compromised?

slide-95
SLIDE 95

UNKNOWN COVARIANCE

What if we are given samples from ? How do we detect if the naïve estimator is compromised? Key Fact: Let and Then restricted to flattenings of d x d symmetric matrices

slide-96
SLIDE 96

UNKNOWN COVARIANCE

What if we are given samples from ? How do we detect if the naïve estimator is compromised? Key Fact: Let and Then restricted to flattenings of d x d symmetric matrices Proof uses Isserlis’s Theorem

slide-97
SLIDE 97

UNKNOWN COVARIANCE

need to project out What if we are given samples from ? How do we detect if the naïve estimator is compromised? Key Fact: Let and Then restricted to flattenings of d x d symmetric matrices

slide-98
SLIDE 98

Key Idea: Transform the data, look for restricted large eigenvalues

slide-99
SLIDE 99

Key Idea: Transform the data, look for restricted large eigenvalues

slide-100
SLIDE 100

Key Idea: Transform the data, look for restricted large eigenvalues If were the true covariance, we would have for inliers

slide-101
SLIDE 101

Key Idea: Transform the data, look for restricted large eigenvalues If were the true covariance, we would have for inliers, in which case: would have small restricted eigenvalues

slide-102
SLIDE 102

Key Idea: Transform the data, look for restricted large eigenvalues If were the true covariance, we would have for inliers, in which case: would have small restricted eigenvalues Take-away: An adversary needs to mess up the (restricted) fourth moment in order to corrupt the second moment

slide-103
SLIDE 103

ASSEMBLING THE ALGORITHM

Given samples that are ε-close in total variation distance to a d-dimensional Gaussian

slide-104
SLIDE 104

ASSEMBLING THE ALGORITHM

Given samples that are ε-close in total variation distance to a d-dimensional Gaussian Step #1: Doubling trick

slide-105
SLIDE 105

ASSEMBLING THE ALGORITHM

Given samples that are ε-close in total variation distance to a d-dimensional Gaussian Step #1: Doubling trick Now use algorithm for unknown covariance

slide-106
SLIDE 106

ASSEMBLING THE ALGORITHM

Given samples that are ε-close in total variation distance to a d-dimensional Gaussian Step #1: Doubling trick Now use algorithm for unknown covariance Step #2: (Agnostic) isotropic position

slide-107
SLIDE 107

ASSEMBLING THE ALGORITHM

Given samples that are ε-close in total variation distance to a d-dimensional Gaussian Step #1: Doubling trick Now use algorithm for unknown covariance Step #2: (Agnostic) isotropic position right distance, in general case

slide-108
SLIDE 108

ASSEMBLING THE ALGORITHM

Given samples that are ε-close in total variation distance to a d-dimensional Gaussian Step #1: Doubling trick Now use algorithm for unknown covariance Step #2: (Agnostic) isotropic position Now use algorithm for unknown mean right distance, in general case

slide-109
SLIDE 109

Part I: Introduction Ÿ Robust Estimation in One-dimension Ÿ Robustness vs. Hardness in High-dimensions Ÿ Our Results Part II: Agnostically Learning a Gaussian Ÿ Parameter Distance Ÿ Detecting When an Estimator is Compromised Ÿ A Win-Win Algorithm Ÿ Unknown Covariance

OUTLINE

Part III: Experiments

slide-110
SLIDE 110

Part I: Introduction Ÿ Robust Estimation in One-dimension Ÿ Robustness vs. Hardness in High-dimensions Ÿ Our Results Part II: Agnostically Learning a Gaussian Ÿ Parameter Distance Ÿ Detecting When an Estimator is Compromised Ÿ A Win-Win Algorithm Ÿ Unknown Covariance

OUTLINE

Part III: Experiments

slide-111
SLIDE 111

SYNTHETIC EXPERIMENTS

Error rates on synthetic data (unknown mean): + 10% noise

slide-112
SLIDE 112

SYNTHETIC EXPERIMENTS

Error rates on synthetic data (unknown mean):

100 200 300 400 0.5 1 1.5 dimension excess `2 error

Filtering LRVMean Sample mean w/ noise Pruning RANSAC Geometric Median

100 200 300 400 0.04 0.06 0.08 0.1 0.12 0.14 dimension excess `2 error

slide-113
SLIDE 113

SYNTHETIC EXPERIMENTS

Error rates on synthetic data (unknown covariance, isotropic): + 10% noise close to identity

slide-114
SLIDE 114

SYNTHETIC EXPERIMENTS

20 40 60 80 100 0.5 1 1.5 dimension excess `2 error

Filtering LRVCov Sample covariance w/ noise Pruning RANSAC

20 40 60 80 100 0.1 0.2 0.3 0.4 dimension excess `2 error

Error rates on synthetic data (unknown covariance, isotropic):

slide-115
SLIDE 115

SYNTHETIC EXPERIMENTS

Error rates on synthetic data (unknown covariance, anisotropic): + 10% noise far from identity

slide-116
SLIDE 116

SYNTHETIC EXPERIMENTS

20 40 60 80 100 50 100 150 200 dimension excess `2 error

Filtering LRVCov Sample covariance w/ noise Pruning RANSAC

20 40 60 80 100 0.5 1 dimension excess `2 error

Error rates on synthetic data (unknown covariance, anisotropic):

slide-117
SLIDE 117

REAL DATA EXPERIMENTS

Famous study of [Novembre et al. ‘08]: Take top two singular vectors of people x SNP matrix (POPRES)

slide-118
SLIDE 118

REAL DATA EXPERIMENTS

Famous study of [Novembre et al. ‘08]: Take top two singular vectors of people x SNP matrix (POPRES)

  • 0.2
  • 0.1

0.1 0.2 0.3

  • 0.15
  • 0.1
  • 0.05

0.05 0.1 0.15 0.2

Original Data

slide-119
SLIDE 119

REAL DATA EXPERIMENTS

Famous study of [Novembre et al. ‘08]: Take top two singular vectors of people x SNP matrix (POPRES)

  • 0.2
  • 0.1

0.1 0.2 0.3

  • 0.15
  • 0.1
  • 0.05

0.05 0.1 0.15 0.2

Original Data

slide-120
SLIDE 120

REAL DATA EXPERIMENTS

Famous study of [Novembre et al. ‘08]: Take top two singular vectors of people x SNP matrix (POPRES)

  • 0.2
  • 0.1

0.1 0.2 0.3

  • 0.15
  • 0.1
  • 0.05

0.05 0.1 0.15 0.2

Original Data

“Genes Mirror Geography in Europe”

slide-121
SLIDE 121

REAL DATA EXPERIMENTS

Can we find such patterns in the presence of noise?

slide-122
SLIDE 122

REAL DATA EXPERIMENTS

Can we find such patterns in the presence of noise?

  • 0.2
  • 0.1

0.1 0.2 0.3

  • 0.15
  • 0.1
  • 0.05

0.05 0.1 0.15 0.2

Pruning Projection

10% noise What PCA finds

slide-123
SLIDE 123

REAL DATA EXPERIMENTS

Can we find such patterns in the presence of noise?

  • 0.2
  • 0.1

0.1 0.2 0.3

  • 0.15
  • 0.1
  • 0.05

0.05 0.1 0.15 0.2

Pruning Projection

10% noise What PCA finds

slide-124
SLIDE 124
  • 0.2
  • 0.1

0.1 0.2 0.3

  • 0.15
  • 0.1
  • 0.05

0.05 0.1 0.15 0.2

RANSAC Projection

REAL DATA EXPERIMENTS

Can we find such patterns in the presence of noise? 10% noise What RANSAC finds

slide-125
SLIDE 125
  • 0.2
  • 0.1

0.1 0.2 0.3

  • 0.15
  • 0.1
  • 0.05

0.05 0.1 0.15 0.2

XCS Projection

REAL DATA EXPERIMENTS

Can we find such patterns in the presence of noise? 10% noise What robust PCA (via SDPs) finds

slide-126
SLIDE 126
  • 0.2
  • 0.1

0.1 0.2 0.3

  • 0.15
  • 0.1
  • 0.05

0.05 0.1 0.15 0.2

Filter Projection

REAL DATA EXPERIMENTS

Can we find such patterns in the presence of noise? 10% noise What our methods find

slide-127
SLIDE 127
  • 0.2
  • 0.1

0.1 0.2 0.3

  • 0.15
  • 0.1
  • 0.05

0.05 0.1 0.15 0.2

Filter Projection

  • 0.2
  • 0.1

0.1 0.2 0.3

  • 0.15
  • 0.1
  • 0.05

0.05 0.1 0.15 0.2

Original Data

REAL DATA EXPERIMENTS

10% noise What our methods find no noise The power of provably robust estimation:

slide-128
SLIDE 128

LOOKING FORWARD

Can algorithms for agnostically learning a Gaussian help in exploratory data analysis in high-dimensions?

slide-129
SLIDE 129

LOOKING FORWARD

Can algorithms for agnostically learning a Gaussian help in exploratory data analysis in high-dimensions? Isn’t this what we would have been doing with robust statistical estimators, if we had them all along?

slide-130
SLIDE 130

Thanks! Any Questions?

Summary: Ÿ Nearly optimal algorithm for agnostically learning a high-dimensional Gaussian Ÿ General recipe using restricted eigenvalue problems Ÿ Further applications to other mixture models Ÿ Is practical, robust statistics within reach?