 
              Introduction Sparsity oracle inequalities(SOI) BIC and LASSO Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW) Sparse Exponential Weighting as an alternative to LASSO and Dantzig selector Alexandre Tsybakov Laboratoire de Statistique, CREST and Laboratoire de Probabilit´ es et Mod` eles Al´ eatoires, Universit´ e Paris 6 Vienna, July 24, 2008 Alexandre Tsybakov Sparse Exponential Weighting
Introduction Sparsity oracle inequalities(SOI) Model, dictionary, linear approximation BIC and LASSO Sparsity and dimension reduction Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW) Nonparametric regression model Assume that we observe the pairs ( X 1 , Y 1 ) , . . . , ( X n , Y n ) ∈ R d × R where Y i = f ( X i ) + ξ i , i = 1 , . . . , n . Regression function f : R d → R is unknown Errors ξ i are independent Gaussian N (0 , σ 2 ) random variables. X i ∈ R d are arbitrary fixed (non-random) points. We want to estimate f based on the data ( X 1 , Y 1 ) , . . . , ( X n , Y n ). Alexandre Tsybakov Sparse Exponential Weighting
Introduction Sparsity oracle inequalities(SOI) Model, dictionary, linear approximation BIC and LASSO Sparsity and dimension reduction Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW) Dictionary, linear approximation Let f 1 , . . . , f M be a finite dictionary of functions , f j : R d → R . We approximate the regression function f by linear combination M � f λ ( x ) = λ j f j ( x ) with weights λ = ( λ 1 , . . . , λ M ) . j =1 We believe that M � f ( x ) ≈ λ j f j ( x ) j =1 for some λ = ( λ 1 , . . . , λ M ). Alexandre Tsybakov Sparse Exponential Weighting
Introduction Sparsity oracle inequalities(SOI) Model, dictionary, linear approximation BIC and LASSO Sparsity and dimension reduction Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW) Dictionary, linear approximation Let f 1 , . . . , f M be a finite dictionary of functions , f j : R d → R . We approximate the regression function f by linear combination M � f λ ( x ) = λ j f j ( x ) with weights λ = ( λ 1 , . . . , λ M ) . j =1 We believe that M � f ( x ) ≈ λ j f j ( x ) j =1 for some λ = ( λ 1 , . . . , λ M ). Possibly M ≫ n Alexandre Tsybakov Sparse Exponential Weighting
Introduction Sparsity oracle inequalities(SOI) Model, dictionary, linear approximation BIC and LASSO Sparsity and dimension reduction Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW) Scenarios (LinReg) Exact equality: there exists λ ∗ ∈ R M such that f = f λ ∗ = � M j =1 λ ∗ j f j ( linear regression , with possibly M ≫ n parameters); Alexandre Tsybakov Sparse Exponential Weighting
Introduction Sparsity oracle inequalities(SOI) Model, dictionary, linear approximation BIC and LASSO Sparsity and dimension reduction Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW) Scenarios (LinReg) Exact equality: there exists λ ∗ ∈ R M such that f = f λ ∗ = � M j =1 λ ∗ j f j ( linear regression , with possibly M ≫ n parameters); (NPReg) f 1 , . . . , f M are the first M functions of a basis (usually orthonormal) and M ≤ n , there exists λ ∗ such that f − f λ ∗ is small: nonparametric estimation of regression ; Alexandre Tsybakov Sparse Exponential Weighting
Introduction Sparsity oracle inequalities(SOI) Model, dictionary, linear approximation BIC and LASSO Sparsity and dimension reduction Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW) Scenarios (LinReg) Exact equality: there exists λ ∗ ∈ R M such that f = f λ ∗ = � M j =1 λ ∗ j f j ( linear regression , with possibly M ≫ n parameters); (NPReg) f 1 , . . . , f M are the first M functions of a basis (usually orthonormal) and M ≤ n , there exists λ ∗ such that f − f λ ∗ is small: nonparametric estimation of regression ; (Agg) aggregation of arbitrary estimators : in this case f 1 , . . . , f M are preliminary estimators of f based on a training sample independent of the observations ( X 1 , Y 1 ) , . . . , ( X n , Y n ); Alexandre Tsybakov Sparse Exponential Weighting
Introduction Sparsity oracle inequalities(SOI) Model, dictionary, linear approximation BIC and LASSO Sparsity and dimension reduction Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW) Scenarios (LinReg) Exact equality: there exists λ ∗ ∈ R M such that f = f λ ∗ = � M j =1 λ ∗ j f j ( linear regression , with possibly M ≫ n parameters); (NPReg) f 1 , . . . , f M are the first M functions of a basis (usually orthonormal) and M ≤ n , there exists λ ∗ such that f − f λ ∗ is small: nonparametric estimation of regression ; (Agg) aggregation of arbitrary estimators : in this case f 1 , . . . , f M are preliminary estimators of f based on a training sample independent of the observations ( X 1 , Y 1 ) , . . . , ( X n , Y n ); (Weak) learning : f 1 , . . . , f M are “weak learners”, i.e., some rough approximations to f ; M is extremely large. Alexandre Tsybakov Sparse Exponential Weighting
Introduction Sparsity oracle inequalities(SOI) Model, dictionary, linear approximation BIC and LASSO Sparsity and dimension reduction Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW) Sparsity of a vector The number of non-zero coordinates of λ : M � M ( λ ) = I { λ j � =0 } j =1 The value M ( λ ) characterizes the sparsity of vector λ ∈ R M : the smaller M ( λ ), the “sparser” λ . Alexandre Tsybakov Sparse Exponential Weighting
Introduction Sparsity oracle inequalities(SOI) Model, dictionary, linear approximation BIC and LASSO Sparsity and dimension reduction Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW) Sparsity of the model Intuitive formulation of sparsity assumption: M � f ( x ) ≈ λ j f j ( x ) (“ f is well approximated by f λ ”) j =1 where the vector λ = ( λ 1 , . . . , λ M ) is sparse: M ( λ ) ≪ M . Alexandre Tsybakov Sparse Exponential Weighting
Introduction Sparsity oracle inequalities(SOI) Model, dictionary, linear approximation BIC and LASSO Sparsity and dimension reduction Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW) Strong sparsity Strong sparsity: f admits an exact sparse representation f = f λ ∗ for some λ ∗ ∈ R M , with M ( λ ∗ ) ≪ M ⇒ Scenario (LinReg) Alexandre Tsybakov Sparse Exponential Weighting
Introduction Sparsity oracle inequalities(SOI) Model, dictionary, linear approximation BIC and LASSO Sparsity and dimension reduction Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW) Sparsity and dimension reduction Let � λ OLS be the ordinary least squares (OLS) estimator. Elementary result: n + σ 2 M λ OLS − f � 2 n ≤ � f − f λ � 2 E � f b n for any λ ∈ R M where � · � n is the empirical norm: � � n � � � 1 f 2 ( X i ) . � f � n = n i =1 Alexandre Tsybakov Sparse Exponential Weighting
Introduction Sparsity oracle inequalities(SOI) Model, dictionary, linear approximation BIC and LASSO Sparsity and dimension reduction Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW) Sparsity and dimension reduction For any λ ∈ R M the “oracular” OLS that acts only on the relevant M ( λ ) coordinates satisfies n + σ 2 M ( λ ) E � f oracle λ OLS − f � 2 n ≤ � f − f λ � 2 . b n This is only an OLS oracle, not an estimator. The set of relevant coordinates should be known. Alexandre Tsybakov Sparse Exponential Weighting
Introduction Sparsity oracle inequalities(SOI) BIC and LASSO Implications of SOI Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW) Sparsity oracle inequalities Do there exist estimators with similar behavior? Choose some other data-driven weights � λ = ( � λ 1 , . . . , � λ M ) and estimate f by M � � � f ( x ) = f b λ ( x ) = λ j f j ( x ) . j =1 Can we find � λ such that n + σ 2 M ( λ ) λ − f � 2 n � � f − f λ � 2 E � f b ∀ λ ? , n Alexandre Tsybakov Sparse Exponential Weighting
Introduction Sparsity oracle inequalities(SOI) BIC and LASSO Implications of SOI Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW) Sparsity oracle inequalities (SOI) Realizable task: look for an estimator f b λ satisfying a sparsity oracle inequality (SOI) � � n + C ′ M ( λ ) log M λ − f � 2 C � f − f λ � 2 E � f b n ≤ inf n λ ∈ R M with some constants C ≥ 1, C ′ > 0 and an inevitable extra log M in the variance term. C = 1 ⇒ sharp SOI . Alexandre Tsybakov Sparse Exponential Weighting
Recommend
More recommend