sparse exponential weighting as an alternative to lasso
play

Sparse Exponential Weighting as an alternative to LASSO and Dantzig - PowerPoint PPT Presentation

Introduction Sparsity oracle inequalities(SOI) BIC and LASSO Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW) Sparse Exponential Weighting as an alternative to LASSO and Dantzig selector Alexandre Tsybakov


  1. Introduction Sparsity oracle inequalities(SOI) BIC and LASSO Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW) Sparse Exponential Weighting as an alternative to LASSO and Dantzig selector Alexandre Tsybakov Laboratoire de Statistique, CREST and Laboratoire de Probabilit´ es et Mod` eles Al´ eatoires, Universit´ e Paris 6 Vienna, July 24, 2008 Alexandre Tsybakov Sparse Exponential Weighting

  2. Introduction Sparsity oracle inequalities(SOI) Model, dictionary, linear approximation BIC and LASSO Sparsity and dimension reduction Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW) Nonparametric regression model Assume that we observe the pairs ( X 1 , Y 1 ) , . . . , ( X n , Y n ) ∈ R d × R where Y i = f ( X i ) + ξ i , i = 1 , . . . , n . Regression function f : R d → R is unknown Errors ξ i are independent Gaussian N (0 , σ 2 ) random variables. X i ∈ R d are arbitrary fixed (non-random) points. We want to estimate f based on the data ( X 1 , Y 1 ) , . . . , ( X n , Y n ). Alexandre Tsybakov Sparse Exponential Weighting

  3. Introduction Sparsity oracle inequalities(SOI) Model, dictionary, linear approximation BIC and LASSO Sparsity and dimension reduction Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW) Dictionary, linear approximation Let f 1 , . . . , f M be a finite dictionary of functions , f j : R d → R . We approximate the regression function f by linear combination M � f λ ( x ) = λ j f j ( x ) with weights λ = ( λ 1 , . . . , λ M ) . j =1 We believe that M � f ( x ) ≈ λ j f j ( x ) j =1 for some λ = ( λ 1 , . . . , λ M ). Alexandre Tsybakov Sparse Exponential Weighting

  4. Introduction Sparsity oracle inequalities(SOI) Model, dictionary, linear approximation BIC and LASSO Sparsity and dimension reduction Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW) Dictionary, linear approximation Let f 1 , . . . , f M be a finite dictionary of functions , f j : R d → R . We approximate the regression function f by linear combination M � f λ ( x ) = λ j f j ( x ) with weights λ = ( λ 1 , . . . , λ M ) . j =1 We believe that M � f ( x ) ≈ λ j f j ( x ) j =1 for some λ = ( λ 1 , . . . , λ M ). Possibly M ≫ n Alexandre Tsybakov Sparse Exponential Weighting

  5. Introduction Sparsity oracle inequalities(SOI) Model, dictionary, linear approximation BIC and LASSO Sparsity and dimension reduction Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW) Scenarios (LinReg) Exact equality: there exists λ ∗ ∈ R M such that f = f λ ∗ = � M j =1 λ ∗ j f j ( linear regression , with possibly M ≫ n parameters); Alexandre Tsybakov Sparse Exponential Weighting

  6. Introduction Sparsity oracle inequalities(SOI) Model, dictionary, linear approximation BIC and LASSO Sparsity and dimension reduction Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW) Scenarios (LinReg) Exact equality: there exists λ ∗ ∈ R M such that f = f λ ∗ = � M j =1 λ ∗ j f j ( linear regression , with possibly M ≫ n parameters); (NPReg) f 1 , . . . , f M are the first M functions of a basis (usually orthonormal) and M ≤ n , there exists λ ∗ such that f − f λ ∗ is small: nonparametric estimation of regression ; Alexandre Tsybakov Sparse Exponential Weighting

  7. Introduction Sparsity oracle inequalities(SOI) Model, dictionary, linear approximation BIC and LASSO Sparsity and dimension reduction Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW) Scenarios (LinReg) Exact equality: there exists λ ∗ ∈ R M such that f = f λ ∗ = � M j =1 λ ∗ j f j ( linear regression , with possibly M ≫ n parameters); (NPReg) f 1 , . . . , f M are the first M functions of a basis (usually orthonormal) and M ≤ n , there exists λ ∗ such that f − f λ ∗ is small: nonparametric estimation of regression ; (Agg) aggregation of arbitrary estimators : in this case f 1 , . . . , f M are preliminary estimators of f based on a training sample independent of the observations ( X 1 , Y 1 ) , . . . , ( X n , Y n ); Alexandre Tsybakov Sparse Exponential Weighting

  8. Introduction Sparsity oracle inequalities(SOI) Model, dictionary, linear approximation BIC and LASSO Sparsity and dimension reduction Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW) Scenarios (LinReg) Exact equality: there exists λ ∗ ∈ R M such that f = f λ ∗ = � M j =1 λ ∗ j f j ( linear regression , with possibly M ≫ n parameters); (NPReg) f 1 , . . . , f M are the first M functions of a basis (usually orthonormal) and M ≤ n , there exists λ ∗ such that f − f λ ∗ is small: nonparametric estimation of regression ; (Agg) aggregation of arbitrary estimators : in this case f 1 , . . . , f M are preliminary estimators of f based on a training sample independent of the observations ( X 1 , Y 1 ) , . . . , ( X n , Y n ); (Weak) learning : f 1 , . . . , f M are “weak learners”, i.e., some rough approximations to f ; M is extremely large. Alexandre Tsybakov Sparse Exponential Weighting

  9. Introduction Sparsity oracle inequalities(SOI) Model, dictionary, linear approximation BIC and LASSO Sparsity and dimension reduction Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW) Sparsity of a vector The number of non-zero coordinates of λ : M � M ( λ ) = I { λ j � =0 } j =1 The value M ( λ ) characterizes the sparsity of vector λ ∈ R M : the smaller M ( λ ), the “sparser” λ . Alexandre Tsybakov Sparse Exponential Weighting

  10. Introduction Sparsity oracle inequalities(SOI) Model, dictionary, linear approximation BIC and LASSO Sparsity and dimension reduction Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW) Sparsity of the model Intuitive formulation of sparsity assumption: M � f ( x ) ≈ λ j f j ( x ) (“ f is well approximated by f λ ”) j =1 where the vector λ = ( λ 1 , . . . , λ M ) is sparse: M ( λ ) ≪ M . Alexandre Tsybakov Sparse Exponential Weighting

  11. Introduction Sparsity oracle inequalities(SOI) Model, dictionary, linear approximation BIC and LASSO Sparsity and dimension reduction Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW) Strong sparsity Strong sparsity: f admits an exact sparse representation f = f λ ∗ for some λ ∗ ∈ R M , with M ( λ ∗ ) ≪ M ⇒ Scenario (LinReg) Alexandre Tsybakov Sparse Exponential Weighting

  12. Introduction Sparsity oracle inequalities(SOI) Model, dictionary, linear approximation BIC and LASSO Sparsity and dimension reduction Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW) Sparsity and dimension reduction Let � λ OLS be the ordinary least squares (OLS) estimator. Elementary result: n + σ 2 M λ OLS − f � 2 n ≤ � f − f λ � 2 E � f b n for any λ ∈ R M where � · � n is the empirical norm: � � n � � � 1 f 2 ( X i ) . � f � n = n i =1 Alexandre Tsybakov Sparse Exponential Weighting

  13. Introduction Sparsity oracle inequalities(SOI) Model, dictionary, linear approximation BIC and LASSO Sparsity and dimension reduction Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW) Sparsity and dimension reduction For any λ ∈ R M the “oracular” OLS that acts only on the relevant M ( λ ) coordinates satisfies n + σ 2 M ( λ ) E � f oracle λ OLS − f � 2 n ≤ � f − f λ � 2 . b n This is only an OLS oracle, not an estimator. The set of relevant coordinates should be known. Alexandre Tsybakov Sparse Exponential Weighting

  14. Introduction Sparsity oracle inequalities(SOI) BIC and LASSO Implications of SOI Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW) Sparsity oracle inequalities Do there exist estimators with similar behavior? Choose some other data-driven weights � λ = ( � λ 1 , . . . , � λ M ) and estimate f by M � � � f ( x ) = f b λ ( x ) = λ j f j ( x ) . j =1 Can we find � λ such that n + σ 2 M ( λ ) λ − f � 2 n � � f − f λ � 2 E � f b ∀ λ ? , n Alexandre Tsybakov Sparse Exponential Weighting

  15. Introduction Sparsity oracle inequalities(SOI) BIC and LASSO Implications of SOI Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW) Sparsity oracle inequalities (SOI) Realizable task: look for an estimator f b λ satisfying a sparsity oracle inequality (SOI) � � n + C ′ M ( λ ) log M λ − f � 2 C � f − f λ � 2 E � f b n ≤ inf n λ ∈ R M with some constants C ≥ 1, C ′ > 0 and an inevitable extra log M in the variance term. C = 1 ⇒ sharp SOI . Alexandre Tsybakov Sparse Exponential Weighting

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend