Sparse Exponential Weighting as an alternative to LASSO and Dantzig - PowerPoint PPT Presentation

Introduction Sparsity oracle inequalities(SOI) BIC and LASSO Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW) Sparse Exponential Weighting as an alternative to LASSO and Dantzig selector Alexandre Tsybakov Laboratoire de Statistique, CREST and Laboratoire de Probabilit´ es et Mod` eles Al´ eatoires, Universit´ e Paris 6 Vienna, July 24, 2008 Alexandre Tsybakov Sparse Exponential Weighting

Introduction Sparsity oracle inequalities(SOI) Model, dictionary, linear approximation BIC and LASSO Sparsity and dimension reduction Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW) Nonparametric regression model Assume that we observe the pairs ( X 1 , Y 1 ) , . . . , ( X n , Y n ) ∈ R d × R where Y i = f ( X i ) + ξ i , i = 1 , . . . , n . Regression function f : R d → R is unknown Errors ξ i are independent Gaussian N (0 , σ 2 ) random variables. X i ∈ R d are arbitrary fixed (non-random) points. We want to estimate f based on the data ( X 1 , Y 1 ) , . . . , ( X n , Y n ). Alexandre Tsybakov Sparse Exponential Weighting

Introduction Sparsity oracle inequalities(SOI) Model, dictionary, linear approximation BIC and LASSO Sparsity and dimension reduction Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW) Dictionary, linear approximation Let f 1 , . . . , f M be a finite dictionary of functions , f j : R d → R . We approximate the regression function f by linear combination M � f λ ( x ) = λ j f j ( x ) with weights λ = ( λ 1 , . . . , λ M ) . j =1 We believe that M � f ( x ) ≈ λ j f j ( x ) j =1 for some λ = ( λ 1 , . . . , λ M ). Alexandre Tsybakov Sparse Exponential Weighting

Introduction Sparsity oracle inequalities(SOI) Model, dictionary, linear approximation BIC and LASSO Sparsity and dimension reduction Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW) Dictionary, linear approximation Let f 1 , . . . , f M be a finite dictionary of functions , f j : R d → R . We approximate the regression function f by linear combination M � f λ ( x ) = λ j f j ( x ) with weights λ = ( λ 1 , . . . , λ M ) . j =1 We believe that M � f ( x ) ≈ λ j f j ( x ) j =1 for some λ = ( λ 1 , . . . , λ M ). Possibly M ≫ n Alexandre Tsybakov Sparse Exponential Weighting

Introduction Sparsity oracle inequalities(SOI) Model, dictionary, linear approximation BIC and LASSO Sparsity and dimension reduction Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW) Scenarios (LinReg) Exact equality: there exists λ ∗ ∈ R M such that f = f λ ∗ = � M j =1 λ ∗ j f j ( linear regression , with possibly M ≫ n parameters); Alexandre Tsybakov Sparse Exponential Weighting

Introduction Sparsity oracle inequalities(SOI) Model, dictionary, linear approximation BIC and LASSO Sparsity and dimension reduction Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW) Scenarios (LinReg) Exact equality: there exists λ ∗ ∈ R M such that f = f λ ∗ = � M j =1 λ ∗ j f j ( linear regression , with possibly M ≫ n parameters); (NPReg) f 1 , . . . , f M are the first M functions of a basis (usually orthonormal) and M ≤ n , there exists λ ∗ such that f − f λ ∗ is small: nonparametric estimation of regression ; Alexandre Tsybakov Sparse Exponential Weighting

Introduction Sparsity oracle inequalities(SOI) Model, dictionary, linear approximation BIC and LASSO Sparsity and dimension reduction Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW) Scenarios (LinReg) Exact equality: there exists λ ∗ ∈ R M such that f = f λ ∗ = � M j =1 λ ∗ j f j ( linear regression , with possibly M ≫ n parameters); (NPReg) f 1 , . . . , f M are the first M functions of a basis (usually orthonormal) and M ≤ n , there exists λ ∗ such that f − f λ ∗ is small: nonparametric estimation of regression ; (Agg) aggregation of arbitrary estimators : in this case f 1 , . . . , f M are preliminary estimators of f based on a training sample independent of the observations ( X 1 , Y 1 ) , . . . , ( X n , Y n ); Alexandre Tsybakov Sparse Exponential Weighting

Introduction Sparsity oracle inequalities(SOI) Model, dictionary, linear approximation BIC and LASSO Sparsity and dimension reduction Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW) Scenarios (LinReg) Exact equality: there exists λ ∗ ∈ R M such that f = f λ ∗ = � M j =1 λ ∗ j f j ( linear regression , with possibly M ≫ n parameters); (NPReg) f 1 , . . . , f M are the first M functions of a basis (usually orthonormal) and M ≤ n , there exists λ ∗ such that f − f λ ∗ is small: nonparametric estimation of regression ; (Agg) aggregation of arbitrary estimators : in this case f 1 , . . . , f M are preliminary estimators of f based on a training sample independent of the observations ( X 1 , Y 1 ) , . . . , ( X n , Y n ); (Weak) learning : f 1 , . . . , f M are “weak learners”, i.e., some rough approximations to f ; M is extremely large. Alexandre Tsybakov Sparse Exponential Weighting

Introduction Sparsity oracle inequalities(SOI) Model, dictionary, linear approximation BIC and LASSO Sparsity and dimension reduction Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW) Sparsity of a vector The number of non-zero coordinates of λ : M � M ( λ ) = I { λ j � =0 } j =1 The value M ( λ ) characterizes the sparsity of vector λ ∈ R M : the smaller M ( λ ), the “sparser” λ . Alexandre Tsybakov Sparse Exponential Weighting

Introduction Sparsity oracle inequalities(SOI) Model, dictionary, linear approximation BIC and LASSO Sparsity and dimension reduction Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW) Sparsity of the model Intuitive formulation of sparsity assumption: M � f ( x ) ≈ λ j f j ( x ) (“ f is well approximated by f λ ”) j =1 where the vector λ = ( λ 1 , . . . , λ M ) is sparse: M ( λ ) ≪ M . Alexandre Tsybakov Sparse Exponential Weighting

Introduction Sparsity oracle inequalities(SOI) Model, dictionary, linear approximation BIC and LASSO Sparsity and dimension reduction Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW) Strong sparsity Strong sparsity: f admits an exact sparse representation f = f λ ∗ for some λ ∗ ∈ R M , with M ( λ ∗ ) ≪ M ⇒ Scenario (LinReg) Alexandre Tsybakov Sparse Exponential Weighting

Introduction Sparsity oracle inequalities(SOI) Model, dictionary, linear approximation BIC and LASSO Sparsity and dimension reduction Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW) Sparsity and dimension reduction Let � λ OLS be the ordinary least squares (OLS) estimator. Elementary result: n + σ 2 M λ OLS − f � 2 n ≤ � f − f λ � 2 E � f b n for any λ ∈ R M where � · � n is the empirical norm: � � n � � � 1 f 2 ( X i ) . � f � n = n i =1 Alexandre Tsybakov Sparse Exponential Weighting

Introduction Sparsity oracle inequalities(SOI) Model, dictionary, linear approximation BIC and LASSO Sparsity and dimension reduction Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW) Sparsity and dimension reduction For any λ ∈ R M the “oracular” OLS that acts only on the relevant M ( λ ) coordinates satisfies n + σ 2 M ( λ ) E � f oracle λ OLS − f � 2 n ≤ � f − f λ � 2 . b n This is only an OLS oracle, not an estimator. The set of relevant coordinates should be known. Alexandre Tsybakov Sparse Exponential Weighting

Introduction Sparsity oracle inequalities(SOI) BIC and LASSO Implications of SOI Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW) Sparsity oracle inequalities Do there exist estimators with similar behavior? Choose some other data-driven weights � λ = ( � λ 1 , . . . , � λ M ) and estimate f by M � � � f ( x ) = f b λ ( x ) = λ j f j ( x ) . j =1 Can we find � λ such that n + σ 2 M ( λ ) λ − f � 2 n � � f − f λ � 2 E � f b ∀ λ ? , n Alexandre Tsybakov Sparse Exponential Weighting

Introduction Sparsity oracle inequalities(SOI) BIC and LASSO Implications of SOI Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW) Sparsity oracle inequalities (SOI) Realizable task: look for an estimator f b λ satisfying a sparsity oracle inequality (SOI) � � n + C ′ M ( λ ) log M λ − f � 2 C � f − f λ � 2 E � f b n ≤ inf n λ ∈ R M with some constants C ≥ 1, C ′ > 0 and an inevitable extra log M in the variance term. C = 1 ⇒ sharp SOI . Alexandre Tsybakov Sparse Exponential Weighting

Sparse Exponential Weighting as an alternative to LASSO and Dantzig - PowerPoint PPT Presentation

Introduction Sparsity oracle inequalities(SOI) BIC and LASSO Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW) Sparse Exponential Weighting as an alternative to LASSO and Dantzig selector Alexandre Tsybakov

Sparse CCA using Lasso Anastasia Lykou & Joe Whittaker Department of Mathematics and

Ridge/Lasso Regression, Model selection Xuezhi Wang Computer Science Department Carnegie Mellon

Exponential Families Leila Wehbe March 19, 2013 Leila Wehbe Exponential Families Exponential

Big Data - Lecture 2 High dimensional regression with the Lasso S. Gadat Toulouse, Octobre 2014

Screening Rules for Lasso with Non-Convex Sparse Regularizers Joseph Salmon

A practical tour of optimization algorithms for the Lasso Alexandre Gramfort

Using Stata 16s lasso features for prediction and inference Di Liu StataCorp August, 2019

Why Geometric Progression LASSO Method in Selecting the LASSO How Is Selected: . . . Natural

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

Exponential Growth Exponential Growth Introduction Exponential Growth vs. Linear Growth

Applications of exponential functions Applications of exponential functions abound throughout the

Exponential Family Distributions CMSC 691 UMBC Exponential Family Form Exponential Family Form

Sparse Matrices sparse many elements are zero dense few elements are zero Example Of

= + N 1 N M measurements M 1 sparse signal Problem : Solve for x Basis pursuit,

Exponential smoothing and non-negative data Muhammad Akram Rob J Hyndman J Keith Ord Business

Section5.2 Exponential Functions and Graphs Graphing Definition The exponential function with

Lecture 5 : Sparse Models Homework 3 discussion (Nima) Sparse Models Lecture - Reading :

Variance bounds for estimators in autoregressive models with constraints Wolfgang Wefelmeyer

Nonparametric Methods Recap Aarti Singh Machine Learning 10-701/15-781 Oct 4, 2010

COMMISSION Workshop for Deaf and Hard of Hearing people, and their families, friends and

Pointers and Structs Returning Multiple Values 1 Returning two Values from a Function We

Errors and Asserts Motivation Specifications assign responsibility When code crashes, who

On the separation of queries from modifiers Ran Ettinger, IBM Research Haifa CREST Open

1 # *

Sparse Exponential Weighting as an alternative to LASSO and Dantzig - PowerPoint PPT Presentation

Introduction Sparsity oracle inequalities(SOI) BIC and LASSO Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW) Sparse Exponential Weighting as an alternative to LASSO and Dantzig selector Alexandre Tsybakov

Sparse CCA using Lasso Anastasia Lykou &amp; Joe Whittaker Department of Mathematics and

Ridge/Lasso Regression, Model selection Xuezhi Wang Computer Science Department Carnegie Mellon

Exponential Families Leila Wehbe March 19, 2013 Leila Wehbe Exponential Families Exponential

Big Data - Lecture 2 High dimensional regression with the Lasso S. Gadat Toulouse, Octobre 2014

Screening Rules for Lasso with Non-Convex Sparse Regularizers Joseph Salmon

A practical tour of optimization algorithms for the Lasso Alexandre Gramfort

Using Stata 16s lasso features for prediction and inference Di Liu StataCorp August, 2019

Why Geometric Progression LASSO Method in Selecting the LASSO How Is Selected: . . . Natural

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

Exponential Growth Exponential Growth Introduction Exponential Growth vs. Linear Growth

Applications of exponential functions Applications of exponential functions abound throughout the

Exponential Family Distributions CMSC 691 UMBC Exponential Family Form Exponential Family Form

Sparse Matrices sparse many elements are zero dense few elements are zero Example Of

= + N 1 N M measurements M 1 sparse signal Problem : Solve for x Basis pursuit,

Exponential smoothing and non-negative data Muhammad Akram Rob J Hyndman J Keith Ord Business

Section5.2 Exponential Functions and Graphs Graphing Definition The exponential function with

Lecture 5 : Sparse Models Homework 3 discussion (Nima) Sparse Models Lecture - Reading :

Variance bounds for estimators in autoregressive models with constraints Wolfgang Wefelmeyer

Nonparametric Methods Recap Aarti Singh Machine Learning 10-701/15-781 Oct 4, 2010

COMMISSION Workshop for Deaf and Hard of Hearing people, and their families, friends and

Pointers and Structs Returning Multiple Values 1 Returning two Values from a Function We

Errors and Asserts Motivation Specifications assign responsibility When code crashes, who

On the separation of queries from modifiers Ran Ettinger, IBM Research Haifa CREST Open

1 # *

Sparse CCA using Lasso Anastasia Lykou & Joe Whittaker Department of Mathematics and