exponentially weighted aggregation laplace prior for
play

Exponentially weighted aggregation Laplace prior for linear - PowerPoint PPT Presentation

Introduction: prediction in high dimension Penalization and Lasso Exponentially weighted average Exponentially weighted aggregation Laplace prior for linear regression Arnak Dalalyan, Edwin Grappin & Quentin Paris edwin.grappin@ensae.fr


  1. Introduction: prediction in high dimension Penalization and Lasso Exponentially weighted average Exponentially weighted aggregation Laplace prior for linear regression Arnak Dalalyan, Edwin Grappin & Quentin Paris edwin.grappin@ensae.fr JPS - Les Houches - 2016 Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

  2. Introduction: prediction in high dimension Linear regression: goals & settings Penalization and Lasso Linear regression: least squares Exponentially weighted average Goals & settings We observe n labels ( Y i ) i ∈{ 1 ,..., n } and there is a linear relation between the label and the p features ( X i , j ) j ∈{ 1 ,..., p } such that: Y = X β ⋆ + ξ, where Y ∈ R n , X ∈ R n × p , β ⋆ ∈ R p and ξ ∈ R n a random variable such that ξ i is N ( 0 , σ 2 ) . Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

  3. Introduction: prediction in high dimension Linear regression: goals & settings Penalization and Lasso Linear regression: least squares Exponentially weighted average Goals & settings We observe n labels ( Y i ) i ∈{ 1 ,..., n } and there is a linear relation between the label and the p features ( X i , j ) j ∈{ 1 ,..., p } such that: Y = X β ⋆ + ξ, where Y ∈ R n , X ∈ R n × p , β ⋆ ∈ R p and ξ ∈ R n a random variable such that ξ i is N ( 0 , σ 2 ) . Our interests are: Low prediction loss : � X ( β ⋆ − ˆ 2 (fitting β ⋆ is less β ) � 2 important), Good quality when p is large ( p >> n ), Efficient use of sparsity property of β ⋆ ( β ⋆ is s -sparse if at most s elements are non null). Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

  4. Introduction: prediction in high dimension Linear regression: goals & settings Penalization and Lasso Linear regression: least squares Exponentially weighted average Least squares method Ordinary least squares (OLS) estimator is defined by: ˆ β ∈ R p � Y − X β � 2 β OLS = arg min 2 . OLS minimizes the sum of the squares of the residuals. Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

  5. Introduction: prediction in high dimension Linear regression: goals & settings Penalization and Lasso Linear regression: least squares Exponentially weighted average Least squares method Ordinary least squares (OLS) estimator is defined by: ˆ β ∈ R p � Y − X β � 2 β OLS = arg min 2 . OLS minimizes the sum of the squares of the residuals. Overfitting . If p is very large, OLS has poor prediction results: Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

  6. Introduction: prediction in high dimension Linear regression: goals & settings Penalization and Lasso Linear regression: least squares Exponentially weighted average Least squares method Ordinary least squares (OLS) estimator is defined by: ˆ β ∈ R p � Y − X β � 2 β OLS = arg min 2 . OLS minimizes the sum of the squares of the residuals. Overfitting . If p is very large, OLS has poor prediction results: There is not a unique solution when p > n , Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

  7. Introduction: prediction in high dimension Linear regression: goals & settings Penalization and Lasso Linear regression: least squares Exponentially weighted average Least squares method Ordinary least squares (OLS) estimator is defined by: ˆ β ∈ R p � Y − X β � 2 β OLS = arg min 2 . OLS minimizes the sum of the squares of the residuals. Overfitting . If p is very large, OLS has poor prediction results: There is not a unique solution when p > n , Does not detect meaningful features among all features, Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

  8. Introduction: prediction in high dimension Linear regression: goals & settings Penalization and Lasso Linear regression: least squares Exponentially weighted average Least squares method Ordinary least squares (OLS) estimator is defined by: ˆ β ∈ R p � Y − X β � 2 β OLS = arg min 2 . OLS minimizes the sum of the squares of the residuals. Overfitting . If p is very large, OLS has poor prediction results: There is not a unique solution when p > n , Does not detect meaningful features among all features, Performance is focus on fitting the data not predicting labels. Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

  9. Penalized regression Introduction: prediction in high dimension ℓ 0 penalization Penalization and Lasso A trade off between sparsity and convexity Exponentially weighted average Lasso and oracle inequality Penalized regression In our case, a good estimator has the following properties: Guarantees on prediction results, Use sparsity assumption to manage p > n , Computationnaly fast (of paramount importance when p is large). Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

  10. Penalized regression Introduction: prediction in high dimension ℓ 0 penalization Penalization and Lasso A trade off between sparsity and convexity Exponentially weighted average Lasso and oracle inequality Penalized regression In our case, a good estimator has the following properties: Guarantees on prediction results, Use sparsity assumption to manage p > n , Computationnaly fast (of paramount importance when p is large). Penalized regression is a method that combines the usual fitting term with a penalty term : � � ˆ � Y − X β � 2 β pen = arg min 2 + λ P ( β ) , β ∈ R p P is the penalty function and λ ≥ 0 controls the trade off between the two terms. Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

  11. Penalized regression Introduction: prediction in high dimension ℓ 0 penalization Penalization and Lasso A trade off between sparsity and convexity Exponentially weighted average Lasso and oracle inequality Subset selection with a ℓ 0 penalization An intuitive candidate would be a penalization based on ℓ 0 pseudo-norm (the sparsity level): p � � β � 0 = ✶ β i � = 0 . i = 1 � � ˆ � Y − X β � 2 2 + λ � β � 0 β ℓ 0 = arg min . β ∈ R p Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

  12. Penalized regression Introduction: prediction in high dimension ℓ 0 penalization Penalization and Lasso A trade off between sparsity and convexity Exponentially weighted average Lasso and oracle inequality Subset selection with a ℓ 0 penalization An intuitive candidate would be a penalization based on ℓ 0 pseudo-norm (the sparsity level): p � � β � 0 = ✶ β i � = 0 . i = 1 � � ˆ � Y − X β � 2 2 + λ � β � 0 β ℓ 0 = arg min . β ∈ R p The penalty forces many elements of ˆ β to be null. It chooses the most important features. Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

  13. Penalized regression Introduction: prediction in high dimension ℓ 0 penalization Penalization and Lasso A trade off between sparsity and convexity Exponentially weighted average Lasso and oracle inequality Subset selection with a ℓ 0 penalization An intuitive candidate would be a penalization based on ℓ 0 pseudo-norm (the sparsity level): p � � β � 0 = ✶ β i � = 0 . i = 1 � � ˆ � Y − X β � 2 2 + λ � β � 0 β ℓ 0 = arg min . β ∈ R p The penalty forces many elements of ˆ β to be null. It chooses the most important features. Due to the ℓ 0 pseudo-norm, the objective function is nonconvex. Hence, computational time grows exponentially with p . Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

  14. Penalized regression Introduction: prediction in high dimension ℓ 0 penalization Penalization and Lasso A trade off between sparsity and convexity Exponentially weighted average Lasso and oracle inequality Choice of the penalization term Let q > 0, we consider the estimators � � ˆ � Y − X β � 2 2 + λ � β � q β q = arg min . q β ∈ R p If q <1 , the solution is sparse but the problem is nonconvex . Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

  15. Penalized regression Introduction: prediction in high dimension ℓ 0 penalization Penalization and Lasso A trade off between sparsity and convexity Exponentially weighted average Lasso and oracle inequality Choice of the penalization term Let q > 0, we consider the estimators � � ˆ � Y − X β � 2 2 + λ � β � q β q = arg min . q β ∈ R p If q > 1 , the problem is convex but the solution is not sparse . Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

  16. Penalized regression Introduction: prediction in high dimension ℓ 0 penalization Penalization and Lasso A trade off between sparsity and convexity Exponentially weighted average Lasso and oracle inequality Choice of the penalization term Let q > 0, we consider the estimators � � ˆ � Y − X β � 2 2 + λ � β � q β q = arg min . q β ∈ R p If q = 1 , the solution is sparse and the problem is convex . Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend