Exponentially weighted aggregation Laplace prior for linear - PowerPoint PPT Presentation

Introduction: prediction in high dimension Penalization and Lasso Exponentially weighted average Exponentially weighted aggregation Laplace prior for linear regression Arnak Dalalyan, Edwin Grappin & Quentin Paris edwin.grappin@ensae.fr JPS - Les Houches - 2016 Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

Introduction: prediction in high dimension Linear regression: goals & settings Penalization and Lasso Linear regression: least squares Exponentially weighted average Goals & settings We observe n labels ( Y i ) i ∈{ 1 ,..., n } and there is a linear relation between the label and the p features ( X i , j ) j ∈{ 1 ,..., p } such that: Y = X β ⋆ + ξ, where Y ∈ R n , X ∈ R n × p , β ⋆ ∈ R p and ξ ∈ R n a random variable such that ξ i is N ( 0 , σ 2 ) . Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

Introduction: prediction in high dimension Linear regression: goals & settings Penalization and Lasso Linear regression: least squares Exponentially weighted average Goals & settings We observe n labels ( Y i ) i ∈{ 1 ,..., n } and there is a linear relation between the label and the p features ( X i , j ) j ∈{ 1 ,..., p } such that: Y = X β ⋆ + ξ, where Y ∈ R n , X ∈ R n × p , β ⋆ ∈ R p and ξ ∈ R n a random variable such that ξ i is N ( 0 , σ 2 ) . Our interests are: Low prediction loss : � X ( β ⋆ − ˆ 2 (fitting β ⋆ is less β ) � 2 important), Good quality when p is large ( p >> n ), Efficient use of sparsity property of β ⋆ ( β ⋆ is s -sparse if at most s elements are non null). Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

Introduction: prediction in high dimension Linear regression: goals & settings Penalization and Lasso Linear regression: least squares Exponentially weighted average Least squares method Ordinary least squares (OLS) estimator is defined by: ˆ β ∈ R p � Y − X β � 2 β OLS = arg min 2 . OLS minimizes the sum of the squares of the residuals. Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

Introduction: prediction in high dimension Linear regression: goals & settings Penalization and Lasso Linear regression: least squares Exponentially weighted average Least squares method Ordinary least squares (OLS) estimator is defined by: ˆ β ∈ R p � Y − X β � 2 β OLS = arg min 2 . OLS minimizes the sum of the squares of the residuals. Overfitting . If p is very large, OLS has poor prediction results: Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

Introduction: prediction in high dimension Linear regression: goals & settings Penalization and Lasso Linear regression: least squares Exponentially weighted average Least squares method Ordinary least squares (OLS) estimator is defined by: ˆ β ∈ R p � Y − X β � 2 β OLS = arg min 2 . OLS minimizes the sum of the squares of the residuals. Overfitting . If p is very large, OLS has poor prediction results: There is not a unique solution when p > n , Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

Introduction: prediction in high dimension Linear regression: goals & settings Penalization and Lasso Linear regression: least squares Exponentially weighted average Least squares method Ordinary least squares (OLS) estimator is defined by: ˆ β ∈ R p � Y − X β � 2 β OLS = arg min 2 . OLS minimizes the sum of the squares of the residuals. Overfitting . If p is very large, OLS has poor prediction results: There is not a unique solution when p > n , Does not detect meaningful features among all features, Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

Introduction: prediction in high dimension Linear regression: goals & settings Penalization and Lasso Linear regression: least squares Exponentially weighted average Least squares method Ordinary least squares (OLS) estimator is defined by: ˆ β ∈ R p � Y − X β � 2 β OLS = arg min 2 . OLS minimizes the sum of the squares of the residuals. Overfitting . If p is very large, OLS has poor prediction results: There is not a unique solution when p > n , Does not detect meaningful features among all features, Performance is focus on fitting the data not predicting labels. Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

Penalized regression Introduction: prediction in high dimension ℓ 0 penalization Penalization and Lasso A trade off between sparsity and convexity Exponentially weighted average Lasso and oracle inequality Penalized regression In our case, a good estimator has the following properties: Guarantees on prediction results, Use sparsity assumption to manage p > n , Computationnaly fast (of paramount importance when p is large). Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

Penalized regression Introduction: prediction in high dimension ℓ 0 penalization Penalization and Lasso A trade off between sparsity and convexity Exponentially weighted average Lasso and oracle inequality Penalized regression In our case, a good estimator has the following properties: Guarantees on prediction results, Use sparsity assumption to manage p > n , Computationnaly fast (of paramount importance when p is large). Penalized regression is a method that combines the usual fitting term with a penalty term : � � ˆ � Y − X β � 2 β pen = arg min 2 + λ P ( β ) , β ∈ R p P is the penalty function and λ ≥ 0 controls the trade off between the two terms. Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

Penalized regression Introduction: prediction in high dimension ℓ 0 penalization Penalization and Lasso A trade off between sparsity and convexity Exponentially weighted average Lasso and oracle inequality Subset selection with a ℓ 0 penalization An intuitive candidate would be a penalization based on ℓ 0 pseudo-norm (the sparsity level): p � � β � 0 = ✶ β i � = 0 . i = 1 � � ˆ � Y − X β � 2 2 + λ � β � 0 β ℓ 0 = arg min . β ∈ R p Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

Penalized regression Introduction: prediction in high dimension ℓ 0 penalization Penalization and Lasso A trade off between sparsity and convexity Exponentially weighted average Lasso and oracle inequality Subset selection with a ℓ 0 penalization An intuitive candidate would be a penalization based on ℓ 0 pseudo-norm (the sparsity level): p � � β � 0 = ✶ β i � = 0 . i = 1 � � ˆ � Y − X β � 2 2 + λ � β � 0 β ℓ 0 = arg min . β ∈ R p The penalty forces many elements of ˆ β to be null. It chooses the most important features. Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

Penalized regression Introduction: prediction in high dimension ℓ 0 penalization Penalization and Lasso A trade off between sparsity and convexity Exponentially weighted average Lasso and oracle inequality Subset selection with a ℓ 0 penalization An intuitive candidate would be a penalization based on ℓ 0 pseudo-norm (the sparsity level): p � � β � 0 = ✶ β i � = 0 . i = 1 � � ˆ � Y − X β � 2 2 + λ � β � 0 β ℓ 0 = arg min . β ∈ R p The penalty forces many elements of ˆ β to be null. It chooses the most important features. Due to the ℓ 0 pseudo-norm, the objective function is nonconvex. Hence, computational time grows exponentially with p . Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

Penalized regression Introduction: prediction in high dimension ℓ 0 penalization Penalization and Lasso A trade off between sparsity and convexity Exponentially weighted average Lasso and oracle inequality Choice of the penalization term Let q > 0, we consider the estimators � � ˆ � Y − X β � 2 2 + λ � β � q β q = arg min . q β ∈ R p If q <1 , the solution is sparse but the problem is nonconvex . Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

Penalized regression Introduction: prediction in high dimension ℓ 0 penalization Penalization and Lasso A trade off between sparsity and convexity Exponentially weighted average Lasso and oracle inequality Choice of the penalization term Let q > 0, we consider the estimators � � ˆ � Y − X β � 2 2 + λ � β � q β q = arg min . q β ∈ R p If q > 1 , the problem is convex but the solution is not sparse . Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

Penalized regression Introduction: prediction in high dimension ℓ 0 penalization Penalization and Lasso A trade off between sparsity and convexity Exponentially weighted average Lasso and oracle inequality Choice of the penalization term Let q > 0, we consider the estimators � � ˆ � Y − X β � 2 2 + λ � β � q β q = arg min . q β ∈ R p If q = 1 , the solution is sparse and the problem is convex . Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

Exponentially weighted aggregation Laplace prior for linear - PowerPoint PPT Presentation

Introduction: prediction in high dimension Penalization and Lasso Exponentially weighted average Exponentially weighted aggregation Laplace prior for linear regression Arnak Dalalyan, Edwin Grappin & Quentin Paris edwin.grappin@ensae.fr

JUST THE MATHS SLIDES NUMBER 16.2 LAPLACE TRANSFORMS 2 (Inverse Laplace Transforms) by

Topic 9: The Laplace Transform o Introduction o Laplace Transform & Examples o Region of

TOC Chapter 4. The Laplace Transform [part 1] 4.1 Preliminaries 4.2 Laplace Transform 4.3

Weighted graphs Weighted graphs Weighted graphs Weighted graphs Graphs with numbers, called

JUST THE MATHS SLIDES NUMBER 16.7 LAPLACE TRANSFORMS 7 (An appendix) by A.J.Hobson One

JUST THE MATHS SLIDES NUMBER 16.1 LAPLACE TRANSFORMS 1 (Definitions and rules) by

Laplace Transforms e st f ( t ) dt . Definition 1 (Laplace Transform) . L [ f ( t )] =

Chapter 7: The Laplace Transform Part 1 Department of Electrical Engineering National Taiwan

Signal and Systems Chapter 9: Laplace Transform Motivation and Definition of the (Bilateral)

Weighted graphs 2 Weighted graphs So far we have only considered weighted graphs with

Weighted graphs 3 Weighted graph Edges in weighted graph are assigned a weight: w(v 1 , v 2 ),

18.175: Lecture 12 DeMoivre-Laplace and weak convergence Scott Sheffield MIT 1 18.175 Lecture 12

Chapter 7: The Laplace Transform Department of Electrical Engineering National Taiwan University

Chapter 7: The Laplace Transform Department of Electrical Engineering National Taiwan University

Continuous-time systems 1 March 2, 2015 Continuous-time systems 1 Linear differential equations

Topic # 1 Laplace transform Reference textbook : Control Systems, Dhanesh N. Manik, Cengage

$TITLE: M5-1.GMS, ordinary least squares using NLP $ONTEXT there are I observations on two

Introduction to Big Data and Machine Learning OLS matrix derivation Dr. Mihail August 26, 2019

Explaining Success in Sports Competitions: Paired Comparison Methods with Explanatory Variables

An exercise in separating client-specific parameters from your program Erik Tilanus The

Lecture 9: Regularized/penalized regression Felix Held, Mathematical Sciences MSA220/MVE440

LEARNING Outline Linear Models 1D Ordinary Least Squares (OLS) Solution of OLS

Regression Applied Bayesian Statistics Dr. Earvin Balderama Department of Mathematics &

Introduction to regression Supervised Learning with scikit-learn Boston housing data In [1]:

Sambuz

Useful Links

Newsletter

Mail Us

Exponentially weighted aggregation Laplace prior for linear - PowerPoint PPT Presentation

Introduction: prediction in high dimension Penalization and Lasso Exponentially weighted average Exponentially weighted aggregation Laplace prior for linear regression Arnak Dalalyan, Edwin Grappin & Quentin Paris edwin.grappin@ensae.fr

JUST THE MATHS SLIDES NUMBER 16.2 LAPLACE TRANSFORMS 2 (Inverse Laplace Transforms) by

Topic 9: The Laplace Transform o Introduction o Laplace Transform &amp; Examples o Region of

TOC Chapter 4. The Laplace Transform [part 1] 4.1 Preliminaries 4.2 Laplace Transform 4.3

Weighted graphs Weighted graphs Weighted graphs Weighted graphs Graphs with numbers, called

JUST THE MATHS SLIDES NUMBER 16.7 LAPLACE TRANSFORMS 7 (An appendix) by A.J.Hobson One

JUST THE MATHS SLIDES NUMBER 16.1 LAPLACE TRANSFORMS 1 (Definitions and rules) by

Laplace Transforms e st f ( t ) dt . Definition 1 (Laplace Transform) . L [ f ( t )] =

Chapter 7: The Laplace Transform Part 1 Department of Electrical Engineering National Taiwan

Signal and Systems Chapter 9: Laplace Transform Motivation and Definition of the (Bilateral)

Weighted graphs 2 Weighted graphs So far we have only considered weighted graphs with

Weighted graphs 3 Weighted graph Edges in weighted graph are assigned a weight: w(v 1 , v 2 ),

18.175: Lecture 12 DeMoivre-Laplace and weak convergence Scott Sheffield MIT 1 18.175 Lecture 12

Chapter 7: The Laplace Transform Department of Electrical Engineering National Taiwan University

Chapter 7: The Laplace Transform Department of Electrical Engineering National Taiwan University

Continuous-time systems 1 March 2, 2015 Continuous-time systems 1 Linear differential equations

Topic # 1 Laplace transform Reference textbook : Control Systems, Dhanesh N. Manik, Cengage

$TITLE: M5-1.GMS, ordinary least squares using NLP $ONTEXT there are I observations on two

Introduction to Big Data and Machine Learning OLS matrix derivation Dr. Mihail August 26, 2019

Explaining Success in Sports Competitions: Paired Comparison Methods with Explanatory Variables

An exercise in separating client-specific parameters from your program Erik Tilanus The

Lecture 9: Regularized/penalized regression Felix Held, Mathematical Sciences MSA220/MVE440

LEARNING Outline Linear Models 1D Ordinary Least Squares (OLS) Solution of OLS

Regression Applied Bayesian Statistics Dr. Earvin Balderama Department of Mathematics &amp;

Introduction to regression Supervised Learning with scikit-learn Boston housing data In [1]:

Sambuz

Useful Links

Newsletter

Mail Us

Topic 9: The Laplace Transform o Introduction o Laplace Transform & Examples o Region of

Regression Applied Bayesian Statistics Dr. Earvin Balderama Department of Mathematics &