Outline Motivation Sparsity in static regressions Ridge and lasso - PowerPoint PPT Presentation

The Illusion of the Illusion of Sparsity 2 Bruno Fava 1 Hedibert F. Lopes 2 1 Northwestern University, Illinois, USA 2 Professor of Statistics and Econometrics Head of the Center of Statistics, Data Science and Decision INSPER, S˜ ao Paulo, Brazil August/September 2020 2 Giannone, Lenza and Primiceri (2020) Economic predictions with big data: the illusion of sparsity . Our manuscript and these slides can be found in my page at hedibert.org

Outline Motivation Sparsity in static regressions Ridge and lasso regressions Spike and slab model (or SMN model) SSVS and scaled SSVS priors Other mixture priors Toy example: R package Bayeslm Revisiting GLP The sparse-inducing linear model Their findings An important drawback Experiments I. Adding meaningless variables II. Fatter tails via Student’s t III. A simulation exercise

Sparsity in Economics We revisit the paper Economic predictions with big data: the illusion of sparsity by Giannone, Lenza and Primiceri, whose July 2020 abstract says: We compare sparse and dense representations of predictive models in macroeconomics, microeconomics and finance. To deal with a large number of possible predictors, we specify a prior that allows for both variable selection and shrinkage. The posterior distribution does not typically concentrate on a single sparse model, but on a wide set of models that often include many predictors. They conclude the paper saying: In economics, there is no theoretical argument suggesting that predictive models should in general include only a handful of predictors. As a consequence, the use of low-dimensional model representations can be justified only when supported by strong statistical evidence. They add that: Empirical support for low-dimensional models is generally weak. Predic- tive model uncertainty seems too pervasive to be treated as statistically negligible. The right approach to scientific reporting is thus to assess and fully convey this uncertainty, rather than understating it through the use of dogmatic (prior) assumptions favoring low dimensional models.

Our contribution We proposes a revision of the methods adopted by Giannone, Lenza and Primiceri. ◮ We analyze the posterior distribution of the included coefficients of the linear model. This was not explored by Giannone, Lenza and Primiceri. ◮ We add bogus predictors and observe correct exclusion only in a subset of the data sets. ◮ We extend their analysis with Student’s t prior for the regression coefficients. The heavier-tailed distribution was more restrictive in selecting possible predictors, and results once again corroborate with the thesis that the original Spike-and-Slab prior is unable to correctly allow and distinguish between shrinkage or sparsity. ◮ We developed a simulation exercise to check the performance of the original model and with the t-student modification in a totally controlled environment. Posterior inference reinforces the belief that their prior incorrectly induces shrinkage. Overall conclusion: Their Spike-and-Slab approach does not seem to be robust, leading to the illusion that sparsity is nonexistent, when it might in fact exist.

Outline Motivation Sparsity in static regressions Ridge and lasso regressions Spike and slab model (or SMN model) SSVS and scaled SSVS priors Other mixture priors Toy example: R package Bayeslm Revisiting GLP The sparse-inducing linear model Their findings An important drawback Experiments I. Adding meaningless variables II. Fatter tails via Student’s t III. A simulation exercise

Ridge and lasso regressions Throughout, we consider the standard Gaussian linear model, y t = β 1 x 1 t + β 2 x 2 t + · · · + β q x qt + ν t , where RSS= ( y − X β ) ′ ( y − X β ) is the residual sum of squares. ◮ Ridge regression Hoerl and Kennard [1970] - ℓ 2 penalty on β :   q   � ˆ  RSS + λ 2 β 2 λ 2 β ridge = arg min  , r ≥ 0 , r j β j =1 leading to ˆ β ridge = ( X ′ X + λ 2 r I q ) − 1 X ′ y . ◮ Lasso regression Tibshirani [1996] - ℓ 1 penalty on β :   q   � ˆ β lasso = arg min  RSS + λ l | β j |  , λ l ≥ 0 , β j =1 which can be solved by a coordinate gradient descent algorithm.

Ridge and lasso estimates are posterior modes! The posterior mode or the maximum a posteriori (MAP) is given by ˜ β mode = arg min {− 2 log p ( y | β ) − 2 log p ( β ) } β The ˆ β ridge estimate equals the posterior mode of the normal linear model with p ( β j ) ∝ exp {− 0 . 5 λ 2 r β 2 j } , which is a Gaussian distribution with location 0 and scale 1 /λ 2 r , N (0 , 1 /λ 2 r ). The mean is 0, the variance is 1 /λ 2 r and the excess kurtosis is 0. The ˆ β lasso estimate equals the posterior mode of the normal linear model with p ( β j ) ∝ exp {− 0 . 5 λ l | β j |} , which is a Laplace distribution with location 0 and scale 2 /λ l , Laplace(0 , 2 /λ l ). The mean is 0, the variance is 8 /λ 2 l and excess kurtosis is 3.

Spike and slab model (or scale mixture of normals) Ishwaran and Rao [2005] define a spike and slab model as a Bayesian model specified by the following prior hierarchy: ( y t | x t , β, σ 2 ) t β, σ 2 ) , N ( x ′ ∼ t = 1 , . . . , n ( β | ψ ) ∼ N (0 , diag( ψ )) ψ ∼ π ( d ψ ) σ 2 µ ( d σ 2 ) ∼ They go to say that “Lempers [1988] and Mitchell and Beauchamp [1988] were among the earliest to pioneer the spike and slab method. The expression ‘spike and slab’ referred to the prior for β used in their hierarchical formulation.”

Spike and slab model (or scale mixture of normals model) Regularization and variable selection are done by assuming independent prior distributions from the SMN class to each coefficient β j : β j | ψ j ∼ N (0 , ψ j ) and ψ j ∼ p ( ψ j ) so � p ( β j ) = p ( β j | ψ j ) p ( ψ j ) d ψ j . Mixing density p ( ψ j ) Marginal density p ( β j ) V ( β j ) Ex.kurtosis( β j ) ψ j = 1 /λ 2 N (0 , 1 /λ 2 1 /λ 2 r ) - (ridge) 0 r r IG ( η/ 2 , ητ 2 / 2) t η (0 , τ 2 ) η/ ( η − 2) τ 2 6 / ( η − 4) G (1 , λ 2 8 /λ 2 l / 8) Laplace(0 , 2 /λ l ) - (blasso) 3 l G ( ζ, 1 / (2 γ 2 )) NG ( ζ, γ 2 ) 2 ζγ 2 3 /ζ Griffin and Brown [2010] Normal-Gamma prior: 1 p ( β | ζ, γ 2 ) = √ π 2 ζ − 1 / 2 γ ζ +1 / 2 Γ( ζ ) | β | ζ − 1 / 2 K ζ − 1 / 2 ( | β | /γ ) , where K is the modified Bessel function of the 3rd kind.

Illustration Ridge: λ 2 r = 0 . 01 ⇒ Excess kurtosis=0 Student’s t : η = 5, τ 2 = 60 ⇒ Excess kurtosis=6 Blasso: λ 2 l = 0 . 08 ⇒ Excess kurtosis=3 NG: ξ = 0 . 5, γ 2 = 100 ⇒ Excess kurtosis=6 All variances are equal to 100. 0.12 −2 ridge Student's t 0.10 blasso NG −4 0.08 Log density Density 0.06 −6 0.04 −8 0.02 0.00 −10 −40 −20 0 20 40 −30 −20 −10 0 10 20 30 β β

Stochastic search variable selection (SSVS) prior SSVS George and McCulloch [1993]: For small τ > 0 and c >> 1, β | ω, τ 2 , c 2 ∼ (1 − ω ) N (0 , τ 2 ) + ω N (0 , c 2 τ 2 ) . � �� spike slab SMN representation: β | ψ ∼ N (0 , ψ ) and ψ | ω, τ 2 , c 2 ∼ (1 − ω ) δ τ 2 ( ψ ) + ωδ c 2 τ 2 ( ψ )

Scaled SSVS prior = normal mixture of IG prior NMIG prior of Ishwaran and Rao [2005]: For υ 0 ≪ υ 1 , β | K , τ 2 ∼ N (0 , K τ 2 ) , K | ω, υ 0 , υ 1 ∼ (1 − ω ) δ υ 0 ( K ) + ωδ υ 1 ( K ) , (1) τ 2 ∼ IG ( a τ , b τ ) . ◮ Large ω implies non-negligible effects. ◮ The scale ψ = K τ 2 ∼ (1 − ω ) IG ( a τ , υ 0 b τ ) + ω IG ( a τ , υ 1 b τ ). ◮ p ( β ) is a two component mixture of scaled Student’s t distributions.

Other mixture priors Fr¨ uhwirth-Schnatter and Wagner [2011]: absolutely continuous priors β ∼ (1 − ω ) p spike ( β ) + ω p slab ( β ) , (2) Let Q > 0 a scale parameter and r = Var spike ( β ) Var slab ( β ) ≪ 1 , then the mixing densities for ψ , 1. IG: ψ ∼ (1 − ω ) IG ( ν, rQ ) + ω IG ( ν, Q ), 2. Exp: ψ ∼ (1 − ω ) Exp (1 / 2 rQ ) + ω Exp (1 / 2 Q ) , 3. Gamma: ψ ∼ (1 − ω ) G ( a , 1 / 2 rQ ) + ω G ( a , 1 / 2 Q ), leads to the marginal densities for β , 1. Scaled-t: β ∼ (1 − ω ) t 2 ν (0 , rQ /ν ) + ω t 2 ν (0 , Q /ν ), 2. Laplace: β ∼ (1 − ω ) Lap ( √ rQ ) + ω Lap ( √ Q ), 3. NG: β ∼ (1 − ω ) NG ( a , r , Q ) + ω NG ( a , Q ).

Outline Motivation Sparsity in static regressions Ridge and lasso - PowerPoint PPT Presentation

The Illusion of the Illusion of Sparsity 2 Bruno Fava 1 Hedibert F. Lopes 2 1 Northwestern University, Illinois, USA 2 Professor of Statistics and Econometrics Head of the Center of Statistics, Data Science and Decision INSPER, S ao Paulo,

Ins Domingues Breast Cancer Workshop April 7th 2015 Outline Outline Outline Outline

Presentation Preparation Outline Speech Outline Template ***Use this outline to guide you in

Outline for St Outline for St Outline for

Beob Kyun Kim, S oonwook Hwang {kyun, hwang}@ kisti.re.kr KIS TI, Korea Outline Outline

Catherine Revels, World Bank November 2009 Presentation outline Presentation outline

Battlestar Galactica Battlestar Galactica Galactica Battlestar Outline Outline Outline

Outline 2 Outline 2 ZSim core simulation techniques Outline 2 ZSim core simulation

Appendix J: Capstone Presentation Outline Revised Spring 2016 CAPSTONE PRESENTATION OUTLINE This

PT1 TMP Presentation Outline 1 Group Members: ___________________________________ Use this outline

Broverview Outline 2 Outline Philosophy and Architecture A framework for network traffic

Xingqian Peng, Huaqiao University, China Presented by Zhen Wu Presented by Zhen Wu October 30,2011

1 Web Application Development 2 3 Web Application Development CSS Outline An outline is a

Lecture Outline Strengthening Induction Hypothesis. Lecture Outline Strengthening Induction

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

High Dimensional Approximation - Outline Background and Sources Wolfgang Dahmen Seminar: USC,

Outline Outline Deaf and Hearing Impaired Deaf and Hearing Impaired Physical Structures of

The Art of Consistent SDN Updates Stefan Schmid Aalborg University The Art of Consistent SDN

Develo lopment Plan for r a Fis Fission and Fu Fusion Powered Propulsion System to Reach

New Rewriter Features in FGL Sol Swords Centaur Technology, Inc. ACL2 Workshop 2020 Paper:

STEVE o E on F FHI HIR NAHDO 2020 Annual Meeting Caprice Edwards Systems Director

Nesting Transactions: Why and What do we need? J. Eliot B. Moss University of Massachusetts

Population pressures Outline Basic data 1. Malthusian theories 2. Demographic transitions 3.

LArTPC Testbeam: CAPTAIN and LArIAT Jason St. John, University of Cincinnati On behalf of the

a = b c . . . C ONTENT G ENERAL R ECURSION Intro & motivation, getting started with

Outline Motivation Sparsity in static regressions Ridge and lasso - PowerPoint PPT Presentation

The Illusion of the Illusion of Sparsity 2 Bruno Fava 1 Hedibert F. Lopes 2 1 Northwestern University, Illinois, USA 2 Professor of Statistics and Econometrics Head of the Center of Statistics, Data Science and Decision INSPER, S ao Paulo,

Ins Domingues Breast Cancer Workshop April 7th 2015 Outline Outline Outline Outline

Presentation Preparation Outline Speech Outline Template ***Use this outline to guide you in

Outline for St Outline for St Outline for

Beob Kyun Kim, S oonwook Hwang {kyun, hwang}@ kisti.re.kr KIS TI, Korea Outline Outline

Catherine Revels, World Bank November 2009 Presentation outline Presentation outline

Battlestar Galactica Battlestar Galactica Galactica Battlestar Outline Outline Outline

Outline 2 Outline 2 ZSim core simulation techniques Outline 2 ZSim core simulation

Appendix J: Capstone Presentation Outline Revised Spring 2016 CAPSTONE PRESENTATION OUTLINE This

PT1 TMP Presentation Outline 1 Group Members: ___________________________________ Use this outline

Broverview Outline 2 Outline Philosophy and Architecture A framework for network traffic

Xingqian Peng, Huaqiao University, China Presented by Zhen Wu Presented by Zhen Wu October 30,2011

1 Web Application Development 2 3 Web Application Development CSS Outline An outline is a

Lecture Outline Strengthening Induction Hypothesis. Lecture Outline Strengthening Induction

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

High Dimensional Approximation - Outline Background and Sources Wolfgang Dahmen Seminar: USC,

Outline Outline Deaf and Hearing Impaired Deaf and Hearing Impaired Physical Structures of

The Art of Consistent SDN Updates Stefan Schmid Aalborg University The Art of Consistent SDN

Develo lopment Plan for r a Fis Fission and Fu Fusion Powered Propulsion System to Reach

New Rewriter Features in FGL Sol Swords Centaur Technology, Inc. ACL2 Workshop 2020 Paper:

STEVE o E on F FHI HIR NAHDO 2020 Annual Meeting Caprice Edwards Systems Director

Nesting Transactions: Why and What do we need? J. Eliot B. Moss University of Massachusetts

Population pressures Outline Basic data 1. Malthusian theories 2. Demographic transitions 3.

LArTPC Testbeam: CAPTAIN and LArIAT Jason St. John, University of Cincinnati On behalf of the

a = b c . . . C ONTENT G ENERAL R ECURSION Intro &amp; motivation, getting started with

a = b c . . . C ONTENT G ENERAL R ECURSION Intro & motivation, getting started with