Ratemaking application of Bayesian LASSO with conjugate hyperprior - PowerPoint PPT Presentation

Ratemaking application of Bayesian LASSO with conjugate hyperprior Himchan Jeong and Emiliano A. Valdez University of Connecticut Actuarial Science Seminar Department of Mathematics University of Illinois at Urbana-Champaign 26 October 2018 Jeong/Valdez (U. of Connecticut) Bayesian LASSO with conjugate hyperprior 26 Oct 2018 1 / 31

Outline of talk Introduction Regularization or penalized least squares Bayesian LASSO Bayesian LASSO with conjugate hyperprior LAAD penalty Comparing the different penalty functions Optimization routine Model calibration The two-part model Data Estimation results: frequency Validation: frequency Estimation results: average severity Validation: average severity Conclusion Jeong/Valdez (U. of Connecticut) Bayesian LASSO with conjugate hyperprior 26 Oct 2018 2 / 31

Introduction Regularization or penalized least squares Regularization or least squares penalty L q penalty function: � � || Y − Xβ || 2 + λ || β || q ˜ β = argmin , β where λ is the regularization or penalty parameter and || β || q = � p j =1 | β j | q . Special cases include: LASSO (Least Absolute Shrinkage and Selection Operator): q = 1 Ridge regression: q = 2 Interpretation is to penalize unreasonable values of β . LASSO optimization problem: � p � || Y − Xβ || 2 � min subject to j =1 | β j | = || β || 1 ≤ t β See Tibshirani (1996) Jeong/Valdez (U. of Connecticut) Bayesian LASSO with conjugate hyperprior 26 Oct 2018 3 / 31

Introduction Regularization or penalized least squares A motivation for regularization: correlated predictors Let y be a response variable with potential predictors x 1 , x 2 , and x 3 and consider the case when predictors are highly correlated. > x1 <- rnorm(50); x2 <- rnorm(50,mean=x1,sd=0.05); x3 <- rnorm(50,mean=-x1,sd=0.02) > y <- rnorm(50,mean=-2+x1+x2-2*x3); x <- data.frame(x1,x2,x3); x <- as.matrix(x) > # correlation matrix > upper x1 x2 x3 x1 1 x2 0.9984 1 x3 -0.9997 -0.9982 1 Fitting the least squares regression: > coef(lm(y~x1+x2+x3)) (Intercept) x1 x2 x3 -2.3347410 -16.5839237 0.2353327 -19.9617757 Fitting ridge regression and lasso: > library(glmnet) > lm.ridge <- glmnet(x,y,alpha=0,lambda=0.1,standardize=FALSE); t(coef(lm.ridge)) 1 x 4 sparse Matrix of class "dgCMatrix" (Intercept) x1 x2 x3 s0 -2.359547 1.114166 1.104729 -1.356508 > lm.lasso <- glmnet(x,y,alpha=1,lambda=0.1,standardize=FALSE); t(coef(lm.lasso)) 1 x 4 sparse Matrix of class "dgCMatrix" (Intercept) x1 x2 x3 s0 -2.381575 . . -3.496807 Jeong/Valdez (U. of Connecticut) Bayesian LASSO with conjugate hyperprior 26 Oct 2018 4 / 31

Introduction Bayesian LASSO Bayesian interpretation of LASSO (Naive) Park and Casella (2008) demonstrated that we may interpret LASSO in a Bayesian framework as follows: Y | β ∼ N ( Xβ, σ 2 I n ) , β i | λ ∼ Laplace (0 , 2 /λ ) so that p ( β i | λ ) = λ 4 e − λ | β i | . According to this specification, we may write out the likelihood for β as � � �� n i =1 ( y i − X i β ) 2 � − 1 L ( β | Y, X, λ ) ∝ exp − λ || β || 1 2 σ 2 and the log-likelihood as �� n i =1 ( y i − X i β ) 2 � ℓ ( β | Y, X, λ ) = − 1 − λ || β || 1 + Constant . 2 σ 2 Jeong/Valdez (U. of Connecticut) Bayesian LASSO with conjugate hyperprior 26 Oct 2018 5 / 31

Bayesian LASSO with conjugate hyperprior Bayesian LASSO with conjugate hyperprior Choice of the optimal λ is critical in penalized regression. Here, let us assume that Y | β ∼ N ( Xβ, σ 2 I n ) , ∼ Gamma ( r/σ 2 − 1 , 1) . λ j | r i.i.d. β j | λ j ∼ Laplace (0 , 2 /λ j ) , In other words, the ‘hyperprior’ of λ follows a gamma distribution so that p ( λ | r ) = λ ( r/σ 2 ) − p − 1 e − λ / Γ( r/σ 2 − p ) , then we have � i =1 ( y i − X i β ) 2 �� n − 1 L ( β, λ 1 , . . . , λ p | Y, X, r ) ∝ exp × 2 σ 2 � p j =1 exp ( − λ j [ | β j | + 1]) λ r/σ 2 − 1 . j Jeong/Valdez (U. of Connecticut) Bayesian LASSO with conjugate hyperprior 26 Oct 2018 6 / 31

Bayesian LASSO with conjugate hyperprior LAAD penalty Log adjusted absolute deviation (LAAD) penalty Integrating out the λ and taking the log of the likelihood, we get �� n � � p ℓ ( β | Y, X, r ) = − 1 i =1 ( y i − X i β ) 2 + 2 r j =1 log(1 + | β j | ) + Const . 2 σ 2 Therefore, we have a new formulation for our penalized least squares problem. This gives rise to what we call LAAD penalty function: � p || β || L = j =1 log(1 + | β j | ) so that || y − Xβ || 2 + 2 r || β || L . � β = argmin β Jeong/Valdez (U. of Connecticut) Bayesian LASSO with conjugate hyperprior 26 Oct 2018 7 / 31

Bayesian LASSO with conjugate hyperprior LAAD penalty Analytic solution for the univariate case To understand the characteristics of the new penalty, consider the simple example when X ′ X = I , in other words, design matrix is orthonormal so that it is enough to solve the following: 1 2( z j − θ j ) 2 + r log(1 + | θ j | ) . � θ j = argmin θ j 2 ( z − θ ) 2 + r log(1 + | θ | ) , then we can show that By setting ℓ ( θ | r, z ) = 1 minimizer will be given as � θ = θ ∗ ✶ {| z |≥ z ∗ ( r ) ∨ r ) } where z ∗ ( r ) is the unique solution of ∆( z | r ) = 1 2( θ ∗ ) 2 − θ ∗ z + r log(1 + | θ ∗ | ) = 0 , �� θ ∗ = 1 ( | z | − 1) 2 + 4 | z | − 4 r − 1 2( z + sgn ( z ) . Note that � θ converges to z as | z | tends to ∞ . Jeong/Valdez (U. of Connecticut) Bayesian LASSO with conjugate hyperprior 26 Oct 2018 8 / 31

Bayesian LASSO with conjugate hyperprior LAAD penalty Sketch of the proof We have � θ × z ≥ 0 so we start from the case that z is nonnegative number and we have the following; r r ℓ ′ ( θ | r, z ) = ( θ − z ) + 1 + θ, ℓ ′′ ( θ | r, z ) = 1 − (1 + θ ) 2 , � ( z − 1) 2 + 4 z − 4 r ℓ ′ ( θ ∗ ) = 0 ⇔ θ ∗ = z − 1 + 2 2 Case (1) z ≥ r ⇒ ℓ ′′ ( θ ∗ | r, z ) > 0 so that θ ∗ is the local minimum. Moreover, ℓ ′ (0 | r, z ) ≤ 0 implies θ ∗ is the global minimum. Case (2) z < r, z < 1 ⇒ θ ∗ < 0 so that ℓ ′ ( θ | r, z ) > 0 ∀ θ ≥ 0 . Therefore, ℓ ( θ | r, z ) strictly increasing and � θ = 0 . 2 ) 2 ⇒ in this case, θ ∗ / 2 ) 2 ≥ z , Case (3) r ≥ ( z +1 ∈ R . Moreover, ( z +1 ℓ ′ (0 | r, z ) = r − z ≥ 0 and ℓ ′ ( θ | r, z ) > 0 ∀ θ > 0 . Therefore, � θ = 0 . Jeong/Valdez (U. of Connecticut) Bayesian LASSO with conjugate hyperprior 26 Oct 2018 9 / 31

Bayesian LASSO with conjugate hyperprior LAAD penalty Contour map of � θ 50 ^ = θ ∗ θ ^ = 0 θ 40 30 z 20 10 0 0 10 20 30 40 50 r Figure 1: Distribution of the optimizer for the three cases Jeong/Valdez (U. of Connecticut) Bayesian LASSO with conjugate hyperprior 26 Oct 2018 10 / 31

Bayesian LASSO with conjugate hyperprior LAAD penalty - continued 2 ) 2 ⇒ First, we show that ℓ ′′ ( θ ∗ | r, z ) > 0 so Case (4) 1 ≤ z < r < ( z +1 that θ ∗ is a local minimum of ℓ ( θ | r, z ) and � θ would be either θ ∗ or 0 . In this case, we compute ∆( z | r ) = ℓ ( θ ∗ | r, z ) − ℓ (0 | r, z ) and � θ ∗ , if ∆( z | r ) < 0 � θ = , 0 if ∆( z | r ) > 0 � � ∂θ ∗ r θ ∗ − z + ∂z − θ ∗ = − θ ∗ < 0 . ∆ ′ ( z | r ) = 1 + θ ∗ Thus, ∆( z | r ) is strictly decreasing w.r.t. z and ∆( z | r ) = 0 has a unique solution because ∆( z | r ) < 0 ⇔ � θ = θ ∗ , if z = r and θ = 0 , if z = 2 √ r − 1 . ∆( z | r ) > 0 ⇔ � Jeong/Valdez (U. of Connecticut) Bayesian LASSO with conjugate hyperprior 26 Oct 2018 11 / 31

Bayesian LASSO with conjugate hyperprior LAAD penalty - continued 50 ^ = θ ∗ θ ^ = 0 θ 40 30 z 20 10 0 0 10 20 30 40 50 r Figure 2: Distribution of the optimizer for all the cases Jeong/Valdez (U. of Connecticut) Bayesian LASSO with conjugate hyperprior 26 Oct 2018 12 / 31

Bayesian LASSO with conjugate hyperprior Comparing the different penalty functions Estimate behavior L2 Penalty − Ridge L1 Penalty − LASSO LAAD Penalty 6 6 6 4 4 4 2 2 2 beta_hat beta_hat beta_hat 0 0 0 −2 −2 −2 −4 −4 −4 −6 −6 −6 −6 −4 −2 0 2 4 6 −6 −4 −2 0 2 4 6 −6 −4 −2 0 2 4 6 beta beta beta Figure 3: Estimate behavior for different penalties Jeong/Valdez (U. of Connecticut) Bayesian LASSO with conjugate hyperprior 26 Oct 2018 13 / 31

Bayesian LASSO with conjugate hyperprior Comparing the different penalty functions Penalty regions L2 penalty - Ridge L1 penalty - LASSO LAAD penalty 10 10 10 5 5 5 0 0 0 -5 -5 -5 -10 -10 -10 -10 -5 0 5 10 -10 -5 0 5 10 -10 -5 0 5 10 Figure 4: Penalty regions for different penalties Jeong/Valdez (U. of Connecticut) Bayesian LASSO with conjugate hyperprior 26 Oct 2018 14 / 31

Ratemaking application of Bayesian LASSO with conjugate hyperprior - PowerPoint PPT Presentation

Ratemaking application of Bayesian LASSO with conjugate hyperprior Himchan Jeong and Emiliano A. Valdez University of Connecticut Actuarial Science Seminar Department of Mathematics University of Illinois at Urbana-Champaign 26 October 2018

Ridge/Lasso Regression, Model selection Xuezhi Wang Computer Science Department Carnegie Mellon

Ground Rules The purpose of this session is to educate actuaries in various methods used to

PM-11: Multilevel Models, Credibility Theory, and Ratemaking Fred Klinker, ISO CAS Ratemaking

Sparse CCA using Lasso Anastasia Lykou & Joe Whittaker Department of Mathematics and

A practical tour of optimization algorithms for the Lasso Alexandre Gramfort

Sparse Exponential Weighting as an alternative to LASSO and Dantzig selector Alexandre Tsybakov

Using Stata 16s lasso features for prediction and inference Di Liu StataCorp August, 2019

Why Geometric Progression LASSO Method in Selecting the LASSO How Is Selected: . . . Natural

Choosing Priors Probability Intervals 18.05 Spring 2014 Conjugate priors A prior is conjugate

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Big Data - Lecture 2 High dimensional regression with the Lasso S. Gadat Toulouse, Octobre 2014

Executive Director 1 Disclaimer This ratemaking presentation provides a high level overview of

3/15/11 Energy limits and their impact on ratemaking Gail Tverberg, FCAS, MAAA Draft,

Frequency and Severity vs. Loss Cost Modeling vs. Loss Cost Modeling CAS 2012 Ratemaking and

CAS Ratemaking and Product Management Seminar Effective Predictive Models Senior Leadership

Existence and Comparisons for BSDEs in general spaces Samuel N. Cohen (joint work with Robert J.

PROBABILISTIC ASPECTS OF ARBITRAGE IOANNIS KARATZAS INTECH Investment Management LLC, Princeton,

QuantLib Erlk onige Peter Caspers IKB December 4th 2014 Peter Caspers (IKB) QuantLib Erlk

Maximum Entropy Tagging (for the Maximum Entropy method itself, refer to NPFL067 added slides

Syntax & Grammars CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT marine@cs.umd.edu T

Logic Programming with Names and Binding James Cheney September 28, 2004 1 Prologue 2 Gabbay

(A)symmetries in Asante Twi object extraction Johannes Hein & Doreen Georgi

Economics 2 Professor Christina Romer Spring 2019 Professor David Romer LECTURE 17 April 2,

Ratemaking application of Bayesian LASSO with conjugate hyperprior - PowerPoint PPT Presentation

Ratemaking application of Bayesian LASSO with conjugate hyperprior Himchan Jeong and Emiliano A. Valdez University of Connecticut Actuarial Science Seminar Department of Mathematics University of Illinois at Urbana-Champaign 26 October 2018

Ridge/Lasso Regression, Model selection Xuezhi Wang Computer Science Department Carnegie Mellon

Ground Rules The purpose of this session is to educate actuaries in various methods used to

PM-11: Multilevel Models, Credibility Theory, and Ratemaking Fred Klinker, ISO CAS Ratemaking

Sparse CCA using Lasso Anastasia Lykou &amp; Joe Whittaker Department of Mathematics and

A practical tour of optimization algorithms for the Lasso Alexandre Gramfort

Sparse Exponential Weighting as an alternative to LASSO and Dantzig selector Alexandre Tsybakov

Using Stata 16s lasso features for prediction and inference Di Liu StataCorp August, 2019

Why Geometric Progression LASSO Method in Selecting the LASSO How Is Selected: . . . Natural

Choosing Priors Probability Intervals 18.05 Spring 2014 Conjugate priors A prior is conjugate

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Big Data - Lecture 2 High dimensional regression with the Lasso S. Gadat Toulouse, Octobre 2014

Executive Director 1 Disclaimer This ratemaking presentation provides a high level overview of

3/15/11 Energy limits and their impact on ratemaking Gail Tverberg, FCAS, MAAA Draft,

Frequency and Severity vs. Loss Cost Modeling vs. Loss Cost Modeling CAS 2012 Ratemaking and

CAS Ratemaking and Product Management Seminar Effective Predictive Models Senior Leadership

Existence and Comparisons for BSDEs in general spaces Samuel N. Cohen (joint work with Robert J.

PROBABILISTIC ASPECTS OF ARBITRAGE IOANNIS KARATZAS INTECH Investment Management LLC, Princeton,

QuantLib Erlk onige Peter Caspers IKB December 4th 2014 Peter Caspers (IKB) QuantLib Erlk

Maximum Entropy Tagging (for the Maximum Entropy method itself, refer to NPFL067 added slides

Syntax &amp; Grammars CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT marine@cs.umd.edu T

Logic Programming with Names and Binding James Cheney September 28, 2004 1 Prologue 2 Gabbay

(A)symmetries in Asante Twi object extraction Johannes Hein &amp; Doreen Georgi

Economics 2 Professor Christina Romer Spring 2019 Professor David Romer LECTURE 17 April 2,

Sparse CCA using Lasso Anastasia Lykou & Joe Whittaker Department of Mathematics and

Syntax & Grammars CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT marine@cs.umd.edu T

(A)symmetries in Asante Twi object extraction Johannes Hein & Doreen Georgi