Estimation and Model Selection in Dirichlet Regression Andr Camargo - PowerPoint PPT Presentation

Estimation and Model Selection in Dirichlet Regression André Camargo 1 , Julio Michael Stern 1 , Marcelo de Souza Lauretto 2 1 Institute of Mathematis and Statistics, 2 School of Arts, Sciences and Humanities, University of Sao Paulo Conference on Inductive Statistics A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

Introduction ◮ Compositional data: vectors whose components are the proportions or percentages of some whole. ◮ Sample Space: S D , ( D − 1 ) − dimensional simplex: S D = { z = ( z 1 , z 2 . . . z D ) : z > 0 , z1 = 1 } . ◮ Many applications, e.g: ◮ Market share analysis ◮ Election forecasts ◮ Soil composition analysis ◮ Household expenses composition ◮ Aitchison(1986) developed a methodology for compositional data analysis based on logistic normal distributions. ◮ Here we focus on Dirichlet Regression. A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

Dirichlet Regression ◮ Let X = [ x 1 • ; x 2 • ; . . . ; x n • ] , Y = [ y 1 • ; y 2 • ; . . . ; y n • ] be a sample of observations where y i • ∈ S D and x i • ∈ R C , i = 1 , 2 , . . . , n . ◮ The goal is to build a regression predictor for y i • as a function of x i • . ◮ We assume that y i • ∼ D ( α 1 ( x i • ) , . . . , α D ( x i • )) , where each α j ( x i • ) is a positive function of x i • . ◮ In this work: α j ( x i • ) = x i , 1 β 1 , j + x i , 2 β 2 , j + ... + x i , C β C , j = x i • β • j . ◮ Parameters to be estimated: β = ( β k , j , k = 1 . . . C , j = 1 . . . D ) , subject to the constraint α ( x i • ) > 0. ◮ Model selection can be done by testing β k , j = 0 for some pairs ( k , j ) ∈ { 1 . . . C } × { 1 . . . D } . A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

x 1 , 1 x 1 , 2 x 1 , C y 1 , 1 y 1 , 2 y 1 , D  . . .   . . .  x 2 , 1 x 2 , 2 x 2 , C y 2 , 1 y 2 , 2 y 2 , D . . . . . .     X =  Y =  . . .   . . .  ... ... . . . . . .     . . . . . .    x n , 1 x n , 2 x n , C y n , 1 y n , 2 y n , D . . . . . .  β 1 , 1 β 1 , 2 . . . β 1 , D  β 2 , 1 β 2 , 2 . . . β 2 , D   β = α = X β  . . .  ... . . .   . . .   β C , 1 β C , 2 . . . β C , D A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

Case study ◮ Arctic Lake Sediments dataset (Coakley & Rust, 1968): compositions of sand, silt and clay ( y ) for 39 sediment samples at different water depths ( x ). ◮ Interest in submodels of the complete second-order polynomial model on x , α j ( x ) = β 1 , j + β 2 , j x + β 3 , j x 2 , j = 1 . . . 3 . A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

Case study A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

Parameters Estimation ◮ Likelihood function: Given y 1 • . . . , y n • c.i.i.d. given β :   α j ( x i • ) − 1 y � n � D ij  , L ( β | X , Y ) =  Γ(Λ i ( x i • )) Γ( α j ( x i • )) i = 1 j = 1 where Λ i ( x i • ) = � D j = 1 α j ( x i • ) . ◮ Gradients: ∂ log L � n = � Γ ′ (Λ i ( x i • )) − Γ ′ ( α j ( x i • )) + x i , k log y i , j � ∂β k , j i = 1 Γ ′ : digamma function, Γ ′ ( u ) = ∂ log Γ ∂ u ( u ) . A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

◮ Fitting Dirichlet Distributions with constant parameters is straightforward via standard numerical methods. ◮ The difficulty arises when we attempt to extend the estimation to Dirichlet Regression. ◮ Starting values and regularization policies must be carefully chosen to assure the optimization convergence. ◮ Hijazi and Jernigan (2009) proposed a method for choosing starting values for the coefficients, which is based on: ◮ Drawing resamples of the original data; ◮ Fitting the resamples by least squares method. A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

Hijazi and Jernigan’s Method ◮ Hijazi and Jernigan’s method: 1. Draw r resamples with replacement from X and Y , each of size m ( m < n ). 2. For each resample l : - fit a Dirichlet model with constant parameters; and - compute the mean of the corresponding covariates. This will result in matrices A r × D , W r × C where row a l • and w l • represent, respectively, the ML estimates and the covariates mean of resample l . 3. Fit by least squares D models of the form A i , j = α j ( w i • ) = � C k = 1 w ik β kj . ˆ 4. Use the fitted coefficients beta k , j as starting values. ◮ Drawback: This method does not guarantee that the starting values ˆ beta k , j yield positive values for α j ( x i ) . A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

Our Proposal ◮ We propose a regularization approach anchored by the constant (without covariates) Dirichlet model. ◮ We extend the initial model to include the constant (intercept) terms as artificial variables, in case they are not present. ◮ Finally, we solve a sequence of optimization problems that drive the artificial variables back to zero. A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

◮ Algorithm: 1 Include a constant vector 1 as the first column of X , in case it is not present in the original model. 2 Define a boolean matrix M indicating the non-zero parameters of the original model, namely: � 1 if β k , j is a model parameter; M k , j = 0 if β k , j = 0. 3 Fit Y by a Dirichlet distribution with constant parameters (via MLE). Notice that this corresponds to the solution β 0 of a basic model whose boolean matrix model M is: � 1 if k = 1 M 0 k , j = 0 if k � = 1 Moreover, this solution is a feasible point for the (possible extended) model including the intercept. A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

◮ (cont.) 4 Build a supermodel joining all variables present either in the anchor or in the original model, namely: k , j = max ( M 0 M ∗ k , j , M k , j ) , k = 1 . . . C , j = 1 . . . D . 5 Solve the sequence of optimization problems g ( β | X , Y ) = − K b β 2 + log L ( β | X , Y ) . max β Boolean vector b indicates which of the β 1 , j are “artificial” variables: � 1 if M 1 , j = 0 ; b j = 1 − M 1 , j = 0 otherwise. ◮ − K b β 2 : penalty term for artificial variables. ◮ Repeating step 5 with a sequence of increasing scalars, K t , drives these artificial variables to zero, converging to the optimal solution (best fit) of the original model. A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

Prediction using Dirichlet Regression ◮ Having obtained the estimate ˆ β , the expected composition proportions in y given the vector x of covariates values is the mean of the distribution D (ˆ α ( x )) : � � α 1 ( x ) ˆ , ˆ α 2 ( x ) . . . ˆ α D ( x ) ˆ y = ˆ ˆ ˆ Λ( x ) Λ( x ) Λ( x ) where ˆ Λ( x ) = � D j = 1 ˆ α j ( x ) . A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

Results - Parameter Estimation Procedures ◮ Random subsamples of arctic lake dataset, n ∈ { 20 , 27 } ◮ We try to fit each subsample with an incomplete polynomial model described by a random structural matrix M ( q ) : M ( q ) k , j ∼ Ber ( p ) Fill-in probability, p ∈ { 0 . 33 , 0 . 5 , 0 . 66 } . ◮ Performance measures: 1. Failure rate; 2. Computational processing time. A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

Falhas Tempo de processamento 20 Hijazi 6 Nosso Método 4 15 Segundos (Log 2 ) 2 10 % 0 5 −2 Hijazi Nosso Método −4 0 0.33 0.5 0.66 0.33 0.5 0.66 Completude da Matriz de modelo: Pr(m jk = 1) Completude da Matriz de modelo: Pr(m jk = 1) A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

Full Bayesian Significance Test (FBST) ◮ FBST: proposed by Pereira & Stern (1999); a review in Pereira et al (2008). ◮ Notation and assumptions: ◮ Parameter space: Θ ⊆ R n ◮ Hypothesis H : θ ∈ Θ H , where H ≡ Θ H = { θ ∈ Θ | g ( θ ) ≤ 0 ∧ h ( θ ) = 0 } ; dim ( H ) < dim (Θ) ◮ f x ( θ ) denotes the posterior probability density function. A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

◮ Computation of the evidence measure used on the FBST: 1. Optimization step: find the maximum (supremum) of posterior under the hypothesis: θ ∗ = arg sup f x ( θ ) , f ∗ = f x ( θ ∗ ) H 2. Integration step: integrate the posteriori density over the tangential set: T = { θ ∈ Θ : f ( θ ) > f ∗ } � Ev ( H ) = Pr ( θ ∈ T | x ) = f ( θ ) d θ T ◮ Ev ( H ) “large” ⇒ T “heavy” ⇒ hypothesis set in a region of “low” posterior density ⇒ “strong” evidence against H . ◮ Ev ( H ) : evidence against H ; Ev ( H ) = 1 − Ev ( H ) : evidence in favor of H . A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

Estimation and Model Selection in Dirichlet Regression Andr Camargo - PowerPoint PPT Presentation

Estimation and Model Selection in Dirichlet Regression Andr Camargo 1 , Julio Michael Stern 1 , Marcelo de Souza Lauretto 2 1 Institute of Mathematis and Statistics, 2 School of Arts, Sciences and Humanities, University of Sao Paulo Conference

The Dirichlet-Bohr radius Manuel Maestre April 13, 2014 Kent State University Content

Model selection and parameter estimation with covariates in logistic regression missing

STARTS: STARTS: STARTS: STARTS: STAtic STAtic Regression Test Selection Regression Test

Applied Statistics Lecturer: Serena Arima Regression model Model estimation Properties OLS

Ridge/Lasso Regression, Model selection Xuezhi Wang Computer Science Department Carnegie Mellon

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Hierarchical Dirichlet Processes Presenters: Micah Hodosh, Yizhou Sun 4/7/2010 1 Content

Perspective Hierarchical Dirichlet Process for Perspective Hierarchical Dirichlet Process for

Boundary Representation of Dirichlet Forms on Canonically Compactifiable Graphs Michael Schwarz

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

STAT 213 Model Selection II Colin Reimer Dawson Oberlin College March 30, 2018 1 / 13 Outline

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

Variational Model Selection for Sparse Gaussian Process Regression Michalis K. Titsias School of

Continuous-Time Random Matching Darrell Duffie Lei Qiao Yeneng Sun Stanford S.U.F.E. N.U.S.

Continuous-Time Random Matching Darrell Duffie Lei Qiao Yeneng Sun Stanford S.U.F.E. N.U.S.

Longevity and Mortality Models Andrs M. Villegas white School of Risk and Actuarial Studies,

This page has been left blank deliberately. . . Chemnitz, CMS2013, September of 2013 p. 1

In colaboration with: Jacek Dobaczewski, Pawe Bczyk, Maciek Konieczka, Koichi Sato, Takashi

Axion Phenomenology from Unquenched La7ce QCD C. Bona=,

Survivors Association, Inc. (www.thyca.org) October 20, 2010 Executive Director of ThyCa:

Building Relationships 2 1 10/8/2020 The Importance of Relationships Acts 2:42 They

Estimation and Model Selection in Dirichlet Regression Andr Camargo - PowerPoint PPT Presentation

Estimation and Model Selection in Dirichlet Regression Andr Camargo 1 , Julio Michael Stern 1 , Marcelo de Souza Lauretto 2 1 Institute of Mathematis and Statistics, 2 School of Arts, Sciences and Humanities, University of Sao Paulo Conference

The Dirichlet-Bohr radius Manuel Maestre April 13, 2014 Kent State University Content

Model selection and parameter estimation with covariates in logistic regression missing

STARTS: STARTS: STARTS: STARTS: STAtic STAtic Regression Test Selection Regression Test

Applied Statistics Lecturer: Serena Arima Regression model Model estimation Properties OLS

Ridge/Lasso Regression, Model selection Xuezhi Wang Computer Science Department Carnegie Mellon

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Hierarchical Dirichlet Processes Presenters: Micah Hodosh, Yizhou Sun 4/7/2010 1 Content

Perspective Hierarchical Dirichlet Process for Perspective Hierarchical Dirichlet Process for

Boundary Representation of Dirichlet Forms on Canonically Compactifiable Graphs Michael Schwarz

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

ERP Selection KIRTANE &amp; PANDIT Suhas Deshpande Why ERP Selection is important ?

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

STAT 213 Model Selection II Colin Reimer Dawson Oberlin College March 30, 2018 1 / 13 Outline

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

Variational Model Selection for Sparse Gaussian Process Regression Michalis K. Titsias School of

Continuous-Time Random Matching Darrell Duffie Lei Qiao Yeneng Sun Stanford S.U.F.E. N.U.S.

Continuous-Time Random Matching Darrell Duffie Lei Qiao Yeneng Sun Stanford S.U.F.E. N.U.S.

Longevity and Mortality Models Andrs M. Villegas white School of Risk and Actuarial Studies,

This page has been left blank deliberately. . . Chemnitz, CMS2013, September of 2013 p. 1

In colaboration with: Jacek Dobaczewski, Pawe Bczyk, Maciek Konieczka, Koichi Sato, Takashi

Axion Phenomenology from Unquenched La7ce QCD C. Bona=,

Survivors Association, Inc. (www.thyca.org) October 20, 2010 Executive Director of ThyCa:

Building Relationships 2 1 10/8/2020 The Importance of Relationships Acts 2:42 They

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?