estimation and model selection in dirichlet regression
play

Estimation and Model Selection in Dirichlet Regression Andr Camargo - PowerPoint PPT Presentation

Estimation and Model Selection in Dirichlet Regression Andr Camargo 1 , Julio Michael Stern 1 , Marcelo de Souza Lauretto 2 1 Institute of Mathematis and Statistics, 2 School of Arts, Sciences and Humanities, University of Sao Paulo Conference


  1. Estimation and Model Selection in Dirichlet Regression André Camargo 1 , Julio Michael Stern 1 , Marcelo de Souza Lauretto 2 1 Institute of Mathematis and Statistics, 2 School of Arts, Sciences and Humanities, University of Sao Paulo Conference on Inductive Statistics A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

  2. Introduction ◮ Compositional data: vectors whose components are the proportions or percentages of some whole. ◮ Sample Space: S D , ( D − 1 ) − dimensional simplex: S D = { z = ( z 1 , z 2 . . . z D ) : z > 0 , z1 = 1 } . ◮ Many applications, e.g: ◮ Market share analysis ◮ Election forecasts ◮ Soil composition analysis ◮ Household expenses composition ◮ Aitchison(1986) developed a methodology for compositional data analysis based on logistic normal distributions. ◮ Here we focus on Dirichlet Regression. A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

  3. Dirichlet Regression ◮ Let X = [ x 1 • ; x 2 • ; . . . ; x n • ] , Y = [ y 1 • ; y 2 • ; . . . ; y n • ] be a sample of observations where y i • ∈ S D and x i • ∈ R C , i = 1 , 2 , . . . , n . ◮ The goal is to build a regression predictor for y i • as a function of x i • . ◮ We assume that y i • ∼ D ( α 1 ( x i • ) , . . . , α D ( x i • )) , where each α j ( x i • ) is a positive function of x i • . ◮ In this work: α j ( x i • ) = x i , 1 β 1 , j + x i , 2 β 2 , j + ... + x i , C β C , j = x i • β • j . ◮ Parameters to be estimated: β = ( β k , j , k = 1 . . . C , j = 1 . . . D ) , subject to the constraint α ( x i • ) > 0. ◮ Model selection can be done by testing β k , j = 0 for some pairs ( k , j ) ∈ { 1 . . . C } × { 1 . . . D } . A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

  4. x 1 , 1 x 1 , 2 x 1 , C y 1 , 1 y 1 , 2 y 1 , D  . . .   . . .  x 2 , 1 x 2 , 2 x 2 , C y 2 , 1 y 2 , 2 y 2 , D . . . . . .     X =  Y =  . . .   . . .  ... ... . . . . . .     . . . . . .    x n , 1 x n , 2 x n , C y n , 1 y n , 2 y n , D . . . . . .  β 1 , 1 β 1 , 2 . . . β 1 , D  β 2 , 1 β 2 , 2 . . . β 2 , D   β = α = X β  . . .  ... . . .   . . .   β C , 1 β C , 2 . . . β C , D A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

  5. Case study ◮ Arctic Lake Sediments dataset (Coakley & Rust, 1968): compositions of sand, silt and clay ( y ) for 39 sediment samples at different water depths ( x ). ◮ Interest in submodels of the complete second-order polynomial model on x , α j ( x ) = β 1 , j + β 2 , j x + β 3 , j x 2 , j = 1 . . . 3 . A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

  6. Case study A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

  7. Parameters Estimation ◮ Likelihood function: Given y 1 • . . . , y n • c.i.i.d. given β :   α j ( x i • ) − 1 y � n � D ij  , L ( β | X , Y ) =  Γ(Λ i ( x i • )) Γ( α j ( x i • )) i = 1 j = 1 where Λ i ( x i • ) = � D j = 1 α j ( x i • ) . ◮ Gradients: ∂ log L � n = � Γ ′ (Λ i ( x i • )) − Γ ′ ( α j ( x i • )) + x i , k log y i , j � ∂β k , j i = 1 Γ ′ : digamma function, Γ ′ ( u ) = ∂ log Γ ∂ u ( u ) . A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

  8. ◮ Fitting Dirichlet Distributions with constant parameters is straightforward via standard numerical methods. ◮ The difficulty arises when we attempt to extend the estimation to Dirichlet Regression. ◮ Starting values and regularization policies must be carefully chosen to assure the optimization convergence. ◮ Hijazi and Jernigan (2009) proposed a method for choosing starting values for the coefficients, which is based on: ◮ Drawing resamples of the original data; ◮ Fitting the resamples by least squares method. A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

  9. Hijazi and Jernigan’s Method ◮ Hijazi and Jernigan’s method: 1. Draw r resamples with replacement from X and Y , each of size m ( m < n ). 2. For each resample l : - fit a Dirichlet model with constant parameters; and - compute the mean of the corresponding covariates. This will result in matrices A r × D , W r × C where row a l • and w l • represent, respectively, the ML estimates and the covariates mean of resample l . 3. Fit by least squares D models of the form A i , j = α j ( w i • ) = � C k = 1 w ik β kj . ˆ 4. Use the fitted coefficients beta k , j as starting values. ◮ Drawback: This method does not guarantee that the starting values ˆ beta k , j yield positive values for α j ( x i ) . A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

  10. Our Proposal ◮ We propose a regularization approach anchored by the constant (without covariates) Dirichlet model. ◮ We extend the initial model to include the constant (intercept) terms as artificial variables, in case they are not present. ◮ Finally, we solve a sequence of optimization problems that drive the artificial variables back to zero. A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

  11. ◮ Algorithm: 1 Include a constant vector 1 as the first column of X , in case it is not present in the original model. 2 Define a boolean matrix M indicating the non-zero parameters of the original model, namely: � 1 if β k , j is a model parameter; M k , j = 0 if β k , j = 0. 3 Fit Y by a Dirichlet distribution with constant parameters (via MLE). Notice that this corresponds to the solution β 0 of a basic model whose boolean matrix model M is: � 1 if k = 1 M 0 k , j = 0 if k � = 1 Moreover, this solution is a feasible point for the (possible extended) model including the intercept. A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

  12. ◮ (cont.) 4 Build a supermodel joining all variables present either in the anchor or in the original model, namely: k , j = max ( M 0 M ∗ k , j , M k , j ) , k = 1 . . . C , j = 1 . . . D . 5 Solve the sequence of optimization problems g ( β | X , Y ) = − K b β 2 + log L ( β | X , Y ) . max β Boolean vector b indicates which of the β 1 , j are “artificial” variables: � 1 if M 1 , j = 0 ; b j = 1 − M 1 , j = 0 otherwise. ◮ − K b β 2 : penalty term for artificial variables. ◮ Repeating step 5 with a sequence of increasing scalars, K t , drives these artificial variables to zero, converging to the optimal solution (best fit) of the original model. A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

  13. Prediction using Dirichlet Regression ◮ Having obtained the estimate ˆ β , the expected composition proportions in y given the vector x of covariates values is the mean of the distribution D (ˆ α ( x )) : � � α 1 ( x ) ˆ , ˆ α 2 ( x ) . . . ˆ α D ( x ) ˆ y = ˆ ˆ ˆ Λ( x ) Λ( x ) Λ( x ) where ˆ Λ( x ) = � D j = 1 ˆ α j ( x ) . A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

  14. Results - Parameter Estimation Procedures ◮ Random subsamples of arctic lake dataset, n ∈ { 20 , 27 } ◮ We try to fit each subsample with an incomplete polynomial model described by a random structural matrix M ( q ) : M ( q ) k , j ∼ Ber ( p ) Fill-in probability, p ∈ { 0 . 33 , 0 . 5 , 0 . 66 } . ◮ Performance measures: 1. Failure rate; 2. Computational processing time. A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

  15. Falhas Tempo de processamento 20 Hijazi 6 Nosso Método 4 15 Segundos (Log 2 ) 2 10 % 0 5 −2 Hijazi Nosso Método −4 0 0.33 0.5 0.66 0.33 0.5 0.66 Completude da Matriz de modelo: Pr(m jk = 1) Completude da Matriz de modelo: Pr(m jk = 1) A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

  16. Full Bayesian Significance Test (FBST) ◮ FBST: proposed by Pereira & Stern (1999); a review in Pereira et al (2008). ◮ Notation and assumptions: ◮ Parameter space: Θ ⊆ R n ◮ Hypothesis H : θ ∈ Θ H , where H ≡ Θ H = { θ ∈ Θ | g ( θ ) ≤ 0 ∧ h ( θ ) = 0 } ; dim ( H ) < dim (Θ) ◮ f x ( θ ) denotes the posterior probability density function. A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

  17. ◮ Computation of the evidence measure used on the FBST: 1. Optimization step: find the maximum (supremum) of posterior under the hypothesis: θ ∗ = arg sup f x ( θ ) , f ∗ = f x ( θ ∗ ) H 2. Integration step: integrate the posteriori density over the tangential set: T = { θ ∈ Θ : f ( θ ) > f ∗ } � Ev ( H ) = Pr ( θ ∈ T | x ) = f ( θ ) d θ T ◮ Ev ( H ) “large” ⇒ T “heavy” ⇒ hypothesis set in a region of “low” posterior density ⇒ “strong” evidence against H . ◮ Ev ( H ) : evidence against H ; Ev ( H ) = 1 − Ev ( H ) : evidence in favor of H . A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend