Estimation and Model Selection in Dirichlet Regression Andr Camargo - - PowerPoint PPT Presentation

estimation and model selection in dirichlet regression
SMART_READER_LITE
LIVE PREVIEW

Estimation and Model Selection in Dirichlet Regression Andr Camargo - - PowerPoint PPT Presentation

Estimation and Model Selection in Dirichlet Regression Andr Camargo 1 , Julio Michael Stern 1 , Marcelo de Souza Lauretto 2 1 Institute of Mathematis and Statistics, 2 School of Arts, Sciences and Humanities, University of Sao Paulo Conference


slide-1
SLIDE 1

Estimation and Model Selection in Dirichlet Regression

André Camargo1, Julio Michael Stern1, Marcelo de Souza Lauretto2

1Institute of Mathematis and Statistics, 2School of Arts, Sciences and Humanities,

University of Sao Paulo

Conference on Inductive Statistics

A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

slide-2
SLIDE 2

Introduction

◮ Compositional data: vectors whose components are the proportions or

percentages of some whole.

◮ Sample Space: SD, (D − 1)−dimensional simplex:

SD = {z = (z1, z2 . . . zD) : z > 0, z1 = 1}.

◮ Many applications, e.g:

◮ Market share analysis ◮ Election forecasts ◮ Soil composition analysis ◮ Household expenses composition

◮ Aitchison(1986) developed a methodology for compositional data

analysis based on logistic normal distributions.

◮ Here we focus on Dirichlet Regression.

A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

slide-3
SLIDE 3

Dirichlet Regression

◮ Let X = [x1•; x2•; . . . ; xn•], Y = [y1•; y2•; . . . ; yn•] be a sample of

  • bservations where yi• ∈ SD and xi• ∈ RC, i = 1, 2, . . . , n.

◮ The goal is to build a regression predictor for yi• as a function of xi•. ◮ We assume that

yi• ∼ D(α1(xi•), . . . , αD(xi•)), where each αj(xi•) is a positive function of xi•.

◮ In this work:

αj(xi•) = xi,1β1,j + xi,2β2,j + ... + xi,CβC,j = xi•β•j.

◮ Parameters to be estimated: β = (βk,j, k = 1 . . . C, j = 1 . . . D), subject

to the constraint α(xi•) > 0.

◮ Model selection can be done by testing βk,j = 0 for some pairs

(k, j) ∈ {1 . . . C} × {1 . . . D}.

A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

slide-4
SLIDE 4

X =      x1,1 x1,2 . . . x1,C x2,1 x2,2 . . . x2,C . . . . . . ... . . . xn,1 xn,2 . . . xn,C      Y =      y1,1 y1,2 . . . y1,D y2,1 y2,2 . . . y2,D . . . . . . ... . . . yn,1 yn,2 . . . yn,D      β =      β1,1 β1,2 . . . β1,D β2,1 β2,2 . . . β2,D . . . . . . ... . . . βC,1 βC,2 . . . βC,D      α = Xβ

A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

slide-5
SLIDE 5

Case study

◮ Arctic Lake Sediments dataset (Coakley & Rust, 1968): compositions of

sand, silt and clay (y) for 39 sediment samples at different water depths (x).

◮ Interest in submodels of the complete second-order polynomial model

  • n x,

αj(x) = β1,j + β2,jx + β3,jx2, j = 1 . . . 3.

A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

slide-6
SLIDE 6

Case study

A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

slide-7
SLIDE 7

Parameters Estimation

◮ Likelihood function: Given y1• . . . , yn• c.i.i.d. given β:

L(β | X, Y) = n

i=1

 Γ(Λi(xi•)) D

j=1

y

αj (xi•)−1 ij

Γ(αj(xi•))   , where Λi(xi•) = D

j=1αj(xi•).

◮ Gradients:

∂ log L ∂βk,j = n

i=1

  • Γ′(Λi(xi•)) − Γ′(αj(xi•)) + xi,k log yi,j
  • Γ′: digamma function, Γ′(u) = ∂ log Γ

∂u (u).

A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

slide-8
SLIDE 8

◮ Fitting Dirichlet Distributions with constant parameters is straightforward

via standard numerical methods.

◮ The difficulty arises when we attempt to extend the estimation to

Dirichlet Regression.

◮ Starting values and regularization policies must be carefully chosen to

assure the optimization convergence.

◮ Hijazi and Jernigan (2009) proposed a method for choosing starting

values for the coefficients, which is based on:

◮ Drawing resamples of the original data; ◮ Fitting the resamples by least squares method. A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

slide-9
SLIDE 9

Hijazi and Jernigan’s Method

◮ Hijazi and Jernigan’s method:

  • 1. Draw r resamples with replacement from X and Y, each of size m

(m < n).

  • 2. For each resample l:
  • fit a Dirichlet model with constant parameters; and
  • compute the mean of the corresponding covariates.

This will result in matrices Ar×D, Wr×C where row al• and wl• represent, respectively, the ML estimates and the covariates mean

  • f resample l.
  • 3. Fit by least squares D models of the form

Ai,j = αj(wi•) = C

k=1wikβkj.

  • 4. Use the fitted coefficients

ˆ betak,j as starting values.

◮ Drawback: This method does not guarantee that the starting values

ˆ betak,j yield positive values for αj(xi).

A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

slide-10
SLIDE 10

Our Proposal

◮ We propose a regularization approach anchored by the constant

(without covariates) Dirichlet model.

◮ We extend the initial model to include the constant (intercept) terms as

artificial variables, in case they are not present.

◮ Finally, we solve a sequence of optimization problems that drive the

artificial variables back to zero.

A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

slide-11
SLIDE 11

◮ Algorithm:

1 Include a constant vector 1 as the first column of X, in case it is not present in the original model. 2 Define a boolean matrix M indicating the non-zero parameters of the original model, namely: Mk,j = 1 if βk,j is a model parameter; 0 if βk,j = 0. 3 Fit Y by a Dirichlet distribution with constant parameters (via MLE). Notice that this corresponds to the solution β0of a basic model whose boolean matrix model M is: M0

k,j =

1 if k = 1 0 if k = 1 Moreover, this solution is a feasible point for the (possible extended) model including the intercept.

A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

slide-12
SLIDE 12

◮ (cont.)

4 Build a supermodel joining all variables present either in the anchor

  • r in the original model, namely:

M∗

k,j = max(M0 k,j, Mk,j), k = 1 . . . C, j = 1 . . . D.

5 Solve the sequence of optimization problems max

β

g(β | X, Y) = −Kbβ2 + log L(β | X, Y). Boolean vector b indicates which of the β1,j are “artificial” variables: bj = 1 − M1,j = 1 if M1,j = 0; 0 otherwise.

◮ −Kbβ2: penalty term for artificial variables. ◮ Repeating step 5 with a sequence of increasing scalars, Kt, drives these

artificial variables to zero, converging to the optimal solution (best fit) of the original model.

A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

slide-13
SLIDE 13

Prediction using Dirichlet Regression

◮ Having obtained the estimate ˆ

β, the expected composition proportions in y given the vector x of covariates values is the mean of the distribution D(ˆ α(x)): ˆ y =

  • ˆ

α1(x) ˆ Λ(x) , ˆ α2(x) ˆ Λ(x) . . . ˆ αD(x) ˆ Λ(x)

  • where ˆ

Λ(x) = D

j=1 ˆ

αj(x).

A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

slide-14
SLIDE 14

Results - Parameter Estimation Procedures

◮ Random subsamples of arctic lake dataset, n ∈ {20, 27} ◮ We try to fit each subsample with an incomplete polynomial model

described by a random structural matrix M(q): M(q)

k,j ∼ Ber(p)

Fill-in probability, p ∈ {0.33, 0.5, 0.66}.

◮ Performance measures:

  • 1. Failure rate;
  • 2. Computational processing time.

A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

slide-15
SLIDE 15

0.33 0.5 0.66 Hijazi Nosso Método Falhas Completude da Matriz de modelo: Pr(mjk = 1) % 5 10 15 20 Tempo de processamento Completude da Matriz de modelo: Pr(mjk = 1) Segundos (Log2) 0.33 0.5 0.66 −4 −2 2 4 6 Hijazi Nosso Método A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

slide-16
SLIDE 16

Full Bayesian Significance Test (FBST)

◮ FBST: proposed by Pereira & Stern (1999); a review in Pereira et al

(2008).

◮ Notation and assumptions:

◮ Parameter space: Θ ⊆ Rn ◮ Hypothesis H : θ ∈ ΘH, where

H ≡ ΘH = {θ ∈ Θ | g(θ) ≤ 0 ∧ h(θ) = 0}; dim(H) < dim(Θ)

◮ fx(θ) denotes the posterior probability density function. A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

slide-17
SLIDE 17

◮ Computation of the evidence measure used on the FBST:

  • 1. Optimization step: find the maximum (supremum) of posterior

under the hypothesis: θ∗ = arg sup

H

fx(θ), f ∗ = fx(θ∗)

  • 2. Integration step: integrate the posteriori density over the

tangential set: T = {θ ∈ Θ : f(θ) > f ∗} Ev(H) = Pr(θ ∈ T | x) =

  • T

f(θ)dθ

◮ Ev(H) “large” ⇒ T “heavy” ⇒ hypothesis set in a region of “low”

posterior density ⇒ “strong” evidence against H.

◮ Ev(H): evidence against H;

Ev(H) = 1 − Ev(H): evidence in favor of H.

A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

slide-18
SLIDE 18

◮ Example: Hardy-Weinberg equilibrium (Lauretto et al., 2009),

n , sample size, x11, x22 , homozygote, x12 = n − x11 − x22 , heterozygote count. θ = (θ11, θ22, θ12): population genotype proportions fx(θ) ∝ θx11+1

11

θx22+1

22

θx12+1

12

Θ = S2 , H = {θ ∈ Θ : θ22 = (1 − √θ11 )2} .

θ11 θ22 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

*

+

T ΘH X = [5, 5, 10] θ11 θ22 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

*

+

T ΘH X = [3, 7, 10]

A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

slide-19
SLIDE 19

FBST for Dirichlet Regression

◮ In this work:

◮ θ = (βk,j, j = 1 . . . D, k = 1 . . . C). ◮ We assume an improper uniform prior for βk,j on RD×C:

fx(β) ∝ L(β | X, Y)

◮ Numerical integration: Metropolis-Hasting A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

slide-20
SLIDE 20

Results - Hypotheses Tests

◮ Complete model: Second-order polynomials

αj(x) = β1,j + β2,j ∗ x + β3,j ∗ x2, j = 1 . . . 3.

◮ Hypothesis: H : β3,j = 0, j = 1, 2, 3

(assumption that αj(x) may be suitable modelled as a first-order polynomial.)

◮ Analysis of Type I, Type II and average errors ◮ Acceptance / rejection threshold:

  • 1. Asymptotic approximation;
  • 2. Empirical power analysis.

A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

slide-21
SLIDE 21

◮ (a),(b),(c): asymptotic acc./rej. thresholds; ◮ (d): empirical thresholds.

A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

slide-22
SLIDE 22

Current work

◮ Avoiding explicit positiveness constraint for αj(xi•) via positive functions

g, αj(xi•) = g(xi,1β1,j + xi,2β2,j + ... + xi,CβC,j) = g(xi•β•j), where g : R − → (0, +∞) is continuous, differentiable and injective (Melo et al, 2009).

  • Special case:

αj(xi•) = exp(xi,1β1,j + xi,2β2,j + ... + xi,CβC,j)

  • Drawback: Maximum likelihood estimation stability decreases fast on

the number of covariates.

◮ FBST integration convergence

◮ Alternative prioris ◮ Other monte carlo methods: Nested Sampling (Skilling, 2006). A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

slide-23
SLIDE 23

Acknowledgments

◮ University of São Paulo ◮ CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível

Superior

◮ CNPq - Conselho Nacional de Desenvolvimento Científico e Tecnológico ◮ FAPESP - Fundação de Amparo à Pesquisa do Estado de São Paulo.

A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

slide-24
SLIDE 24

References

  • 1. Aitchison J.(1986) The Statistical Analysis of Compositional Data,

Monographs on Statistics and Applied Probability (Chapman & Hall Ltd, London) Reprinted (2003) with additional material by The Blackburn Press, Caldwell, NJ.

  • 2. Campbell, G. , and Mosimann, J. (1987). Multivariate methods for

proportional shape. ASA Proceedings of the Section on Statistical Graphics, 10-17.

  • 3. Hijazi, R.H., and Jernigan, R.W. (2009). Modelling Compositional Data

Using Dirichlet Regression Models. Journal of Applied Probability & Statistics 4, 77-91.

  • 4. Lauretto, M.S., Nakano, Faria Jr., S.R., Pereira, C.A.B., Stern, J.M.

(2009). A straightforward multiallelic significance test for the Hardy-Weinberg equilibrium law. Genet Mol Biol. 32(3): 619-625.

  • 5. Melo, T.F

.N., Vasconcellos, K.L.P ., Lemonte, A.J. (2009) Some restriction tests in a new class of regression models for proportions. Computational Statistics and Data Analysis 53: 3972-3979.

  • 6. Pereira, C.A.B., Stern, J.M., Wechsler, S. (2008). Can a significance test

be genuinely Bayesian? Bayesian Analysis 3(1), 79-100.

A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression