Exploring Large Regression Model Spaces via Trans-dimensional - - PowerPoint PPT Presentation

exploring large regression model spaces via trans
SMART_READER_LITE
LIVE PREVIEW

Exploring Large Regression Model Spaces via Trans-dimensional - - PowerPoint PPT Presentation

Exploring Large Regression Model Spaces via Trans-dimensional Genetic Algorithms Ricardo S. Ehlers ICMC - USP http://www.icmc.usp.br/ ehlers ehlers@icmc.usp.br Joint work with Marco A.R. Ferreira, University of Missouri. UFSCar, April 2009


slide-1
SLIDE 1

Exploring Large Regression Model Spaces via Trans-dimensional Genetic Algorithms Ricardo S. Ehlers

ICMC - USP http://www.icmc.usp.br/∼ehlers ehlers@icmc.usp.br

Joint work with Marco A.R. Ferreira, University of Missouri.

slide-2
SLIDE 2

UFSCar, April 2009

Searching for the “Best” Model(s)

  • Supose that the number M of alternative models is quite large.

E.g. linear model with 19 possible covariates: 219 = 524288 alternative models (with no interations).

  • Enumerate, estimate and associate a measure of fit and parsimony to each

possible model may not be the best strategy.

  • How to compare competing models?
  • How to make average inference using the competing models (or a subset of this)?

Ricardo Ehlers Exploring Large Regression Model Spaces 2

slide-3
SLIDE 3

UFSCar, April 2009

Bayesian Approach

  • Models M1, . . . , Mk are assigned a priori probabilities p(Mi).
  • For each model θi ∈ Rni with:

– a likelihood function p(y|θi, Mi) – a prior distribution p(θi|Mi).

  • By Bayes Theorem,

π(Mi, θi) ∝ p(y|θi, Mi) p(θi|Mi) p(Mi) p(Mi|y) ∝ p(y|Mi) p(Mi) p(y|Mi) =

  • p(y|θi, Mi)p(θi|Mi)dθi

Ricardo Ehlers Exploring Large Regression Model Spaces 3

slide-4
SLIDE 4

UFSCar, April 2009

Approaches

  • Akaike (1974)

AIC(ˆ θi, Mi) = −2 log p(y|ˆ θi, Mi) + 2ni

  • Schwartz (1978) BIC(ˆ

θi, Mi) = −2 log p(y|ˆ θi, Mi) + ni log T

  • Spiegelhalter et al. (2002) DIC(θi, Mi) = −2 log p(y|θi, Mi) + 2pD
  • Gelfand and Ghosh (1998) Dγ =

γ γ+1

n

i=1(µi − yi,obs)2 + n i=1 σ2 i ,

  • George and McCulloch (1993) SSVS
  • Chen (2005), Chib (1995), Chib and Jeliazkov (2001), Friel and Pettit (2008)

Estimating the marginal likelihood.

Ricardo Ehlers Exploring Large Regression Model Spaces 4

slide-5
SLIDE 5

UFSCar, April 2009

Genetic Algorithms

Holland (1975), Chatterjee, Laudato, and Lynch (1996) A population of M individuals each of dimension L.                                  x11 . . . x1k . . . x1L . . . . . . . . . xi1 . . . xik . . . xiL . . . . . . . . . xj1 . . . xjk . . . xjL . . . . . . . . . xM1 . . . xMk . . . xML Apply genetic opera- tors to transform the population. Selection, crossover, mutation

Ricardo Ehlers Exploring Large Regression Model Spaces 5

slide-6
SLIDE 6

UFSCar, April 2009

Trans-dimensional Jumps

Green (1995)

  • Propose a jump from model Mi to model Mj w.p. rij,
  • generate a vector u of dimension nj − ni from q(),
  • set θj = fij(θi, u) where fij : Θi × Rnj−ni → Θj denotes a bijective function.
  • Accept the jump w.p. min(1, A) where

A = π(θj, Mj) π(θi, Mi)

  • target ratio

rji rij q(u)

  • ∂fij(θi, u)

∂(θi, u)

  • proposal ratio

Choice of proposal distribution q is crucial to cover model and parameter spaces.

Ricardo Ehlers Exploring Large Regression Model Spaces 6

slide-7
SLIDE 7

UFSCar, April 2009

We assume that:

  • θi|Mi is easy to estimate using standard methods and software.
  • Posterior distribution on model space is well approximated by

P(Mk|y) ∝ exp{−BIC(ˆ θk, k)/2}. BIC(ˆ θk, k) = −2 log p(y|ˆ θk, k) + nk log T. ˆ θk: maximum likelihood estimate under model Mk.

Ricardo Ehlers Exploring Large Regression Model Spaces 7

slide-8
SLIDE 8

UFSCar, April 2009

RJMCMC + Genetic Algorithms

g(E(Y )) = β0 + βj1xj1 + · · · + βjkxjk, k = 0, . . . , kmax Given a population of models Z = (z1, . . . , zM) where zij = 0, 1,

  • 1. propose a new population z′ via genetic operators (esp. mutation and crossover),
  • 2. accept the new population with probability,

min

  • 1, exp{−BIC(z′)/2}

exp{−BIC(z)/2} P(z′, z) P(z, z′)

  • where

P(z, z′) = Pr(proposing a jump from population z to z′)

Ricardo Ehlers Exploring Large Regression Model Spaces 8

slide-9
SLIDE 9

UFSCar, April 2009

Crossover Move

Combine pairs of models to generate offsprings more likely to be accepted if they have high performance. Randomly choose a pair of individuals zi, zj and propose a new population as follows,

  • 1. select those elements with different values K = {k : zik = zjk}
  • 2. randomly choose k ∈ K
  • 3. set z′

ik = zjk and z′ jk = zik

  • 4. Accept this new population with probability

min

  • 1, exp(−BIC(z′

i)/2 − BIC(z′ j)/2)

exp(−BIC(zi)/2 − BIC(zj)/2) P(z′, z) P(z, z′)

  • Repeat this updating scheme for all [M/2] pairs selected without replacement from

the population.

Ricardo Ehlers Exploring Large Regression Model Spaces 9

slide-10
SLIDE 10

UFSCar, April 2009

Mutation Move

Include new regressor w.p. w, or delete an existing one w.p. 1 − w. Suppose we are updating zi and propose an inclusion. Define R0 = {j : zij = 0} and R1 = {j : zij = 1}. Then,

  • 1. randomly choose j ∈ R0 and set z′

ij = 1

  • 2. accepted this move w.p. min(1, A) where

A = exp(−BIC(z′

i)/2)

exp(−BIC(zi)/2) (1 − w) |R0| w (|R1| + 1) and |J| denotes the cardinality of J. Likewise, if a deletion is proposed

  • 1. choose j ∈ R1 and set z′

ij = 0.

  • 2. accept the move w.p. min(1, A−1).

Repeat this updating scheme for all z1, . . . , zM.

Ricardo Ehlers Exploring Large Regression Model Spaces 10

slide-11
SLIDE 11

UFSCar, April 2009

Example - linear regression

Effect of punishment regimes on crime rates in 47 US states, 15 potential regressors. (Raftery, Painter, and Volinsky 2005). M percentage of males aged 14-24 So indicator variable for a southern state Ed mean years of schooling Po1 police expenditure in 1960 Po2 police expenditure in 1959 LF labour force participation rate M.F number of males per 1000 females Pop state population NW number of nonwhites per 1000 people U1 unemployment rate of urban males 14-24 U2 unemployment rate of urban males 35-39 GDP gross domestic product per head Ineq income inequality Prob probability of imprisonment Time average time served in state prisons

Ricardo Ehlers Exploring Large Regression Model Spaces 11

slide-12
SLIDE 12

UFSCar, April 2009

Probs 0.209 0.122 0.060 0.055 0.053 0.036 0.026 0.025 0.023 0.022 Prob.inc M 1 1 1 1 1 1 1 1 1 1 0.9890 So 0.0549 Ed 1 1 1 1 1 1 1 1 1 1 1.0000 Po1 1 1 1 1 1 1 1 0.7714 Po2 1 1 1 0.2459 LF 0.0290 M.F 0.0347 Pop 1 0.2049 NW 1 1 1 1 1 1 1 1 1 0.9227 U1 1 0.0889 U2 1 1 1 1 1 1 1 1 1 0.8891 GDP 1 1 0.2414 Ineq 1 1 1 1 1 1 1 1 1 1 1.0000 Prob 1 1 1 1 1 1 1 1 1 1 0.9956 Time 1 1 1 1 1 1 0.4963

Ricardo Ehlers Exploring Large Regression Model Spaces 12

slide-13
SLIDE 13

UFSCar, April 2009 Models visited by GA−MCMC

Model 1 2 3 4 5 7 12 21 54 Time Prob Ineq GDP U2 U1 NW Pop M.F LF Po2 Po1 Ed So M

Ricardo Ehlers Exploring Large Regression Model Spaces 13

slide-14
SLIDE 14

UFSCar, April 2009

Example - Logistic Regression

Risk factors associated with low infant birth weight (Hosmer and Lemeshow 1989). yi ∼ Bernoulli(πi) where πi is the ith baby probability of low weight at birth. Under model k this is associate with the covariates as log

  • πi

1 − πi

  • = X′

iθ. Ricardo Ehlers Exploring Large Regression Model Spaces 14

slide-15
SLIDE 15

UFSCar, April 2009

Model Covariates indicator Model indicator age lwt race smoke ptl ht ui ftv probability 35 1 1 0.0962 99 1 1 1 0.0673 51 1 1 1 0.0600 43 1 1 1 0.0599 107 1 1 1 1 0.0333 3 1 0.0294 115 1 1 1 1 0.0287 17 1 0.0239 19 1 1 0.0202 47 1 1 1 1 0.0202 Inclusion 0.190 0.696 0.140 0.381 0.349 0.659 0.376 0.081 – probability

Ricardo Ehlers Exploring Large Regression Model Spaces 15

slide-16
SLIDE 16

UFSCar, April 2009 Models visited by GA−MCMC

Model 2 4 6 9 14 22 33 50 84 ftv ui ht ptl smoke race lwt age

Ricardo Ehlers Exploring Large Regression Model Spaces 16

slide-17
SLIDE 17

UFSCar, April 2009

Example - Censored Survival Models

Survival times of patients with primary biliary cirrhosis, h(t) = h0(t) exp(X′

iθ).

age: in years alb: serum albumin alkphos: alkaline phosphotase ascites: presence of ascites bili: serum bilirunbin edtrt: edema treatment hepmeg: enlarged liver platelet: platelet count protime: standardised blood clotting time sex: 1=male sgot: liver enzyme (now called AST) spiders: blood vessel malformations in the skin stage: histologic stage of disease (needs biopsy) trt: 1/2/-9 for control, treatment, not randomised copper: urine copper

Ricardo Ehlers Exploring Large Regression Model Spaces 17

slide-18
SLIDE 18

UFSCar, April 2009

Probs 0.081 0.037 0.036 0.029 0.026 0.025 0.021 0.018 0.017 0.017 Prob.inc age 1 1 1 1 1 1 1 1 1 1 0.999 alb 1 1 1 1 1 1 1 1 1 1 0.997 alkphos 1 0.038 ascites 0.012 bili 1 1 1 1 1 1 1 1 1 1 1.000 edtrt 1 1 1 1 1 1 1 1 1 0.916 hepmeg 0.012 platelet 0.014 protime 1 1 1 1 1 1 1 1 0.848 sex 0.020 sgot 1 1 1 0.101 spiders 0.007 stage 1 1 1 0.258 trt 0.011 copper 1 1 1 1 1 1 1 0.863

Ricardo Ehlers Exploring Large Regression Model Spaces 18

slide-19
SLIDE 19

UFSCar, April 2009 Models visited by GA−MCMC

Model 2 5 9 16 28 52 91 185 538 copper trt stage spiders sgot sex protime platelet hepmeg edtrt bili ascites alkphos alb age

Ricardo Ehlers Exploring Large Regression Model Spaces 19

slide-20
SLIDE 20

UFSCar, April 2009

A Few Comments

  • A suite o R functions was written for linear regression, logistic regression and

Cox proportional hazards models.

  • How to assess convergence?
  • How to specify the population size?
  • The method is being extended to include quadratic terms and interactions in the

model which leads to huge model spaces.

Ricardo Ehlers Exploring Large Regression Model Spaces 20

slide-21
SLIDE 21

UFSCar, April 2009 REFERENCES

References

Akaike, H. (1974). A new look at the statistical identification model. IEEE Transactions on Automatic Control 19, 716–723. Chatterjee, S., M. Laudato, and L. Lynch (1996). Genetic algorithms and their statistical applications: an introduction. Computational Statistics and Data Analysis 22, 633–651. Chen, M.-H. (2005). Computing marginal likelihoods from a single MCMC

  • utput. Statistica Neerlandica 59(1), 16–29.

Chib, S. (1995). Marginal likelihood from the Gibbs output. Journal of the American Statistical Association 90(432), 1313–1321. Chib, S. and E. Jeliazkov (2001). Marginal likelihood from the Metropolis-Hastings output. Journal of the American Statistical Association 96, 270–281. Friel, N. and A. N. Pettit (2008). Marginal likelihood estimation via power

  • posteriors. Journal of the Royal Statistical Society, B 70, 589–607.

Gelfand, A. E. and S. K. Ghosh (1998). Model choice: A posterior predictive loss

  • approach. Biometrika 8, 1–11.

George, E. I. and R. E. McCulloch (1993). Variable selection via Gibbs sampling. Journal of the American Statistical Association 88(423), 881–889.

Ricardo Ehlers Exploring Large Regression Model Spaces 21

slide-22
SLIDE 22

UFSCar, April 2009 REFERENCES

Green, P. J. (1995). Reversible jump MCMC computation and Bayesian model

  • determination. Biometrika 82, 711–732.

Holland, J. M. (1975). Adaptation in Natural and Artificial Systems. University of Michigan Press. Ann Arbor. Hosmer, D. and S. Lemeshow (1989). Applied Logistic Regression. New York: Wiley. Raftery, A. E., I. S. Painter, and C. T. Volinsky (2005). BMA: An R package for Bayesian model averaging. R News 5(2), 2–8. Schwartz, G. (1978). Estimating the dimension of a model. Annals of Statistics 6, 461–464. Spiegelhalter, D. J., N. G. Best, B. P. Carlin, and A. van der Linde (2002). Bayesian measures of model complexity and fit (with discussion). Journal of the Royal Statistical Society, Series B 64, 1–34.

Ricardo Ehlers Exploring Large Regression Model Spaces 22