Exploring Large Regression Model Spaces via Trans-dimensional - - PowerPoint PPT Presentation
Exploring Large Regression Model Spaces via Trans-dimensional - - PowerPoint PPT Presentation
Exploring Large Regression Model Spaces via Trans-dimensional Genetic Algorithms Ricardo S. Ehlers ICMC - USP http://www.icmc.usp.br/ ehlers ehlers@icmc.usp.br Joint work with Marco A.R. Ferreira, University of Missouri. UFSCar, April 2009
UFSCar, April 2009
Searching for the “Best” Model(s)
- Supose that the number M of alternative models is quite large.
E.g. linear model with 19 possible covariates: 219 = 524288 alternative models (with no interations).
- Enumerate, estimate and associate a measure of fit and parsimony to each
possible model may not be the best strategy.
- How to compare competing models?
- How to make average inference using the competing models (or a subset of this)?
Ricardo Ehlers Exploring Large Regression Model Spaces 2
UFSCar, April 2009
Bayesian Approach
- Models M1, . . . , Mk are assigned a priori probabilities p(Mi).
- For each model θi ∈ Rni with:
– a likelihood function p(y|θi, Mi) – a prior distribution p(θi|Mi).
- By Bayes Theorem,
π(Mi, θi) ∝ p(y|θi, Mi) p(θi|Mi) p(Mi) p(Mi|y) ∝ p(y|Mi) p(Mi) p(y|Mi) =
- p(y|θi, Mi)p(θi|Mi)dθi
Ricardo Ehlers Exploring Large Regression Model Spaces 3
UFSCar, April 2009
Approaches
- Akaike (1974)
AIC(ˆ θi, Mi) = −2 log p(y|ˆ θi, Mi) + 2ni
- Schwartz (1978) BIC(ˆ
θi, Mi) = −2 log p(y|ˆ θi, Mi) + ni log T
- Spiegelhalter et al. (2002) DIC(θi, Mi) = −2 log p(y|θi, Mi) + 2pD
- Gelfand and Ghosh (1998) Dγ =
γ γ+1
n
i=1(µi − yi,obs)2 + n i=1 σ2 i ,
- George and McCulloch (1993) SSVS
- Chen (2005), Chib (1995), Chib and Jeliazkov (2001), Friel and Pettit (2008)
Estimating the marginal likelihood.
Ricardo Ehlers Exploring Large Regression Model Spaces 4
UFSCar, April 2009
Genetic Algorithms
Holland (1975), Chatterjee, Laudato, and Lynch (1996) A population of M individuals each of dimension L. x11 . . . x1k . . . x1L . . . . . . . . . xi1 . . . xik . . . xiL . . . . . . . . . xj1 . . . xjk . . . xjL . . . . . . . . . xM1 . . . xMk . . . xML Apply genetic opera- tors to transform the population. Selection, crossover, mutation
Ricardo Ehlers Exploring Large Regression Model Spaces 5
UFSCar, April 2009
Trans-dimensional Jumps
Green (1995)
- Propose a jump from model Mi to model Mj w.p. rij,
- generate a vector u of dimension nj − ni from q(),
- set θj = fij(θi, u) where fij : Θi × Rnj−ni → Θj denotes a bijective function.
- Accept the jump w.p. min(1, A) where
A = π(θj, Mj) π(θi, Mi)
- target ratio
rji rij q(u)
- ∂fij(θi, u)
∂(θi, u)
- proposal ratio
Choice of proposal distribution q is crucial to cover model and parameter spaces.
Ricardo Ehlers Exploring Large Regression Model Spaces 6
UFSCar, April 2009
We assume that:
- θi|Mi is easy to estimate using standard methods and software.
- Posterior distribution on model space is well approximated by
P(Mk|y) ∝ exp{−BIC(ˆ θk, k)/2}. BIC(ˆ θk, k) = −2 log p(y|ˆ θk, k) + nk log T. ˆ θk: maximum likelihood estimate under model Mk.
Ricardo Ehlers Exploring Large Regression Model Spaces 7
UFSCar, April 2009
RJMCMC + Genetic Algorithms
g(E(Y )) = β0 + βj1xj1 + · · · + βjkxjk, k = 0, . . . , kmax Given a population of models Z = (z1, . . . , zM) where zij = 0, 1,
- 1. propose a new population z′ via genetic operators (esp. mutation and crossover),
- 2. accept the new population with probability,
min
- 1, exp{−BIC(z′)/2}
exp{−BIC(z)/2} P(z′, z) P(z, z′)
- where
P(z, z′) = Pr(proposing a jump from population z to z′)
Ricardo Ehlers Exploring Large Regression Model Spaces 8
UFSCar, April 2009
Crossover Move
Combine pairs of models to generate offsprings more likely to be accepted if they have high performance. Randomly choose a pair of individuals zi, zj and propose a new population as follows,
- 1. select those elements with different values K = {k : zik = zjk}
- 2. randomly choose k ∈ K
- 3. set z′
ik = zjk and z′ jk = zik
- 4. Accept this new population with probability
min
- 1, exp(−BIC(z′
i)/2 − BIC(z′ j)/2)
exp(−BIC(zi)/2 − BIC(zj)/2) P(z′, z) P(z, z′)
- Repeat this updating scheme for all [M/2] pairs selected without replacement from
the population.
Ricardo Ehlers Exploring Large Regression Model Spaces 9
UFSCar, April 2009
Mutation Move
Include new regressor w.p. w, or delete an existing one w.p. 1 − w. Suppose we are updating zi and propose an inclusion. Define R0 = {j : zij = 0} and R1 = {j : zij = 1}. Then,
- 1. randomly choose j ∈ R0 and set z′
ij = 1
- 2. accepted this move w.p. min(1, A) where
A = exp(−BIC(z′
i)/2)
exp(−BIC(zi)/2) (1 − w) |R0| w (|R1| + 1) and |J| denotes the cardinality of J. Likewise, if a deletion is proposed
- 1. choose j ∈ R1 and set z′
ij = 0.
- 2. accept the move w.p. min(1, A−1).
Repeat this updating scheme for all z1, . . . , zM.
Ricardo Ehlers Exploring Large Regression Model Spaces 10
UFSCar, April 2009
Example - linear regression
Effect of punishment regimes on crime rates in 47 US states, 15 potential regressors. (Raftery, Painter, and Volinsky 2005). M percentage of males aged 14-24 So indicator variable for a southern state Ed mean years of schooling Po1 police expenditure in 1960 Po2 police expenditure in 1959 LF labour force participation rate M.F number of males per 1000 females Pop state population NW number of nonwhites per 1000 people U1 unemployment rate of urban males 14-24 U2 unemployment rate of urban males 35-39 GDP gross domestic product per head Ineq income inequality Prob probability of imprisonment Time average time served in state prisons
Ricardo Ehlers Exploring Large Regression Model Spaces 11
UFSCar, April 2009
Probs 0.209 0.122 0.060 0.055 0.053 0.036 0.026 0.025 0.023 0.022 Prob.inc M 1 1 1 1 1 1 1 1 1 1 0.9890 So 0.0549 Ed 1 1 1 1 1 1 1 1 1 1 1.0000 Po1 1 1 1 1 1 1 1 0.7714 Po2 1 1 1 0.2459 LF 0.0290 M.F 0.0347 Pop 1 0.2049 NW 1 1 1 1 1 1 1 1 1 0.9227 U1 1 0.0889 U2 1 1 1 1 1 1 1 1 1 0.8891 GDP 1 1 0.2414 Ineq 1 1 1 1 1 1 1 1 1 1 1.0000 Prob 1 1 1 1 1 1 1 1 1 1 0.9956 Time 1 1 1 1 1 1 0.4963
Ricardo Ehlers Exploring Large Regression Model Spaces 12
UFSCar, April 2009 Models visited by GA−MCMC
Model 1 2 3 4 5 7 12 21 54 Time Prob Ineq GDP U2 U1 NW Pop M.F LF Po2 Po1 Ed So M
Ricardo Ehlers Exploring Large Regression Model Spaces 13
UFSCar, April 2009
Example - Logistic Regression
Risk factors associated with low infant birth weight (Hosmer and Lemeshow 1989). yi ∼ Bernoulli(πi) where πi is the ith baby probability of low weight at birth. Under model k this is associate with the covariates as log
- πi
1 − πi
- = X′
iθ. Ricardo Ehlers Exploring Large Regression Model Spaces 14
UFSCar, April 2009
Model Covariates indicator Model indicator age lwt race smoke ptl ht ui ftv probability 35 1 1 0.0962 99 1 1 1 0.0673 51 1 1 1 0.0600 43 1 1 1 0.0599 107 1 1 1 1 0.0333 3 1 0.0294 115 1 1 1 1 0.0287 17 1 0.0239 19 1 1 0.0202 47 1 1 1 1 0.0202 Inclusion 0.190 0.696 0.140 0.381 0.349 0.659 0.376 0.081 – probability
Ricardo Ehlers Exploring Large Regression Model Spaces 15
UFSCar, April 2009 Models visited by GA−MCMC
Model 2 4 6 9 14 22 33 50 84 ftv ui ht ptl smoke race lwt age
Ricardo Ehlers Exploring Large Regression Model Spaces 16
UFSCar, April 2009
Example - Censored Survival Models
Survival times of patients with primary biliary cirrhosis, h(t) = h0(t) exp(X′
iθ).
age: in years alb: serum albumin alkphos: alkaline phosphotase ascites: presence of ascites bili: serum bilirunbin edtrt: edema treatment hepmeg: enlarged liver platelet: platelet count protime: standardised blood clotting time sex: 1=male sgot: liver enzyme (now called AST) spiders: blood vessel malformations in the skin stage: histologic stage of disease (needs biopsy) trt: 1/2/-9 for control, treatment, not randomised copper: urine copper
Ricardo Ehlers Exploring Large Regression Model Spaces 17
UFSCar, April 2009
Probs 0.081 0.037 0.036 0.029 0.026 0.025 0.021 0.018 0.017 0.017 Prob.inc age 1 1 1 1 1 1 1 1 1 1 0.999 alb 1 1 1 1 1 1 1 1 1 1 0.997 alkphos 1 0.038 ascites 0.012 bili 1 1 1 1 1 1 1 1 1 1 1.000 edtrt 1 1 1 1 1 1 1 1 1 0.916 hepmeg 0.012 platelet 0.014 protime 1 1 1 1 1 1 1 1 0.848 sex 0.020 sgot 1 1 1 0.101 spiders 0.007 stage 1 1 1 0.258 trt 0.011 copper 1 1 1 1 1 1 1 0.863
Ricardo Ehlers Exploring Large Regression Model Spaces 18
UFSCar, April 2009 Models visited by GA−MCMC
Model 2 5 9 16 28 52 91 185 538 copper trt stage spiders sgot sex protime platelet hepmeg edtrt bili ascites alkphos alb age
Ricardo Ehlers Exploring Large Regression Model Spaces 19
UFSCar, April 2009
A Few Comments
- A suite o R functions was written for linear regression, logistic regression and
Cox proportional hazards models.
- How to assess convergence?
- How to specify the population size?
- The method is being extended to include quadratic terms and interactions in the
model which leads to huge model spaces.
Ricardo Ehlers Exploring Large Regression Model Spaces 20
UFSCar, April 2009 REFERENCES
References
Akaike, H. (1974). A new look at the statistical identification model. IEEE Transactions on Automatic Control 19, 716–723. Chatterjee, S., M. Laudato, and L. Lynch (1996). Genetic algorithms and their statistical applications: an introduction. Computational Statistics and Data Analysis 22, 633–651. Chen, M.-H. (2005). Computing marginal likelihoods from a single MCMC
- utput. Statistica Neerlandica 59(1), 16–29.
Chib, S. (1995). Marginal likelihood from the Gibbs output. Journal of the American Statistical Association 90(432), 1313–1321. Chib, S. and E. Jeliazkov (2001). Marginal likelihood from the Metropolis-Hastings output. Journal of the American Statistical Association 96, 270–281. Friel, N. and A. N. Pettit (2008). Marginal likelihood estimation via power
- posteriors. Journal of the Royal Statistical Society, B 70, 589–607.
Gelfand, A. E. and S. K. Ghosh (1998). Model choice: A posterior predictive loss
- approach. Biometrika 8, 1–11.
George, E. I. and R. E. McCulloch (1993). Variable selection via Gibbs sampling. Journal of the American Statistical Association 88(423), 881–889.
Ricardo Ehlers Exploring Large Regression Model Spaces 21
UFSCar, April 2009 REFERENCES
Green, P. J. (1995). Reversible jump MCMC computation and Bayesian model
- determination. Biometrika 82, 711–732.
Holland, J. M. (1975). Adaptation in Natural and Artificial Systems. University of Michigan Press. Ann Arbor. Hosmer, D. and S. Lemeshow (1989). Applied Logistic Regression. New York: Wiley. Raftery, A. E., I. S. Painter, and C. T. Volinsky (2005). BMA: An R package for Bayesian model averaging. R News 5(2), 2–8. Schwartz, G. (1978). Estimating the dimension of a model. Annals of Statistics 6, 461–464. Spiegelhalter, D. J., N. G. Best, B. P. Carlin, and A. van der Linde (2002). Bayesian measures of model complexity and fit (with discussion). Journal of the Royal Statistical Society, Series B 64, 1–34.
Ricardo Ehlers Exploring Large Regression Model Spaces 22