Bayesian Methods for Variable Selection with Applications to - - PowerPoint PPT Presentation

bayesian methods for variable selection with applications
SMART_READER_LITE
LIVE PREVIEW

Bayesian Methods for Variable Selection with Applications to - - PowerPoint PPT Presentation

Bayesian Methods for Variable Selection with Applications to High-Dimensional Data Part 4: Non-linear Models via Gaussian Processes Marina Vannucci Rice University, USA ABS13-Italy 06/17-21/2013 Marina Vannucci (Rice University, USA)


slide-1
SLIDE 1

Bayesian Methods for Variable Selection with Applications to High-Dimensional Data

Part 4: Non-linear Models via Gaussian Processes Marina Vannucci

Rice University, USA

ABS13-Italy 06/17-21/2013

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 4) ABS13-Italy 06/17-21/2013 1 / 16

slide-2
SLIDE 2

Part 4: Non-linear Models via Gaussian Processes

  • 1. Gaussian processes for nonlinear models
  • 2. Methods for variable selection and computational strategies
  • 3. Simulated and real data examples

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 4) ABS13-Italy 06/17-21/2013 2 / 16

slide-3
SLIDE 3

Nonlinear Models via Gaussian Processes

Gaussian processes describe nonparametric relationships between a response and a set of predictors. In regression replace Xβ with z(X), y = z(X) + ǫ, ǫ ∼ N

  • 0, σ2In
  • and wrap X in a GP, z(X) ∼ N (0, C) , C = Cov(z(X))

Marginalize over z y|C, r ∼ Nn

  • 0,

1 r In + C

  • to obtain a nonparametric regression model where the covariance matrix

varies with the predictors.

Diggle et al. (JRSSC, 1998), Neal (1999); Linkletter et al. (Tech,2006)

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 4) ABS13-Italy 06/17-21/2013 3 / 16

slide-4
SLIDE 4

Choice of the Covariance Matrix

Exponential form C = Cov(z(X)) =

1 λa 1n + 1 λz exp (−G)

gij = (xi − xj)′P(xi − xj), P = diag (− log(ρ1, . . . , ρp)), ρk ∈ [0, 1]

−2 −1.5 −1 −0.5 0.5 1 1.5 2 −1.5 −1 −0.5 0.5 1 1.5 2 2.5 3

Y x

−2 −1.5 −1 −0.5 0.5 1 1.5 2 −2.5 −2 −1.5 −1 −0.5 0.5 1 1.5 2 2.5

Y x

−2 −1.5 −1 −0.5 0.5 1 1.5 2 −2 −1.5 −1 −0.5 0.5 1

Y x

−2 −1.5 −1 −0.5 0.5 1 1.5 2 −4 −3 −2 −1 1 2 3

Y x

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 4) ABS13-Italy 06/17-21/2013 4 / 16

slide-5
SLIDE 5

General Covariance Formulation: Mat´ ern

Employs explicit smoothing parameter, ν ∈ [0, ∞) C (z(xi), z(xj)) = 1 2ν−1Γ(ν)

  • 2
  • νd(xi, xj)

ν Kν

  • 2
  • νd(xi, xj)
  • ,

Parameterize d(xi, xj) = (xi − xj)

′P(xi − xj)

Recall P = diag (− log(ρ1, . . . , ρp)) Mat´ ern = exponential for ν > 7/2

−2 −1.5 −1 −0.5 0.5 1 1.5 2 −2.5 −2 −1.5 −1 −0.5 0.5 1 1.5 2

Y x (a) Matern Covariance: ν = 0.5, ρ = 0.05

−2 −1.5 −1 −0.5 0.5 1 1.5 2 −2 −1.5 −1 −0.5 0.5 1 1.5

Y x (b) Matern Covariance: ν = 0.5, ρ = 0.95

−2 −1.5 −1 −0.5 0.5 1 1.5 2 −3 −2.5 −2 −1.5 −1 −0.5 0.5 1

Y x (c) Matern Covariance: ν = 4.0, ρ = 0.05

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 4) ABS13-Italy 06/17-21/2013 5 / 16

slide-6
SLIDE 6

Nonlinear Models

y = f(x) + ǫ GP models are contained in the class of nonparametric kernel regression with exponential family observations, Rasmusen & Williams (2006). Kernel models include splines models and models that use regularized methods. With respect to nonparametric spline regression models GP models are less interpretable but better suited for prediction. Prediction performances of GP models are competitive with ensamble learning models, such as bagging, boosting and random forest models, Hastie et al. (2001). Variable selection can easily be achieved within GP models.

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 4) ABS13-Italy 06/17-21/2013 6 / 16

slide-7
SLIDE 7

Mixture Priors for Variable Selection

Extract a cell from C Cij = 1 λa + 1 λz

p

  • k=1

ρ(xik−xjk)2

k

ρk ∈ (0, 1]; ρk = 1 − → xk does not influence y (via C) Selection parameters, γ = {γ1, . . . , γp} Select {ρk} with {γk}: π(ρk|γk) = γk U(0, 1) + (1 − γk)δ1(ρk) γ ∼ Bernoulli(α), λa ∼ G(1, 1), λz ∼ G(1, 1)

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 4) ABS13-Italy 06/17-21/2013 7 / 16

slide-8
SLIDE 8

MCMC for posterior inference

Similar to MC3 scheme, but here we traverse both model / parameter spaces Randomly choose 3 Between-models moves:

Add: randomly choose k : γk = 0, set γ

k = 1 and propose

q(ρ

k|ρk) = q(ρ

k) ∼ U(0, 1)

Delete: randomly choose k : γk = 1, set (γ

k = 0, ρk = 1)

Swap: Jointly propose (Add, Delete) moves

Accept proposed value (γ

′, ρ ′

γ′ ) jointly

Add a within-model move to speed convergence: For all γ

k = 1 propose

q(ρ

′′

k ) ∼ U(0, 1)

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 4) ABS13-Italy 06/17-21/2013 8 / 16

slide-9
SLIDE 9

Generalized formulation

GLM with link function g(ηi) = z(xi) z(X) ∼ N (0, C) Regression, logit and probit models. Poisson canonical link function for count data π (si|λi) = λsi

i exp(−λi) 1

si! ∝ exp (si log(λi) − λi) and define the Poisson GP regression model g(η) = log(λ) = z(X)

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 4) ABS13-Italy 06/17-21/2013 9 / 16

slide-10
SLIDE 10

Cox formulation for survival data

Define the hazard rate function as h(ti|z(xi)) = h0(ti) exp(z(xi)), i = 1, 2, . . . , n Fits spirit of semi-parametric construction of Cox (1972) Partial likelihood avoids baseline hazard estimation Use likelihood formulation of Kalbfleisch (1978) with a Gamma process prior on the baseline hazard

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 4) ABS13-Italy 06/17-21/2013 10 / 16

slide-11
SLIDE 11

Simulation: Count Data (n = 100, p = 1000)

yi = 1.6(xi,1 + xi,2 + xi,3 + xi,4) + sin(3xi,5) + sin(5xi,6) + ǫ, si = Pois (exp(yi))

2 4 6 8 10 12 14 16 18 20 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Variable Selection Parameters: γ1 … γ20 P(γk = 1 | D) Selected Predictors, γk , based on EFDR = 0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Predictor

Posterior Samples of ρk

Low order polynomial-like association ρ1, . . . , ρ4 close to 1; High

  • rder/non-linear association: ρ5, ρ4 closer to 0

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 4) ABS13-Italy 06/17-21/2013 11 / 16

slide-12
SLIDE 12

Simulation: Cox GP model (n = 100, p = 1000)

yi = (3xi,1 − 2.5xi,2 + 3.5xi,3 − 3xi,4) + sin(3xi,5) − sin(5xi,6) + ǫ, Event time observations from a Cox model with survivor function: S(t|y) = exp [−H0(t) exp(y)], H0(t) = λt, λ = 0.2 t = M/ (λ exp(y)) , M ∼ Exp(1) with 5% randomly censored

2 4 6 8 10 12 14 16 18 20 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Variable Selection Parameters: γ1 … γ20 P(γk = 1 | D) Selected Predictors, γk , based on EFDR = 0.01

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Predictor

Posterior Samples of ρk

−8 −6 −4 −2 2 4 6 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

log Survival Time Survival Probability

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 4) ABS13-Italy 06/17-21/2013 12 / 16

slide-13
SLIDE 13

Application: Ozone Count Data

Integer particle counts per one million particles of air near Los Angeles for n = 330 days and an associated set of 8 meteorological predictors. We held out a randomly chosen set of 165 observations for validation.

1 2 3 4 5 6 7 8 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Variable Selection Parameters: γ1 … γ8 P(γk = 1 | D) Selected Predictors, γk , based on EFDR = 0.09

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 7 8

Predictor

Posterior Samples of ρk

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 4) ABS13-Italy 06/17-21/2013 13 / 16

slide-14
SLIDE 14

Analyzed by Liang et al (2007) with a linear regression model including all linear and quadratic terms (p = 44).

Prior on g Mγ pγ RMSE (Mγ) Local Empirical Bayes X5, X6, X7, X2

6, X2 7, X3X5

6 4.5 Hyper-g (a=4) X5, X6, X7, X2

6, X2 7, X3X5

6 4.5 Fixed (BIC) X5, X6, X7, X2

6, X2 7, X3X5

6 4.5 Brown et al (2002) X1X6, X1X7, X6X7, X2

1, X2 3, X2 7

6 4.5 GP model X3, X6, X7 3 3.7

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 4) ABS13-Italy 06/17-21/2013 14 / 16

slide-15
SLIDE 15

Application: Wisconsin Breast Cancer

Time-to-recurrence in n = 194 subjects, 76% censored p = 32: characteristics of cell nuclei present in breast mass e.g. shape, size, texture Obtained from Fine Needle Aspiration (FNA) digitized image

5 10 15 20 25 30 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Variable Selection Parameters: γ1 … γ32 P(γk = 1 | D) Selected Predictors, γk , based on EFDR = 0.03

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 2 4 5 6 7 17 18 20 25 28 32

Predictor

Posterior Samples of ρk

1 2 3 4 5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

log Survival Time Survival Probability

Note boxplot mix of lower and higher order covariate associations

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 4) ABS13-Italy 06/17-21/2013 15 / 16

slide-16
SLIDE 16

Summary

GP priors to obtain nonparametric regression models where the covariance matrix varies with the predictors. Mixture priors for Bayesian variable selection. Continuous, categorical, count and survival responses. Savitsky, Vannucci and Sha (2011, Statistical Science)

Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 4) ABS13-Italy 06/17-21/2013 16 / 16