Bayesian Methods for Variable Selection with Applications to - PowerPoint PPT Presentation

Bayesian Methods for Variable Selection with Applications to High-Dimensional Data Part 4: Non-linear Models via Gaussian Processes Marina Vannucci Rice University, USA ABS13-Italy 06/17-21/2013 Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 4) ABS13-Italy 06/17-21/2013 1 / 16

Part 4: Non-linear Models via Gaussian Processes 1. Gaussian processes for nonlinear models 2. Methods for variable selection and computational strategies 3. Simulated and real data examples Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 4) ABS13-Italy 06/17-21/2013 2 / 16

Nonlinear Models via Gaussian Processes Gaussian processes describe nonparametric relationships between a response and a set of predictors. In regression replace X β with z ( X ) , y = z ( X ) + ǫ, ǫ ∼ N 0 , σ 2 I n � � and wrap X in a GP, z ( X ) ∼ N ( 0 , C ) , C = Cov ( z ( X )) Marginalize over z � � 1 �� y | C , r ∼ N n r I n + C 0 , to obtain a nonparametric regression model where the covariance matrix varies with the predictors. Diggle et al. ( JRSSC , 1998), Neal (1999); Linkletter et al. ( Tech ,2006) Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 4) ABS13-Italy 06/17-21/2013 3 / 16

Choice of the Covariance Matrix Exponential form C = Cov ( z ( X )) = λ a 1 n + 1 1 λ z exp ( − G ) g ij = ( x i − x j ) ′ P ( x i − x j ) , P = diag ( − log ( ρ 1 , . . . , ρ p )) , ρ k ∈ [ 0 , 1 ] 3 2.5 2 2.5 1.5 2 1 1.5 0.5 1 Y Y 0 0.5 −0.5 0 −1 −0.5 −1.5 −1 −2 −1.5 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 x x 1 3 2 0.5 1 0 0 Y −0.5 Y −1 −1 −2 −1.5 −3 −2 −4 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 x x Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 4) ABS13-Italy 06/17-21/2013 4 / 16

General Covariance Formulation: Mat´ ern Employs explicit smoothing parameter, ν ∈ [ 0 , ∞ ) � ν � � � 1 � � C ( z ( x i ) , z ( x j )) = ν d ( x i , x j ) K ν ν d ( x i , x j ) , 2 2 2 ν − 1 Γ( ν ) Parameterize d ( x i , x j ) = ( x i − x j ) ′ P ( x i − x j ) Recall P = diag ( − log ( ρ 1 , . . . , ρ p )) Mat´ ern = exponential for ν > 7 / 2 (a) Matern Covariance: ν = 0.5, ρ = 0.05 (b) Matern Covariance: ν = 0.5, ρ = 0.95 (c) Matern Covariance: ν = 4.0, ρ = 0.05 2 1.5 1 1.5 0.5 1 1 0 0.5 0.5 −0.5 0 0 Y Y Y −1 −0.5 −0.5 −1.5 −1 −1 −2 −1.5 −1.5 −2.5 −2 −2.5 −2 −3 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 x x x Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 4) ABS13-Italy 06/17-21/2013 5 / 16

Nonlinear Models y = f ( x ) + ǫ GP models are contained in the class of nonparametric kernel regression with exponential family observations, Rasmusen & Williams (2006). Kernel models include splines models and models that use regularized methods. With respect to nonparametric spline regression models GP models are less interpretable but better suited for prediction. Prediction performances of GP models are competitive with ensamble learning models, such as bagging, boosting and random forest models, Hastie et al. (2001). Variable selection can easily be achieved within GP models. Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 4) ABS13-Italy 06/17-21/2013 6 / 16

Mixture Priors for Variable Selection Extract a cell from C p C ij = 1 + 1 ρ ( x ik − x jk ) 2 � k λ a λ z k = 1 → x k does not influence y (via C ) ρ k ∈ ( 0 , 1 ] ; ρ k = 1 − Selection parameters, γ = { γ 1 , . . . , γ p } Select { ρ k } with { γ k } : π ( ρ k | γ k ) = γ k U ( 0 , 1 ) + ( 1 − γ k ) δ 1 ( ρ k ) γ ∼ Bernoulli ( α ) , λ a ∼ G ( 1 , 1 ) , λ z ∼ G ( 1 , 1 ) Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 4) ABS13-Italy 06/17-21/2013 7 / 16

MCMC for posterior inference Similar to MC 3 scheme, but here we traverse both model / parameter spaces Randomly choose 3 Between-models moves: Add : randomly choose k : γ k = 0, set γ ′ k = 1 and propose q ( ρ ′ k | ρ k ) = q ( ρ ′ k ) ∼ U ( 0 , 1 ) Delete : randomly choose k : γ k = 1, set ( γ ′ k = 0 , ρ k = 1 ) Swap: Jointly propose (Add, Delete) moves ′ , ρ ′ Accept proposed value ( γ γ ′ ) jointly ′ Add a within-model move to speed convergence: For all γ k = 1 propose q ( ρ ′′ k ) ∼ U ( 0 , 1 ) Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 4) ABS13-Italy 06/17-21/2013 8 / 16

Generalized formulation GLM with link function g ( η i ) = z ( x i ) z ( X ) ∼ N ( 0 , C ) Regression, logit and probit models. Poisson canonical link function for count data i exp ( − λ i ) 1 π ( s i | λ i ) = λ s i s i ! ∝ exp ( s i log ( λ i ) − λ i ) and define the Poisson GP regression model g ( η ) = log ( λ ) = z ( X ) Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 4) ABS13-Italy 06/17-21/2013 9 / 16

Cox formulation for survival data Define the hazard rate function as h ( t i | z ( x i )) = h 0 ( t i ) exp ( z ( x i )) , i = 1 , 2 , . . . , n Fits spirit of semi-parametric construction of Cox (1972) Partial likelihood avoids baseline hazard estimation Use likelihood formulation of Kalbfleisch (1978) with a Gamma process prior on the baseline hazard Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 4) ABS13-Italy 06/17-21/2013 10 / 16

Simulation: Count Data ( n = 100 , p = 1000 ) y i = 1 . 6 ( x i , 1 + x i , 2 + x i , 3 + x i , 4 ) + sin ( 3 x i , 5 ) + sin ( 5 x i , 6 ) + ǫ, s i = P ois ( exp ( y i )) Selected Predictors, γ k , based on EFDR = 0 Posterior Samples of ρ k 1 1 0.9 0.9 0.8 0.8 0.7 0.7 P( γ k = 1 | D) 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 2 4 6 8 10 12 14 16 18 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 γ 1 … γ 20 Variable Selection Parameters: Predictor Low order polynomial-like association ρ 1 , . . . , ρ 4 close to 1; High order/non-linear association: ρ 5 , ρ 4 closer to 0 Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 4) ABS13-Italy 06/17-21/2013 11 / 16

Simulation: Cox GP model ( n = 100 , p = 1000 ) y i = ( 3 x i , 1 − 2 . 5 x i , 2 + 3 . 5 x i , 3 − 3 x i , 4 ) + sin ( 3 x i , 5 ) − sin ( 5 x i , 6 ) + ǫ, Event time observations from a Cox model with survivor function: S ( t | y ) = exp [ − H 0 ( t ) exp ( y )] , H 0 ( t ) = λ t , λ = 0 . 2 t = M / ( λ exp ( y )) , M ∼ Exp ( 1 ) with 5 % randomly censored Selected Predictors, γ k , based on EFDR = 0.01 Posterior Samples of ρ k 1 1 1 0.9 0.9 0.9 0.8 0.8 0.8 0.7 0.7 0.7 Survival Probability P( γ k = 1 | D) 0.6 0.6 0.6 0.5 0.5 0.5 0.4 0.4 0.4 0.3 0.3 0.3 0.2 0.2 0.2 0.1 0.1 0.1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 0 0 Predictor 2 4 6 8 10 12 14 16 18 20 −8 −6 −4 −2 0 2 4 6 γ 1 … γ 20 Variable Selection Parameters: log Survival Time Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 4) ABS13-Italy 06/17-21/2013 12 / 16

Application: Ozone Count Data Integer particle counts per one million particles of air near Los Angeles for n = 330 days and an associated set of 8 meteorological predictors. We held out a randomly chosen set of 165 observations for validation. Selected Predictors, γ k , based on EFDR = 0.09 Posterior Samples of ρ k 1 1 0.9 0.9 0.8 0.8 0.7 0.7 P( γ k = 1 | D) 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 1 2 3 4 5 6 7 8 0 Predictor 1 2 3 4 5 6 7 8 γ 1 … γ 8 Variable Selection Parameters: Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 4) ABS13-Italy 06/17-21/2013 13 / 16

Analyzed by Liang et al (2007) with a linear regression model including all linear and quadratic terms ( p = 44). Prior on g RMSE ( M γ ) M γ p γ X 5 , X 6 , X 7 , X 2 6 , X 2 7 , X 3 X 5 Local Empirical Bayes 6 4.5 X 5 , X 6 , X 7 , X 2 6 , X 2 7 , X 3 X 5 Hyper-g (a=4) 6 4.5 X 5 , X 6 , X 7 , X 2 6 , X 2 7 , X 3 X 5 Fixed (BIC) 6 4.5 X 1 X 6 , X 1 X 7 , X 6 X 7 , X 2 1 , X 2 3 , X 2 Brown et al (2002) 6 4.5 7 X 3 , X 6 , X 7 GP model 3 3.7 Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 4) ABS13-Italy 06/17-21/2013 14 / 16

Application: Wisconsin Breast Cancer Time-to-recurrence in n = 194 subjects, 76 % censored p = 32: characteristics of cell nuclei present in breast mass e.g. shape, size, texture Obtained from Fine Needle Aspiration (FNA) digitized image Selected Predictors, γ k , based on EFDR = 0.03 Posterior Samples of ρ k 1 1 1 0.9 0.95 0.9 0.8 0.9 0.8 0.7 0.7 Survival Probability 0.85 P( γ k = 1 | D) 0.6 0.6 0.8 0.5 0.5 0.75 0.4 0.4 0.3 0.7 0.3 0.2 0.65 0.2 0.1 0.1 0.6 0 2 4 5 6 7 17 18 20 25 28 32 0 0.55 Predictor 5 10 15 20 25 30 0 1 2 3 4 5 γ 1 … γ 32 Variable Selection Parameters: log Survival Time Note boxplot mix of lower and higher order covariate associations Marina Vannucci (Rice University, USA) Bayesian Variable Selection (Part 4) ABS13-Italy 06/17-21/2013 15 / 16

Bayesian Methods for Variable Selection with Applications to - PowerPoint PPT Presentation

Bayesian Methods for Variable Selection with Applications to High-Dimensional Data Part 4: Non-linear Models via Gaussian Processes Marina Vannucci Rice University, USA ABS13-Italy 06/17-21/2013 Marina Vannucci (Rice University, USA)

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

Luigi Spezia Biomathematics & Statistics Scotland Aberdeen BAYESIAN VARIABLE SELECTION

Bayesian Methods for Variable Selection with Applications to High-Dimensional Data Part 3:

Bayesian variable selection Dr. Jarad Niemi Iowa State University September 4, 2017 Jarad Niemi

Bayesian Methods for Variable Selection with Applications to High-Dimensional Data Part 5:

Numberjack User Guide May 27, 2013 1 Variables Constructor for the class Variable : Constructor

Bayesian Methods for Variable Selection with Applications to High-Dimensional Data Part 2:

Bayesian Methods for Variable Selection with Applications to High-Dimensional Data Part 1:

Variable selection STAT 401 - Statistical Methods for Research Workers Jarad Niemi Iowa State

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

MLCC 2019 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable

Sequential Monte Carlo Methods for Bayesian Model Selection in Positron Emission Tomography Yan

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Canadian Bioinformatics Workshops www.bioinformatics.ca Module #: Title of Module 2 Module bio

CYTOMINE A rich internet application for remote visualization, collaborative annotation, and

GLOBAL OPTIMIZATION WITH BRANCH-AND-REDUCE: Algorithms, Software, and Applications Nick

Natural Language Processing queen Transformers spaCy Context Task word2vec one hot encoded

Image segmentation applied to cytology Niels VAN VLIET <niels@lrde.epita.fr> LRDE seminar,

The Subtleties of Thyroid Disease Management for the Non-Endocrinologist Learning Objectives

The 'little r' in Artistic Research Paul Draper & Kim Cunio Queensland Conservatorium

Initiating Research-Practice Partnerships William R. Penuel University of Colorado Boulder

Bayesian Methods for Variable Selection with Applications to - PowerPoint PPT Presentation

Bayesian Methods for Variable Selection with Applications to High-Dimensional Data Part 4: Non-linear Models via Gaussian Processes Marina Vannucci Rice University, USA ABS13-Italy 06/17-21/2013 Marina Vannucci (Rice University, USA)

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

Luigi Spezia Biomathematics &amp; Statistics Scotland Aberdeen BAYESIAN VARIABLE SELECTION

Bayesian Methods for Variable Selection with Applications to High-Dimensional Data Part 3:

Bayesian variable selection Dr. Jarad Niemi Iowa State University September 4, 2017 Jarad Niemi

Bayesian Methods for Variable Selection with Applications to High-Dimensional Data Part 5:

Numberjack User Guide May 27, 2013 1 Variables Constructor for the class Variable : Constructor

Bayesian Methods for Variable Selection with Applications to High-Dimensional Data Part 2:

Bayesian Methods for Variable Selection with Applications to High-Dimensional Data Part 1:

Variable selection STAT 401 - Statistical Methods for Research Workers Jarad Niemi Iowa State

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

MLCC 2019 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable

Sequential Monte Carlo Methods for Bayesian Model Selection in Positron Emission Tomography Yan

ERP Selection KIRTANE &amp; PANDIT Suhas Deshpande Why ERP Selection is important ?

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Canadian Bioinformatics Workshops www.bioinformatics.ca Module #: Title of Module 2 Module bio

CYTOMINE A rich internet application for remote visualization, collaborative annotation, and

GLOBAL OPTIMIZATION WITH BRANCH-AND-REDUCE: Algorithms, Software, and Applications Nick

Natural Language Processing queen Transformers spaCy Context Task word2vec one hot encoded

Image segmentation applied to cytology Niels VAN VLIET &lt;niels@lrde.epita.fr&gt; LRDE seminar,

The Subtleties of Thyroid Disease Management for the Non-Endocrinologist Learning Objectives

The 'little r' in Artistic Research Paul Draper &amp; Kim Cunio Queensland Conservatorium

Initiating Research-Practice Partnerships William R. Penuel University of Colorado Boulder

Luigi Spezia Biomathematics & Statistics Scotland Aberdeen BAYESIAN VARIABLE SELECTION

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?

Image segmentation applied to cytology Niels VAN VLIET <niels@lrde.epita.fr> LRDE seminar,

The 'little r' in Artistic Research Paul Draper & Kim Cunio Queensland Conservatorium