case study approximate bayesian inference for latent
play

Case Study: Approximate Bayesian Inference for Latent Gaussian - PowerPoint PPT Presentation

Case Study: Approximate Bayesian Inference for Latent Gaussian Models by Using Integrated Nested Laplace Approximations BIOSTAT830: Graphical Models December 08, 2016 Introduction - INLA Inference for latent Gaussian Markov random field


  1. Case Study: Approximate Bayesian Inference for Latent Gaussian Models by Using Integrated Nested Laplace Approximations BIOSTAT830: Graphical Models December 08, 2016

  2. Introduction - INLA ◮ Inference for latent Gaussian Markov random field (GMRF) models, avoiding MCMC simulations ◮ Fast Bayesian inference using accurate, multiple types of approximations to ◮ pr ( θ | y ): marginal density for the model parameters ◮ pr ( x i | y ): marginal posterior densities for one (or more) latent variables. ◮ Can be used for model criticisms: 1. Fast cross-validation 2. Bayes factors and deviation information criterion (DIC) can be efficiently calculated for model comparisoins ◮ Software inla available from R; very easy to use

  3. Supported Models ◮ Hierarchical GMRF of the form: likelihood : y j | η j , θ 1 ∼ pr ( y j | η j , θ 1 ) , j ∈ J , n f − 1 � w ki f k ( c ki ) + z ′ linear predictor : η i = Offset i + i β + ǫ i , k =0 i = 0 , . . . , n η − 1 . ◮ J ⊂ { 0 , 1 , . . . , n η − 1 } , i.e., not all latent η are observed through data y ◮ pr ( y j | η j , θ 1 ): likelihood of data; known link function ◮ ǫ = ( ǫ 0 , . . . , ǫ n η − 1 ) ′ | λ η ∼ N ( 0 , λ η I ); λ η denotes precision ◮ η = { η i } : a vector of linear predictors, e.g., in GLM ◮ w k : in the k -th nonlinear effect, the known weights, one for each observed data point

  4. Supported Models (continued) ◮ Hierarchical GMRF of the form: likelihood : y j | η j , θ 1 ∼ pr ( y j | η j , θ 1 ) , j ∈ J , n f − 1 w ki f k ( c ki ) + z ′ � linear predictor : η i = Offset i + i β + ǫ i , k =0 i = 0 , . . . , n η − 1 . ◮ f k ( c ki ): nonlinear effect of covariate k for observation i 1. Nonlinear effects: time trends and seasonal effects, two dimensional surfaces, iid random intercepts, slopes and spatial random effects. 2. The unknown functions f k = ( f 0 k , . . . , f m k − 1 , k ) ′ are modelled as GMRF given some parameter θ f k : f k | θ f k ∼ N ( 0 , Q − 1 k ) ◮ z i : A vector of n β covariates assumed to have a linear effect ◮ β : The vector of unknown parameters

  5. Supported Models (continued) ◮ x = ( η ′ , f ′ 0 , . . . , f ′ n f − 1 , β ′ ) ′ : full vector of latent variables; Dimension: n = n η + � n f − 1 j =0 m j + n β ; note we parameterized x by η instead of ǫ ◮ All the elements of vector x are defined as GMRFs: n η − 1 n f − 1 pr ( η i | f 0 , . . . , f n f − 1 , β , λ − 1 � � pr ( x | θ 2 ) = pr ( f k | θ f k ) η ) i =0 k =0 n β − 1 � × pr ( β m ) , m =0 where   n f − 1 iid � f k ( c ki ) + z ′  , β m η i | f 0 , . . . , f n f − 1 , β ∼ N ∼ N (0 , λ β ) i β , λ η  k =0 and θ 2 = { log λ η , θ f 0 , . . . , θ f nf − 1 } is a vector of unknown hyperparameters.

  6. Prior ◮ Specify priors on the hyperparameters: θ 2 = { log λ η , θ f 0 , . . . , θ f nf − 1 , log λ β } ◮ Need not be Gaussian

  7. Examples ◮ Time series model : c k = t for time, f k for nonlinear trends or seasonal effects η t = f trend ( t ) + f seasonal ( t ) + z ′ t β ◮ Generalized additive models (GAM) : 1. pr ( y i | η i , θ 1 ) belongs to an exponential family 2. c ki ’s are univariate, continuous covariates 3. f k ’s are smooth functions

  8. Examples ◮ Generalized additive mixed models (GAMM) for longitudinal data ◮ Individuals: i = 0 , · · · , n i − 1, observed at time points t 0 , t 1 , . . . . A GAMM extends a GAM by introducing individual specific random effects: η it = f 0 ( c it 0 )+ . . . + f n f − 1 ( c it , n f − 1 )+ b oi w it 0 + . . . + b n b − 1 , i w it , n b − 1 , where η it is the linear predictor for individual i at time t , c itk , k = 0 , . . . , n f − 1 , w itq , q = 0 , . . . , n b − 1 are covariate values for individual i at time t , and b 0 i , . . . , b n b − 1 , i is a vector of n b individual-specific random intercepts (if w itq = 1) or slopes. ◮ Just define r = ( i , t ) and c kr = c kit for k = 0 , . . . , n f − 1 and c n f − 1+ q , r = w qit , f n f − 1+ q ( c ( n f − 1+ q ) , r ) = b qi w qit for q = 0 , . . . , n b .

  9. Examples ◮ Geoadditive models (Kammann and Wand, 2003, JRSS-C): η i = f 1 ( c 0 i ) + . . . + f n f − 1 ( c n f − 1 , i ) + f spatial ( s i ) + z ′ i β , where s i indicates the location of observation i and f spatial is a spatially correlated effect.

  10. Examples ◮ ANOVA-type interaction model : For the effect of two continuous covariates w and v : η i = f 1 ( w i ) + f 2 ( v i ) + f 1 , 2 ( w i , v i ) + . . . , where f 1 , f 2 are the main effects and f 1 , 2 is a two dimensional interaction surface. As a special case, we just define c 1 i = w i , c 2 i = v i and c 3 i = ( w i , v i ), ◮ Univariate stochastic volatility model ◮ Time series models with Gaussian likelihood where the variance (not the mean) of the observed data is part of the latent GMRF model: y i | η i ∼ N (0 , exp( η i )) , and, for example, model the latent field η as an autoregressive model of order 1.

  11. Bayesian for Spatial and Spatio-temporal Models (Blangiardo and Cameletti, 2015, Wiley)

  12. INLA for Spatial Area Data: Suicides in London ◮ Disease mapping is commonly used in small area studies to assess the pattern of a disease ◮ To identify areas characterized by unusually high or low relative risk (Lawson 2009)

  13. London Cholera Outbreak in 1854 - John Snow’s Cholera map in dot style; dots represent deaths from cholera in London in 1854 to detect the source of the disease

  14. Example: Suicide Mortality ◮ 32 Boroughs in London; 1989-1993 ◮ For the i -th area, the number of suicides y i : y i ∼ Poisson ( λ i ) , where λ i = ρ i E i , a product of rate ρ i and the expected number of suicides E i ◮ Linear predictor defined on logarithmic scale: η i = log( ρ i ) = α + v i + ν i , where α is the intercept, v i = f 1 ( i ) and ν i = f 2 ( i ) are two area specific effects.

  15. Besag-York-Mollie (BYM) model (Besag et al. 1991) ◮ v i : spatially structured residual, modeled using an intrinsic conditional autoregressive structure (iCAR): v i | v j � = i ∼ Normal ( m i , s 2 i ) � j ∈N ( i ) v j m i = |N ( i ) | σ 2 s 2 v i = |N ( i ) | , where |N ( i ) | is the number of areas which share boundaries with the i -th one. ◮ ν i : unstructured residual; modeled by exchangeable prior: ν i ∼ Normal (0 , σ 2 )

  16. Priors ◮ log τ ν ∼ log − Gamma (1 , 0 . 0005) ◮ log τ v ∼ log − Gamma (1 , 0 . 0005)

  17. Goal 1. Posterior for borough-specific relative risks of suicides, compared to the whole of London: pr (exp( v i + ν i ) | y ) 2. Posterior exceedance probability: pr (exp( v i + ν i ) > 1 | y ) 3. Fraction of structured variance component

  18. Incorporating Risk Factors ◮ Extension: when risk factors are available and the aim of the study is to evaluate their effect on the risk of death (or disease) ◮ Ecological regression model ◮ For example: Index of social deprivation ( x 1 ), index of social fragmentation (describing lack of social connections and of sense of community) ( x 2 ) ◮ Model: η i = α + β 1 x 1 i + β 2 x 2 i + v i + ν i ◮ Can be fitted using the R-INLA package

  19. London Suicide Rates Mapping Figure 1: suicide_rates

  20. Other Spatial Examples (FiveThirtyEight) Figure 2: Stehpen Curry

  21. Background: Gaussian Markov Random Fields (GMRF) ◮ GMRF: x = ( x 1 , . . . , x n ) ′ with Markov property that for some i � = j , x i ⊥ x j | x − ij ◮ Can be encoded by precision matrix Q : Q ij = 0 if and only if x i ⊥ x j | x − ij ◮ Density function with mean vector µ : pr ( x ) = (2 π ) − n / 2 | Q | − 1 / 2 exp {− 1 2( x − µ ) ′ Q ( x − µ ) } ◮ Most cases: Q is sparse: only O ( n ) of the n 2 entries are nonzero ◮ Can handle extra linear constraints: Ax = e for a k × n matrix A of rank k ◮ Computational note : Simulation usually based on lower Cholesky decomposition Q = LL ′ , with L preserving the sparseness in Q . See Section 2.1 in Rue et al. (2009) for more details.

  22. Background: Gaussian approximation (under regularity conditions) ◮ Find a Gaussian density q ( z ) to approximate a density � f ( z ) d z p ( z ) = 1 Z f ( z ), where Z = ◮ One-dimensional case ◮ Multi-dimensional case ◮ Need to find mode z 0 (Newton or quasi-Newton methods) ◮ Need not know the normalizing constant Z ◮ Central Limit Theorem, approximate becomes better as sample size n increases if f ( z ; Data ) is a posterior distribution of model parameters ◮ Typically better for marginal and conditional posteriors than joint posteriors (marginals are averages across other distributions!) ◮ Can use transformations (e.g., logit or log) to approximate a distribution over a constrained space ◮ Not so useful if there is skewness, or if interested in extreme values that are far from the mode

  23. Background: Gaussian approximation - Density at Extreme Values (Bishop CM, 2006, Sec 4.4)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend