conjugate bayesian models for massive spatial data
play

Conjugate Bayesian Models for Massive Spatial Data Abhi Datta 1 , - PowerPoint PPT Presentation

Conjugate Bayesian Models for Massive Spatial Data Abhi Datta 1 , Sudipto Banerjee 2 and Andrew O. Finley 3 July 31, 2017 1 Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland. 2 Department


  1. Conjugate Bayesian Models for Massive Spatial Data Abhi Datta 1 , Sudipto Banerjee 2 and Andrew O. Finley 3 July 31, 2017 1 Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland. 2 Department of Biostatistics, Fielding School of Public Health, University of California, Los Angeles. 3 Departments of Forestry and Geography, Michigan State University, East Lansing, Michigan.

  2. Case Study: Alaska Tanana Valley Forest Height Dataset Forest height and tree cover Forest fire history • Forest height (red lines) data from LiDAR at 5 × 10 6 locations • Knowledge of forest height is important for biomass assessment, carbon management etc 1

  3. Case Study: Alaska Tanana Valley Forest Height Dataset Forest height and tree cover Forest fire history • Goal: High-resolution domainwide prediction maps of forest height • Covariates: Domainwide tree cover (grey) and forest fire history (red patches) in the last 20 years 1

  4. Analyzing the data Models used: • Non-spatial regression: y FH ( s ) = β 0 + β tree x tree + β fire x fire + ǫ ( s ) Figure: Variogram of the residuals from non-spatial regression indicates strong spatial pattern 2

  5. NNGP models • Collapsed NNGP: • y FH ( s ) = β 0 + β tree x tree + β fire x fire + w ( s ) + ǫ ( s ) • w ( s ) ∼ NNGP (0 , C ( · , · | σ 2 , φ )) • y FH ∼ N ( X β, ˜ C + τ 2 I ) where ˜ C is the NNGP covariance matrix derived from C • Response NNGP: • y FH ( s ) ∼ NNGP ( β 0 + β tree x tree + β fire x fire , Σ( · , · | σ 2 , φ, τ 2 )) • y FH ∼ N ( X β, ˜ Σ) where ˜ Σ is the NNGP covariance matrix derived from Σ = C + τ 2 I 3

  6. NNGP models Non-spatial regression Collapsed NNGP Response NNGP CRPS 2.3 0.86 0.86 RMSPE 4.2 1.73 1.72 CP 93% 94% 94% CIW 16.3 6.6 6.6 Table: Model comparison metrics for the Tanana valley dataset • NNGP models perform significantly better than the non-spatial model • MCMC run time for the NNGP models: • Collapsed model: 319 hours • Response model: 38 hours • For massive spatial data, full Bayesian output for even NNGP models require substantial time 4

  7. Another look at the response model • Original full GP model: y ( s ) ind ∼ N ( x ( s ) ′ β + w ( s ) , τ 2 ) • w ( s ) ∼ GP with a stationary covariance function C ( · , · | σ 2 , φ ) • Cov ( w ) = σ 2 R ( φ ) • Full GP model: y ∼ N ( X β, Σ) where Σ = σ 2 M • M = R ( φ ) + α I • α = τ 2 /σ 2 is the ratio of the noise to signal variance • Response NNGP model: y ∼ N ( X β, ˜ Σ) Σ = σ 2 ˜ • ˜ M where ˜ M is the NNGP approximation for M 5

  8. Conjugate NNGP • y ∼ N ( X β, σ 2 ˜ M ) • If φ and α are known, M , and hence ˜ M , are known matrices • The model becomes a standard Bayesian linear model • Assume a Normal Inverse Gamma (NIG) prior for ( β, σ 2 ) ′ • ( β, σ 2 ) ′ ∼ NIG ( µ β , V β , a σ , b σ ), i.e., β | σ 2 ∼ N ( µ β , σ 2 V β ) and σ 2 ∼ IG ( a σ , b σ ) 6

  9. Conjugate NNGP • y ∼ N ( X β, σ 2 ˜ M ), ˜ M is known Joint likelihood: N ( y | X β, σ 2 ˜ M ) × N ( β | µ β , σ 2 V β ) × IG ( σ 2 | a σ , b σ ) 7

  10. Conjugate NNGP • y ∼ N ( X β, σ 2 ˜ M ), ˜ M is known Joint likelihood: N ( y | X β, σ 2 ˜ M ) × N ( β | µ β , σ 2 V β ) × IG ( σ 2 | a σ , b σ ) • Conjugate posterior distribution ( β, σ 2 ) | y ∼ NIG ( µ ∗ β , V ∗ β , a ∗ σ , b ∗ σ ) • Expressions for µ ∗ β , V ∗ β , a ∗ σ and b ∗ σ can be calculated in O ( n ) time 7

  11. Conjugate NNGP • ( β, σ 2 ) | y ∼ NIG ( µ ∗ β , V ∗ β , a ∗ σ , b ∗ σ ) β , b ∗ • Marginal posterior: β | y ∼ MVt 2 a ∗ σ ( µ ∗ σ V ∗ β ) σ a ∗ • MVt k ( m , V ) is the multivariate t distribution with degrees of k , mean m and scale matrix V b ∗ • E ( β | y ) = µ ∗ σ − 1 V ∗ β , Var ( β | y ) = σ β a ∗ • Marginal posterior: σ 2 | y ∼ IG ( a ∗ σ , b ∗ σ ) b ∗ 2 • E ( σ 2 | y ) = σ − 1 , Var ( σ 2 | y ) = b ∗ σ σ σ − 1) 2 ( a ∗ a ∗ ( a ∗ σ − 2) • Exact posterior distributions of β and σ 2 are available 8

  12. Predictive distributions σ ( m ( s ) , b ∗ • y ( s ) | y ∼ t 2 a ∗ σ v ( s )) σ a ∗ b ∗ • E ( y ( s ) | y ) = m ( s ), Var ( y ( s ) | y ) = σ − 1 v ( s ) σ a ∗ • m ( s ) and v ( s ) can be computed using O ( m ) flops • Exact posterior predictive distributions of y ( s ) | y for any s • No MCMC required for parameter estimation or prediction 9

  13. Choosing α and φ • φ and α are chosen using K -fold cross validation over a grid of possible values • Unlike MCMC, cross-validation can be completely parallelized • Resolution of the grid for φ and α can be decided based on computing resources available • In practice, a reasonably coarse grid often suffices 10

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend