Univariate areal data modeling In spatial epidemiology, interest in - PowerPoint PPT Presentation

Univariate areal data modeling In spatial epidemiology, interest in disease mapping, where we have data observed number of cases of disease in county i Y i = E i = expected number of cases of disease in county i Y i are random, but the E i are thought of as fixed and known functions of n i , e.g., �� i y i � E i = n i ¯ r ≡ n i , i n i what we expect under a constant disease rate across i This process is called internal standardization, since it centers the disease rates, but uses the observed data to do so. Hierarchical Modeling for Univariate Spatial Data – p. 1

External standardization Internal standardization is “cheating.” We estimate the grand rate r from our current data but model as if it were fixed Better approach: use an existing standard table of age-adjusted rates for the disease. Then after stratifying the population by age group, the E i emerge as � E i = n ij r j , j where n ij is the person-years at risk in area i for age group j , and r j is the disease rate in age group j (taken from the standard table). This process is called external standardization Hierarchical Modeling for Univariate Spatial Data – p. 2

Traditional models and methods If E i are not too large (disease is rare or regions i are small), we often assume Y i | η i ∼ Po ( E i η i ) , where η i is the true relative risk of disease in region i . The maximum likelihood estimate (MLE) of η i is η i ≡ SMR i = Y i ˆ , E i the standardized morbidity (or mortality) ratio (SMR), i.e., the ratio of observed to expected disease cases (or deaths). Hierarchical Modeling for Univariate Spatial Data – p. 3

A different parametrization Rather than λ i = E i η i , we could write λ i = n i p i where n i is the number at risk and p i is the incidence probability (usual definition in Poisson approximation to Binomial) Now, no E i . Now, model p i and then can introduce E i after the model fitting to infer about η i by solving for η i = n i p i /E i . The problem: Model log p i as linear in risk factors. What if the right hand side is positive, i.e., then p i > 1 ? That’s why we use the logit model for p ’s But, in practice, p i is very small. Usually, we are modeling rare diseases. Hierarchical Modeling for Univariate Spatial Data – p. 4

Traditional models and methods (cont’d) Note that V ar ( SMR i ) = V ar ( Y i ) /E 2 i = η i /E i , and so we might take � η i /E i = Y i /E 2 V ar ( SMR i ) = ˆ i ... To find a confidence interval for η i , easiest to assume that log SMR i is roughly normally distributed. Using the delta method (Taylor series expansion), V ar ( SMR i ) = E 2 1 × Y i = 1 i V ar [log( SMR i )] ≈ . SMR 2 Y 2 E 2 Y i i i i An approximate 95% CI for log η i is thus log SMR i ± 1 . 96 / √ Y i , and so (transforming back) an approximate 95% CI for η i is � � � � SMR i exp( − 1 . 96 / Y i ) , SMR i exp(1 . 96 / Y i ) . Hierarchical Modeling for Univariate Spatial Data – p. 5

Traditional models and methods (cont’d) Now suppose we wish to test whether the true relative risk in county i is elevated or not, i.e., H 0 : η i = 1 versus H A : η i > 1 . Under the null hypothesis, Y i ∼ Po ( E i ) , so the p -value for this test is Y i − 1 � exp( − E i ) E x i Pr ( X ≥ Y i | E i ) = 1 − Pr ( X < Y i | E i ) = 1 − . x ! x =0 This is the (one-sided) p -value; if it is less than 0.05 the traditional approach would reject H 0 , and conclude that there is a statistically significant excess risk in county i . Hierarchical Modeling for Univariate Spatial Data – p. 6

Hierarchical Bayesian methods Now think of the true underlying relative risks η i as random effects, to allow “borrowing of strength" across regions Appropriate if we want to estimate and map the underlying risk surface The random effects here can be high dimensional, and are incorporated into a nonnormal (Poisson) likelihood... ⇒ most naturally handled using hierarchical Bayesian modeling! Hierarchical Modeling for Univariate Spatial Data – p. 7

Poisson-gamma model A standard hierarchical model is: ind Y i | η i ∼ Po ( E i η i ) , i = 1 , . . . , I, iid and η i ∼ G ( a, b ) , where G ( a, b ) denotes the gamma distribution with mean µ = a/b and variance σ 2 = a/b 2 (this is the WinBUGS parametrization of the gamma) Solving these two equations for a and b we get a = µ 2 /σ 2 and b = µ/σ 2 . Setting µ = 1 (the “null” value) and σ 2 = (0 . 5) 2 , panel (a) of the next slide shows a sample of size 1000 from the resulting (fairly vague) G (4 , 4) prior... Hierarchical Modeling for Univariate Spatial Data – p. 8

Gamma prior and posterior 150 150 100 100 50 50 0 0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 1.0 1.5 2.0 a) sample of size 1000 from the G(4,4) prior b) sample of size 1000 from the G(31,25) posterior Note vertical reference line at η i = µ = 1 (the “null" value) Hierarchical Modeling for Univariate Spatial Data – p. 9

Gamma prior and posterior Thanks to the conjugacy of the gamma prior with the Poisson likelihood, the posterior distribution for η i emerges as another gamma, namely a G ( y i + a, E i + b ) Point estimate of η i : the posterior mean, E i + b = y i + µ 2 E ( η i | y i ) = y i + a E ( η i | y ) σ 2 = E i + µ σ 2 � y i � � µ � E i µ E i σ 2 = + = w i SMR i + (1 − w i ) µ , E i + µ E i + µ σ 2 σ 2 where w i = E i / [ E i + ( µ/σ 2 )] , so that 0 ≤ w i ≤ 1 . Thus our point estimate is a weighted average of the data-based SMR for region i and the prior mean µ . Hierarchical Modeling for Univariate Spatial Data – p. 10

Poisson-gamma data example Suppose in county i we observe y i = 27 disease cases, when we expected only E i = 21 Under our G (4 , 4) prior we obtain a G (27 + 4 , 21 + 4) = G (31 , 25) posterior distribution; panel (b) of the figure (two slides ago) shows a sample of size 1000 drawn from this distribution. This distribution has mean 31 / 25 = 1 . 24 (consistent with the figure), indicating slightly elevated risk (24%). However, the posterior probability that the true risk is bigger than 1 is P ( η i > 1 | y i ) = . 863 , which we can get: exactly, using, say, 1 − pgamma(25,31) in R or empirically, as the proportion of samples in the posterior histogram that are greater than 1. Hierarchical Modeling for Univariate Spatial Data – p. 11

Poisson-gamma data example (cont’d) A 100 × (1 − α ) % confidence interval for η i would simply take the upper and lower α/ 2 -points of the G (31 , 25) posterior. In our case, taking α = . 05 we obtain this 95% equal-tail credible interval as ( η ( L ) , η ( U ) ) = ( . 842 , 1 . 713) , again i i indicating no “significant” elevation in risk for this county. Note: In R , the command is qgamma(.975, 31)/25 . To summarize I (instead of 1) posterior distributions (one for each county), we might use a choropleth map (say, in WinBUGS or ArcView ) of the posterior means or 95% CI interval widths Hierarchical Modeling for Univariate Spatial Data – p. 12

Poisson-lognormal (spatial) model The gamma prior is very convenient computationally, but fails to allow for spatial correlation among the η i . It is also awkward for regression. Could contemplate a multivariate version of the gamma distribution, but instead we place a multivariate normal distribution on the ψ i ≡ log η i , the log-relative risks. Specifically, we augment our basic model to � E i e ψ i � ind Y i | ψ i ∼ Po , x ′ where ψ i i β + φ i + θ i using = fixed effects β (for spatial covariates x i ) iid heterogeneity random effects θ i ∼ N (0 , 1 /τ h ) spatial clustering random effects φ ∼ CAR ( τ c ) Hierarchical Modeling for Univariate Spatial Data – p. 13

Comments Identifiability - φ i + θ i ; redundancy Impropriety in CAR requires an identifiability constraint, e.g., � φ i = 0 Specifying prior on pure error variance, equivalently on the precision τ h , specifying prior on CAR "variance", equivalently, on the precision τ c Reparametrization - "centering" to ψ i = θ i + φ i and θ i Not much learning about the φ ’s or the θ ’s Perhaps just focus on the spatial story Other first stage specifications - general linear areal data modeling Hierarchical Modeling for Univariate Spatial Data – p. 14

Comparison Comparing point-referenced and areal data models Second stage random effects in both cases A process model vs. n -dimensional distribution Gaussian process vs. CAR (Markov random field) Model Σ Y vs. Σ − 1 Y Prediction/interpolation vs explanation and smoothing Likelihood evaluation; big “n” probelm Hierarchical Modeling for Univariate Spatial Data – p. 15

Univariate areal data modeling In spatial epidemiology, interest in - PowerPoint PPT Presentation

Univariate areal data modeling In spatial epidemiology, interest in disease mapping, where we have data observed number of cases of disease in county i Y i = E i = expected number of cases of disease in county i Y i are random, but the E i are

Lecture 18 Models for areal data Colin Rundel 03/22/2017 1 areal / lattice data 2 Example -

Hierarchical Modeling for Large Univariate Areal Data Abhi Datta 1 , Sudipto Banerjee 2 and Andrew

Denmark Areal 34.o94 km2 Inhabitants 5.511.451 Agriculture areal 2.589.000.000 ha. Fruit and

Lionel Riou Fransca Univariate & bivariate Two kind of analysis Univariate

Outline Univariate Trees 1 Decision Trees Classification Regression Pruning Steven J Zeil

Fusing point and areal level space-time data with application to wet deposition Alan Gelfand

Areal Unit Data Regular Grids or Lattices Large Point-referenced Datasets Is there spatial

Language Typology and Areal Linguistics Yiru July 13, 2016 Yiru Language Typology July 13,

Numeral classifiers in areal perspective: Khmer and Thai 'syntactic borrowing' revisited RIKKER

Robust Statistics Part 1: Introduction and univariate data Peter Rousseeuw LARS-IASC School, May

Univariate Graphics STAT 133 Gaston Sanchez Department of Statistics, UCBerkeley

Univariate Time Series Analysis; ARIMA Models Heino Bohn Nielsen 1 of 40 Outline of the Lecture

Fisher scoring for some univariate discrete distributions Thomas Yee University of Auckland 26

On Univariate Extreme Value Statistics and the Estimation of Reinsurance Premiums B. Vandewalle

Statistics 762 Nonlinear Statistical Models for Univariate and Multivariate Response Instructor:

Spatio-temporal Models Again point-referenced vs. areal unit data Continuous time vs. discretized

JUST THE MATHS SLIDES NUMBER 19.7 PROBABILITY 7 (The Poisson distribution) by

Problems in spatial statistics Barry Rowlingson Research Fellow, Lancaster University DataCamp

Statistical aspects of determinantal point processes eric Lavancier , Fr ed Laboratoire de

GDPP Learning Diverse Generations using Determinantal Point Process Mohamed Elfeki , Camille

Statistical Distributions in Simulation R.B. Lenin (rblenin@daiict.ac.in) Autumn 2007 R.B.

Math 186: 4.2 Poisson Distribution: Counting Crossovers in Meiosis 4.2 Exponential and 4.6 Gamma

The Poisson Distribution Bernd Schr oder logo1 Bernd Schr oder Louisiana Tech University,

Invariant transports of random measures and the extra head problem G unter Last (Karlsruhe)

Sambuz

Useful Links

Newsletter

Mail Us

Univariate areal data modeling In spatial epidemiology, interest in - PowerPoint PPT Presentation

Univariate areal data modeling In spatial epidemiology, interest in disease mapping, where we have data observed number of cases of disease in county i Y i = E i = expected number of cases of disease in county i Y i are random, but the E i are

Lecture 18 Models for areal data Colin Rundel 03/22/2017 1 areal / lattice data 2 Example -

Hierarchical Modeling for Large Univariate Areal Data Abhi Datta 1 , Sudipto Banerjee 2 and Andrew

Denmark Areal 34.o94 km2 Inhabitants 5.511.451 Agriculture areal 2.589.000.000 ha. Fruit and

Lionel Riou Fransca Univariate &amp; bivariate Two kind of analysis Univariate

Outline Univariate Trees 1 Decision Trees Classification Regression Pruning Steven J Zeil

Fusing point and areal level space-time data with application to wet deposition Alan Gelfand

Areal Unit Data Regular Grids or Lattices Large Point-referenced Datasets Is there spatial

Language Typology and Areal Linguistics Yiru July 13, 2016 Yiru Language Typology July 13,

Numeral classifiers in areal perspective: Khmer and Thai 'syntactic borrowing' revisited RIKKER

Robust Statistics Part 1: Introduction and univariate data Peter Rousseeuw LARS-IASC School, May

Univariate Graphics STAT 133 Gaston Sanchez Department of Statistics, UCBerkeley

Univariate Time Series Analysis; ARIMA Models Heino Bohn Nielsen 1 of 40 Outline of the Lecture

Fisher scoring for some univariate discrete distributions Thomas Yee University of Auckland 26

On Univariate Extreme Value Statistics and the Estimation of Reinsurance Premiums B. Vandewalle

Statistics 762 Nonlinear Statistical Models for Univariate and Multivariate Response Instructor:

Spatio-temporal Models Again point-referenced vs. areal unit data Continuous time vs. discretized

JUST THE MATHS SLIDES NUMBER 19.7 PROBABILITY 7 (The Poisson distribution) by

Problems in spatial statistics Barry Rowlingson Research Fellow, Lancaster University DataCamp

Statistical aspects of determinantal point processes eric Lavancier , Fr ed Laboratoire de

GDPP Learning Diverse Generations using Determinantal Point Process Mohamed Elfeki , Camille

Statistical Distributions in Simulation R.B. Lenin (rblenin@daiict.ac.in) Autumn 2007 R.B.

Math 186: 4.2 Poisson Distribution: Counting Crossovers in Meiosis 4.2 Exponential and 4.6 Gamma

The Poisson Distribution Bernd Schr oder logo1 Bernd Schr oder Louisiana Tech University,

Invariant transports of random measures and the extra head problem G unter Last (Karlsruhe)

Sambuz

Useful Links

Newsletter

Mail Us

Lionel Riou Fransca Univariate & bivariate Two kind of analysis Univariate