Statistical Modelling Approaches to Disease Mapping Peter J Diggle - - PowerPoint PPT Presentation

statistical modelling approaches to disease mapping
SMART_READER_LITE
LIVE PREVIEW

Statistical Modelling Approaches to Disease Mapping Peter J Diggle - - PowerPoint PPT Presentation

Statistical Modelling Approaches to Disease Mapping Peter J Diggle Lancaster University and University of Liverpool CHICAS combining health information, computation and statistics Spatial statistics according to Cressie (1991) 6.4 1100


slide-1
SLIDE 1

Statistical Modelling Approaches to Disease Mapping

Peter J Diggle Lancaster University and University of Liverpool

combining health information, computation and statistics

CHICAS

slide-2
SLIDE 2

Spatial statistics according to Cressie (1991)

100 200 300 400 500 500 600 700 800 900 1000 1100 Eastings (km) Northings (km) 0.64 1.3 1.9 2.6 3.2 3.9 4.5 5.1 5.8 6.4

8 9 10 11 12 13 14 15 2 3 4 5 6 7 8 X Coord Y Coord 400000 420000 440000 460000 480000 100000 120000 140000 160000

  • Lattice data

Geostatistics Point patterns Cressie, N.A.C. (1991). Statistics for Spatial Data. Wiley.

slide-3
SLIDE 3

Lattice data: Scottish lip cancer incidence

100 200 300 400 500 500 600 700 800 900 1000 1100 Eastings (km) Northings (km) 0.64 1.3 1.9 2.6 3.2 3.9 4.5 5.1 5.8 6.4

Data: county-level incidences Yi : i = 1, ...., n Model: Markov random field: [Yi|{Yj : j = i}] : i = 1, ..., n risks in near-neighbouring counties are positively correlated incidences Yi are noisy versions of risk × population Scientific interest confined to specified set of counties?

slide-4
SLIDE 4

Geostatistics: Loa loa prevalence in Cameroon

8 9 10 11 12 13 14 15 2 3 4 5 6 7 8 X Coord Y Coord

Data: empirical prevalences Yi at sample locations xi : i = 1, ...., n Model: spatially continuous stochastic process, S(x) : x ∈ I R2 correlation between S(u) and S(v) specified as a function of distance between u and v Yi|S(xi) ∼ Binomial Scientific interest extends to S(x) at non-sampled locations

slide-5
SLIDE 5

Point pattern: gastro-enteric illness in Hampshire

400000 420000 440000 460000 480000 100000 120000 140000 160000

  • Data: outcomes (xi, ti) are locations and

dates of calls to NHS Direct recorded as “vomiting and/or diarrhoea” Model: (xi, ti) : i = 1, 2, ... a stochastic point process intensity λ(x, t) successive cases independent? Scientific interest is in locations themselves

slide-6
SLIDE 6

Disease mapping

Context region of interest A disease risk ρ(x) : x ∈ A data relating to variation in disease prevalence over A Objective estimate ρ(x) ? calculate P{ρ(x) > c|data} ? The answer to any prediction problem is a probability distribution Peter McCullagh, FRS

slide-7
SLIDE 7

Markov Random Field (MRF) models

Random variables S = (S1, ..., Sn) Joint distribution [S] fully specified by full conditionals, [Si|{Sj : j = i}] : i = 1, ..., n Neighbourhood of i is N (i) ⊂ {1, 2, ..., n}

[Si|{Sj : j = i}] = [Si|Sj : j ∈ N (i)] : i = 1, ..., n

slide-8
SLIDE 8

Hierarchical Poisson/Gaussian MRF

latent Gaussian MRF S = (S1, ..., Sn), Si|{Sj : j = i} ∼ N(¯ Si, τ 2/mi) conditionally independent Yi|S ∼ Poiss(z′

iβ + γSi)

risk map: E[Si|Y] Besag, York and Molli´ e, 1991

slide-9
SLIDE 9

Cancer atlases

Raw and spatially smoothed relative risk estimates for lip cancer in 56 Scottish counties

100 200 300 400 500 500 600 700 800 900 1000 1100 Eastings (km) Northings (km) 0.64 1.3 1.9 2.6 3.2 3.9 4.5 5.1 5.8 6.4 100 200 300 400 500 500 600 700 800 900 1000 1100 Eastings (km) Northings (km) 0.64 1.3 1.9 2.6 3.2 3.9 4.5 5.1 5.8 6.4

Wakefield (2007)

slide-10
SLIDE 10

Limitations of MRF models for spatial data

MRF’s are just multivariate probability distributions parameterised in a way that has a spatial interpretation but specific to a fixed set of locations x1, ..., xn Neighbourhood specification can be problematic natural hierarchy of models on regular lattices not so for irregular lattices and arguably un-natural for spatially aggregated data, Yi =

  • Ai

Y(x)dx

slide-11
SLIDE 11

Geostatistical models

Stochastic process S(x) : x ∈ A ⊂ I R2 Data {(Yi, xi) : i = 1, ..., n} Stationary Gaussian model E[S(x) = 0] Cov{S(x), S(x − u)} = σ2ρ(u) [Y|S] = [Y1|S(x1)]...[Yn|S(xn)]

slide-12
SLIDE 12

A geostatistical data-set: Loa loa prevalence surveys

8 9 10 11 12 13 14 15 2 3 4 5 6 7 8 X Coord Y Coord

slide-13
SLIDE 13

Loa loa: generalised linear model

Latent spatially correlated process S(x) ∼ SGP{0, σ2, ρ(u))} ρ(u) = exp(−|u|/φ) Linear predictor (regression model) d(x) = environmental variables at location x η(x) = d(x)′β + S(x) p(x) = log[η(x)/{1 − η(x)}] Conditional distribution for positive proportion Yi/ni Yi|S(·) ∼ Bin{ni, p(xi)} (binomial sampling)

slide-14
SLIDE 14

Probabilistic exceedance map for Cameroon (Diggle et al, 2007)

slide-15
SLIDE 15

Point process models (log-Gaussian Cox processes)

Stochastic process S(x) : x ∈ A ⊂ I R2 Data X = {xi : i = 1, ..., n} Stationary Gaussian model E[S(x) = 0] Cov{S(x), S(x − u)} = σ2ρ(u) [X|S] = Poisson process, intensity Λ(x) = exp{S(x)}

slide-16
SLIDE 16

Real-time spatial surveillance: spatio-temporal point process

Ascertainment and Enhancement of Gastroenteric Infection Surveillance Statistics largely sporadic incidence pattern concentration in population centres

  • ccasional “clusters” of cases

Can spatial statistical modelling enable earlier detection of “clusters”?

slide-17
SLIDE 17

AEGISS: log-Gaussian Cox process model

intensity = expected × unexpected Λ(x, t) = λ0(x) × µ0(t) × R(x, t) Objective: use incident data up to time t to construct predictive distribution for current “anomaly” surface, R(x, t) Model spatio-temporal point process P log R(x, t) ∼ latent Gaussian process P|R ∼ Poisson process

slide-18
SLIDE 18

Spatial prediction: 6 March 2003

c = 2

slide-19
SLIDE 19

Spatial prediction: 6 March 2003

c = 4

slide-20
SLIDE 20

Spatial prediction: 6 March 2003

c = 8

slide-21
SLIDE 21

Synthesis

S = state of nature Y = all relevant data T = F(S) = target for prediction Model: [S, Y] = [S][Y|S] Prediction: [S, Y] ⇒ [S|Y] ⇒ [T|Y] Diggle, P.J., Moraga, P., Rowlingson, B. and Taylor, B. (2013). Spatial and spatio-temporal log-Gaussian Cox processes: extending the geostatistical paradigm. Statistical Science (to appear)

slide-22
SLIDE 22

Pau da Lima, Salvador, Brazil

slide-23
SLIDE 23

Pau da Lima, Salvador, Brazil

slide-24
SLIDE 24

Leptospirosis cohort study: Pau da Lima

subjects i at locations xi, blood-samples taken at times tij ≈ 0, 6, 12, 18, 24 months sero-conversion defined as change from zero to positive, or at least four-fold increase in concentration data consist of:

Yij = 0/1 : j = 1, 2, 3, 4 (seroconversion no/yes) ri(t) known and hypothesised risk-factors

slide-25
SLIDE 25

Leptospirosis cohort study: analysing the data

Longitudinal data, binary outcome ⇒ standard problem? id Follow-up Age 1 2 3 4 1 1 57 2 34 3 1 X 38 4 1 1 1 28 . . . . . . . . . . . . . . . . . . 950 1 1 40 Logistic regression for binary response, log{pit/(1 − pit)} = α + β × age Need to account for correlation amongst repeated outcomes on same individual generalized estimating equations generalized linear mixed models ...

slide-26
SLIDE 26

Leptospirosis cohort study: analysing the problem

t1 t2 t3 t4

  • 1

1 1

time

infection events on each individual form a point process with time-varying intensity, Λi(t) follow-up times partially censor the point process record reduction to binary data represents additional censoring

slide-27
SLIDE 27

Leptospirosis cohort study: model formulation

Data: Yit = 0/1 t = 1, 2, 3, 4 i = 1, 2, ..., n Yit = 1 ⇔ at least one infection event model infection events as person-specific, inhomogeneous Cox processes, Λi(t) = exp{ri(t)′β + Ui + S(xi)}

P(Yit = 1|Λi(·)} = 1 − exp

tij

ti,j−1

Λi(u)du

slide-28
SLIDE 28

Inference: likelihood rules OK?

The likelihood principle Two data-sets x and y that generate identical likelihood functions are equivalent as evidence The law of likelihood If HA ⇒ pA(x) and HB ⇒ pB(x), then data x constitutes evidence in favour of A over B iff pA(x) > pB(x), and the likelihood ratio, pA(x)/pB(x) measures the strength of the evidence

slide-29
SLIDE 29

Inference: what’s the question?

Bayesian What should I believe? Decision-theoretic What should I do? Classical: What do the data tell me? Royall, R. (1997). Statistical Evidence: a likelihood paradigm. London: Chapman and Hall.

slide-30
SLIDE 30

Acknowledgements

CHICAS, Lancaster University : Paula Moraga, Barry Rowlingson, Ben Taylor APOC Madeleine Thomson, Hans Remme, Honorat Zoure, ... Yale University/Fiocruz, Brazil: Federico Costa, Jose Hagan, Albert Ko MRC: Methodology Research Grant G0902153