BaySTDetect: Detecting unusual temporal patterns in small area - - PowerPoint PPT Presentation

baystdetect detecting unusual temporal patterns in small
SMART_READER_LITE
LIVE PREVIEW

BaySTDetect: Detecting unusual temporal patterns in small area - - PowerPoint PPT Presentation

BaySTDetect: Detecting unusual temporal patterns in small area disease rates using Bayesian posterior model probabilities Guangquan Li, Sylvia Richardson, Lea Fortunato, Isma l Ahmed, Anna Hansell, Mireille Toledano and Nicky Best


slide-1
SLIDE 1

BaySTDetect: Detecting unusual temporal patterns in small area disease rates using Bayesian posterior model probabilities

Guangquan Li, Sylvia Richardson, Lea Fortunato, Isma¨ ıl Ahmed, Anna Hansell, Mireille Toledano and Nicky Best Department of Epidemiology and Biostatistics Imperial College London

ISBA2012 Kyoto, June 29, 2012

1 / 25

slide-2
SLIDE 2

Outline

Motivation BaySTDetect: Bayesian model choice for detecting unusual temporal patterns in small area data Simulation study Application1: Policy assessment Application2: Data mining cancer incidence Conclusions

2 / 25

slide-3
SLIDE 3

Outline

Motivation BaySTDetect: Bayesian model choice for detecting unusual temporal patterns in small area data Simulation study Application1: Policy assessment Application2: Data mining cancer incidence Conclusions

3 / 25

slide-4
SLIDE 4

Motivation

◮ For many areas of application such as small area estimates of

income, unemployment, crime rates and rates of chronic diseases, there is typically a general time trend that affects most areas similarly.

◮ However, abrupt changes may occur in a particular area due

to, for example,

◮ emergence of localized risk factor(s); ◮ local policy implementation (e.g., health awareness or

screening campaigns);

◮ changes to health care provision or social structure of the local

population;

◮ local variations in diagnostic or coding practice; ◮ · · ·

◮ Detection of areas with unusual temporal patterns is therefore

important as a screening tool for further investigation.

4 / 25

slide-5
SLIDE 5

Motivation: Two applications

  • 1. COPD: Policy assessment
  • 2. TCR: Retrospective surveillance on cancer incidence

5 / 25

slide-6
SLIDE 6

Motivation: Two applications

  • 1. COPD: Policy assessment

◮ Industrial Injuries Disablement Benefit was made available for

miners developing COPD from 1992 onwards in the UK.

◮ There was a debate on whether this policy may have

differentially increased the likelihood of a COPD diagnosis in mining areas As miners with other respiratory problems with similar symptoms (e.g., asthma) could potentially have benefited from this scheme.

  • 2. TCR: Retrospective surveillance on cancer incidence

5 / 25

slide-7
SLIDE 7

Motivation: Two applications

  • 1. COPD: Policy assessment

◮ Industrial Injuries Disablement Benefit was made available for

miners developing COPD from 1992 onwards in the UK.

◮ There was a debate on whether this policy may have

differentially increased the likelihood of a COPD diagnosis in mining areas As miners with other respiratory problems with similar symptoms (e.g., asthma) could potentially have benefited from this scheme.

  • 2. TCR: Retrospective surveillance on cancer incidence

◮ to highlight areas with a potential need for further

investigation and/or intervention

5 / 25

slide-8
SLIDE 8

Problems in small area detection

  • 1. Sparse data (small number of cases)

◮ BaySTDetect employs the Bayesian multilevel modelling

framework to allow appropriate information borrowing.

6 / 25

slide-9
SLIDE 9

Problems in small area detection

  • 1. Sparse data (small number of cases)

◮ BaySTDetect employs the Bayesian multilevel modelling

framework to allow appropriate information borrowing.

  • 2. Multiple comparisons are made

◮ A Bayesian procedure is used in BaySTDetect to derive

decision rules which enable the control of the false discovery rate (FDR).

6 / 25

slide-10
SLIDE 10

Outline

Motivation BaySTDetect: Bayesian model choice for detecting unusual temporal patterns in small area data Simulation study Application1: Policy assessment Application2: Data mining cancer incidence Conclusions

7 / 25

slide-11
SLIDE 11

BaySTDetect: Model specification

Data level yit ∼ Poisson(µit · Eit) Modelling underlying risks

log(μit) ¡ Common ¡ )me ¡trend ¡ Area-­‑specific ¡)me ¡trends ¡ Common ¡ spa)al ¡pa6ern ¡

Model ¡1: ¡Time ¡ trend ¡pa-ern ¡is ¡the ¡ same ¡for ¡all ¡areas ¡ ¡ Model ¡2: ¡Time ¡ trends ¡are ¡es0mated ¡ independently ¡for ¡ each ¡area ¡

8 / 25

slide-12
SLIDE 12

BaySTDetect: Model specification

Data level yit ∼ Poisson(µit · Eit) Modelling underlying risks

log(μit) ¡ Common ¡ )me ¡trend ¡ Area-­‑specific ¡)me ¡trends ¡ Common ¡ spa)al ¡pa6ern ¡

Model ¡1: ¡Time ¡ trend ¡pa-ern ¡is ¡the ¡ same ¡for ¡all ¡areas ¡ ¡ Model ¡2: ¡Time ¡ trends ¡are ¡es0mated ¡ independently ¡for ¡ each ¡area ¡ Selection A model indicator zi indicates for each area whether Model 1 (zi = 1) or Model 2 (zi = 0) is supported by the data. µit = zi · µ(M1)

it

+ (1 − zi) · µ(M2)

it

8 / 25

slide-13
SLIDE 13

BaySTDetect: Model specification

yit ∼ Poisson(Eit · µit) log(µit) =

  • α0 + ηi + γt

Model 1 for all i, t ui + ξit Model 2 for all i, t.

9 / 25

slide-14
SLIDE 14

BaySTDetect: Model specification

yit ∼ Poisson(Eit · µit) log(µit) =

  • α0 + ηi + γt

Model 1 for all i, t ui + ξit Model 2 for all i, t. Model 1 ηi ∼ spatial BYM model Common spatial pattern γt ∼ random walk [RW(σ2

γ)]

Common temporal pattern

9 / 25

slide-15
SLIDE 15

BaySTDetect: Model specification

yit ∼ Poisson(Eit · µit) log(µit) =

  • α0 + ηi + γt

Model 1 for all i, t ui + ξit Model 2 for all i, t. Model 1 ηi ∼ spatial BYM model Common spatial pattern γt ∼ random walk [RW(σ2

γ)]

Common temporal pattern Model 2 ui ∼ N(0, 1000) ξi,t ∼ random walk [RW(σ2

ξ,i)]

Area-specific temporal pattern

9 / 25

slide-16
SLIDE 16

BaySTDetect: Model specification

yit ∼ Poisson(Eit · µit) log(µit) =

  • α0 + ηi + γt

Model 1 for all i, t ui + ξit Model 2 for all i, t. Model 1 ηi ∼ spatial BYM model Common spatial pattern γt ∼ random walk [RW(σ2

γ)]

Common temporal pattern Model 2 ui ∼ N(0, 1000) ξi,t ∼ random walk [RW(σ2

ξ,i)]

Area-specific temporal pattern Selection zi ∼ Bern(0.95)

9 / 25

slide-17
SLIDE 17

A detection rule based on FDR

◮ Define fi = P(zi = 1|data) which is the probability that area i

belongs to the common trend model (Model 1)

◮ A small fi suggests that area i is unlikely to follow the common

trend.

10 / 25

slide-18
SLIDE 18

A detection rule based on FDR

◮ Define fi = P(zi = 1|data) which is the probability that area i

belongs to the common trend model (Model 1)

◮ A small fi suggests that area i is unlikely to follow the common

trend.

◮ We need to set a suitable cut-off value, C, such that areas

with fi < C are declared to be unusual.

10 / 25

slide-19
SLIDE 19

A detection rule based on FDR

◮ Define fi = P(zi = 1|data) which is the probability that area i

belongs to the common trend model (Model 1)

◮ A small fi suggests that area i is unlikely to follow the common

trend.

◮ We need to set a suitable cut-off value, C, such that areas

with fi < C are declared to be unusual.

◮ Put another way, if we declare area i to be unusual, then fi

can be thought of as the probability of false detection for that area.

◮ We chose C in such a way that we ensure that the average

probability of false detection (i.e. the average value of fi) amongst areas declared to be unusual is less than some pre-set level α.

10 / 25

slide-20
SLIDE 20

A detection rule based on FDR

◮ Define fi = P(zi = 1|data) which is the probability that area i

belongs to the common trend model (Model 1)

◮ A small fi suggests that area i is unlikely to follow the common

trend.

◮ We need to set a suitable cut-off value, C, such that areas

with fi < C are declared to be unusual.

◮ Put another way, if we declare area i to be unusual, then fi

can be thought of as the probability of false detection for that area.

◮ We chose C in such a way that we ensure that the average

probability of false detection (i.e. the average value of fi) amongst areas declared to be unusual is less than some pre-set level α.

◮ This procedure ensures that, on average, the number of false

positives is no more than (k × α), where k is the number of declared unusual areas.

10 / 25

slide-21
SLIDE 21

Outline

Motivation BaySTDetect: Bayesian model choice for detecting unusual temporal patterns in small area data Simulation study Application1: Policy assessment Application2: Data mining cancer incidence Conclusions

11 / 25

slide-22
SLIDE 22

Simulation: Setup

◮ Simulated data were based on the observed COPD mortality

data (see Li et al. 2012).

◮ Three departure patterns were considered. ◮ When simulating the data, either the original set of expected

counts from the COPD data or a reduced set (multiplying the

  • riginal by 1/5) were used.

◮ 15 areas (approx. 4%) were chosen to have the unusual trend

patterns.

◮ areas were chosen to cover a wide range expected count values

and overall spatial risks.

◮ Results were compared to those from the popular SaTScan

space-time scan statistic.

12 / 25

slide-23
SLIDE 23

Simulation: Unusual patterns

Figure: Illustration of the three departure patterns (red) with the common trend (black).

Pattern 1 Pattern 2 Pattern 3 Two departure magnitudes, q =1.5 and 2, were considered.

13 / 25

slide-24
SLIDE 24

Simulation: Sensitivity

Figure: Sensitivity of detecting the 15 truly unusual areas

14 / 25

slide-25
SLIDE 25

Outline

Motivation BaySTDetect: Bayesian model choice for detecting unusual temporal patterns in small area data Simulation study Application1: Policy assessment Application2: Data mining cancer incidence Conclusions

15 / 25

slide-26
SLIDE 26

COPD application: Detected areas (FDR=0.05)

16 / 25

slide-27
SLIDE 27

COPD application: Interpretation

◮ Results provide little support for hypothesis regarding the

industrial injuries policy

◮ only 3 out of 40 mining districts detected (Barnsley,

Carmarthenshire and Rotherham);

◮ unusual trend patterns in these areas are not consistent. 17 / 25

slide-28
SLIDE 28

COPD application: Interpretation

◮ Results provide little support for hypothesis regarding the

industrial injuries policy

◮ only 3 out of 40 mining districts detected (Barnsley,

Carmarthenshire and Rotherham);

◮ unusual trend patterns in these areas are not consistent.

◮ Two unusual districts (Lewisham and Tower Hamlets) with an

increasing trend (against a national decreasing trend) were identified in inner London.

◮ These areas are very deprived, with high in-migration and

ethnic minorities → might expect different trends to rest of country.

◮ In fact, Tower Hamlets has been commissioning various local

enhanced services to tackle high rates of COPD mortality since 2008.

◮ This rising trend could potentially have been recognised earlier

in the 1990s through using BaySTDetect as a surveillance tool.

17 / 25

slide-29
SLIDE 29

Outline

Motivation BaySTDetect: Bayesian model choice for detecting unusual temporal patterns in small area data Simulation study Application1: Policy assessment Application2: Data mining cancer incidence Conclusions

18 / 25

slide-30
SLIDE 30

TCR application: Data

◮ The Thames Cancer Registry (TCR) collects data on newly

diagnosed cases of cancer in the population of London and South East England.

◮ It is one of the largest cancer registries in Europe, covering a

population of over 12 million, and holds nearly 3 million cancer registration records.

◮ We perform a retrospective surveillance of time trends for

several cancer types using BaySTDetect

◮ aim to provide screening tool to detect areas with unusual

temporal patterns

◮ automatically flag-up areas warranting further investigations 19 / 25

slide-31
SLIDE 31

TCR application: results

melanoma , fdr= 0.05

4 areas 4 areas 3 areas 9 areas 6 areas

0.5 1.0 1.5 2.0 Period Relative Risk 81−84 85−88 89−92 93−96 97−00 01−04 05−08 melanoma , fdr= 0.05 Overall trend 4 areas 4 areas 3 areas 9 areas 6 areas

Melanoma, ¡FDR=0.05 ¡

Time ¡period ¡

20 / 25

slide-32
SLIDE 32

Post-processing the detected trends

29UMGT 22UHHP 26UFGH 22UQGT 00AWGC 26UJFX 00ASHB 26UCHD 26UJGQ 00AGGE 43UDGA 43ULGR 00LCPB 00ADGW 43UMFU 00AUFY 00AJGY 00APGK 00ALHF 00ANGA 45UDGQ 00BBGX 29UBHR 29UCGF 26UJGC 26UEGJ 00AYGL 00ASGJ 00BKGQ 00AKGP 00BHGR 29UHHE 00BCGU 00BHGK 00BCFZ 00BKGR 00BFGE 43UFGN 00BAGM 00MLNP 00BFGN 45UBFT 00BJGG 00BFGS 21UDFU 00AUGM 00ANGC 29UNHA 21UGGJ 00BJGM 00ATGB 00ALGP 00AFGM 00AFGG 0.0 0.5 1.0 1.5 2.0 2.5

Cluster Dendrogram

hclust (*, "complete") d Height 1 2 3 4 5 6 7 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 Period Relative Risk Overall trend 54 areas 1 2 3 4 5 6 7 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 Period Relative Risk Overall trend 42 areas 12 areas 1 2 3 4 5 6 7 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 Period Relative Risk Overall trend 30 areas 12 areas 12 areas 1 2 3 4 5 6 7 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 Period Relative Risk Overall trend 20 areas 12 areas 10 areas 12 areas 1 2 3 4 5 6 7 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 Period Relative Risk Overall trend 20 areas 12 areas 10 areas 5 areas 7 areas

1 ¡cluster ¡ 2 ¡clusters ¡ 3 ¡clusters ¡ 4 ¡clusters ¡ 5 ¡clusters ¡

Breast ¡cancer ¡ FDR=0.2 ¡ ¡

Black ¡line ¡= ¡ ¡ common ¡trend ¡ Coloured ¡lines ¡= ¡ average ¡local ¡trend ¡ in ¡each ¡cluster ¡

21 / 25

slide-33
SLIDE 33

Outline

Motivation BaySTDetect: Bayesian model choice for detecting unusual temporal patterns in small area data Simulation study Application1: Policy assessment Application2: Data mining cancer incidence Conclusions

22 / 25

slide-34
SLIDE 34

Conclusions

◮ We have proposed a Bayesian space-time method for

retrospective detection of unusual time trends;

◮ Simulation study has shown good performance of the model

in detecting various realistic departures with relatively modest sample sizes

◮ We have demonstrated the use of BaySTDetect in policy

assessment and in data mining;

◮ Implemented in R and WinBUGS, BaySTDetect enables

real-time analysis of routinely collected data;

◮ Papers and WinBUGS codes for this model are available on

www.bias-project.org.uk.

23 / 25

slide-35
SLIDE 35

Acknowledgement

◮ This project is funded by the ESRC National Center for

Research Methods through the BIAS II project.

◮ Thanks to the Thames Cancer Registry and the Small Area

Health Statistics Unit (SAHSU) for providing the cancer incidence data.

Thank you!!

24 / 25

slide-36
SLIDE 36

References

  • 1. Li G, Best N, Hansell A, Ahmed I, and Richardson S.

BaySTDetect: Detecting unusual temporal patterns in small area data via Bayesian model choice, Biostatistics, 2012;

  • 2. Li G, Richardson S, Fortunato L, Ahmed I, Hansell A,

Toledano M and Best N. Data mining cancer registries: retrospective surveillance of small area time trends in cancer incidence using BaySTDetect , ICDM 2011 Workshops Proceeding.

25 / 25