Using follow-up data to adjust for selective non-participation in - - PowerPoint PPT Presentation

using follow up data to adjust for selective non
SMART_READER_LITE
LIVE PREVIEW

Using follow-up data to adjust for selective non-participation in - - PowerPoint PPT Presentation

Research problem Data Bayesian modelling Summary Using follow-up data to adjust for selective non-participation in cross-sectional setting Juho Kopra University of Jyv askyl a Department of Mathematics and Statistics NoPaHES-project


slide-1
SLIDE 1

1

Research problem Data Bayesian modelling Summary

Using follow-up data to adjust for selective non-participation in cross-sectional setting

Juho Kopra

University of Jyv¨ askyl¨ a Department of Mathematics and Statistics NoPaHES-project

30th August 2017

Juho Kopra 30th August 2017

slide-2
SLIDE 2

2

Research problem Data Bayesian modelling Summary

Research problem

Data from cross-sectional surveys (Finrisk studies) No re-contact data is available for 1972-2002.

→ Previous solution cannot be used.

Instead, we utilize the follow-up data about the smoking-related diseases: Lung cancer and Chronic Obstructive Pulmonary Disease (COPD, keuhkoahtaumatauti in Finnish).

Juho Kopra 30th August 2017

slide-3
SLIDE 3

3

Research problem Data Bayesian modelling Summary

Data we utilized from FINRISK studies: People aged 25-59 years-old (30-59 years-old for 1972 and 1977). Data from 1972, 1977, 1982, 1987, 1992, 1997, 2002 and 2007. We use two areas of Finland: Northern Karelia and North Savonia. In total, the data contain 52,325 persons including 9,928 persons with missing smoking indicator.

Juho Kopra 30th August 2017

slide-4
SLIDE 4

4

Research problem Data Bayesian modelling Summary

Variables provided by FINRISK survey samples Background knowledge for both participants and non-participants: Area, Age, Gender and Study year. Self-reported indicator of daily smoking.

Juho Kopra 30th August 2017

slide-5
SLIDE 5

5

Research problem Data Bayesian modelling Summary

Combining information from the National Hospitalization Register and Cause of Death Register, we build a follow-up: Up to the end of 2012. Available for both participants and non-participants. Persons age at the time of diagnosis (lung cancer or COPD). Death to other causes and the end of the follow-up are treated as censoring.

→ Persons with no diagnosis have censoring.

Juho Kopra 30th August 2017

slide-6
SLIDE 6

6

Research problem Data Bayesian modelling Summary Bayesian methodology (very) briefly Modelling using survival data and Bayesian modelling

Bayesian modelling

Juho Kopra 30th August 2017

slide-7
SLIDE 7

7

Research problem Data Bayesian modelling Summary Bayesian methodology (very) briefly Modelling using survival data and Bayesian modelling

Bayesian methodology (very) briefly

Bayesian approach combines the information provided by the data (via likelihood function) and subjective information about parameters of the model (via prior distribution). The scientist decides the prior distributions he wants to use. Results are called posterior distribution, which represents the combination of prior and the data.

Juho Kopra 30th August 2017

slide-8
SLIDE 8

8

Research problem Data Bayesian modelling Summary Bayesian methodology (very) briefly Modelling using survival data and Bayesian modelling

We utilized uninformative priors for most of the parameters (not all). The informative priors we used allow identifiability of our model while restricting the unrealistic posterior prevalences.

Juho Kopra 30th August 2017

slide-9
SLIDE 9

Modelling 1/2

Use Bayesian modelling to estimate smoking prevalence based

  • n the survival (follow-up) data.

Build a model from three submodels:

  • 1. Participation M given smoking Y and background information

X: P(M|X, Y )

  • 2. Smoking Y given the background information X: P(Y |X).
  • 3. Survival model for lung cancer or COPD disease age T given

smoking Y and background information X: P(T|X, Y )

Define an informative prior regarding submodel 1 to allow identifiability, and estimate the posterior for smoking prevalence. Fit the model and simultaneously impute the missing smoking indicators ˜ Y ∼ P(Y |M = 0, X, T) 9

slide-10
SLIDE 10

10

Research problem Data Bayesian modelling Summary Bayesian methodology (very) briefly Modelling using survival data and Bayesian modelling

Modelling 2/2

Participation P(M|X, Y ) is modelled using a logistic distribution explained by gender, study year, age, region and smoking Smoking P(Y |X) is modelled using a logistic distribution explained by year of birth. Coefficients vary by gender, region and study year. Survival model for follow-up data P(T|X, Y ) uses piecewise constant hasard model. The survival is explained by gender and smoking.

Juho Kopra 30th August 2017

slide-11
SLIDE 11

11

Research problem Data Bayesian modelling Summary Bayesian methodology (very) briefly Modelling using survival data and Bayesian modelling

Prior distributions

Participation model: Informative prior is required for the ”η” which models how smoking affects participation. η ∼ Logistic(µ = 0, s = 2.05−1) Risk factor model: Uninformative priors; N(0, 1000). Survival model: Baseline hasard is a priori monotonically

  • increasing. Others are uninformative priors; N(0, 1000).

Juho Kopra 30th August 2017

slide-12
SLIDE 12

12

Research problem Data Bayesian modelling Summary Bayesian methodology (very) briefly Modelling using survival data and Bayesian modelling

Model fitting

Models were implemented with Just Another Gibbs Sampler

  • software (JAGS). (Plummer, 2003)

The imputations for smoking indicator Yi are drawn from fully conditional distribution P(Yi|Mi = 0, Xi, Ti). The model fitting took 107 hours to complete (five days). The high absolute number of missing values (9,928) and computationally intensive algorithm (MCMC) explains the long running time.

Juho Kopra 30th August 2017

slide-13
SLIDE 13

13

Research problem Data Bayesian modelling Summary Bayesian methodology (very) briefly Modelling using survival data and Bayesian modelling

Simulation experiment

We generated randomly one data from the model we use. Model appears to be able to restore the original trends from the data.

Juho Kopra 30th August 2017

slide-14
SLIDE 14

Trend estimates for the simulated data:

North Karelia men

proportion of smokers

  • participants only

true trends Bayesian modelling 95 % credible interval 1972 1982 1992 2002 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55

North Karelia women

proportion of smokers

  • 1972

1982 1992 2002 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40

Northern Savonia men

proportion of smokers

  • 1972

1982 1992 2002 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55

Northern Savonia women

proportion of smokers

  • 1972

1982 1992 2002 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40

14

slide-15
SLIDE 15

Trend estimates for the FINRISK data:

North Karelia men

proportion of smokers

  • participants only

Bayesian modelling 95 % credible interval 1972 1982 1992 2002 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55

North Karelia women

proportion of smokers

  • 1972

1982 1992 2002 0.05 0.10 0.15 0.20 0.25 0.30

Northern Savonia men

proportion of smokers

  • 1972

1982 1992 2002 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55

Northern Savonia women

proportion of smokers

  • 1972

1982 1992 2002 0.05 0.10 0.15 0.20 0.25 0.30

15

slide-16
SLIDE 16

16

Research problem Data Bayesian modelling Summary

Summary

Follow-up data can be used in Bayesian modelling to estimate the prevalence of smoking although the survey data suffer from selective non-participation.

Long register-based follow-up is required. For the later years, which do not have lengthy follow-up, modelling assumptions can be made to provide different

  • scenarios. (2007 and 2012 luckily have re-contact data)

Bayesian model fitting requires informative prior and is computationally very demanding with large absolute amount

  • f missing values.

Juho Kopra 30th August 2017

slide-17
SLIDE 17

17

Research problem Data Bayesian modelling Summary

THANKS

Juho Kopra 30th August 2017

slide-18
SLIDE 18

18

Research problem Data Bayesian modelling Summary

References Bayesian models for data missing not at random in health examination surveys. Juho Kopra, Juha Karvanen and Tommi H¨ ark¨

  • anen. Accepted for publication in Statistical
  • Modelling. https://arxiv.org/abs/1610.03687

Plummer, M. (2003). JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling. In Proceedings of the 3rd international workshop on distributed statistical computing. 124, p. 125. Wien, Austria: Technische Universit at Wien.

Juho Kopra 30th August 2017