Supported by NIH R01 ES009411 Donna Spiegelman, Sc.D. Professor of - - PowerPoint PPT Presentation

โ–ถ
supported by nih r01 es009411 donna spiegelman sc d
SMART_READER_LITE
LIVE PREVIEW

Supported by NIH R01 ES009411 Donna Spiegelman, Sc.D. Professor of - - PowerPoint PPT Presentation

Public Health and Statistics In India IISA-Harvard-SAMSI May 2016 Supported by NIH R01 ES009411 Donna Spiegelman, Sc.D. Professor of Epidemiologic Methods Departments of Epidemiology, Biostatistics, Nutrition and Global Health


slide-1
SLIDE 1

Donna Spiegelman, Sc.D.

Professor of Epidemiologic Methods

Departments of Epidemiology, Biostatistics, Nutrition and Global Health stdls@hsph.harvard.edu www.hsph.harvard.edu/donna-spiegelman/

Public Health and Statistics In India IISA-Harvard-SAMSI May 2016 Supported by NIH R01 ES009411

slide-2
SLIDE 2

hsph.harvard.edu/donna-spiegelman/

2 2

slide-3
SLIDE 3

hsph.harvard.edu/donna-spiegelman/

3

Over the past 10 years, our group has developed methods that adjust for exposure measurement error in point and interval estimates of relative risk and other measures of association:

  • Regression calibration for main study/external validation study designs
  • Regression calibration for multiple surrogates for the same exposure
  • Regression calibration with heteroscedastic error
  • Regression calibration for main study/internal validation study designs
  • Regression calibration for survival data analysis with baseline exposures, time-varying

point exposures, and exposure metrics that are functions of the exposure history Methods have been motivated by studies in environmental and occupational epidemiology conducted at the Harvard School of Public Health

Introduction

slide-4
SLIDE 4

hsph.harvard.edu/donna-spiegelman/

4

Time permits a brief overview of a few of these:

  • Regression calibration for main study/external validation study designs
  • Regression calibration for main study/internal validation study designs
  • Regression calibration for multiple surrogates for the same exposure
  • Regression calibration with heteroscedastic error

In the future, I can discuss:

  • Regression calibration for survival data analysis with baseline

exposures, time-varying point exposures, and exposure metrics that are functions of the exposure history

slide-5
SLIDE 5

hsph.harvard.edu/donna-spiegelman/

5

๐‘œ1 : Number of participants in main study ๐‘œ2 : Number of participants in validation study ๐ธ : Binary health outcome ๐‘Œ : โ€œTrueโ€ exposure ๐‘Ž : Surrogate exposure ๐‘‰ : t perfectly measured covariates (e.g. age, race, smoking status) Measured on all participants in the main and validation studies ๐ธ๐‘—, ๐‘Ž๐‘—, ๐‘ฝ๐’‹ , ๐‘— = 1, โ€ฆ , ๐‘œ1 Main study ๐‘Œ๐‘—, ๐‘Ž๐‘—, ๐‘ฝ๐‘— , ๐‘— = ๐‘œ1 + 1, โ€ฆ , ๐‘œ1 + ๐‘œ2 External validation study ๐ธ๐‘—, ๐‘Œ๐‘—, ๐‘Ž๐‘—, ๐‘ฝ๐‘— , ๐‘— = ๐‘œ1 + 1, โ€ฆ , ๐‘œ1 + ๐‘œ2 Internal validation study

Notation

slide-6
SLIDE 6

hsph.harvard.edu/donna-spiegelman/

6

Assumptions

  • True exposure (๐‘Œ) and the t-vector of covariates (๐‘‰) are related to the

probability of binary outcome (๐ธ) by the logistic function: ๐‘š๐‘๐‘•๐‘—๐‘ข Pr ๐ธ = 1 = ๐›พ0 + ๐‘Œ๐›พ1 + ๐‘ฝโ€ฒ๐œธ๐Ÿ‘ where ๐›พโ€ฒ2 = (๐›พ21, ๐›พ22, โ€ฆ , ๐›พ2๐‘ข).

  • Linear regression model is appropriate to relate the surrogates (๐‘Ž) and

the t covariates (๐‘‰) to the true exposure: ๐‘Œ = ๐›ฟ0 + ๐‘Ž๐›ฟ1 + ๐‘ฝโ€ฒ๐œน๐Ÿ‘ + ๐œ where ๐น ๐œ = 0, ๐‘Š๐‘๐‘  ๐œ = ๐œ๐‘Œ|๐‘Ž,๐‘‰

2

  • ๐‘Ž is a surrogate if Pr ๐ธ ๐‘Œ, ๐‘ฝ, ๐‘Ž = Pr ๐ธ ๐‘Œ, ๐‘ฝ), that is, knowledge of the

surrogates provides no additional information if the true exposure is known.

  • ๐œ~๐‘‚(0, ๐œ๐‘Œ|๐‘Ž,๐‘‰

2

) and Pr (๐ธ) is small, or ๐›พ1

2๐œ๐‘Œ|๐‘Ž,๐‘‰ 2

small.

slide-7
SLIDE 7

hsph.harvard.edu/donna-spiegelman/

7

Rosner et al. regression calibration method for MS/EVS

The (Rosner, Willettt, Spiegelman,1989; Rosner, Spiegelman, Willett, 1990; Rosner, Spiegelman, Willett, 1992) version of regression calibration for MS/EVS design:

3-step algorithm: 1. In the main study, regress ๐‘ on ๐’‚ and ๐‘ฝ to obtain

๐›พ

โˆ—, ๐œธ

1

โˆ—, ๐œธ

2

โˆ— where now ๐’‚ is a ๐‘ก ร— 1 vector of mis-measured

continuous covariates and ๐‘ฝ is a ๐‘ข ร— 1 vector of perfectly measured covariates.

slide-8
SLIDE 8

hsph.harvard.edu/donna-spiegelman/

8

Rosner et al. regression calibration method for MS/EVS

2. In the validation study, regress ๐’€ on ๐’‚ and ๐‘ฝ to obtain ๐›ฟ

0, ๐›ฅ

1, ๐›ฅ 2 where ๐›ฟ

0 is a ๐‘ก ๐‘ฆ 1 vector of regression intercepts, ๐›ฅ

1 is a ๐‘ก ร— ๐‘ก matrix of slopes for the regression of ๐’€ on ๐’‚, adjusted for ๐‘ฝ, and

๐›ฅ

1 is a ๐‘ก ร— ๐‘ข matrix of slopes for the regression of ๐’€ on ๐‘ฝ, adjusted for ๐’‚.

8

slide-9
SLIDE 9

hsph.harvard.edu/donna-spiegelman/

9

Rosner et al. regression calibration method for MS/EVS

3. Correct estimates of effect for measurement error, by

๐›พ 1 = ๐›พ 1

โˆ—

๐›ฟ 1 , ๐›พ 0= ๐›พ

โˆ— โˆ’ ๐›พ

1๐›ฟ 0, ๐›พ 2= ๐›พ 2

โˆ— โˆ’ ๐›พ

1๐›ฟ 2

  • r ๐›ฅ

1 ๐‘ˆ

๐›ฅ

2 ๐‘ˆ

1

โˆ’1 ๐›พ 1

โˆ—๐‘ˆ

๐›พ 2

โˆ—๐‘ˆ

=

๐›พ 1

๐‘ˆ

๐›พ 2

๐‘ˆ

where ๐Ÿ is a ๐‘ก ร— ๐‘ข matrix of 0โ€™s and ๐‘ฑ is a ๐‘ข ร— ๐‘ข identity matrix,

๐‘ฑ ๐‘ข ร— ๐‘ข = 1 โ‹ฏ 1 โ‹ฎ โ‹ฎ โ‹ฑ โ‹ฏ 1

9

slide-10
SLIDE 10

hsph.harvard.edu/donna-spiegelman/

10

Rosner et al. regression calibration method for MS/EVS

4. Use multivariate delta method to derive variance, e.g.,

๐‘Š๐‘๐‘  ๐›พ 1 = ๐‘Š๐‘๐‘  ๐›พ 1

โˆ—

๐›ฟ

1 2

+ (๐›พ 1

โˆ—)2๐‘Š๐‘๐‘ 

๐›ฟ 1 ๐›ฟ

1 4

See Appendices 2 and 3 of Rosner et al., 1990 for a derivation of the variance of ๐›พ 1

๐‘ˆ

๐›พ 2

๐‘ˆ

, again using the multivariate delta method.

slide-11
SLIDE 11

hsph.harvard.edu/donna-spiegelman/

11

Regression calibration (Carroll et al.)

Given validation or reliability data, the Carroll et al. version of the regression calibration estimator follows (when ๐‘œ๐‘ ๐‘— = ๐‘œ๐‘†๐ฝ = 2): Sketch of Algorithm (univariate case) 1. Estimate ๐›ฟ0 and ๐›ฟ1 in the validation study from the regression of ๐‘Œ๐‘— on

๐‘Ž๐‘—, ๐‘— = 1, โ€ฆ , ๐‘œ๐‘Š or in the reliability study from the regression of ๐‘Ž๐‘—1

  • n ๐‘Ž๐‘—2, ๐‘— = 1, , , ๐‘œ๐‘†, where ๐‘œ๐‘ ๐‘— = ๐‘œ๐‘†๐ฝ = 2

2. Estimate ๐‘Œ

๐‘— = ๐›ฟ 0 + ๐›ฟ 1๐‘Ž๐‘— + ๐‘“๐‘—, ๐‘— = 1, โ€ฆ , ๐‘œ๐‘ in the main study.

slide-12
SLIDE 12

hsph.harvard.edu/donna-spiegelman/

12

Regression calibration (Carroll et al.)

3. Run usual regression model for ๐‘ on ๐‘Œ in the main study to obtain estimates of effect adjusted for measurement error, i.e., fit model ๐‘• ๐น ๐‘ ๐‘— ๐‘Œ๐‘—

= ๐›พ0 + ๐›พ1๐‘Œ 1

in the main study, where ๐‘•[โ‹…] is a link function, e.g., identity for linear regression, log for Poisson and log-binomial regression, logit for logistic regression, probit for probit regression to obtain estimates of ๐›พ1 and ๐›พ0 that are corrected for measurement error, at least โ€˜approximatelyโ€™. 4. Variance must be adjusted as well and cannot be obtained from the standard regression software.

RSW and Carroll et al. versions are identical in GLMs (Thurston SW,

Spiegelman D, Ruppert D. โ€œEquivalence of regression calibration methods for main study/external validation study designsโ€. Journal of Statistical Planning and Inference, 2003; 113:527-539)

slide-13
SLIDE 13

hsph.harvard.edu/donna-spiegelman/

13

An example

Home Endotoxin Exposure and Wheeze in Infants: Correction for Bias Due to Exposure Measurement Error

Nora Horick, Edie Weller, Donald K. Milton, Diane R. Gold, Ruifeng Li, and Donna Spiegelman

Department of Biostatistics and Department of Environmental Health, Harvard School of Public Health, Boston, Massachusetts, USA; Channing Laboratory, Harvard Medical School, Boston, Massachusetts, USA; Department of Epidemiology, Harvard School of Public Health, Boston, Massachusetts, USA

Environmental Health Perspectives Volume 114, Number 1, January 2006

slide-14
SLIDE 14

hsph.harvard.edu/donna-spiegelman/

14

slide-15
SLIDE 15

hsph.harvard.edu/donna-spiegelman/

15

slide-16
SLIDE 16

hsph.harvard.edu/donna-spiegelman/

16

Download %blinplus SAS macro at

http://www.hsph.harvard.edu/donna-spiegelman/software/blinplus-macro/

slide-17
SLIDE 17

hsph.harvard.edu/donna-spiegelman/

17

Regression calibration for logistic regression with multiple surrogates for one exposure

Edie A. Weller, Donna Spiegelman, Don Milton, Ellen Eisen Departments of Biostatistics, Epidemiology, and Environmental Health Harvard School of Public Health and Dana Farber Cancer Institute Journal of Statistical Planning and Inference, 2007; 137:449-461

  • Occupational exposures often characterized by numerous factors of the workplace and

work duration in a particular area ==> multiple surrogates describe one exposure.

  • Validation study: Personal exposure is commonly measured on a subset of the

subjects and these values are then used to estimate average exposure by job or exposure zone.

  • No adjustment for bias or uncertainty in the exposure estimates.
  • Standard methods typically assume that there is one surrogate for each exposure (for

example, Rosner et al, 1989, 1990).

  • Propose adjustment method which allows for multiple surrogates for one exposure

using a regression calibration approach.

slide-18
SLIDE 18

hsph.harvard.edu/donna-spiegelman/

18

Main Study

  • To assess the relationship between exposure to metal working fluids

(MWF) and respiratory function (United Automobile Workers Union and General Motors Corporation sponsored study, Greaves et al, 1997).

  • Outcome here is prevalence of wheeze
  • Job characteristics include metal working fluid (MWF) type, plant and

machine operation (grinding or not).

  • Assembly workers are considered the non-exposed group.
  • Possible confounders include age, smoking status and race.
slide-19
SLIDE 19

hsph.harvard.edu/donna-spiegelman/

19

Exposure Assessment Study (generically, the validation study)

  • Exposure was measured in various job zones (Woskie et al., 1994).
  • Intensity of exposure to MWF aerosol measured by the thoracic

aerosol fraction (i.e. the sum of the two smallest size fractions measured with the personal monitors).

  • Full shift (8 hour) personal samples of aerosol exposure in breathing

zone of automobile workers were collected in various job zones.

slide-20
SLIDE 20

hsph.harvard.edu/donna-spiegelman/

20

Assumptions

  • True exposure (๐‘Œ) and the ๐‘ข-vector of covariates (๐‘ฝ) are related to

the probability of binary outcome (๐ธ) by the logistic function: ๐‘š๐‘๐‘•๐‘—๐‘ข Pr ๐ธ = 1 = ๐›พ0 + ๐‘Œ๐›พ1 + ๐‘ฝ๐›พ2 where ๐œธโ€ฒ๐Ÿ‘ = (๐›พ21, ๐›พ22, โ€ฆ , ๐›พ2๐‘ข).

  • Linear regression model is appropriate to relate the ๐’” surrogates (๐‘ฟ)

and the s covariates (๐’‚) to the true exposure: ๐‘Œ = ๐›ฟ0 + ๐‘ฟโ€ฒ๐œน๐Ÿ + ๐‘ฝโ€ฒ๐œน๐Ÿ‘ + ๐œป where ๐น ๐œ = 0, ๐‘Š๐‘๐‘ (๐œ) = ๐œ๐‘Œ|๐‘ฝ,๐‘‹

2

  • ๐‘ฟ is a surrogate if Pr ๐ธ ๐‘Œ, ๐‘ฟ, ๐‘ฝ = Pr ๐ธ ๐‘Œ, ๐‘ฝ , that is, knowledge of

the surrogates provides no additional information if the true exposure is known.

  • ๐œ~๐‘‚(0, ๐œ๐‘Œ|๐‘ฟ,๐‘ฝ

2

) and Pr (๐ธ) small, or ๐›พ1

2๐œ๐‘Œ|๐‘ฟ,๐‘ฝ 2

small

slide-21
SLIDE 21

hsph.harvard.edu/donna-spiegelman/

21

Goal: to obtain point and interval estimates of ๐›พ and ๐‘“๐›พ relating exposure (๐‘Œ) to outcome (๐ธ) adjusting for the covariates (๐‘ฝ)

Problem

  • Quantitative measure of exposure (๐‘Œ) is not measured on all subjects

โ€“ ๐‘ฟ is measured on all ๐‘œ1 of the subjects โ€“ ๐‘Œ and ๐‘ฟ measured on ๐‘œ2 subjects

  • Multiple surrogates, ๐‘ฟ, describe exposure

Solution: An extension to two closely related approaches

  • Rosner, Spiegelman and Willett (RSW, 1989, 1990)
  • Carroll, Ruppert and Stefanski (CRS, 1995)
slide-22
SLIDE 22

hsph.harvard.edu/donna-spiegelman/

22

Procedure

Propose the following approach which follows RSW and assumes normality of ๐œ and rare disease, or that ๐›พ1

2๐œ๐‘Œ|๐‘ฟ,๐‘ฝ 2

is small 1. Estimate ๐œท from a logistic regression model of ๐ธ on ๐‘ฟ and in ๐‘œ1 subjects in main study

๐‘š๐‘๐‘•๐‘—๐‘ข Pr ๐ธ = 1 = ๐›ฝ 0 + ๐‘ฟโ€ฒ๐œท ๐Ÿ +๐‘ฝโ€ฒ๐›ฝ ๐Ÿ‘

2. Estimate ๐›ฟ from a measurement error model among the ๐‘œ2 validation study subjects using ordinary least squares regression.

๐‘Œ = ๐›ฟ 0 + ๐‘ฟโ€ฒ๐›ฟ ๐Ÿ +๐‘ฝโ€ฒ๐›ฟ ๐Ÿ‘

SAS PROC GENMOD or PROC LOGISTIC for step 1, PROC REG for step 2

slide-23
SLIDE 23

hsph.harvard.edu/donna-spiegelman/

23

  • 3. Optimally combine the adjusted estimates for each

surrogate ๐›พ ๐‘ฟ, where ๐›พ ๐‘ฟ = ๐›ฅ

1 โˆ’1๐›ฝ

1, ๐›ฅ

1 = ๐‘’๐‘—๐‘๐‘•(๐›ฟ

1) ๐œโ€ฒ = (1โ€ฒ๐›ต ๐›พ๐‘‹

โˆ’1 1)โˆ’11โ€ฒ๐›ต

๐›พ๐‘‹

โˆ’1 . 1 = 1,1, โ€ฆ , 1 โ€ฒ

๐›ต ๐›พ๐‘‹ is the estimated variance-covariance matrix of ๐›พ ๐‘ฟ ๐›ต ๐›พ๐‘‹ = ๐œ–๐œธ๐‘ฟ ๐œ– ๐›ฝ1, ๐›ฟ1

โ€ฒ ๐›ฝ 1,๐›ฟ 1

๐›ต ๐›ฝ1 ๐›ต ๐›ฟ1 ๐œ–๐œธ๐‘ฟ ๐œ– ๐›ฝ1, ๐›ฟ1

๐›ฝ 1,๐›ฟ 1

SAS macro downloadable from my website to accomplish step 3; input to the macro is the output from PROC LOGISTIC and PROC REG

http://www.hsph.harvard.edu/donna-spiegelman/software/multsurr-method/

ห† ห† ห† ๏ข ๏‚ข ๏€ฝ

W

ฯ„ ฮฒ

slide-24
SLIDE 24

hsph.harvard.edu/donna-spiegelman/

24

Results from logistic regression model for wheeze. GM/UAW main study (n1 = 1040). โ€œTrueโ€ Exposure (X) is thoracic aerosol fraction (mg/m3 ) measures on n2 = 83 workers Variable

Uncorrected

P-value

Corrected

P-value Exposure1 (mg/m3 ) 2.875 (1.353, 6.108) 0.006 Surrogates (W) Plant 2 Grinding Straight Synthetic 2.109 (1.391, 3.198) 0.706 (0.374, 1.332) 1.641 (1.119, 2.407) 1.851 (1.200, 2.854) < 0.001 0.282 0.011 0.005 Covariates (Z) Age 30-39 Age 40-49 Age 50+ Race Current Smoker 0.897 (0.615, 1.307) 0.834 (0.512, 1.358) 0.912 (0.544, 1.528) 1.173 (0.796, 1.728) 3.042 (2.210, 4.188) 0.571 0.465 0.726 0.420 < 0.001 0.965 (0.648, 1.437) 0.853 (0.513, 1.418) 0.914 (0.535, 1.561) 1.166 (0.782, 1.740) 2.978 (2.144, 4.137) 0.861 0.540 0.741 0.451 < 0.001

1

Estimated GLS weights are 0.857 for straight, 0.127 for synthetic, 0.15 for grinding, and 0.0001 for plant

slide-25
SLIDE 25

hsph.harvard.edu/donna-spiegelman/

25

Regression Calibration With Heteroscedastic Variance

Donna Spiegelman, Roger Logan, Douglas Grove International Journal of Biostatistics: 2011 Vol. 7, Issue 1, Article 4. PMCID: PMC3404553

Conclusion: For all practical purposes, no need to worry about heteroscedasticity, that is, if ๐‘Š๐‘๐‘ (๐‘Œ๐‘—|๐‘Ž๐‘—, ๐‘ฝ๐‘—) varies with ๐‘—, little impact on bias or efficiency of RC method

slide-26
SLIDE 26

hsph.harvard.edu/donna-spiegelman/

26

A comparison of regression calibration methods for designs with internal validation data

Sally W. Thurston , Paige L. Williams, Russ Hauser, Howard Hu, Mauricio Hernandez-Avila, and Donna Spiegelman

Department of Biostatistics and Computational Biology, University of Rochester Medical Center, 601 Elmwood Avenue, P.O. Box 630, Rochester, NY 14642, USA Department of Biostatistics, Harvard School of Public Health, USA Department of Environmental Health, Harvard School of Public Health, USA Centro de Investigaciones en Salud Poblacional, Instituto Nacional de Salud Publica, Cuernavaca, Morelos, Mexico Department of Epidemiology, Harvard School of Public Health, USA Channing Laboratory, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, US Journal of Statistical Planning and Inference, 2005; 131:175-190.

slide-27
SLIDE 27

hsph.harvard.edu/donna-spiegelman/

27

ARE of optimal method compared to Carroll method

slide-28
SLIDE 28

hsph.harvard.edu/donna-spiegelman/

28

Conclusions I

We can accommodate the following situations:

  • multiple surrogates for a single mis-measured exposure
  • heteroscedastic measurement error
  • internal and hybrid validation study designs
  • cumulative exposure variables and other functions of the exposure history in cohort

studies User-friendly SAS macros are available to implement many of these procedures

  • http://www.hsph.harvard.edu/donna-spiegelman/software/blinplus-macro/
  • http://www.hsph.harvard.edu/donna-spiegelman/software/multsurr-

method/

  • http://www.hsph.harvard.edu/donna-spiegelman/software/rrc-macro/
  • http://www.mep.ki.se/%7Emarrei/software/

(for optimal main study / validation study design)

slide-29
SLIDE 29

hsph.harvard.edu/donna-spiegelman/

29

Conclusions II

  • Bias due to exposure measurement error is a major limitation to the

validity of occupational and environmental studies

  • Methods have been developed which accommodate the features of

study design and data distributions found in such studies

  • These methods implement explicit adjustments for this source of

bias, using the exposure validation study to characterize the magnitude and other features of the measurement error

  • Point and interval estimates of effect are adjusted
  • Papers have been published applying these methods to the analysis
  • f occupational and environmental studies: you wonโ€™t be the first!
  • Just as we routinely adjust for confounding, we can routinely adjust

for measurement error

slide-30
SLIDE 30

hsph.harvard.edu/donna-spiegelman/

30

Acknowledgements

  • NIEHS
  • Edie Weller, Ruifeng Li, Don Milton, Ellen Eisen, Barbara Valanis, Sally

Thurston, Jon Samet, Paige Williams, Russ Hauser, Roger Logan, Jon Samet, Doug Grove, Doug Dockery, Lucas Neas, Nora Horrick, Diane Gold, Mauricio Hernandez, Howard Hu, Aparna Keshaviah

  • Xiaomei Liao, Molin Wang, Biling Hong
  • Francine Laden, Helen Suh, Jaime E. Hart, Joel Kaufman, Adam Szpiro,

Lianne Sheppard, Ronald Williams, Robin C. Puett, Marianthi-Anna Kioumourtzoglou

  • Alan Berkeley, Emily Long

Thank you!