Case-control studies
C&H 16
Bendix Carstensen
Steno Diabetes Center
& Department of Biostatistics, University of Copenhagen bxc@steno.dk http://BendixCarstensen.com
Case-control studies C&H 16 Bendix Carstensen Steno Diabetes - - PowerPoint PPT Presentation
Case-control studies C&H 16 Bendix Carstensen Steno Diabetes Center & Department of Biostatistics, University of Copenhagen bxc@steno.dk http://BendixCarstensen.com PhD-course in Epidemiology, Department of Biostatistics, Tuesday 31
Steno Diabetes Center
& Department of Biostatistics, University of Copenhagen bxc@steno.dk http://BendixCarstensen.com
Case-control studies (C&H 16) 2/ 59
Case-control studies (C&H 16) 3/ 59
◮ In a follow-up study, rates among exposed and
◮ and hence the rate ratio by:
Case-control studies (C&H 16) 4/ 59
◮ In a case-control study we use the same cases,
◮ Therefore the rate ratio is estimated by:
◮ Controls represent risk time, not disease-free
Case-control studies (C&H 16) 5/ 59
s
Case-control studies (C&H 16) 6/ 59
s
Case-control studies (C&H 16) 7/ 59
s
Case-control studies (C&H 16) 8/ 59
Exposure
❅ ❅ ❅ ❅ p 1 − p
Failure E1 E0
✑✑✑ ◗◗◗ π1 1 − π1 ✑✑✑ ◗◗◗ π0 1 − π0
Selection F S F S
✟✟✟ ❍❍❍ 0.97 0.03 ✟✟✟ ❍❍❍ 0.01 0.99 ✟✟✟ ❍❍❍ 0.97 0.03 ✟✟✟ ❍❍❍ 0.01 0.99
Case (D1) Control (H1) Case (D0) Control (H0) pπ1 × 0.97 p(1 − π1) × 0.01 (1 − p)π0 × 0.97 (1 − p)(1 − π0) × 0.01 Probability
Case-control studies (C&H 16) 9/ 59
◮ The proportion of cases who smoke compared
◮ The mean age of cases compared to controls
Case-control studies (C&H 16) 10/ 59
Selection Failure Exposure Probability
❅ ❅ ❅ ❅
Not in study
❅ ❅ ❅
F S
✟✟✟✟ ❍❍❍❍ ✟✟✟✟ ❍❍❍❍
E1 (Cases) E0 E1 (Controls) E0 p × π1 × 0.97 (1 − p) × π0 × 0.97 p × (1 − π1) × 0.01 (1 − p) × (1 − π0) × 0.01 Note: Parameters in the previous tree not on these branches.
Case-control studies (C&H 16) 11/ 59
Case-control studies (C&H 16) 12/ 59
Case-control studies (C&H 16) 13/ 59
Selection Exposure Failure Probability
❅ ❅ ❅ ❅
Not in study
❅ ❅ ❅ p 1 − p
E1 E0
✟✟✟✟ ❍❍❍❍ π1 1 − π1 ✟✟✟✟ ❍❍❍❍ π0 1 − π0
F S F S p × π1 × 0.97 p × (1 − π1) × 0.01 (1 − p) × π0 × 0.97 (1 − p) × (1 − π0) × 0.01
Case-control studies (C&H 16) 14/ 59
Case-control studies (C&H 16) 15/ 59
Case-control studies (C&H 16) 16/ 59
Case-control studies (C&H 16) 17/ 59
18/ 59
Case-control studies (C&H 16) 19/ 59
× ÷ exp
Case-control studies (C&H 16) 20/ 59
Case-control studies (C&H 16) 21/ 59
Case-control studies (C&H 16) 22/ 59
D1 + 1 H1 + 1 D0 + 1 H0
101 + 1 46028 + 1 159 + 1 34594 = 0.127
× ÷ exp(1.96×0.127) = 0.48 × ÷ 1.28 = (0.37, 0.61)
Case-control studies (C&H 16) 23/ 59
Case-control studies (C&H 16) 24/ 59
× ÷ exp(1.96×0.142) = 0.51 × ÷ 1.32 = (0.39, 0.68)
Case-control studies (C&H 16) 25/ 59
Level of Pulmonary Other Case/ OR exertion in consumption diseases control relative
(Cases) (Controls) ratio to (3) Little (0) 125 385 0.325 1.643 Varied (1) 41 136 0.301 1.526 More (2) 142 630 0.225 1.141 Great (3) 33 167 0.198 1.000
Case-control studies (C&H 16) 26/ 59
◮ Retrospective: Four possible outcomes
◮ Prospective: Two possible outcomes
◮ But the probability model is still a binary
◮ Prospective argument applicable in deriving a
Case-control studies (C&H 16) 27/ 59
◮ If the disease probability, π, in the study period
◮ For small π, 1 − π ≈ 1, so:
Case-control studies (C&H 16) 28/ 59
◮ no censorings. ◮ no delayed entries.
Case-control studies (C&H 16) 29/ 59
◮ Can be achieved simultaneously with small π
◮ Subdivide calendar time in small time bands. ◮ New case-control study in each time band. ◮ Only one case in each time band. ◮ No delayed entry or censoring.
◮ If the fraction of exposed does not vary much
◮ This is effectively matching on calendar time.
Case-control studies (C&H 16) 30/ 59
Case-control studies (C&H 16) 31/ 59
◮ Study base = “large” cohort ◮ Expensive to get covariate information for all
◮ Covariate information only for cases and time
◮ To each case, choose one or more (usually
Case-control studies (C&H 16) 32/ 59
D1 + 1 H1 + 1 D0 + 1 H0 =
D1 + 1 D1 + 1 D0 + 1 D0 = 1 D1 + 1 D0
Case-control studies (C&H 16) 33/ 59
Twice as many controls as cases:
D1 + 1 H1 + 1 D0 + 1 H0 =
D1 + 1 2D1 + 1 D0 + 1 2D0 = 1 D1 + 1 D0
m times as many cases as controls:
D1 + 1 H1 + 1 D0 + 1 H0 = 1 D1 + 1 D0
Case-control studies (C&H 16) 34/ 59
◮ The standard deviation of the log[OR] is
◮ Therefore, 5 controls per case is normally
◮ But if cases and controls cost the same — and
Case-control studies (C&H 16) 35/ 59
Steno Diabetes Center
& Department of Biostatistics, University of Copenhagen bxc@steno.dk http://BendixCarstensen.com
◮ Display manager (programming):
◮ program, log, output windows ◮ reproducible ◮ easy to document
◮ SAS ANALYST
◮ menu-oriented interface ◮ writes and runs programs for you ◮ no learning by heart, no syntax errors ◮ not every thing is included ◮ it is heavy to use in the long run SAS-intro () 37/ 59
OBS SEX OBESE BP 1 male 1.31 130 2 male 1.31 148 3 male 1.19 146 4 male 1.11 122 . . . . . . . . . . . . 101 female 1.64 136 102 female 1.73 208
SAS-intro () 38/ 59
◮ SEX: Character variable ($) ◮ OBESE: weight/ideal weight ◮ BP: systolic blood pressure
SAS-intro () 39/ 59
data bp; filename bpfile url ’’http://www.biostat.ku.dk/~pka/epidata/bp.txt’’; infile bpfile firstobs=2; input sex $ obese bp; run; proc print data=bp; var sex obese bp; run;
SAS-intro () 40/ 59
◮ data-step:
data bp; ( reading ) ; ( data manipulations ) ; run;
◮ proc-step:
proc xx data=bp ; ( procedure statments ) ; run;
◮ NB: No data manipulations after run;
SAS-intro () 41/ 59
data bp; filename bpfile url "http://www.biostat.ku.dk/~pka/epidata/bp.txt"; infile bpfile firstobs=2; input sex obese bp; run; data bp; set bp; if bp<125 then highbp=0; if bp>=125 then highbp=1; /* an alternative way of creating the new variable highbp is: highbp = (bp>=125); */ run; proc freq data=bp; tables sex * highbp ; run;
SAS-intro () 42/ 59
data bp; filename bpfile url ’’http://www.biostat.ku.dk/~pka/epidata/bp.txt’’; infile bpfile firstobs=2; input sex obese bp; if bp < 125 then highbp=0; if bp >= 125 then highbp=1; /* an alternative way of creating the new variable highbp is: highbp = (bp>=125); */ run; proc freq data=bp; tables sex * highbp ; run;
SAS-intro () 43/ 59
◮ Program Editor window:
◮ Works like all other text editors: arrow keys,
backspace, delete etc.
◮ When the program is submitted (click on Submit
◮ Log-window:
◮ Here you can see how things went: ◮ how many observations you have, ◮ how many variables you have ◮ if there were any errors ◮ which pages were written by which procedures SAS-intro () 44/ 59
◮ Output-window (perhaps):
◮ In this window you will find the results (if there are
any)
◮ Graph-window (which we won’t use on this
◮ Here plots are stored in order SAS-intro () 45/ 59
◮ You can move between the windows by clicking
◮ F5 is editor window, ◮ F6 is log window, ◮ F7 is output window. SAS-intro () 46/ 59
◮ Go back to the Program-window ◮ The Log- Output- and Graph-windows
◮ Clear by choosing Clear under Edit (or press
◮ Don’t print! ◮ Remember to save the the program from time
SAS-intro () 47/ 59
Steno Diabetes Center
& Department of Biostatistics, University of Copenhagen bxc@steno.dk http://BendixCarstensen.com
Simple statistical models (Proportions and rates) 49/ 59
◮ SAS: proc genmod ◮ R: glm ◮ Stata: glm
Simple statistical models (Proportions and rates) 50/ 59
data p ; input x n ; datalines ; 4 10 ; run ; proc genmod data= p ; model x/n = / dist=bin link=logit ; estimate "4 versus 6" intercept 1 / exp ; run ; Standard Wald 95% Confidence Parameter DF Estimate Error Limits Intercept 1
0.6455
0.8597 Scale 1.0000 0.0000 1.0000 1.0000 Contrast Estimate Results L’Beta Standard L’Beta Chi- Label Estimate Error Confidence Limits Square 4 versus 6
0.6455
0.8597 0.39 Exp(4 versus 6) 0.6667 0.4303 0.1881 2.3624
Simple statistical models (Proportions and rates) 51/ 59
proc genmod data= p ; model x/n = / dist=bin link=log ; estimate "4 out of 10" intercept 1 / exp ; run ; Standard Wald 95% Confidence Parameter DF Estimate Error Limits Intercept 1
0.3873
Scale 1.0000 0.0000 1.0000 1.0000 Contrast Estimate Results L’Beta Standard L’Beta Chi- Label Estimate Error Confidence Limits Square 4 out of 10
0.3873
5.60 Exp(4 out of 10) 0.4000 0.1549 0.1872 0.8545
Simple statistical models (Proportions and rates) 52/ 59
data bissau; filename bisfile url "http://www.biostat.ku.dk/~pka/epidata/bissau.txt"; infile bisfile firstobs=2; input id fuptime dead bcg dtp age agem; run; title "Estimate odds - Bissau" ; proc genmod data=bissau descending ; model dead = / dist=bin link=logit ; estimate "odds of dying" intercept 1 / exp ; run ; Contrast Estimate Results L’Beta Standard L’Beta Label Estimate Error Confidence Limits Square
0.0686
2076.5 Exp(odds of dying) 0.0439 0.0030 0.0384 0.0503
Simple statistical models (Proportions and rates) 53/ 59
title "Estimate proportion - Bissau" ; proc genmod data=bissau descending ; model dead = / dist=bin link=log ; estimate "prob of dying" intercept 1 / exp ; run ; Contrast Estimate Results L’Beta Standard L’Beta Label Estimate Error Confidence Limits Square prob of dying
0.0657
2325.8 Exp(prob of dying) 0.0421 0.0028 0.0370 0.0479
Simple statistical models (Proportions and rates) 54/ 59
Simple statistical models (Proportions and rates) 55/ 59
data r ; input d y ; ly = log(y) ; my = log(y/1000) ; datalines ; 30 261.9 ; run ; title "Estimate a rate per 1 year" ; proc genmod data= r ; model d = / dist=poisson link=log offset=ly ; estimate "30 during 261.9 - per 1 year" intercept 1 / exp ; run ; Contrast Estimate Results L’Beta Standard L’Beta Label Estimate Error Confidence 30 during 261.9 - per 1 year
0.1826
Exp(30 during 261.9 - per 1 year) 0.1145 0.0209 0.0801
Simple statistical models (Proportions and rates) 56/ 59
title "Estimate a rate per 1000 year" ; proc genmod data= r ; model d = / dist=poisson link=log offset=my ; estimate "30 during 261.9 - per 1000 years" intercept 1 / exp run ; Contrast Estimate Results L’Beta Standard Label Estimate Error Alpha 30 during 261.9 - per 1000 years 4.7410 0.1826 0.05 Exp(30 during 261.9 - per 1000 years) 114.5475 20.9134 0.05
Simple statistical models (Proportions and rates) 57/ 59
data bissau ; set bissau ; ld = log(fuptime) ; ly = log(fuptime/36525) ; run ; title "Estimate a rate per 1 day" ; proc genmod data=bissau ; model dead = / dist=poisson link=log offset=ld ; estimate "mortality rate - per 1 day" intercept 1 / exp ; run ; Contrast Estimate L’Beta Standard Label Estimate Error Alpha Confidence mortality rate - per 1 day
0.0671 0.05
Exp(mortality rate - per 1 day) 0.0003 0.0000 0.05
Simple statistical models (Proportions and rates) 58/ 59
title "Estimate a rate per 1 year" ; proc genmod data=bissau ; model dead = / dist=poisson link=log offset=ly ; estimate "mortality rate - per 100 years" intercept 1 / exp ; run ; Contrast Estimate Results L’Beta Standard Label Estimate Error Alpha mortality rate - per 100 years 2.2205 0.0671 0.05 Exp(mortality rate - per 100 years) 9.2123 0.6183 0.05
Simple statistical models (Proportions and rates) 59/ 59