Epidemiology for PhD students Case-control studies SAS-intro - - PDF document

epidemiology for phd students
SMART_READER_LITE
LIVE PREVIEW

Epidemiology for PhD students Case-control studies SAS-intro - - PDF document

Epidemiology for PhD students Case-control studies SAS-intro Bendix Carstensen Steno Diabetes Center Copenhagen Gentofte, Denmark http://BendixCarstensen.com/EpiPhD/F2017 Department of Biostatistics, University of Copenhagen, Spring 2017


slide-1
SLIDE 1

Case-control studies SAS-intro

Epidemiology for PhD students

Bendix Carstensen Steno Diabetes Center Copenhagen Gentofte, Denmark http://BendixCarstensen.com/EpiPhD/F2017 Department of Biostatistics, University of Copenhagen, Spring 2017 http://BendixCarstensen.com/EpiPhD/F2017

From /home/bendix/teach/Epi/KU-epi/slides/slides.tex Monday 29th January, 2018, 16:34 1/ 43

Case-control studies

Tuesday 30 January 2018

Epidemiology for PhD students Department of Biostatistics, University of Copenhagen, Spring 2017 http://BendixCarstensen.com/EpiPhD/F2017

cc-lik

Epidemiology for PhD students Bendix Carstensen Case-control studies SAS-intro

Relationship between follow–up studies and case–control studies

In a cohort study, the relationship between exposure and disease incidence is investigated by following the entire cohort and measuring the rate of occurrence of new cases in the different exposure groups. The follow–up allows the investigator to register those subjects who develop the disease during the study period and to identify those who remain free of the disease.

Case-control studies (cc-lik) 2/ 43

slide-2
SLIDE 2

Epidemiology for PhD students Bendix Carstensen Case-control studies SAS-intro

Case-control study

In a case-control study the subjects who develop the disease (the cases) are registered by some other mechanism than follow-up, and a group of healthy subjects (the controls) is used to represent the subjects who do not develop the disease.

Case-control studies (cc-lik) 3/ 43 Epidemiology for PhD students Bendix Carstensen Case-control studies SAS-intro

Rationale behind case-control studies

◮ In a follow-up study, rates among exposed and non-exposed

are estimated by: D1 Y1 D0 Y0

◮ and hence the rate ratio by:

D1 Y1 D0 Y0 = D1 D0 Y1 Y0

Case-control studies (cc-lik) 4/ 43 Epidemiology for PhD students Bendix Carstensen Case-control studies SAS-intro

◮ In a case-control study we use the same cases, but select

controls to represent the distribution of risk time between exposed and unexposed: H1 H0 ≈ Y1 Y0

◮ Therefore the rate ratio is estimated by:

D1 D0 H1 H0

◮ Controls represent risk time, not disease-free persons.

Case-control studies (cc-lik) 5/ 43

slide-3
SLIDE 3

Epidemiology for PhD students Bendix Carstensen Case-control studies SAS-intro

Choice of controls (I)

s

Failures Healthy study period The period over which failures are registered as cases is called the study period. A group of subjects who remain healthy over the study period is chosen to represent the healthy part of the source population. — but this is an oversimplification. . .

Case-control studies (cc-lik) 6/ 43 Epidemiology for PhD students Bendix Carstensen Case-control studies SAS-intro

What about censoring and late entry?

s

Failures Healthy Censored Late entry study period Choosing controls which remains healthy throughout takes no account of censoring or late entry. Instead, choose controls who are in the study and healthy, at the times the cases are registered.

Case-control studies (cc-lik) 7/ 43 Epidemiology for PhD students Bendix Carstensen Case-control studies SAS-intro

Choice of controls (II)

s

Failures Healthy Censored Late entry study period This is called incidence density sampling. Subjects can be chosen as controls more than once, and a subject who is chosen as a control can later become a case. Equivalent to sampling observation time from vertical bands drawn to enclose each case.

Case-control studies (cc-lik) 8/ 43

slide-4
SLIDE 4

Epidemiology for PhD students Bendix Carstensen Case-control studies SAS-intro

Case-control probability tree

Exposure

❅ ❅ ❅ ❅ p 1 − p

Failure E1 E0

✑✑✑ ◗◗◗ π1 1 − π1 ✑✑✑ ◗◗◗ π0 1 − π0

Selection F S F S

✟✟✟ ❍❍❍ 0.97 0.03 ✟✟✟ ❍❍❍ 0.01 0.99 ✟✟✟ ❍❍❍ 0.97 0.03 ✟✟✟ ❍❍❍ 0.01 0.99

Case (D1) Control (H1) Case (D0) Control (H0) pπ1 × 0.97 p(1 − π1) × 0.01 (1 − p)π0 × 0.97 (1 − p)(1 − π0) × 0.01 Probability

Case-control studies (cc-lik) 9/ 43 Epidemiology for PhD students Bendix Carstensen Case-control studies SAS-intro

Prospective analysis of case-control studies

Compare the case/control ratio between exposed and non-exposed subjects — or more general: How does case-control ratio vary with exposure ? The point is that in the study it varies in the same way as in the population.

Case-control studies (cc-lik) 10/ 43 Epidemiology for PhD students Bendix Carstensen Case-control studies SAS-intro

The prospective argument

Selection Exposure Failure Probability

❅ ❅ ❅ ❅

Not in study

❅ ❅ ❅ p 1 − p

E1 E0

✟✟✟✟ ❍❍❍❍ π1 1 − π1 ✟✟✟✟ ❍❍❍❍ π0 1 − π0

F S F S p × π1 × 0.97 p × (1 − π1) × 0.01 (1 − p) × π0 × 0.97 (1 − p) × (1 − π0) × 0.01

Case-control studies (cc-lik) 11/ 43

slide-5
SLIDE 5

Epidemiology for PhD students Bendix Carstensen Case-control studies SAS-intro

Odds of disease = P {Case given inclusion} P {Control given inclusion} ω1 = p × π1 × 0.97 p × (1 − π1) × 0.01 = 0.97 0.01 × π1 1 − π1 ω0 = (1 − p) × π0 × 0.97 (1 − p) × (1 − π0) × 0.01 = 0.97 0.01 × π0 1 − π0 OR = ω1 ω0 = π1 1 − π1

  • π0

1 − π0 = OR(disease)population

Case-control studies (cc-lik) 12/ 43 Epidemiology for PhD students Bendix Carstensen Case-control studies SAS-intro

What is the case-control ratio?

D1 H1 = 0.97 0.01 × π1 1 − π1 = s1,cas s1,ctr × π1 1 − π1

  • D0

H0 = 0.97 0.01 × π0 1 − π0 = s0,cas s0,ctr × π0 1 − π0

  • D1/H1

D0/H0 = π1/(1 − π1) π0/(1 − π0) = ORpopulation — but only if the sampling fractions are identical: s1,cas = s0,cas and s1,ctr = s0,ctr.

Case-control studies (cc-lik) 13/ 43 Epidemiology for PhD students Bendix Carstensen Case-control studies SAS-intro

Log-likelihood for case-control studies

Log-Likelihood (conditional on being included) is a binomial likelihood with odds-parameters ω0 and ω1 D0log(ω0) − N0log(1 + ω0) + D1log(ω1) − N1log(1 + ω1) where N0 = D0 + H0 and N1 = D1 + H1. Exposed: D1 cases, H1 controls Unexposed: D0 cases, H0 controls

Case-control studies (cc-lik) 14/ 43

slide-6
SLIDE 6

Epidemiology for PhD students Bendix Carstensen Case-control studies SAS-intro

Odds-ratio (θ) is the ratio of the odds ω1 to ω0, so: log(θ) = log ω1 ω0

  • = log(ω1) − log(ω0)

Estimates of log(ω1) and log(ω0) are just the empirical odds: log D1 H1

  • and

log D0 H0

  • Case-control studies (cc-lik)

15/ 43 Epidemiology for PhD students Bendix Carstensen Case-control studies SAS-intro

The standard errors of the odds are estimated by:

  • 1

D1 + 1 H1 and

  • 1

D0 + 1 H0 Exposed and unexposed form two independent bodies of data (they are sampled independently), so the estimate of log(θ) [= log(OR)] is: log D1 H1

  • − log

D0 H0

  • ,

with s.e.

  • log(OR)
  • =
  • 1

D1 + 1 H1 + 1 D0 + 1 H0

Case-control studies (cc-lik) 16/ 43 Epidemiology for PhD students Bendix Carstensen Case-control studies SAS-intro

Confidence interval for OR

First a confidence interval for log(OR): log(OR) ± 1.96 ×

  • 1

D1 + 1 H1 + 1 D0 + 1 H0 Take the exponential: OR

× ÷ exp

  • 1.96 ×
  • 1

D1 + 1 H1 + 1 D0 + 1 H0

  • error factor

Case-control studies (cc-lik) 17/ 43

slide-7
SLIDE 7

Epidemiology for PhD students Bendix Carstensen Case-control studies SAS-intro

BCG vaccination and leprosy

Does BCG vaccination in early childhood protect against leprosy? New cases of leprosy were examined for presence or absence of the BCG scar. During the same period, a 100% survey of the population of this area, which included examination for BCG scar, had been carried out. The tabulated data refer only to subjects under 35, because vaccination was not widely available when older persons were children.

Case-control studies (cc-lik) 18/ 43 Epidemiology for PhD students Bendix Carstensen Case-control studies SAS-intro

Exercise I

BCG scar Leprosy cases Population survey Present 101 46 028 Absent 159 34 594 Estimate the odds of BCG vaccination for leprosy cases and for the controls. Estimate the odds ratio and hence the extent of protection against leprosy afforded by vaccination. Give a 95% c.i. for the OR. Use SAS for this: Exercise from the notes.

Case-control studies (cc-lik) 19/ 43 Epidemiology for PhD students Bendix Carstensen Case-control studies SAS-intro

Solution to I

OR = D1/H1 D0/H0 = 101/46028 159/34594 = 0.002194 0.004596 = 0.48 s.e.(log[OR]) =

  • 1

D1 + 1 H1 + 1 D0 + 1 H0

=

  • 1

101 + 1 46028 + 1 159 + 1 34594 = 0.127

The 95% limits for the odds-ratio are: OR

× ÷ exp(1.96 × 0.127) = 0.48 × ÷ 1.28 = (0.37, 0.61)

Case-control studies (cc-lik) 20/ 43

slide-8
SLIDE 8

Epidemiology for PhD students Bendix Carstensen Case-control studies SAS-intro

Exercise II

BCG scar Leprosy cases Population controls Present 101 554 Absent 159 446 The table shows the results of a computer-simulated study which picked 1000 controls at random. What is the odds ratio estimate in this study? Give a 95% c.i. for the OR. Use SAS for this: Exercise from the notes.

Case-control studies (cc-lik) 21/ 43 Epidemiology for PhD students Bendix Carstensen Case-control studies SAS-intro

Solution to II

OR = D1/H1 D0/H0 = 101/554 159/446 = 0.1823 0.3565 = 0.51 s.e.(log[OR]) =

  • 1

D1 + 1 H1 + 1 D0 + 1 H0 =

  • 1

101 + 1 554 + 1 159 + 1 446 = 0.142 The 95% limits for the odds-ratio are: OR

× ÷ exp(1.96 × 0.142) = 0.51 × ÷ 1.32 = (0.39, 0.68)

Case-control studies (cc-lik) 22/ 43 Epidemiology for PhD students Bendix Carstensen Case-control studies SAS-intro

More levels of exposure (William Guy)

Physical exertion at work of 1659 outpatients: 341 pulmonary consumption, 1318 other diseases.

Level of Pulmonary Other Case/ OR exertion in consumption diseases control relative

  • ccupation

(Cases) (Controls) ratio to (3) Little (0) 125 385 0.325 1.643 Varied (1) 41 136 0.301 1.526 More (2) 142 630 0.225 1.141 Great (3) 33 167 0.198 1.000

The relationship of case-control ratios is what matters.

Case-control studies (cc-lik) 23/ 43

slide-9
SLIDE 9

Epidemiology for PhD students Bendix Carstensen Case-control studies SAS-intro

The retro/prospective argument

◮ Retrospective: Four possible outcomes

(little/varied/more/great),

◮ Prospective: Two possible outcomes (case/control), but a

large number of comparisons (between any two exposure levels).

◮ But the probability model is still a binary model, and the

argument for the analysis is still the same as before.

◮ Prospective argument applicable in deriving a logistic

regression model for case-control studies.

Case-control studies (cc-lik) 24/ 43 Epidemiology for PhD students Bendix Carstensen Case-control studies SAS-intro

Odds-ratio and rate ratio

◮ If the disease probability, π, in the study period is small:

π = cumulative risik ≈ cumulative rate = λT

◮ For small π, 1 − π ≈ 1, so:

OR = π1/(1 − π1) π0/(1 − π0) ≈ π1 π0 ≈ λ1 λ0 = RR π small ⇒ OR estimate of RR.

Case-control studies (cc-lik) 25/ 43 Epidemiology for PhD students Bendix Carstensen Case-control studies SAS-intro

Important assumption behind rate ratio interpretation

The entire“study base”must have been available throughout:

◮ no censorings. ◮ no delayed entries.

This will clearly not always be the case, but it may be achieved in carefully designed studies.

Case-control studies (cc-lik) 26/ 43

slide-10
SLIDE 10

Epidemiology for PhD students Bendix Carstensen Case-control studies SAS-intro

Avoiding censoring and delayed entry

◮ Can be achieved simultaneously with small π by incidence

density sampling:

◮ Subdivide calendar time in small time bands. ◮ New case-control study in each time band. ◮ Only one case in each time band. ◮ No delayed entry or censoring.

◮ If the fraction of exposed does not vary much over time, all

the small studies can be analysed together as one.

◮ This is effectively matching on calendar time.

Case-control studies (cc-lik) 27/ 43 Epidemiology for PhD students Bendix Carstensen Case-control studies SAS-intro

The rare disease assumption

Necessary to make the approximation: π1/(1 − π1) π0/(1 − π0) ≈ π1 π0 This is more appropriately termed: “The short study duration assumption” — each of the small studies we imagine as components of the entire study should be sufficiently short in relation to disease

  • ccurrence, so that the π (disease probability) if small.

Case-control studies (cc-lik) 28/ 43 Epidemiology for PhD students Bendix Carstensen Case-control studies SAS-intro

Nested case-control studies

◮ Study base =“large”cohort ◮ Expensive to get covariate information for all persons.

(expensive analyses, tracing of histories,. . . )

◮ Covariate information only for cases and time matched

controls:

◮ To each case, choose one or more (usually ≤ 5) controls

from the risk set.

Case-control studies (cc-lik) 29/ 43

slide-11
SLIDE 11

Epidemiology for PhD students Bendix Carstensen Case-control studies SAS-intro

How many controls per case?

The standard deviation of log(OR): Equal number of cases and controls:

  • 1

D1 + 1 H1 + 1 D0 + 1 H0 =

  • 1

D1 + 1 D1 + 1 D0 + 1 D0 = 1 D1 + 1 D0

  • × (1 + 1)

Case-control studies (cc-lik) 30/ 43 Epidemiology for PhD students Bendix Carstensen Case-control studies SAS-intro

Twice as many controls as cases:

  • 1

D1 + 1 H1 + 1 D0 + 1 H0 =

  • 1

D1 + 1 2D1 + 1 D0 + 1 2D0 = 1 D1 + 1 D0

  • × (1 + 1/2)

m times as many cases as controls:

  • 1

D1 + 1 H1 + 1 D0 + 1 H0 = 1 D1 + 1 D0

  • × (1 + 1/m)

Case-control studies (cc-lik) 31/ 43 Epidemiology for PhD students Bendix Carstensen Case-control studies SAS-intro

How many controls per case?

◮ The standard deviation of the log[OR] is

  • 1 + 1

m times larger in a case-control study, compared to the corresponding cohort-study.

◮ Therefore, 5 controls per case is normally sufficient. (Only

relevant if controls are“cheap”compared to cases).

◮ But if cases and controls cost the same — and are available

— the most efficient is to have the same number of cases and controls.

Case-control studies (cc-lik) 32/ 43

slide-12
SLIDE 12

SAS-intro

Tuesday 30 January, 2018

Epidemiology for PhD students Department of Biostatistics, University of Copenhagen, Spring 2017 http://BendixCarstensen.com/EpiPhD/F2017

SAS-intro

Epidemiology for PhD students Bendix Carstensen Case-control studies SAS-intro

SAS

◮ Display manager (programming):

◮ program, log, output windows ◮ reproducible ◮ easy to document

◮ SAS ANALYST

◮ menu-oriented interface ◮ writes and runs programs for you ◮ no learning by heart, no syntax errors ◮ not every thing is included ◮ it is heavy to use in the long run SAS-intro (SAS-intro) 33/ 43 Epidemiology for PhD students Bendix Carstensen Case-control studies SAS-intro

Data set example:

Blood pressure and obesity OBESE: weight/ideal weight BP: systolic blood pressure

OBS SEX OBESE BP 1 male 1.31 130 2 male 1.31 148 3 male 1.19 146 4 male 1.11 122 . . . . . . . . . . . . 101 female 1.64 136 102 female 1.73 208

SAS-intro (SAS-intro) 34/ 43

slide-13
SLIDE 13

Epidemiology for PhD students Bendix Carstensen Case-control studies SAS-intro

Data

Data are in the text file BP.TXT located at www.biostat.ku.dk/~pka/epidata and contains the following variables:

◮ SEX: Character variable ($) ◮ OBESE: weight/ideal weight ◮ BP: systolic blood pressure

3 variables and 102 observations

SAS-intro (SAS-intro) 35/ 43 Epidemiology for PhD students Bendix Carstensen Case-control studies SAS-intro

Printing in SAS

We read the file bp.txt directly from www and skip the first line containing variable names (firstobs=2).

data bp; filename bpfile url ''http://www.biostat.ku.dk/~pka/epidata/bp.txt''; infile bpfile firstobs=2; input sex $ obese bp; run; proc print data=bp; var sex obese bp; run;

A temporary data set bp which only exists within the current

  • program. (Permanent data sets may be saved but we will not use

this feature in this course.)

SAS-intro (SAS-intro) 36/ 43 Epidemiology for PhD students Bendix Carstensen Case-control studies SAS-intro

SAS programming

◮ data-step:

data bp; ( reading ) ; ( data manipulations ) ; run;

◮ proc-step:

proc xx data=bp ; ( procedure statments ) ; run;

◮ NB: No data manipulations after run;

— only if we make a new data-step. — better to revise the first data-step.

SAS-intro (SAS-intro) 37/ 43

slide-14
SLIDE 14

Epidemiology for PhD students Bendix Carstensen Case-control studies SAS-intro

Example

data bp; filename bpfile url "http://www.biostat.ku.dk/~pka/epidata/bp.txt"; infile bpfile firstobs=2; input sex obese bp; run; data bp; set bp; if bp<125 then highbp=0; if bp>=125 then highbp=1; /* an alternative way of creating the new variable highbp is: highbp = (bp>=125); */ run; proc freq data=bp; tables sex * highbp ; run;

SAS-intro (SAS-intro) 38/ 43 Epidemiology for PhD students Bendix Carstensen Case-control studies SAS-intro

Example, simplfied

data bp; filename bpfile url ''http://www.biostat.ku.dk/~pka/epidata/bp.txt''; infile bpfile firstobs=2; input sex obese bp; if bp < 125 then highbp=0; if bp >= 125 then highbp=1; /* an alternative way of creating the new variable highbp is: highbp = (bp>=125); */ run; proc freq data=bp; tables sex * highbp ; run;

SAS-intro (SAS-intro) 39/ 43 Epidemiology for PhD students Bendix Carstensen Case-control studies SAS-intro

Typing of programs is done in the

◮ Program Editor window:

◮ Works like all other text editors: arrow keys, backspace, delete etc. ◮ When the program is submitted (click on Submit or press F3), the

results are in the

◮ Log-window:

◮ Here you can see how things went: ◮ how many observations you have, ◮ how many variables you have ◮ if there were any errors ◮ which pages were written by which procedures SAS-intro (SAS-intro) 40/ 43

slide-15
SLIDE 15

Epidemiology for PhD students Bendix Carstensen Case-control studies SAS-intro

◮ Output-window (perhaps):

◮ In this window you will find the results (if there are any)

◮ Graph-window (which we won’t use on this course)

◮ Here plots are stored in order SAS-intro (SAS-intro) 41/ 43 Epidemiology for PhD students Bendix Carstensen Case-control studies SAS-intro

Making life simpler

◮ You can move between the windows by clicking Windows in

the command bar, or use that:

◮ F5 is editor window, ◮ F6 is log window, ◮ F7 is output window. SAS-intro (SAS-intro) 42/ 43 Epidemiology for PhD students Bendix Carstensen Case-control studies SAS-intro

Modifications in the program

When the program has been executed and you want to make changes:

◮ Go back to the Program-window ◮ The Log- Output- and Graph-windows cumulate, that is

  • utput is stored consecutively

◮ Clear by choosing Clear under Edit (or press Ctrl-E - for

“erase” )

◮ Don’t print! ◮ Remember to save the the program from time to time before

SAS crashes!

SAS-intro (SAS-intro) 43/ 43