Modelling Rates Mark Lunt Centre for Epidemiology Versus Arthritis - - PowerPoint PPT Presentation

modelling rates
SMART_READER_LITE
LIVE PREVIEW

Modelling Rates Mark Lunt Centre for Epidemiology Versus Arthritis - - PowerPoint PPT Presentation

Introduction Poisson Regression Negative Binomial Regression Additional topics Modelling Rates Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester 15/12/2020 Introduction Poisson Regression Negative Binomial


slide-1
SLIDE 1

Introduction Poisson Regression Negative Binomial Regression Additional topics

Modelling Rates

Mark Lunt

Centre for Epidemiology Versus Arthritis University of Manchester

15/12/2020

slide-2
SLIDE 2

Introduction Poisson Regression Negative Binomial Regression Additional topics

Modelling Rates

Can model prevalence (proportion) with logistic regression Cannot model incidence in this way Need to allow for time at risk (exposure) Exposure often measured in person-years Model a rate (incidents per unit time)

slide-3
SLIDE 3

Introduction Poisson Regression Negative Binomial Regression Additional topics

Assumptions

There is a rate at which events occur This rate may depend on covariates Rate must be ≥ 0 Expected number of events = rate × exposure Events are independent Then the number of events observed will follow a Poisson distribution

slide-4
SLIDE 4

Introduction Poisson Regression Negative Binomial Regression Additional topics Introduction Example Goodness of Fit Constraints Other considerations

Poisson Regression

Negative numbers of events are meaningless Model log(rate), so that rate can range from 0 → ∞ rate = r (events per unit exposure) Count = C (Number of events) ExposureTime = T C ∼ poisson(rT) E[C] = rT

slide-5
SLIDE 5

Introduction Poisson Regression Negative Binomial Regression Additional topics Introduction Example Goodness of Fit Constraints Other considerations

The Poisson Regression Model

log(ˆ r) = β0 + β1x1 + . . . + βpxp ˆ r = eβ0+β1x1+...+βpxp E[C] = Tr = T × eβ0+β1x1+...+βpxp = elog(T)+β0+β1x1+...+βpxp log(E[C]) = log(T) + β0 + β1x1 + . . . + βpxp

slide-6
SLIDE 6

Introduction Poisson Regression Negative Binomial Regression Additional topics Introduction Example Goodness of Fit Constraints Other considerations

Parameter Interpretation

When xi increases by 1, log(r) increases by βi Therefore, r is multiplied by eβi As with logistic regression, coefficients are less interesting than their exponents eβ is the Incidence Rate Ratio

slide-7
SLIDE 7

Introduction Poisson Regression Negative Binomial Regression Additional topics Introduction Example Goodness of Fit Constraints Other considerations

Poisson Regression in Stata

Command poisson will do Poisson regression Enter the exposure with the option exposure(varname) Can also use offset(lvarname), where lvarname is the log of the exposure To obtain Incidence Rate Ratios, use the option irr

slide-8
SLIDE 8

Introduction Poisson Regression Negative Binomial Regression Additional topics Introduction Example Goodness of Fit Constraints Other considerations

Poisson Regression Example: Doctor’s Study

Smokers Non-smokers Age Deaths Person-Years Deaths Person-Years 35–44 32 52,407 2 18,790 45–54 104 43,248 12 10,673 55–64 206 28,612 28 5,710 65–74 186 12,663 28 2,585 75–84 102 5,317 31 1,462

slide-9
SLIDE 9

Introduction Poisson Regression Negative Binomial Regression Additional topics Introduction Example Goodness of Fit Constraints Other considerations . poisson deaths i.agecat i.smokes, exp(pyears) irr Poisson regression Number of obs = 10 LR chi2(5) = 922.93 Prob > chi2 = 0.0000 Log likelihood = -33.600153 Pseudo R2 = 0.9321

  • deaths |

IRR

  • Std. Err.

z P>|z| [95% Conf. Interval]

  • ------------+----------------------------------------------------------------

agecat | 45-54 | 4.410584 .8605197 7.61 0.000 3.009011 6.464997 55-64 | 13.8392 2.542638 14.30 0.000 9.654328 19.83809 65-74 | 28.51678 5.269878 18.13 0.000 19.85177 40.96395 75-84 | 40.45121 7.775511 19.25 0.000 27.75326 58.95885 | smokes | Yes | 1.425519 .1530638 3.30 0.001 1.154984 1.759421 _cons | .0003636 .0000697

  • 41.30

0.000 .0002497 .0005296 ln(pyears) | 1 (exposure)

  • .
slide-10
SLIDE 10

Introduction Poisson Regression Negative Binomial Regression Additional topics Introduction Example Goodness of Fit Constraints Other considerations

Using predict after poisson

Options available: n (default) expected number of events (rate × duration of exposure) ir incidence rate xb linear predictor

slide-11
SLIDE 11

Introduction Poisson Regression Negative Binomial Regression Additional topics Introduction Example Goodness of Fit Constraints Other considerations

Example: predict

predict pred_n Smokers Non-smokers Age Deaths pred_n Deaths pred_n 35–44 32 27.2 2 6.8 45–54 104 98.9 12 17.1 55–64 206 205.3 28 28.7 65–74 186 187.2 28 26.8 75–84 102 111.5 31 21.5

slide-12
SLIDE 12

Introduction Poisson Regression Negative Binomial Regression Additional topics Introduction Example Goodness of Fit Constraints Other considerations

Goodness of Fit

Command estat gof compares observed and expected (from model) counts Can detect whether the Poisson model is reasonable If not could be due to

Systematic part of model poorly specified Random variation not really Poisson

Degrees of freedom for test = number of categories of

  • bservations - number of coefficients in model (including

_cons)

slide-13
SLIDE 13

Introduction Poisson Regression Negative Binomial Regression Additional topics Introduction Example Goodness of Fit Constraints Other considerations

Goodness of Fit Example

. estat gof Deviance goodness-of-fit = 12.13244 Prob > chi2(4) = 0.0164 Pearson goodness-of-fit = 11.15533 Prob > chi2(4) = 0.0249

slide-14
SLIDE 14

Introduction Poisson Regression Negative Binomial Regression Additional topics Introduction Example Goodness of Fit Constraints Other considerations

Improving the fit of the model

If the model fit is poor, it can be improved by:

Allowing for non-linearity of associations Introducing interaction terms Including other variables

slide-15
SLIDE 15

Introduction Poisson Regression Negative Binomial Regression Additional topics Introduction Example Goodness of Fit Constraints Other considerations

Example: Improving fit of the model

. poisson deaths i.agecat##i.smokes, exp(pyears) irr Poisson regression Number of obs = 10 LR chi2(9) = 935.07 Prob > chi2 = 0.0000 Log likelihood =

  • 27.53397

Pseudo R2 = 0.9444

  • deaths |

IRR

  • Std. Err.

z P>|z| [95% Conf. Interval]

  • -------------+----------------------------------------------------------------

agecat | 45-54 | 10.5631 8.067701 3.09 0.002 2.364153 47.19623 55-64 | 46.07004 33.71981 5.23 0.000 10.97496 193.3901 65-74 | 101.764 74.48361 6.32 0.000 24.24256 427.1789 75-84 | 199.2099 145.3356 7.26 0.000 47.67693 832.3648 | smokes | Yes | 5.736637 4.181256 2.40 0.017 1.374811 23.93711 | agecat#smokes | 45-54#Yes | .3728337 .2945619

  • 1.25

0.212 .0792525 1.753951 55-64#Yes | .2559409 .1935392

  • 1.80

0.072 .0581396 1.126697 65-74#Yes | .2363859 .1788334

  • 1.91

0.057 .0536612 1.041316 75-84#Yes | .1577109 .1194146

  • 2.44

0.015 .0357565 .6956154 | _cons | .0001064 .0000753

  • 12.94

0.000 .0000266 .0004256 ln(pyears) | 1 (exposure)

slide-16
SLIDE 16

Introduction Poisson Regression Negative Binomial Regression Additional topics Introduction Example Goodness of Fit Constraints Other considerations . testparm i.agecat#i.smokes chi2( 4) = 10.20 Prob > chi2 = 0.0372 . lincom 1.smokes + 5.age#1.smokes, eform ( 1) [deaths]1.smokes + [deaths]5.agecat#1.smokes = 0

  • deaths |

exp(b)

  • Std. Err.

z P>|z| [95% Conf. Interval]

  • ------------+----------------------------------------------------------------

(1) | .9047304 .1855513

  • 0.49

0.625 .6052658 1.35236

  • . estat gof

Deviance goodness-of-fit = .0000694 Prob > chi2(0) = . Pearson goodness-of-fit = 1.14e-13 Prob > chi2(0) = .

slide-17
SLIDE 17

Introduction Poisson Regression Negative Binomial Regression Additional topics Introduction Example Goodness of Fit Constraints Other considerations

Constraints

Can force parameters to be equal to each other or specified value Can be useful in reducing the number of parameters in a model Simplifies description of model Enables goodness of fit test Syntax: constraint define n varname = expression

slide-18
SLIDE 18

Introduction Poisson Regression Negative Binomial Regression Additional topics Introduction Example Goodness of Fit Constraints Other considerations

Constraint Example

. constraint define 1 3.agecat#1.smokes = 4.agecat#1.smokes . poisson deaths i.agecat##i.smokes, exp(pyears) irr constr(1) Poisson regression Number of obs = 10 Wald chi2(8) = 632.14 Log likelihood = -27.572645 Prob > chi2 = 0.0000 ( 1) [deaths]3.agecat#1.smokes - [deaths]4.agecat#1.smokes = 0

  • deaths |

IRR

  • Std. Err.

z P>|z| [95% Conf. Interval]

  • -------------+----------------------------------------------------------------

agecat | 45-54 | 10.5631 8.067701 3.09 0.002 2.364153 47.19623 55-64 | 47.671 34.37409 5.36 0.000 11.60056 195.8978 65-74 | 98.22765 70.85012 6.36 0.000 23.89324 403.8244 75-84 | 199.2099 145.3356 7.26 0.000 47.67693 832.3648 | smokes | Yes | 5.736637 4.181256 2.40 0.017 1.374811 23.93711 | agecat#smokes | 45-54#Yes | .3728337 .2945619

  • 1.25

0.212 .0792525 1.753951 55-64#Yes | .2461772 .182845

  • 1.89

0.059 .0574155 1.055521 65-74#Yes | .2461772 .182845

  • 1.89

0.059 .0574155 1.055521 75-84#Yes | .1577109 .1194146

  • 2.44

0.015 .0357565 .6956154 | _cons | .0001064 .0000753

  • 12.94

0.000 .0000266 .0004256 ln(pyears) | 1 (exposure)

slide-19
SLIDE 19

Introduction Poisson Regression Negative Binomial Regression Additional topics Introduction Example Goodness of Fit Constraints Other considerations

Constraint Example Cont.

. estat gof Deviance goodness-of-fit = .0774185 Prob > chi2(1) = 0.7808 Pearson goodness-of-fit = .0773882 Prob > chi2(1) = 0.7809

slide-20
SLIDE 20

Introduction Poisson Regression Negative Binomial Regression Additional topics Introduction Example Goodness of Fit Constraints Other considerations

Predicted Numbers from Poisson Regression Model

Smokers Non-smokers Age Observed Pred 1 Pred 2 Observed Pred 1 Pred 2 35–44 32 27.2 32.0 2 6.8 2.0 45–54 104 98.9 104.0 12 17.1 12.0 55–64 206 205.3 205.0 28 28.7 29.0 65–74 186 187.2 187.0 28 26.8 27.0 75–84 102 111.5 102.0 31 21.5 31.0

Pred 1 No Interaction Pred 2 Interaction & Constraint

slide-21
SLIDE 21

Introduction Poisson Regression Negative Binomial Regression Additional topics Introduction Example Goodness of Fit Constraints Other considerations

Zeros

May be structural (Exposure = 0, so count had to be 0) Don’t count towards DOF Lead to problems in estimation

IRR is huge or tiny SE is huge Confidence interval is undefined

Stata may be unable to produce a confidence interval

slide-22
SLIDE 22

Introduction Poisson Regression Negative Binomial Regression Additional topics Introduction Example Goodness of Fit Constraints Other considerations

Overdispersion

Adding predictors to model may not lead to an adequate fit There may be variation between individuals in rate not included in model Variance is equal to mean for a Poisson distribution The variation between individuals means there is more variation than expected: overdispersion If there is overdispersion, standard errors will be too small

slide-23
SLIDE 23

Introduction Poisson Regression Negative Binomial Regression Additional topics

Negative Binomial Regression

Allows for extra variation Assumes a mixture of Poisson variables, with the means having a given distribution Two possible models:

Var(Y) = µ(1 + δ) Var(Y) = µ(1 + αµ)

α or δ is the overdispersion parameter α = 0 or δ = 0 gives the Poisson model.

slide-24
SLIDE 24

Introduction Poisson Regression Negative Binomial Regression Additional topics

Negative Binomial Regression in Stata

Command nbreg Syntax similar to poisson Default gives Var(Y) = µ(1 + αµ) Option dispersion(constant) gives Var(Y) = µ(1 + δ)

slide-25
SLIDE 25

Introduction Poisson Regression Negative Binomial Regression Additional topics

Negative Binomial Regression Example

. poisson deaths i.cohort, exposure(exposure) irr Poisson regression Number of obs = 21 LR chi2(2) = 49.16 Prob > chi2 = 0.0000 Log likelihood = -2159.5158 Pseudo R2 = 0.0113

  • deaths |

IRR

  • Std. Err.

z P>|z| [95% Conf. Interval]

  • ------------+----------------------------------------------------------------

cohort | 1960-1967 | .7393079 .0423859

  • 5.27

0.000 .6607305 .82723 1968-1976 | 1.077037 .0635156 1.26 0.208 .959474 1.209005 | _cons | .0202523 .0008331

  • 94.80

0.000 .0186836 .0219527 ln(exposure) | 1 (exposure)

  • . estat gof

Deviance goodness-of-fit = 4190.689 Prob > chi2(18) = 0.0000 Pearson goodness-of-fit = 15387.67 Prob > chi2(18) = 0.0000

slide-26
SLIDE 26

Introduction Poisson Regression Negative Binomial Regression Additional topics . nbreg deaths i.cohort, exposure(exposure) irr Negative binomial regression Number of obs = 21 LR chi2(2) = 0.40 Dispersion = mean Prob > chi2 = 0.8171 Log likelihood = -131.3799 Pseudo R2 = 0.0015

  • deaths |

IRR

  • Std. Err.

z P>|z| [95% Conf. Interval]

  • ------------+----------------------------------------------------------------

cohort | 1960-1967 | .7651995 .5537904

  • 0.37

0.712 .1852434 3.160869 1968-1976 | .6329298 .4580292

  • 0.63

0.527 .1532395 2.614209 | _cons | .1240922 .0635173

  • 4.08

0.000 .0455042 .3384052 ln(exposure) | 1 (exposure)

  • ------------+----------------------------------------------------------------

/lnalpha | .5939963 .2583615 .087617 1.100376

  • ------------+----------------------------------------------------------------

alpha | 1.811212 .4679475 1.09157 3.005294

  • Likelihood-ratio test of alpha=0:

chibar2(01) = 4056.27 Prob>=chibar2 = 0.000

slide-27
SLIDE 27

Introduction Poisson Regression Negative Binomial Regression Additional topics Log-linear Models Standardisation Generalized Linear Models Setting Reference Category for Categorical Variables

Log-Linear Models

An R × C table is simply a series of counts The counts have two predictor variables (rows and columns) Can fit a Poisson model to such a table Association between two variables is given by the interaction between the variables Model: log(p) = β0 + βrxr + βcxc + βrcxrc For a 2 × 2 table, such a model is exactly equivalent to logistic regression.

slide-28
SLIDE 28

Introduction Poisson Regression Negative Binomial Regression Additional topics Log-linear Models Standardisation Generalized Linear Models Setting Reference Category for Categorical Variables

Log-Linear Modelling Example

Outcome Exposure Exposed Unexposed Cases 20 10 Non-cases 10 20 OR = 4

slide-29
SLIDE 29

Introduction Poisson Regression Negative Binomial Regression Additional topics Log-linear Models Standardisation Generalized Linear Models Setting Reference Category for Categorical Variables

Log-linear modelling example: stata output

+---------------------------+ | outcome exposure freq | |---------------------------|

  • 1. |

20 |

  • 2. |

1 10 |

  • 3. |

1 10 |

  • 4. |

1 1 20 | +---------------------------+ . xi: poisson freq i.exp*i.out, irr Poisson regression Number of obs = 4 LR chi2(3) = 6.80 Prob > chi2 = 0.0787 Log likelihood = -8.9990653 Pseudo R2 = 0.2741

  • freq |

IRR

  • Std. Err.

z P>|z| [95% Conf. Interval]

  • ------------+----------------------------------------------------------------

_Iexposure_1 | .5 .1936492

  • 1.79

0.074 .2340459 1.068166 _Ioutcome_1 | .5 .1936492

  • 1.79

0.074 .2340459 1.068166 _IexpXout_~1 | 4 2.19089 2.53 0.011 1.367218 11.7026

  • . logistic outcome exposure [fw=freq]

Logistic regression Number of obs = 60 LR chi2(1) = 6.80 Prob > chi2 = 0.0091 Log likelihood =

  • 38.19085

Pseudo R2 = 0.0817

  • utcome | Odds Ratio
  • Std. Err.

z P>|z| [95% Conf. Interval]

  • ------------+----------------------------------------------------------------

exposure | 4 2.19089 2.53 0.011 1.367218 11.7026

slide-30
SLIDE 30

Introduction Poisson Regression Negative Binomial Regression Additional topics Log-linear Models Standardisation Generalized Linear Models Setting Reference Category for Categorical Variables

Direct & Indirect Standardisation

Used for comparing rates between populations Assumes covariates differ between populations What would rates be if the covariates were the same ?

I.e. same proportion of subjects in each stratum Proportions from standard population = direct standardisation Proportions from this population = indirect standardisation

slide-31
SLIDE 31

Introduction Poisson Regression Negative Binomial Regression Additional topics Log-linear Models Standardisation Generalized Linear Models Setting Reference Category for Categorical Variables

Direct Standardisation

Calculate rate in each stratum Standardised rate = weighted mean of these rates Weights = proportions of subjects in each stratum of standard population. Standardised rate = what rate would be in standard population if it had the same stratum specific rates as our population Different standard = different standardised rate Can compare directly adjusted rates (adjusted to same population)

slide-32
SLIDE 32

Introduction Poisson Regression Negative Binomial Regression Additional topics Log-linear Models Standardisation Generalized Linear Models Setting Reference Category for Categorical Variables

Indirect Standardisation

Per stratum rates are unavailable/unreliable Use known rates from a standard population Weight known rates according to stratum size our population Produce expected number of events if standard rates apply Ratio Observed

Expected = SMR

slide-33
SLIDE 33

Introduction Poisson Regression Negative Binomial Regression Additional topics Log-linear Models Standardisation Generalized Linear Models Setting Reference Category for Categorical Variables

Standardisation vs. Adjustment

Direct standardisation

Poisson regression assumes same RR in each stratum D.S. assumes different RR in each stratum Both give weighted mean RR: weights differ

Indirect Standardisation

Good measure of causal effect in this sample Can be useful in e.g. observational study of treatment effect. Do not compare SMR’s

They tell you what happened in observed group. Do not tell you what might happen in a different group.

slide-34
SLIDE 34

Introduction Poisson Regression Negative Binomial Regression Additional topics Log-linear Models Standardisation Generalized Linear Models Setting Reference Category for Categorical Variables

Generalized Linear Models

We have met a number of regression models All have the form: g(µ) = β0 + β1x1 + . . . + βpxp Y = µ + ε where µ is the expected value of Y ε has a known distribution (normal, binomial etc) g() is called the link function

slide-35
SLIDE 35

Introduction Poisson Regression Negative Binomial Regression Additional topics Log-linear Models Standardisation Generalized Linear Models Setting Reference Category for Categorical Variables

Components of a GLM

You can choose the link function for yourself It should:

Map −∞ to ∞ onto reasonable values for µ Have parameters that are easy to interpret

Error distribution is determined by the data Only certain distributions are allowed

slide-36
SLIDE 36

Introduction Poisson Regression Negative Binomial Regression Additional topics Log-linear Models Standardisation Generalized Linear Models Setting Reference Category for Categorical Variables

Examples of GLM’s

Model Range of µ Link Error Distribution Linear Regression −∞ to ∞ g(µ) = µ Normal Logistic Regression 0 to 1 g(µ) =log(

µ 1−µ)

Binomial Poisson Regression 0 to ∞ g(µ) =log(µ) Poisson

slide-37
SLIDE 37

Introduction Poisson Regression Negative Binomial Regression Additional topics Log-linear Models Standardisation Generalized Linear Models Setting Reference Category for Categorical Variables

GLM’s in Stata

Command glm Option family() sets the error distribution Option link() sets the link function There are more options to predict after glm E.g.

glm yvar xvars, family(binomial) link(logit)

is equivalent to

logistic yvar xvars

slide-38
SLIDE 38

Introduction Poisson Regression Negative Binomial Regression Additional topics Log-linear Models Standardisation Generalized Linear Models Setting Reference Category for Categorical Variables

Setting Reference Category for Categorical Variables: New Way

For one model ib#.varname Permanently fvset base # varname Alternatives to # first last frequent

slide-39
SLIDE 39

Introduction Poisson Regression Negative Binomial Regression Additional topics Log-linear Models Standardisation Generalized Linear Models Setting Reference Category for Categorical Variables

Setting Reference Category for Categorical Variables: Old Way

char variable[omit] # char Characteristic variable Name of variable to set reference category for # Value of reference category