analysis of variance and regression other types of
play

Analysis of variance and regression Other types of regression models - PowerPoint PPT Presentation

Analysis of variance and regression Other types of regression models Other types of regression models Counts: Poisson models Ordinal data: Proportional odds models Survival analysis (censored, time-to-event data): Cox proportional


  1. Analysis of variance and regression Other types of regression models

  2. Other types of regression models • Counts: Poisson models • Ordinal data: Proportional odds models • Survival analysis (censored, time-to-event data): Cox proportional hazards model • (Other types of censored data)

  3. 1 Other types of regression Until now, we have been looking at • regression for normally distributed data, where parameters describe – differences between groups – expected difference in outcome for one unit’s difference in an explanatory variable • regression for binary data, logistic regression, where parameters describe – odds ratios for one unit’s difference in an explanatory variable

  4. 2 Other types of regression What about something ’in between’? • counts (Poisson distribution) – number of cancer cases in each municipality per year – number of positive pneumocock swabs • ordered categorical variable with more than 2 categories, e.g., – degree of pain (none/mild/moderate/serious) – degree of liver fibrosis

  5. 3 Other types of regression Generalised linear models: Multiple regression models, on a scale suitable for the data: Mean: M Link function: g( M ) linear in covariates, that is, g ( M ) = b 0 + b 1 x 1 + · · · + b k x k Some standard distributions (and link functions): • Normal distribution ( link=IDENTITY ): the general linear model • Binomial distribution ( link=LOGIT ): logistic regression • Poisson distribution ( link=LOG )

  6. 4 Other types of regression Poisson distribution: • distribution on the numbers 0, 1, 2, 3, . . . • limit of binomial distribution for N large, p small, mean: M = Np – e.g., CNS cancer cases among registered cell phone users • probability of k events: P ( Y = k ) = e − M M k k ! Example: Positive swabs for 90 individuals from 18 families

  7. 5 Other types of regression

  8. 6 Other types of regression Illustration of family profiles O O O U O C U O O O C C C O O C O U C C O U C U U C C C O O O C O O O C U C U C O U O C O O C C U C O C C U U U U U U O O U O C O C C C U O C C U U O O U C C U U U U U U U C U O U

  9. 7 Other types of regression We observe counts (we ignore the grouping of families here) Y fn ∼ Poisson( M fn ) Additive model , corresponding to two-way ANOVA in family and name : log( M fn ) = M + a f + b n PROC GENMOD; CLASS family name; MODEL swabs=family name / DIST=POISSON LINK=LOG CL; RUN;

  10. 8 Other types of regression The GENMOD Procedure Model Information Data Set WORK.A0 Distribution Poisson Link Function Log Dependent Variable swabs Observations Used 90 Missing Values 1 Class Level Information Class Levels Values family 18 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 name 5 child1 child2 child3 father mother

  11. 9 Other types of regression Analysis Of Parameter Estimates Standard Wald 95% Chi- Parameter DF Estimate Error Confidence Limits Square Pr > ChiSq Intercept 1 1.5263 0.1845 1.1647 1.8879 68.43 <.0001 family 1 1 0.4636 0.2044 0.0630 0.8641 5.14 0.0233 family 2 1 0.9214 0.1893 0.5503 1.2925 23.68 <.0001 family 3 1 0.4473 0.2050 0.0455 0.8492 4.76 0.0291 . . . . . . . . . . . . . . . . . . family 16 1 0.2283 0.2146 -0.1923 0.6488 1.13 0.2875 family 17 1 -0.5725 0.2666 -1.0951 -0.0499 4.61 0.0318 family 18 0 0.0000 0.0000 0.0000 0.0000 . . name child1 1 0.3228 0.1281 0.0716 0.5739 6.34 0.0118 name child2 1 0.8990 0.1158 0.6721 1.1259 60.31 <.0001 name child3 1 0.9664 0.1147 0.7417 1.1912 71.04 <.0001 name father 1 0.0095 0.1377 -0.2604 0.2793 0.00 0.9451 name mother 0 0.0000 0.0000 0.0000 0.0000 . . Scale 0 1.0000 0.0000 1.0000 1.0000 NOTE: The scale parameter was held fixed.

  12. 10 Other types of regression Interpretation of Poisson analysis: • The family -parameters are uninteresting • The name -parameters are interesting • The mothers serve as the reference group • The model is additive on a logarithmic scale, that is, multiplicative on the original scale

  13. 11 Other types of regression Parameter estimates: name estimate (CI) ratio (CI) child1 0.3228 (0.0716, 0.5739) 1.38 (1.07, 1.78) child2 0.8990 (0.6721, 1.1259) 2.46 (1.96, 3.08) child3 0.9664 (0.7417, 1.1912) 2.63 (2.10, 3.29) father 0.0095 (-0.2604, 0.2793) 1.01 (0.77, 1.32) mother - - Interpretation: The youngest children have a 2-3 fold increased probability of infection, compared to their mother

  14. 12 Other types of regression Ordinal data , e.g., level of pain • data on a rank (ordered) scale • distance between response categories is not known / is undefined • often an imaginary underlying continuous scale Covariates are intended to describe the probability for each response category, and the effect of each covariate is likely to be a general shift in upwards/downwards direction (in contrast to, e.g., increasing/decreasing probabilities of both extremes simultaneously)

  15. 13 Other types of regression Possibilities based on knowledge sofar: • We can pretend that we are dealing with normally distributed data – of course most reasonable, when there are many response categories • We may reduce to a two-category outcome and use logistic regression – but there are several possible cutpoints/thresholds Alternative: Proportional odds

  16. 14 Other types of regression Example on liver fibrosis (degree 0,1,2 or 3), (Julia Johansen, KKHH) 3 blood markers related to fibrosis: • ha • ykl40 • pIIInp Problem: What can we say about the degree of fibrosis from the knowledge of these 3 blood markers?

  17. 15 Other types of regression The MEANS Procedure Variable N Mean Std Dev Minimum Maximum ------------------------------------------------------------------ degree_fibr 129 1.4263566 0.9903850 0 3.0000000 ykl40 129 533.5116279 602.2934049 50.0000000 4850.00 pIIInp 127 13.4149606 12.4887192 1.7000000 70.0000000 ha 128 318.4531250 658.9499624 21.0000000 4730.00 ------------------------------------------------------------------

  18. 16 Other types of regression Y i : the observed degree of fibrosis for the i ’th patient. We wish to specify the probabilities p ik = P ( Y i = k ) , k = 0 , 1 , 2 , 3 and their dependence on certain covariates. Since p i 0 + p i 1 + p i 2 + p i 3 = 1, we have a total of 3 free parameters for each individual.

  19. 17 Other types of regression We start by defining the cumulative probabilities from the top: • split between 2 and 3: model for q i 3 = p i 3 • split between 1 and 2: model for q i 2 = p i 2 + p i 3 • split between 0 and 1: model for q i 1 = p i 1 + p i 2 + p i 3 Logistic regression model for each threshold.

  20. 18 Other types of regression We start out simple, with one single blood marker x i for the i ’th patient (here: i = 1 , . . . , 126). Proportional odds model, model for ’cumulative logits’: � � q ik logit( q ik ) = log = a k + b × x i , 1 − q ik or, on the original probability scale: exp( a k + bx i ) q ik = q k ( x i ) = 1 + exp( a k + bx i ) , k = 1 , 2 , 3

  21. 19 Other types of regression Properties of the proportional odds model : • the odds ratio does not depend on the cut point, only on the covariates � q k ( x 1 ) / (1 − q k ( x 1 )) � log = b × ( x 1 − x 2 ) q k ( x 2 ) / (1 − q k ( x 2 )) • reversing the ordering of the categories only implies a change of sign for the log odds parameters

  22. 20 Other types of regression Probabilities for each degree of fibrosis ( k ) can be calculated as successive differences: exp( a 3 + bx ) p 3 ( x ) = q 3 ( x ) = 1 + exp( a 3 + bx ) p k ( x ) = q k ( x ) − q k +1 ( x ) , k = 0 , 1 , 2

  23. 21 Other types of regression We start out using only the marker HA Very skewed distributions, – but we do not demand anything about these!?

  24. 22 Other types of regression Proportional odds model in SAS: DATA fibrosis; INFILE ’julia.tal’ FIRSTOBS=2; INPUT id degree_fibr ykl40 pIIInp ha; IF degree_fibr<0 THEN DELETE; RUN; PROC LOGISTIC DATA=fibrosis DESCENDING; MODEL degree_fibr=ha / LINK=LOGIT CLODDS=PL; RUN;

  25. 23 Other types of regression The LOGISTIC Procedure Model Information Data Set WORK.FIBROSIS Response Variable degree_fibr Number of Response Levels 4 Number of Observations 128 Model cumulative logit Optimization Technique Fisher’s scoring Response Profile Ordered Total Value degree_fibr Frequency 1 3 20 2 2 42 3 1 40 4 0 26 Probabilities modeled are cumulated over the lower Ordered Values.

  26. 24 Other types of regression Score Test for the Proportional Odds Assumption Chi-Square DF Pr > ChiSq 5.1766 2 0.0751 Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 3 1 -2.3175 0.3113 55.4296 <.0001 Intercept 2 1 -0.4597 0.2029 5.1349 0.0234 Intercept 1 1 1.0945 0.2334 21.9935 <.0001 ha 1 0.00140 0.000383 13.3099 0.0003 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits ha 1.001 1.001 1.002 Profile Likelihood Confidence Interval for Adjusted Odds Ratios Effect Unit Estimate 95% Confidence Limits ha 1.0000 1.001 1.001 1.002

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend