Analysis of variance and regression Other types of regression models

Other types of regression models • Counts: Poisson models • Ordinal data: Proportional odds models • Survival analysis (censored, time-to-event data): Cox proportional hazards model • (Other types of censored data)

1 Other types of regression Until now, we have been looking at • regression for normally distributed data, where parameters describe – differences between groups – expected difference in outcome for one unit’s difference in an explanatory variable • regression for binary data, logistic regression, where parameters describe – odds ratios for one unit’s difference in an explanatory variable

2 Other types of regression What about something ’in between’? • counts (Poisson distribution) – number of cancer cases in each municipality per year – number of positive pneumocock swabs • ordered categorical variable with more than 2 categories, e.g., – degree of pain (none/mild/moderate/serious) – degree of liver fibrosis

3 Other types of regression Generalised linear models: Multiple regression models, on a scale suitable for the data: Mean: M Link function: g( M ) linear in covariates, that is, g ( M ) = b 0 + b 1 x 1 + · · · + b k x k Some standard distributions (and link functions): • Normal distribution ( link=IDENTITY ): the general linear model • Binomial distribution ( link=LOGIT ): logistic regression • Poisson distribution ( link=LOG )

4 Other types of regression Poisson distribution: • distribution on the numbers 0, 1, 2, 3, . . . • limit of binomial distribution for N large, p small, mean: M = Np – e.g., CNS cancer cases among registered cell phone users • probability of k events: P ( Y = k ) = e − M M k k ! Example: Positive swabs for 90 individuals from 18 families

5 Other types of regression

6 Other types of regression Illustration of family profiles O O O U O C U O O O C C C O O C O U C C O U C U U C C C O O O C O O O C U C U C O U O C O O C C U C O C C U U U U U U O O U O C O C C C U O C C U U O O U C C U U U U U U U C U O U

7 Other types of regression We observe counts (we ignore the grouping of families here) Y fn ∼ Poisson( M fn ) Additive model , corresponding to two-way ANOVA in family and name : log( M fn ) = M + a f + b n PROC GENMOD; CLASS family name; MODEL swabs=family name / DIST=POISSON LINK=LOG CL; RUN;

8 Other types of regression The GENMOD Procedure Model Information Data Set WORK.A0 Distribution Poisson Link Function Log Dependent Variable swabs Observations Used 90 Missing Values 1 Class Level Information Class Levels Values family 18 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 name 5 child1 child2 child3 father mother

9 Other types of regression Analysis Of Parameter Estimates Standard Wald 95% Chi- Parameter DF Estimate Error Confidence Limits Square Pr > ChiSq Intercept 1 1.5263 0.1845 1.1647 1.8879 68.43 <.0001 family 1 1 0.4636 0.2044 0.0630 0.8641 5.14 0.0233 family 2 1 0.9214 0.1893 0.5503 1.2925 23.68 <.0001 family 3 1 0.4473 0.2050 0.0455 0.8492 4.76 0.0291 . . . . . . . . . . . . . . . . . . family 16 1 0.2283 0.2146 -0.1923 0.6488 1.13 0.2875 family 17 1 -0.5725 0.2666 -1.0951 -0.0499 4.61 0.0318 family 18 0 0.0000 0.0000 0.0000 0.0000 . . name child1 1 0.3228 0.1281 0.0716 0.5739 6.34 0.0118 name child2 1 0.8990 0.1158 0.6721 1.1259 60.31 <.0001 name child3 1 0.9664 0.1147 0.7417 1.1912 71.04 <.0001 name father 1 0.0095 0.1377 -0.2604 0.2793 0.00 0.9451 name mother 0 0.0000 0.0000 0.0000 0.0000 . . Scale 0 1.0000 0.0000 1.0000 1.0000 NOTE: The scale parameter was held fixed.

10 Other types of regression Interpretation of Poisson analysis: • The family -parameters are uninteresting • The name -parameters are interesting • The mothers serve as the reference group • The model is additive on a logarithmic scale, that is, multiplicative on the original scale

11 Other types of regression Parameter estimates: name estimate (CI) ratio (CI) child1 0.3228 (0.0716, 0.5739) 1.38 (1.07, 1.78) child2 0.8990 (0.6721, 1.1259) 2.46 (1.96, 3.08) child3 0.9664 (0.7417, 1.1912) 2.63 (2.10, 3.29) father 0.0095 (-0.2604, 0.2793) 1.01 (0.77, 1.32) mother - - Interpretation: The youngest children have a 2-3 fold increased probability of infection, compared to their mother

12 Other types of regression Ordinal data , e.g., level of pain • data on a rank (ordered) scale • distance between response categories is not known / is undefined • often an imaginary underlying continuous scale Covariates are intended to describe the probability for each response category, and the effect of each covariate is likely to be a general shift in upwards/downwards direction (in contrast to, e.g., increasing/decreasing probabilities of both extremes simultaneously)

13 Other types of regression Possibilities based on knowledge sofar: • We can pretend that we are dealing with normally distributed data – of course most reasonable, when there are many response categories • We may reduce to a two-category outcome and use logistic regression – but there are several possible cutpoints/thresholds Alternative: Proportional odds

14 Other types of regression Example on liver fibrosis (degree 0,1,2 or 3), (Julia Johansen, KKHH) 3 blood markers related to fibrosis: • ha • ykl40 • pIIInp Problem: What can we say about the degree of fibrosis from the knowledge of these 3 blood markers?

15 Other types of regression The MEANS Procedure Variable N Mean Std Dev Minimum Maximum ------------------------------------------------------------------ degree_fibr 129 1.4263566 0.9903850 0 3.0000000 ykl40 129 533.5116279 602.2934049 50.0000000 4850.00 pIIInp 127 13.4149606 12.4887192 1.7000000 70.0000000 ha 128 318.4531250 658.9499624 21.0000000 4730.00 ------------------------------------------------------------------

16 Other types of regression Y i : the observed degree of fibrosis for the i ’th patient. We wish to specify the probabilities p ik = P ( Y i = k ) , k = 0 , 1 , 2 , 3 and their dependence on certain covariates. Since p i 0 + p i 1 + p i 2 + p i 3 = 1, we have a total of 3 free parameters for each individual.

17 Other types of regression We start by defining the cumulative probabilities from the top: • split between 2 and 3: model for q i 3 = p i 3 • split between 1 and 2: model for q i 2 = p i 2 + p i 3 • split between 0 and 1: model for q i 1 = p i 1 + p i 2 + p i 3 Logistic regression model for each threshold.

18 Other types of regression We start out simple, with one single blood marker x i for the i ’th patient (here: i = 1 , . . . , 126). Proportional odds model, model for ’cumulative logits’: � � q ik logit( q ik ) = log = a k + b × x i , 1 − q ik or, on the original probability scale: exp( a k + bx i ) q ik = q k ( x i ) = 1 + exp( a k + bx i ) , k = 1 , 2 , 3

19 Other types of regression Properties of the proportional odds model : • the odds ratio does not depend on the cut point, only on the covariates � q k ( x 1 ) / (1 − q k ( x 1 )) � log = b × ( x 1 − x 2 ) q k ( x 2 ) / (1 − q k ( x 2 )) • reversing the ordering of the categories only implies a change of sign for the log odds parameters

20 Other types of regression Probabilities for each degree of fibrosis ( k ) can be calculated as successive differences: exp( a 3 + bx ) p 3 ( x ) = q 3 ( x ) = 1 + exp( a 3 + bx ) p k ( x ) = q k ( x ) − q k +1 ( x ) , k = 0 , 1 , 2

21 Other types of regression We start out using only the marker HA Very skewed distributions, – but we do not demand anything about these!?

22 Other types of regression Proportional odds model in SAS: DATA fibrosis; INFILE ’julia.tal’ FIRSTOBS=2; INPUT id degree_fibr ykl40 pIIInp ha; IF degree_fibr<0 THEN DELETE; RUN; PROC LOGISTIC DATA=fibrosis DESCENDING; MODEL degree_fibr=ha / LINK=LOGIT CLODDS=PL; RUN;

23 Other types of regression The LOGISTIC Procedure Model Information Data Set WORK.FIBROSIS Response Variable degree_fibr Number of Response Levels 4 Number of Observations 128 Model cumulative logit Optimization Technique Fisher’s scoring Response Profile Ordered Total Value degree_fibr Frequency 1 3 20 2 2 42 3 1 40 4 0 26 Probabilities modeled are cumulated over the lower Ordered Values.

24 Other types of regression Score Test for the Proportional Odds Assumption Chi-Square DF Pr > ChiSq 5.1766 2 0.0751 Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 3 1 -2.3175 0.3113 55.4296 <.0001 Intercept 2 1 -0.4597 0.2029 5.1349 0.0234 Intercept 1 1 1.0945 0.2334 21.9935 <.0001 ha 1 0.00140 0.000383 13.3099 0.0003 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits ha 1.001 1.001 1.002 Profile Likelihood Confidence Interval for Adjusted Odds Ratios Effect Unit Estimate 95% Confidence Limits ha 1.0000 1.001 1.001 1.002

Analysis of variance and regression Other types of regression models - PowerPoint PPT Presentation

Analysis of variance and regression Other types of regression models Other types of regression models Counts: Poisson models Ordinal data: Proportional odds models Survival analysis (censored, time-to-event data): Cox proportional

Variance Will Perkins January 22, 2013 Variance Definition The variance of a random variable X

Analysis of variance and regression December 4, 2007 Variance component models Variance

Analysis of variance and regression November 27, 2007 Other types of regression models Counts

Module 15 Standard Costing and Variance Analysis Dr. Varadraj Bapat 1 Standard Costing

Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression

Estimating Variance under Estimating Mean . . . Interval and Fuzzy Estimating Variance . . .

Alex Psomas: Lecture 18. Random Variables: Variance 1. Variance 2. Distributions Variance Flip

Variance = E[I 2 ] 2pE[I] + p 2 = E[I] 2p p + p 2 = 2 2 = p-2p+ p pq variance.1

Analysis of variance and regression May 13, 2008 Repeated measurements over time Presentation

Types Dynamic types Types are broken down into many categories Static types Duck typing

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

! TYPES & STATIC ANALYSIS TYPES ARE GOOD, I PROMISE. SAM GREENWOOD @SAMTGREENWOOD

High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march

Feb 27: Expectation, Variance, and Standard Deviation In-class Midterm Exam MOVED to 3/10

Bias, Variance and Error Bias and Variance given algorithm that outputs estimate for , we

Radiation protection dosimetry Anne Laure Lebacq, Olivier Van Hoey Belgian Nuclear Research

DIMACS tutorial Network coding: an Algorithmic Perspective Tracey Ho - California Institute of

The coset leader weight enumerator of the code of the twisted cubic Ruud Pellikaan

UMBC A B M A L T F O U M B C I M Y O R T 1 (12/11/06) I E S R C E O V

Learning Objectives Describe applicable results of important histology-specific clinical

Disclosure Learning Outcomes O Articulate recovery oriented principles of care delivery. The

35T experience with Cryo Measurements and CFD Alan Hahn FNAL 8/15/18 1 35 Ton Prototype

State Notation Language State Notation Language and the Sequencer and the Sequencer NSLS-II