Analysis of variance and regression November 27, 2007 Other types - PowerPoint PPT Presentation

Analysis of variance and regression November 27, 2007

Other types of regression models • Counts (Poisson models) • Ordinal data – proportional odds models – model control – model interpretation • Survival analysis

Lene Theil Skovgaard, Dept. of Biostatistics, Institute of Public Health, University of Copenhagen e-mail: L.T.Skovgaard@biostat.ku.dk http://staff.pubhealth.ku.dk/~lts/regression07_2

Other types of regression, November 2007 1 Until now, we have been looking at • regression for normally distributed data, where parameters describe – differences between groups – effect of a one unit increase in an explanatory variable • regression for binary data, logistic regression, where parameters describe – odds ratios for a one unit increase in an explanatory variable

Other types of regression, November 2007 2 What about something ’in between’? • counts (Poisson distribution) – number of cancer cases in each municipality per year – number of positive pneumocock swabs • categorical variable with more than 2 categories, e.g. – degree of pain (none/mild/moderate/serious) – degree of liver fibrosis • non-normal quantitative measurements – censored data, survival analysis

Other types of regression, November 2007 3 Generalised linear models: Multiple regression models, on a scale suitable for the data: Mean: µ Link function: g( µ ) linear in covariates, i.e. g ( µ ) = β 0 + β 1 x 1 + · · · + β k x k An important class of distributions for these models: Exponential families , including • Normal distribution (link= identity ): the general linear model • Binomial distribution (link= logit ): logistic regression • Poisson distribution (link= log )

Other types of regression, November 2007 4 Poisson distribution: • distribution on the numbers 0,1,2,3,... • limit of Binomial distribution for N large, p small, mean: µ = Np – e.g. cancer events in a certain region • probability of k events: P ( Y = k ) = e − µ µ k k ! Example: positive swabs for 90 individuals from 18 families

Other types of regression, November 2007 5

Other types of regression, November 2007 6 Illustration of family profiles (we ignore the grouping of families here) O O O U O C U O O O C C C O O C O U C C O U C U U C C C O O O C O O O C U C U C O U O C O O C C U C O C C U U U U U U O O U O C O C C C U O C C U U O O U C C U U U U U U U C U O U

Other types of regression, November 2007 7 We observe counts y fn ∼ Poisson( µ fn ) Additive model , corresponding to two-way ANOVA in family and name : log( µ fn ) = µ + α f + β n proc genmod; class family name; model swabs=family name / dist=poisson link=log cl; run;

Other types of regression, November 2007 8 The GENMOD Procedure Model Information Data Set WORK.A0 Distribution Poisson Link Function Log Dependent Variable swabs Observations Used 90 Missing Values 1 Class Level Information Class Levels Values family 18 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 name 5 child1 child2 child3 father mother

Other types of regression, November 2007 9 Analysis Of Parameter Estimates Standard Wald 95% Chi- Parameter DF Estimate Error Confidence Limits Square Pr > ChiSq Intercept 1 1.5263 0.1845 1.1647 1.8879 68.43 <.0001 family 1 1 0.4636 0.2044 0.0630 0.8641 5.14 0.0233 family 2 1 0.9214 0.1893 0.5503 1.2925 23.68 <.0001 family 3 1 0.4473 0.2050 0.0455 0.8492 4.76 0.0291 . . . . . . . . . . . . . . . . . . family 16 1 0.2283 0.2146 -0.1923 0.6488 1.13 0.2875 family 17 1 -0.5725 0.2666 -1.0951 -0.0499 4.61 0.0318 family 18 0 0.0000 0.0000 0.0000 0.0000 . . name child1 1 0.3228 0.1281 0.0716 0.5739 6.34 0.0118 name child2 1 0.8990 0.1158 0.6721 1.1259 60.31 <.0001 name child3 1 0.9664 0.1147 0.7417 1.1912 71.04 <.0001 name father 1 0.0095 0.1377 -0.2604 0.2793 0.00 0.9451 name mother 0 0.0000 0.0000 0.0000 0.0000 . . Scale 0 1.0000 0.0000 1.0000 1.0000 NOTE: The scale parameter was held fixed.

Other types of regression, November 2007 10 Interpretation of Poisson analysis: • The family -parameters are uninteresting • The name -parameters are interesting • The mothers serve as a reference group • The model is additive on a logarithmic scale, i.e. multiplicative on the original scale

Other types of regression, November 2007 11 Parameter estimates: name estimate (CI) ratio (CI) child1 0.3228 (0.0716, 0.5739) 1.38 (1.07, 1.78) child2 0.8990 (0.6721, 1.1259) 2.46 (1.96, 3.08) child3 0.9664 (0.7417, 1.1912) 2.63 (2.10, 3.29) father 0.0095 (-0.2604, 0.2793) 1.01 (0.77, 1.32) mother - - Interpretation: The youngest children have a 2-3 fold increased probability of infection, compared to their mother

Other types of regression, November 2007 12 Ordinal data , e.g. level of pain • data on a rank scale • distance between response categories is not known / is undefined • often an imaginary underlying quantitative scale Covariates must describe the probability for each single response category.

Other types of regression, November 2007 13 We are faced with a dilemma: • We may reduce to a binary outcome and use logistic regression – but there are several possible ’cuts’/thresholds • We can ’pretend’ that we are dealing with normally distributed data – of course most reasonable, when there are many response categories

Other types of regression, November 2007 14 Example on liver fibrosis (degree 0,1,2 or 3), (Julia Johansen, KKHH) 3 blood markers related to fibrosis: • HA • YKL40 • PIIINP Problem: What can we say about the degree of fibrosis from the knowledge of these 3 blood markers?

Other types of regression, November 2007 15 The MEANS Procedure Variable N Mean Std Dev Minimum Maximum -------------------------------------------------------------------------- degree_fibr 129 1.4263566 0.9903850 0 3.0000000 ykl40 129 533.5116279 602.2934049 50.0000000 4850.00 piiinp 127 13.4149606 12.4887192 1.7000000 70.0000000 ha 128 318.4531250 658.9499624 21.0000000 4730.00 --------------------------------------------------------------------------

Other types of regression, November 2007 16 We start out simple, with one single blood marker x p for the p ’th patient (here: p = 1 , · · · , 126). Y p : the observed degree of fibrosis for the p ’th patient. We wish to specify the probabilities π pk = P ( Y p = k ) , k = 0 , 1 , 2 , 3 and their dependence on certain covariates. Since π p 0 + π p 1 + π p 2 + π p 3 = 1, we have a total of 3 parameters for each individual.

Other types of regression, November 2007 17 We start by defining the cumulative probabilities ’from the top’: • divide between 2 and 3: model for γ p 3 = π p 3 • divide between 1 and 2: model for γ p 2 = π p 2 + π p 3 • divide between 0 and 1: model for γ p 1 = π p 1 + π p 2 + π p 3 Logistic regression for each threshold.

Other types of regression, November 2007 18 Proportional odds model, model for ’cumulative logits’: � γ pk � logit( γ pk ) = log = α k + β × x p , 1 − γ pk or, on the original probability scale: exp( α k + βx p ) γ pk = γ k ( x p ) = 1 + exp( α k + βx p ) , k = 1 , 2 , 3

Other types of regression, November 2007 19 Properties of the proportional odds model : • odds ratios do not depend on cutpoint, only on the covariates � γ k ( x 1 ) / (1 − γ k ( x 1 )) � log = β × ( x 1 − x 2 ) γ k ( x 2 ) / (1 − γ k ( x 2 )) • changing the ordering of the categories only implies a change of sign for the parameters

Other types of regression, November 2007 20 Probabilities for each degree of fibrosis ( k ) can be calculated as successive differences: exp( α 3 + βx ) π 3 ( x ) = γ 3 ( x ) = 1 + exp( α 3 + βx ) π k ( x ) = γ k ( x ) − γ k +1 ( x ) , k = 0 , 1 , 2 These are logistic curves

Other types of regression, November 2007 21 Cumulative probabilities:

Other types of regression, November 2007 22 We start out using only the marker HA Very skewed distributions, – but we do not demand anything about these!?

Other types of regression, November 2007 23 Proportional odds model in SAS: data fibrosis; infile ’julia.tal’ firstobs=2; input id degree_fibr ykl40 piiinp ha; if degree_fibr<0 then delete; run; proc logistic data=fibrosis descending; model degree_fibr=ha / link=logit clodds=pl; run;

Other types of regression, November 2007 24 The LOGISTIC Procedure Model Information Data Set WORK.FIBROSIS Response Variable degree_fibr Number of Response Levels 4 Number of Observations 128 Model cumulative logit Optimization Technique Fisher’s scoring Response Profile Ordered Total Value degree_fibr Frequency 1 3 20 2 2 42 3 1 40 4 0 26 Probabilities modeled are cumulated over the lower Ordered Values.

Analysis of variance and regression November 27, 2007 Other types - PowerPoint PPT Presentation

Analysis of variance and regression November 27, 2007 Other types of regression models Counts (Poisson models) Ordinal data proportional odds models model control model interpretation Survival analysis Lene Theil

Analysis of variance and regression December 4, 2007 Variance component models Variance

Variance Will Perkins January 22, 2013 Variance Definition The variance of a random variable X

Analysis of variance and regression Other types of regression models Other types of regression

Estimating Variance under Estimating Mean . . . Interval and Fuzzy Estimating Variance . . .

Alex Psomas: Lecture 18. Random Variables: Variance 1. Variance 2. Distributions Variance Flip

Variance = E[I 2 ] 2pE[I] + p 2 = E[I] 2p p + p 2 = 2 2 = p-2p+ p pq variance.1

Analysis of variance and regression May 13, 2008 Repeated measurements over time Presentation

Module 15 Standard Costing and Variance Analysis Dr. Varadraj Bapat 1 Standard Costing

Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Analysis of variance and regression November 13, 2007 SAS graphics Scatter plots

Analysis of variance and regression November 22, 2007 Parametrisations : Choice of parameters

Analysis of variance and regression November 13, 2007 SAS language The SAS environments

High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march

Feb 27: Expectation, Variance, and Standard Deviation In-class Midterm Exam MOVED to 3/10

International consensus on quality standards for multiple sclerosis care: results from a

Monthly Webinar Series August 2020 Todays Agenda Trial Updates/Reminders Sandi Cassard

The marauders map or the use of non-intrusive range laser scanners in the context of smart

Data Science Applications & Use Cases Instructor: Ekpe Okorafor 1. Accenture Big Data

Established Management Paradigms for Advanced Triple-Negative Breast Cancer (TNBC); Actionable and

The Future of Sharing Ideas Nigel Portwood Chief Executive, Oxford University Press Fellow,

Data Collection Duen Horng (Polo) Chau Associate Professor, College of Computing Associate

Scraping Distributed, Hierarchical Web Data with Programming by Demonstration! Sarah E.

Sambuz

Useful Links

Newsletter

Mail Us

Analysis of variance and regression November 27, 2007 Other types - PowerPoint PPT Presentation

Analysis of variance and regression November 27, 2007 Other types of regression models Counts (Poisson models) Ordinal data proportional odds models model control model interpretation Survival analysis Lene Theil

Analysis of variance and regression December 4, 2007 Variance component models Variance

Variance Will Perkins January 22, 2013 Variance Definition The variance of a random variable X

Analysis of variance and regression Other types of regression models Other types of regression

Estimating Variance under Estimating Mean . . . Interval and Fuzzy Estimating Variance . . .

Alex Psomas: Lecture 18. Random Variables: Variance 1. Variance 2. Distributions Variance Flip

Variance = E[I 2 ] 2pE[I] + p 2 = E[I] 2p p + p 2 = 2 2 = p-2p+ p pq variance.1

Analysis of variance and regression May 13, 2008 Repeated measurements over time Presentation

Module 15 Standard Costing and Variance Analysis Dr. Varadraj Bapat 1 Standard Costing

Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Analysis of variance and regression November 13, 2007 SAS graphics Scatter plots

Analysis of variance and regression November 22, 2007 Parametrisations : Choice of parameters

Analysis of variance and regression November 13, 2007 SAS language The SAS environments

High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march

Feb 27: Expectation, Variance, and Standard Deviation In-class Midterm Exam MOVED to 3/10

International consensus on quality standards for multiple sclerosis care: results from a

Monthly Webinar Series August 2020 Todays Agenda Trial Updates/Reminders Sandi Cassard

The marauders map or the use of non-intrusive range laser scanners in the context of smart

Data Science Applications &amp; Use Cases Instructor: Ekpe Okorafor 1. Accenture Big Data

Established Management Paradigms for Advanced Triple-Negative Breast Cancer (TNBC); Actionable and

The Future of Sharing Ideas Nigel Portwood Chief Executive, Oxford University Press Fellow,

Data Collection Duen Horng (Polo) Chau Associate Professor, College of Computing Associate

Scraping Distributed, Hierarchical Web Data with Programming by Demonstration! Sarah E.

Sambuz

Useful Links

Newsletter

Mail Us

Data Science Applications & Use Cases Instructor: Ekpe Okorafor 1. Accenture Big Data