Applied Statistics and Data Modeling Part 3: Analysis of Variance - - PowerPoint PPT Presentation

Applied Statistics and Data Modeling Part 3: Analysis of Variance - One way ANOVA Luc Duchateau 1 Paul Janssen 2 1 Faculty of Veterinary Medicine Ghent University, Belgium 2 Center for Statistics Hasselt University, Belgium 2020 UGent STATS VM L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 1 / 48

Analysis of Variance Overview ANOVA Overview One way ANOVA Two way ANOVA ANOVA for blocked experiments UGent STATS VM L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 2 / 48

One way ANOVA Overview Overview One way ANOVA Data Models Parameter estimation Hypothesis testing General Contrast Set of contrasts UGent STATS VM L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 3 / 48

One way ANOVA One way ANOVA: model One way ANOVA One way: one factor, > 2 levels An O Va = Analysis Of Variance UGent STATS VM L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 4 / 48

One way ANOVA One way ANOVA: model Example 1: Weight gain in 4 breeds of chicken Breed 1 Breed 2 Breed 3 Breed 4 1.56 1.38 1.49 1.46 1.54 1.41 1.54 1.49 1.50 1.44 1.48 1.44 1.49 1.37 1.51 1.52 1.51 1.40 1.48 1.49 UGent STATS VM L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 5 / 48

One way ANOVA One way ANOVA: model Model specification: Cell means model Y ij = µ i + e ij a � i = 1 , . . . , a j = 1 , . . . , n i n . = n i i =1 with j th observation of breed i Y ij µ i population mean of breed i random error, independent and N (0; σ 2 ) e ij UGent STATS VM L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 6 / 48

One way ANOVA One way ANOVA: model Cell means model: Model assumptions Y ij = µ i + e ij Homogeneity of the variance Var ( Y ij ) = Var ( e ij ) = σ 2 Normality of the observations � Y ij is normally distributed because e ij is normally distributed � E ( Y ij ) = µ i because E ( e ij ) = 0 � Var ( Y ij ) = σ 2 � ⇒ Y ij ∼ N ( µ i , σ 2 ) Independence of the observations UGent STATS VM Y ij ’s are independent because e ij ’s are independent L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 7 / 48

One way ANOVA One way ANOVA: model Graphic representation of cell means model UGent STATS VM L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 8 / 48

One way ANOVA One way ANOVA: model Normal distribution assumption � µ, σ 2 � Throughout the course we assume Y ∼ N � � 2 � 1 − 1 � y − µ √ f Y ( y ) = 2 πσ 2 exp 2 σ + ∞ � E ( Y ) = yf Y ( y ) dy = µ −∞ + ∞ UGent ( y − µ ) 2 f Y ( y ) dy = σ 2 STATS � Var ( Y ) = VM −∞ L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 9 / 48

One way ANOVA One way ANOVA: Estimation Estimation using Least Squares (LS) technique Notation ni � Y ij n i ¯ = Y i . j =1 � Y i . = Y ij Y i . = n i n i j =1 ni a � � Y ij a n i ¯ = Y .. i =1 j =1 = � � = Y .. Y ij Y .. n . n . i =1 j =1 UGent STATS VM L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 10 / 48

One way ANOVA One way ANOVA: Estimation LS technique for cell means model Estimator of µ i by minimising the LS criterion a n i � � ( Y ij − µ i ) 2 Q = i =1 j =1 First take the first parial derivative n i dQ � = ( − 2) ( Y ij − µ i ) d µ i j =1 Equate to 0 n i � 2 ( Y ij − ˆ µ i ) = 0 j =1 UGent STATS VM µ i = ¯ Parameter estimator for µ i : ˆ Y i . L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 11 / 48

One way ANOVA One way ANOVA: Estimation The corresponding R code setwd("c:/users/lduchate/docs/OC/onderwijs/ADEKUS2020/data") chicken<-read.table(file = "chicken.csv",header=T,sep=";") cellmeans.chicken<-lm(weight~breed-1,data=chicken) summary(cellmeans.chicken) ## ## Call: ## lm(formula = weight ~ breed - 1, data = chicken) ## ## Residuals: ## Min 1Q Median 3Q Max ## -0.0400 -0.0200 -0.0050 0.0125 0.0400 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## breeda 1.52000 0.01265 120.2 <2e-16 *** UGent ## breedb 1.40000 0.01265 110.7 <2e-16 *** STATS VM ## breedc 1.50000 0.01265 118.6 <2e-16 *** ## breedd 1.48000 0.01265 117.0 <2e-16 *** ## --- L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 12 / 48 ## Signif. codes: 0 ' *** ' 0.001 ' ** ' 0.01 ' * ' 0.05 ' . ' 0.1 ' ' 1

One way ANOVA One way ANOVA: Estimation Alternative model specification: Factor effects model Y ij = µ + α i + e ij a � i = 1 , . . . , a j = 1 , . . . , n i n . = n i i =1 with j th observation of breed i Y ij µ a constant, common for all observations a constant, the effect of the i th factor level α i the random error term, independent and N(0, σ 2 ) e ij UGent STATS VM L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 13 / 48

One way ANOVA One way ANOVA: Estimation Overparameterisation of factor effects model Cell means model has a parameters µ i Factor effects model has a + 1 parameters µ and α i ⇒ Overparameterisation! Parameter restrictions to make meaning of the parameters clear and unique a a µ i � � α i = 0 ⇒ µ = a i =1 i =1 as a a � � µ i = a µ + α i = a µ i =1 i =1 UGent This is the sum restriction and is the one we will use STATS VM L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 14 / 48

One way ANOVA One way ANOVA: Estimation The corresponding R code options(contrasts = rep("contr.sum", 2)) cellmeanscm.chicken<-lm(weight~breed,data=chicken) summary(cellmeanscm.chicken) ## ## Call: ## lm(formula = weight ~ breed, data = chicken) ## ## Residuals: ## Min 1Q Median 3Q Max ## -0.0400 -0.0200 -0.0050 0.0125 0.0400 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 1.475000 0.006325 233.218 < 2e-16 *** ## breed1 0.045000 0.010954 4.108 0.000823 *** UGent ## breed2 -0.075000 0.010954 -6.847 3.93e-06 *** STATS VM ## breed3 0.025000 0.010954 2.282 0.036501 * ## --- ## Signif. codes: 0 ' *** ' 0.001 ' ** ' 0.01 ' * ' 0.05 ' . ' 0.1 ' ' 1 L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 15 / 48 ##

One way ANOVA One way ANOVA: Hypothesis testing General hypothesis test: Sum of squares H 0 : µ 1 = µ 2 = . . . = µ a vs H a : Not all µ i equal UGent STATS VM L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 16 / 48

One way ANOVA One way ANOVA: Hypothesis testing General hypothesis test: Sum of squares H 0 : µ 1 = µ 2 = . . . = µ a vs H a : Not all µ i equal ANOVA based on sums of squares UGent STATS VM L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 16 / 48

One way ANOVA One way ANOVA: Hypothesis testing General hypothesis test: Sum of squares H 0 : µ 1 = µ 2 = . . . = µ a vs H a : Not all µ i equal ANOVA based on sums of squares The starting point is the deviation of the observation from the overall mean: UGent STATS VM L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 16 / 48

One way ANOVA One way ANOVA: Hypothesis testing General hypothesis test: Sum of squares H 0 : µ 1 = µ 2 = . . . = µ a vs H a : Not all µ i equal ANOVA based on sums of squares The starting point is the deviation of the observation from the overall mean: Y ij − ¯ Y .. UGent STATS VM L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 16 / 48

One way ANOVA One way ANOVA: Hypothesis testing General hypothesis test: Sum of squares H 0 : µ 1 = µ 2 = . . . = µ a vs H a : Not all µ i equal ANOVA based on sums of squares The starting point is the deviation of the observation from the overall mean: � ¯ Y ij − ¯ Y i . − ¯ Y ij − ¯ � � � Y .. = Y .. + Y i . UGent STATS VM L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 16 / 48

One way ANOVA One way ANOVA: Hypothesis testing General hypothesis test: Sum of squares H 0 : µ 1 = µ 2 = . . . = µ a vs H a : Not all µ i equal ANOVA based on sums of squares The starting point is the deviation of the observation from the overall mean: � ¯ Y ij − ¯ Y i . − ¯ Y ij − ¯ � � � Y .. = Y .. + Y i . Square and sum over all observations n i n i a a a � ¯ � 2 = � 2 + � 2 � � Y ij − ¯ � Y i . − ¯ � � Y ij − ¯ � � Y .. n i Y .. Y i . i =1 j =1 i =1 i =1 j =1 UGent STATS VM L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 16 / 48

Applied Statistics and Data Modeling Part 3: Analysis of Variance - - PowerPoint PPT Presentation

Applied Statistics and Data Modeling Part 3: Analysis of Variance - One way ANOVA Luc Duchateau 1 Paul Janssen 2 1 Faculty of Veterinary Medicine Ghent University, Belgium 2 Center for Statistics Hasselt University, Belgium 2020 UGent STATS

The Power and Limits of Statistics DPRRGSP 2018-11-29 @ReinhardFurrer Applied Statistics

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

Modeling of proteins and complexes High resolution Low resolution Modeling of domains Modeling

Virtual Reality Modeling Virtual Reality Modeling from http://www.okino.com/ Modeling Modeling

Nested designs Applied Statistics and Experimental Design Chapter 7 Peter Hoff Statistics,

Section 1 Time Series Modeling 1 / 37 Time Series Modeling ST 810-006 Statistics and Financial

Areal statistics Barry Rowlingson Research Fellow DataCamp Spatial Statistics in R Borders

Language Modeling CSE354 - Spring 2020 Task Language Modeling Probabilistic Modeling

Applied Statistics and Data Modeling Part 3: Analysis of Variance - Balanced block designs Luc

Applied Statistics and Data Modeling Part 3: Analysis of Variance - Two way ANOVA Luc Duchateau 1

Applied Statistics and Data Modeling An introduction to R Luc Duchateau 1 Paul Janssen 2 1 Faculty

AP Biology and Statistics Statistics Statistics help to better understand the meaning of a

Geostatistical data Barry Rowlingson Geostatistician DataCamp Spatial Statistics in R Data

The Pulse monitors: Statistics Smartpods PULSE 1 - Improve Facility Efficiencies 2 - Increase

Probability Review Applied Bayesian Statistics Dr. Earvin Balderama Department of Mathematics

Applied Bayesian Statistics STAT 388/488 Dr. Earvin Balderama Department of Mathematics &

Math 1710 Class 24 Examples Power 2-Sample CIs Dr. Allen Back and HTs 2-Sample

MAT 166 Calculus for Bus/Soc Chapter 3 Notes Limits The Deriviative David J. Gisch Limits

7: The Exam CS1021 CS1021 Exam structure Exam consists of 4 questions,

Statistical Natural Language Processing N-gram Language Models ar ltekin University of

EnviTec Biogas BIOGAS FROM POME AND EFB Eng. Marcello Barbato Regional Business Development -

His istoric Preserv rvation Appreciating th the Weird Steven Hoffman, PhD. Coordinator,

Inline Functions http://cs.mst.edu #include <iostream> using namespace std; float

1 Reasons Why We Might Believe in the Resurrection: Because we are supposed to Some of us have

Sambuz

Useful Links

Newsletter

Mail Us

Applied Statistics and Data Modeling Part 3: Analysis of Variance - - PowerPoint PPT Presentation

Applied Statistics and Data Modeling Part 3: Analysis of Variance - One way ANOVA Luc Duchateau 1 Paul Janssen 2 1 Faculty of Veterinary Medicine Ghent University, Belgium 2 Center for Statistics Hasselt University, Belgium 2020 UGent STATS

The Power and Limits of Statistics DPRRGSP 2018-11-29 @ReinhardFurrer Applied Statistics

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

Modeling of proteins and complexes High resolution Low resolution Modeling of domains Modeling

Virtual Reality Modeling Virtual Reality Modeling from http://www.okino.com/ Modeling Modeling

Nested designs Applied Statistics and Experimental Design Chapter 7 Peter Hoff Statistics,

Section 1 Time Series Modeling 1 / 37 Time Series Modeling ST 810-006 Statistics and Financial

Areal statistics Barry Rowlingson Research Fellow DataCamp Spatial Statistics in R Borders

Language Modeling CSE354 - Spring 2020 Task Language Modeling Probabilistic Modeling

Applied Statistics and Data Modeling Part 3: Analysis of Variance - Balanced block designs Luc

Applied Statistics and Data Modeling Part 3: Analysis of Variance - Two way ANOVA Luc Duchateau 1

Applied Statistics and Data Modeling An introduction to R Luc Duchateau 1 Paul Janssen 2 1 Faculty

AP Biology and Statistics Statistics Statistics help to better understand the meaning of a

Geostatistical data Barry Rowlingson Geostatistician DataCamp Spatial Statistics in R Data

The Pulse monitors: Statistics Smartpods PULSE 1 - Improve Facility Efficiencies 2 - Increase

Probability Review Applied Bayesian Statistics Dr. Earvin Balderama Department of Mathematics

Applied Bayesian Statistics STAT 388/488 Dr. Earvin Balderama Department of Mathematics &amp;

Math 1710 Class 24 Examples Power 2-Sample CIs Dr. Allen Back and HTs 2-Sample

MAT 166 Calculus for Bus/Soc Chapter 3 Notes Limits The Deriviative David J. Gisch Limits

7: The Exam CS1021 CS1021 Exam structure Exam consists of 4 questions,

Statistical Natural Language Processing N-gram Language Models ar ltekin University of

EnviTec Biogas BIOGAS FROM POME AND EFB Eng. Marcello Barbato Regional Business Development -

His istoric Preserv rvation Appreciating th the Weird Steven Hoffman, PhD. Coordinator,

Inline Functions http://cs.mst.edu #include &lt;iostream&gt; using namespace std; float

1 Reasons Why We Might Believe in the Resurrection: Because we are supposed to Some of us have

Sambuz

Useful Links

Newsletter

Mail Us

Applied Bayesian Statistics STAT 388/488 Dr. Earvin Balderama Department of Mathematics &

Inline Functions http://cs.mst.edu #include <iostream> using namespace std; float