Lecture 16. Linear model with a classifying factor 2020 (1) A - PowerPoint PPT Presentation

Lecture 16. Linear model with a classifying factor 2020

(1) A simple anova

(2) Two breeds of sheep Black Welsh A measurement is taken on a 6.6 10.4 random sample of ˛ve animals 8.1 9.8 from each of two breeds. Is 7.6 11.0 there evidence for a di¸erence 6.9 10.6 between breeds? 8.3 9.2

(3) An indicator variable Breed X Y Black 0 6.6 Black 0 8.1 Black 0 7.6 Black 0 6.9 Black 0 8.3 Welsh 1 10.4 Welsh 1 9.8 Welsh 1 11.0 Welsh 1 10.6 Welsh 1 9.2 X is an indicator variable for the Welsh breed.

(4) An indicator variable Set X = 0 for each measurement on a Blackface animal and X = 1 for each measurement on a Welsh animal. The linear model E ( Y ) = b 0 + b 1 X assigns means to breeds as follows: Breed X b 0 + b 1 X Black 0 b 0 Welsh 1 b 0 + b 1 b 0 : mean value for Blackface breed b 1 : di¸erence between Welsh and Blackface Null hypothesis b 1 = 0 is equivalent to ’no di¸erences between breeds’.

(5) Using lm (with indicator variable) Y <- 0.1 * c(66,81,76,69,83,104,98,110,106,92) X <- rep(0:1, each = 5) fit <- lm(Y ˜ X) summary(fit) anova(fit) t statistic (from summary) or F statistic (from anova) tests b 1 = 0 (no di¸erence between breeds).

(6) Using lm (with factor) Y <- 0.1 * c(66,81,76,69,83,104,98,110,106,92) Breed <- gl(2,5, labels = c(’Black’,’Welsh’)) fit <- lm(Y ˜ Breed) summary(fit) t statistic (from summary) or F statistic (from anova) tests b 1 = 0 (no di¸erence between breeds). Output is identical to that obtained using an indicator variable (apart from labelling of estimates).

(7) Output from summary and anova Output from summary: Estimate SE t Intercept 7.5 0.3233 23.201 BreedWelsh 2.7 0.4572 5.906 Output from anova: DF SSQ MSQ F Breed 1 18.225 18.2250 34.88 Residuals 8 4.180 0.5225

(8) Three breeds of sheep Three breeds of sheep: Blackface, Welsh, and the Blackface ˆ Welsh cross. A measurement is taken on a random sample of ˛ve animals from each breed. Is there evidence for a di¸erence between breeds? Black Welsh Cross 6.6 10.4 7.0 8.1 9.8 9.3 7.6 11.0 8.4 6.9 10.6 7.6 8.3 9.2 9.7

(9) Two indicator variables X and Z are indicators for the Welsh and Cross breeds. The linear model E ( Y ) = b 0 + b 1 X + b 2 Z assigns means to breeds as follows: Breed X Z b 0 + b 1 X + b 2 Z Black 0 0 b 0 Welsh 1 0 b 0 + b 1 Cross 0 1 b 0 + b 2 b 0 : mean value for Blackface breed b 1 : di¸erence between Welsh and Blackface b 2 : di¸erence between Cross and Blackface Null hypothesis b 1 = b 2 = 0 is equivalent to ’no di¸erences among breeds’.

(10) Parameter estimates Parameter estimates: ^ b 0 = — ^ b W = — Y W ` — ^ b C = — Y C ` — Y B ; Y B ; Y B Fitted value ^ b 0 + ^ b W X + ^ b C Z is Y B = ^ — b 0 (Blackface observation) Y W = ^ — b 0 + ^ b W (Welsh observation) Y C = ^ — b 0 + ^ b C (Cross observation) Fitted value for an observation is the mean observation for its breed.

(11) Sums of squares Anova equation (for multiple regression): Y ) 2 = Y ) 2 + X ( Y ` — X (^ Y ` — X ( Y ` ^ Y ) 2 Regression sum of squares ( S B ) is the mean-corrected sum of squares of the three ˛tted values, weighted by the size of sample. Residual sum of squares ( S W ) is the sum of the mean-corrected sums of squares within each breed.

(12) Anova table With k groups, and a total of N = nk observations, regression and residual sums of squares have k ` 1 and N ` k d.f. Anova table is calculated in the usual way: Source DF SSQ MSQ F Between k ` 1 S B M B M B =M W Within N ` k S W M W M W (previously S 2 ) estimates the residual variance.

(13) The ’sheep’ data frame Breed Cu X Z Black 6.6 0 0 Black 8.1 0 0 Black 7.6 0 0 Black 6.9 0 0 Black 8.3 0 0 Welsh 10.4 1 0 Welsh 9.8 1 0 Welsh 11.0 1 0 Welsh 10.6 1 0 Welsh 9.2 1 0 Cross 7.0 0 1 Cross 9.3 0 1 Cross 8.4 0 1 Cross 7.6 0 1 Cross 9.7 0 1

(14) Using lm (with factor) library(sda) # if necessary fit <- lm(Cu ˜ Breed, data = sheep) anova(fit) Alternatively: fit <- aov(Cu ˜ Breed) summary(fit)

(15) ANOVA Anova table for the sheep data is Source DF SSQ MSQ F Between breeds 2 18.90 9.450 12.22 Within breeds 12 9.28 0.773 Total 14 28.18 F = 12 : 22 on 2 and 12 d.f. ( P < 0 : 01). Di¸erences between breeds are established beyond reasonable doubt.

END OF LECTURE

Lecture 17. Comparisons among means 2020

(16) ANOVA Anova table for the sheep data is Source DF SSQ MSQ F Between breeds 2 18.90 9.450 12.22 Within breeds 12 9.28 0.773 Total 14 28.18 F = 12 : 22 on 2 and 12 d.f. ( P < 0 : 01). Di¸erences between breeds are established beyond reasonable doubt.

(17) Comparing two means Mean values for each breed: Black Welsh Cross 7.5 10.2 8.4 Estimated standard error of a di¸erence between two means based on n 1 and n 2 observations is q M W (1 =n 1 + 1 =n 2 ) where M W is the within-group mean square. For the sheep data, M W = 0 : 773, n 1 = n 2 = 5, and standard error of di¸erence between any two breed means is 0.556.

(18) Output from summary Estimate Std.Error t.value (Intercept) 7.5000 0.3933 19.071 BreedCross 0.9000 0.5562 1.618 BreedWelsh 2.7000 0.5562 4.855

(19) Comparing two means Comparison Estimate SE t Welsh ` Blackface 2.7 0.556 4.86 Cross ` Blackface 0.9 0.556 1.62 Welsh ` Cross 1.8 0.556 3.24 (Upper 2.5% point of t on 12 d.f. is 2.179). A 95% con˛dence interval for Welsh ` Blackface is 2.7 ˚ 2.179 ˆ 0.556, or (1.49, 3.91).

(20) More general comparisons The ’contrast’ C = a 1 — Y 1 + a 2 — Y 2 + ´ ´ ´ + a k — Y k has estimated standard error v @ a 2 + a 2 + ´ ´ ´ + a 2 u 0 1 u 1 2 k t M W u A n 1 n 2 n k Under H 0 : E ( C ) = 0, the statistic obtained by dividing C by its estimated standard error has an t distn with N ` k degrees of freedom. Usually a 1 + a 2 + ´ ´ ´ + a k = 0.

(21) Example of a contrast A possible contrast of interest is 0 : 5 ˜ (— Y B + — Y W ) ` — Y C . Value of this contrast is 0 : 5 ˆ (7 : 5 + 10 : 2) ` 8 : 4 = 0 : 45, with estimated standard error q 0 : 773(1 = 20 + 1 = 20 + 1 = 5) = 0 : 48155. T statistic is 0 : 45 = 0 : 48155 = 0 : 93 with 12 d.f. Test result is not signi˛cant. A 95% interval estimate for the contrast is 0 : 45 ˚ 2 : 179 ˆ 0 : 48155, or ( ` 0 : 60 ; +1 : 50).

(22) Fruit ‚ies Bristle counts on 20 fruit ‚ies, ˛ve ‚ies in each of four genotype classes. A B C D 16 8 9 5 12 12 12 8 16 7 11 8 11 9 10 7 15 12 8 9 genotype class means A B C D (14.0) (9.6) (10.0) (7.4)

(23) Fruit ‚ies ANOVA of the fruit ‚y data shows highly signi˛cant di¸erences among genotype classes (P < 0.001). We could continue the analysis by producing a table of six pairwise comparisons, A versus B, etc, but this would not shed much light on the data. Additional information on the genotype classes allows a more informative analysis.

(24) Cy and Me mutations The key to a more informative analysis is knowing that genotype classes A { D are determined by presence (+) or absence ( ` ) of two mutations, Cy and Me. A B C D Cy ` ` + + Me ` + ` +

(25) Informative contrasts A B C D Cy ` ` + + Me ` + ` + Cy ˆ Me + ` ` + (C ` A) estimates the Cy e¸ect when Me is absent, (D ` B) estimates the Cy e¸ect when Me is present. The Cy contrast estimates the sum (or average) of these two conditional e¸ects. (C ` A+B ` D) estimates the di¸erence between the two conditional e¸ects (the interaction Cy ˆ Me).

(26) An interaction plot 14 mutation Me absent present number of bristles 12 10 8 absent present mutation Cy

(27) Anova with two factors Instead of setting up one factor (genotype) with four levels, set up two factors (Cy and Me), each with two levels (’present’, ’absent’). Anova based on bristles ‰ Cy + Me + Cy:Me has three single d.f. sums of squares, for the average Cy e¸ect, the average Me e¸ect, and the interaction. F ratios are the squares of t statistics obtained from contrasts, and the three sums of squares add up to the ’genotypes’ sum of squares with 3 d.f. Model formula can also be written bristles ‰ Cy * Me

END OF LECTURE

Lecture 18. The random e¸ects model 2020

(28) Random e¸ects model As previously we have k groups of size n , total number of observations N = nk . Random e¸ects model: Y i = m + U r + e i where r is the group which contains observation i . U 1 : : : U k and e 1 : : : e N are independent r.v.s normally distributed with zero mean and var( U ) = ff 2 var( e ) = ff 2 B ; W : Total variance of a single observation ff 2 B + ff 2 W is partitioned into components ff 2 B and ff 2 W .

(29) Expected mean squares Source DF MSQ E(MSQ) ff 2 W + nff 2 Between groups k ` 1 M B B ff 2 Within groups N ` k M W W F = M B =M W tests H 0 : ff 2 B = 0. Y estimates m , with variance ( ff 2 — W + nff 2 B ) =N . q Estimated standard error is E = M B =N . Interval estimate for m is — Y ˚ kE where k is an upper quantile of t with k ` 1 d.f.

Lecture 16. Linear model with a classifying factor 2020 (1) A - PowerPoint PPT Presentation

Lecture 16. Linear model with a classifying factor 2020 (1) A simple anova (2) Two breeds of sheep Black Welsh A measurement is taken on a 6.6 10.4 random sample of ve animals 8.1 9.8 from each of two breeds. Is 7.6 11.0 there

V0D 2016 Classifying Studies V0D V0D 2016 Classifying Studies 1 2016 Classifying Studies

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Triadic Factor Analysis Cynthia Glodeanu Institute of Algebra, TU Dresden October 19, 2010.

Classifying Homogeneous Structures Cherlin Introduction The finite case Gregory Cherlin

Session 03 Classical Linear Models Regression with factor variables Separate quadratic

Rating Factor 1 Review Rating Factor 1 Capacity of the Applicant 1 Rating Factor Review 2

Attribute Grammars intermediate syntax semantics representation Language Implementation 2

Certainty Factor certainty factor CF (is the certainty factor in the hypothesis H due to

(IHBG) Competitive NOFA Training Rating Factor 3: Soundness of Approach 1 Rating Factor 3

Predicting condition specific transcription factors for target gene. Kaur Alasoo 19.09.2012

Confirmatory Factor Analysis and Exploratory-Confirmatory Factor Analysis Maximum

Classifying local four gluon S-matrices Subham Dutta Chowdhury November 20, 2020 YITP Strings

Lecture #18: Support Vector Classifiers Data Science 1 CS 109A, STAT 121A, AC 209A, E-109A

Linear Factor Models Lecture slides for Chapter 13 of Deep Learning www.deeplearningbook.org Ian

Graphics 2014 Linear Algebra II Linear Maps & Matrices Linear Maps & Matrices CORE

Lecture 20: Support Vector Machines (SVMs) CS109A Introduction to Data Science Pavlos Protopapas

Why Are Some Herbicides Not Recommended By UGA? Eric P. Prostko and A. Stanley Culpepper

Geometriniu savybiu aptikimas atviroje mau molekuliu duomenu baz eje COD

Distributed mobility management more IPv4 blocks available) for Future Internet for Future

Lattice QCD Precision Science for Muon g-2 and EW Physics Kohtaroh Miura (GSI Helmholtz-Institut

Soil Classification Chapter 4 AASHTO 1 2/9/2015 AASHTO AASHTO High-plasticity CLAYS

ECONOMIC AND TECHNICAL ANALYSIS OF THE EUROPEAN SYSTEM WITH A HIGH RES SCENARIO Vera Silva,

Characteristic center of a tree Kamal Lochan Patra School of Mathematical Sciences National

LodeRunner Project Check out FilesAndExceptions from SVN Reading & writing files When the

Lecture 16. Linear model with a classifying factor 2020 (1) A - PowerPoint PPT Presentation

Lecture 16. Linear model with a classifying factor 2020 (1) A simple anova (2) Two breeds of sheep Black Welsh A measurement is taken on a 6.6 10.4 random sample of ve animals 8.1 9.8 from each of two breeds. Is 7.6 11.0 there

V0D 2016 Classifying Studies V0D V0D 2016 Classifying Studies 1 2016 Classifying Studies

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Triadic Factor Analysis Cynthia Glodeanu Institute of Algebra, TU Dresden October 19, 2010.

Classifying Homogeneous Structures Cherlin Introduction The finite case Gregory Cherlin

Session 03 Classical Linear Models Regression with factor variables Separate quadratic

Rating Factor 1 Review Rating Factor 1 Capacity of the Applicant 1 Rating Factor Review 2

Attribute Grammars intermediate syntax semantics representation Language Implementation 2

Certainty Factor certainty factor CF (is the certainty factor in the hypothesis H due to

(IHBG) Competitive NOFA Training Rating Factor 3: Soundness of Approach 1 Rating Factor 3

Predicting condition specific transcription factors for target gene. Kaur Alasoo 19.09.2012

Confirmatory Factor Analysis and Exploratory-Confirmatory Factor Analysis Maximum

Classifying local four gluon S-matrices Subham Dutta Chowdhury November 20, 2020 YITP Strings

Lecture #18: Support Vector Classifiers Data Science 1 CS 109A, STAT 121A, AC 209A, E-109A

Linear Factor Models Lecture slides for Chapter 13 of Deep Learning www.deeplearningbook.org Ian

Graphics 2014 Linear Algebra II Linear Maps &amp; Matrices Linear Maps &amp; Matrices CORE

Lecture 20: Support Vector Machines (SVMs) CS109A Introduction to Data Science Pavlos Protopapas

Why Are Some Herbicides Not Recommended By UGA? Eric P. Prostko and A. Stanley Culpepper

Geometriniu savybiu aptikimas atviroje mau molekuliu duomenu baz eje COD

Distributed mobility management more IPv4 blocks available) for Future Internet for Future

Lattice QCD Precision Science for Muon g-2 and EW Physics Kohtaroh Miura (GSI Helmholtz-Institut

Soil Classification Chapter 4 AASHTO 1 2/9/2015 AASHTO AASHTO High-plasticity CLAYS

ECONOMIC AND TECHNICAL ANALYSIS OF THE EUROPEAN SYSTEM WITH A HIGH RES SCENARIO Vera Silva,

Characteristic center of a tree Kamal Lochan Patra School of Mathematical Sciences National

LodeRunner Project Check out FilesAndExceptions from SVN Reading &amp; writing files When the

Graphics 2014 Linear Algebra II Linear Maps & Matrices Linear Maps & Matrices CORE

LodeRunner Project Check out FilesAndExceptions from SVN Reading & writing files When the