PS 405 Week 4 Section: Difference of means, ANOVA, and Matrix - - PowerPoint PPT Presentation

▶

Jun 24, 2023 473 likes •694 views

PS 405 Week 4 Section: Difference of means, ANOVA, and Matrix Algebra D.J. Flynn February 4, 2014 t-tests for equality of two sample means hypotheses: H 0 : no difference in sample means H A : significant difference calculating

SLIDE 1

PS 405 – Week 4 Section: Difference of means, ANOVA, and Matrix Algebra

D.J. Flynn February 4, 2014

SLIDE 2

t-tests

◮ for equality of two sample means ◮ hypotheses:

H0: no difference in sample means HA: significant difference

◮ calculating the t-stat:

t = statistic - hypothesized difference SE of estimate

SLIDE 3

Gender/partisanship example

Question: are men and women equally likely to be Democrats? t-stat for difference in proportions:

t = (Pm − Pf)

Pm(1−Pm)

+ Pf(1−PF)

p-value that R estimates is for null of no difference; confidence interval is for difference between two sample means Interpretation of p-value: “if null hypothesis is true, how ofen would we observe a difference this large under repeated sampling?” – NOT there is a p% chance that the true difference is equal to X.

SLIDE 4

Logic of ANOVA and the F test

◮ running theme: experiments with > 2 groups ◮ does assignment to a particular group (X) affect some

continuous outcome (Y)?

◮ this question can be answered with one-way ANOVA (AKA

F-test)

◮ two sources of variation in DV:

◮ intended: independent variable/factor ◮ unintended: error/residual

◮ goal of ANOVA: determine share of variance explained by X

SLIDE 5

ANOVA table

◮ Go through table quickly ◮ F statistic (sometimes called F-act):

F = explained variance unexplained variance = MSA MSE

◮ look up critical F-stat based on numerator df, denominator df,

and confidence level

◮ if F-act > F-critical, then we reject the null of independence

SLIDE 6

ANOVA in R

1. identify independent and dependent variables
2. determine variable structures (and change if necessary)
3. estimate ANOVA and call up results

SLIDE 7

Determining variable structure

◮ str(variable) returns the structure of a variable: integer,

factor, character, number, logical

◮ important because ANOVAs are used for categorical IVs ◮ practice:

install.packages("datasets") library(datasets) names(chickwts) str(chickwts$weight) str(chickwts$feed) levels(chickwts$feed)

SLIDE 8

Estimating ANOVAs in R

anova<-aov(weight∼feed,data=chickwts) summary(anova) Df Sum Sq Mean Sq F value Pr(>F) feed 5 231129 46226 15.37 5.94e-10 *** Residuals 65 195556 3009

Signif. codes:

0 ’’ 0.001 ’’ 0.01 ’’ 0.05 ’.’ 0.1 ’ ’

SLIDE 9

What happens if we instead estimate aov(feed∼weight)? wrong.model<-aov(feed∼weight,data=chickwts) Warning messages: 1: In model.response(mf, "numeric") : using type = "numeric" with a factor response will be ignored 2: In Ops.factor(y, z$residuals) :

not meaningful

for factors

SLIDE 10

Another example

We have data on which undergraduate institution people attended and mid-life satisfaction (0-100): names(my.data) [1] "school" "satisfaction" table(my.data$school) school fsu uf um 5 5 5 my.anova<-aov(satisfaction∼school,data=my.data) summary(my.anova) Df Sum Sq Mean Sq F value Pr(>F) school 2 7216 3608 11.85 0.00144 ** Residuals 12 3655 305

Signif. codes:

0 ’’ 0.001 ’’ 0.01 ’’ 0.05 ’.’ 0.1 ’ ’

SLIDE 11

fsu<-subset(my.data,school=="fsu") uf<-subset(my.data,school=="uf") um<-subset(my.data,school=="um") mean(fsu$satisfaction) [1] 92.6 mean(uf$satisfaction) [1] 39.2 mean(um$satisfaction) [1] 60.8

SLIDE 12

Changing variable structure

◮ Current structure:

is.factor is.numeric is.character is.vector... will return TRUE or FALSE

◮ New structure:

as.factor as.numeric as.character as.vector... will change object to desired structure

SLIDE 13

Generalizations of the one-way ANOVA

1. two-way ANOVA: if we have more than 1 explanatory factor

(e.g., soil type + type of potato = potato yield)

2. ANCOVA: ANOVA with a continuous covariate (e.g., soil type +

type of potato + weather = potato yield)

SLIDE 14

Example of two-way ANOVA in R

Does income depend on type of profession and education? library(car) names(Prestige) [1] "education" "income" "women" "prestige" "census" "type" str(Prestige$education) num [1:102] 13.1 12.3 12.8 11.4 14.6 ... str(Prestige$type) Factor w/ 3 levels "bc","prof","wc": 2 2 2 2 2 2 2 ...

SLIDE 15

summary(Prestige$education)

Min. 1st Qu.

Median Mean 3rd Qu. Max. 6.380 8.445 10.540 10.740 12.650 15.970 Prestige$education.recoded<-recode(Prestige$education, "6.38:8.445=1;8.446:10.54=2;10.55:10.74=3;10.75:12.65=4; 12.66:15.97=5;else=NA") table(Prestige$education.recoded) 1 2 3 4 5 26 25 2 23 26 as.factor(Prestige$education.recoded) [1] 5 4 5 4 5 5 5 5 5 5 4 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 4 [37] 4 3 4 2 4 4 2 2 2 4 4 4 4 2 4 2 2 2 4 4 4 2 4 1 2 3 2 [73] 1 1 1 2 2 1 1 1 2 2 2 1 1 2 1 2 2 1 1 1 1 1 1 4 2 1 1 Levels: 1 2 3 4 5

SLIDE 16

my.two.way<-aov(income∼type+education.recoded, data=Prestige) summary(my.two.way) Df Sum Sq Mean Sq F value Pr(>F) type 2 5.960e+08 297978078 25.266 1.65e-09 ** education.recoded 1 2.952e+07 29520188 2.503 0.117 Residuals 94 1.109e+09 11793647

Signif. codes:

0 ’’ 0.001 ’’ 0.01 ’’ 0.05 ’.’ 0.1 ’ ’ 4 observations deleted due to missingness

SLIDE 17

Matrix algebra terms

◮ scalar ◮ vector ◮ matrix

SLIDE 18

Matrix algebra operations

◮ addition ◮ subtraction ◮ multiplication ◮ inverse ◮ transpose

SLIDE 19

Why we care: the linear model

◮ Scalar form:

Yi = β0 + β1X1i + β2X2i + ...βKXKi + ǫi

◮ Matrix form:

Yi = Xiβ + ǫi Benefits of matrix form:

1. more parsimonious expression of models with lots of

covariates

2. understand what’s going on behind the scenes. For example,

the parameter β is estimated by calculating (XTX)−1XTy

PS 405 – Week 4 Section: Difference of means, ANOVA, and Matrix Algebra

D.J. Flynn February 4, 2014

t-tests

H0: no difference in sample means HA: significant difference

t = statistic - hypothesized difference SE of estimate

Gender/partisanship example

Question: are men and women equally likely to be Democrats? t-stat for difference in proportions:

t = (Pm − Pf)

+ Pf(1−PF)

Logic of ANOVA and the F test

continuous outcome (Y)?

F-test)

ANOVA table

F = explained variance unexplained variance = MSA MSE

and confidence level

ANOVA in R

Determining variable structure

factor, character, number, logical

install.packages("datasets") library(datasets) names(chickwts) str(chickwts$weight) str(chickwts$feed) levels(chickwts$feed)

Estimating ANOVAs in R

anova<-aov(weight∼feed,data=chickwts) summary(anova) Df Sum Sq Mean Sq F value Pr(>F) feed 5 231129 46226 15.37 5.94e-10 *** Residuals 65 195556 3009

0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’

What happens if we instead estimate aov(feed∼weight)? wrong.model<-aov(feed∼weight,data=chickwts) Warning messages: 1: In model.response(mf, "numeric") : using type = "numeric" with a factor response will be ignored 2: In Ops.factor(y, z$residuals) :

for factors

Another example

0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’

fsu<-subset(my.data,school=="fsu") uf<-subset(my.data,school=="uf") um<-subset(my.data,school=="um") mean(fsu$satisfaction) [1] 92.6 mean(uf$satisfaction) [1] 39.2 mean(um$satisfaction) [1] 60.8

Changing variable structure

is.factor is.numeric is.character is.vector... will return TRUE or FALSE

as.factor as.numeric as.character as.vector... will change object to desired structure

Generalizations of the one-way ANOVA

(e.g., soil type + type of potato = potato yield)

type of potato + weather = potato yield)

Example of two-way ANOVA in R

Does income depend on type of profession and education? library(car) names(Prestige) [1] "education" "income" "women" "prestige" "census" "type" str(Prestige$education) num [1:102] 13.1 12.3 12.8 11.4 14.6 ... str(Prestige$type) Factor w/ 3 levels "bc","prof","wc": 2 2 2 2 2 2 2 ...

summary(Prestige$education)

my.two.way<-aov(income∼type+education.recoded, data=Prestige) summary(my.two.way) Df Sum Sq Mean Sq F value Pr(>F) type 2 5.960e+08 297978078 25.266 1.65e-09 ** education.recoded 1 2.952e+07 29520188 2.503 0.117 Residuals 94 1.109e+09 11793647

0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 4 observations deleted due to missingness

Matrix algebra terms

Matrix algebra operations

Why we care: the linear model

Yi = β0 + β1X1i + β2X2i + ...βKXKi + ǫi

Yi = Xiβ + ǫi Benefits of matrix form:

covariates

the parameter β is estimated by calculating (XTX)−1XTy

This is the linear model in matrix form:

Yi = Xiβ + ǫi

For each term in this equation...

0 ’’ 0.001 ’’ 0.01 ’’ 0.05 ’.’ 0.1 ’ ’

0 ’’ 0.001 ’’ 0.01 ’’ 0.05 ’.’ 0.1 ’ ’

0 ’’ 0.001 ’’ 0.01 ’’ 0.05 ’.’ 0.1 ’ ’ 4 observations deleted due to missingness