Model diagnostics Treatment Comparisons Multiple comparisons
Practical Considerations for ANOVA Applied Statistics and - - PowerPoint PPT Presentation
Practical Considerations for ANOVA Applied Statistics and - - PowerPoint PPT Presentation
Model diagnostics Treatment Comparisons Multiple comparisons Practical Considerations for ANOVA Applied Statistics and Experimental Design Chapter 5 Peter Hoff Statistics, Biostatistics and the CSSS University of Washington Model
Model diagnostics Treatment Comparisons Multiple comparisons
Model diagnostics
Our model is yij = µj + ǫij. We have shown that, if A1: {ǫij}’s are independent; A2: Var[ǫij] = σ2 for all j; A3: {ǫij}’s are normally distributed. then F = MST/MSE ∼ Fm−1,N−m,λ where λ is the noncentrality parameter. If in addition H0: µi = µ for all i = 1, . . . , m. then the noncentrality parameter is zero and F = MST/MSE ∼ Fm−1,N−m. We make these assumptions to
- do power calculations when designing a study,
- test hypotheses after having gathered the data, and
- make confidence intervals comparing the different treatments.
What should we do if the model assumptions are not correct?
Model diagnostics Treatment Comparisons Multiple comparisons
Model diagnostics
Our model is yij = µj + ǫij. We have shown that, if A1: {ǫij}’s are independent; A2: Var[ǫij] = σ2 for all j; A3: {ǫij}’s are normally distributed. then F = MST/MSE ∼ Fm−1,N−m,λ where λ is the noncentrality parameter. If in addition H0: µi = µ for all i = 1, . . . , m. then the noncentrality parameter is zero and F = MST/MSE ∼ Fm−1,N−m. We make these assumptions to
- do power calculations when designing a study,
- test hypotheses after having gathered the data, and
- make confidence intervals comparing the different treatments.
What should we do if the model assumptions are not correct?
Model diagnostics Treatment Comparisons Multiple comparisons
Model diagnostics
Our model is yij = µj + ǫij. We have shown that, if A1: {ǫij}’s are independent; A2: Var[ǫij] = σ2 for all j; A3: {ǫij}’s are normally distributed. then F = MST/MSE ∼ Fm−1,N−m,λ where λ is the noncentrality parameter. If in addition H0: µi = µ for all i = 1, . . . , m. then the noncentrality parameter is zero and F = MST/MSE ∼ Fm−1,N−m. We make these assumptions to
- do power calculations when designing a study,
- test hypotheses after having gathered the data, and
- make confidence intervals comparing the different treatments.
What should we do if the model assumptions are not correct?
Model diagnostics Treatment Comparisons Multiple comparisons
Model diagnostics
Our model is yij = µj + ǫij. We have shown that, if A1: {ǫij}’s are independent; A2: Var[ǫij] = σ2 for all j; A3: {ǫij}’s are normally distributed. then F = MST/MSE ∼ Fm−1,N−m,λ where λ is the noncentrality parameter. If in addition H0: µi = µ for all i = 1, . . . , m. then the noncentrality parameter is zero and F = MST/MSE ∼ Fm−1,N−m. We make these assumptions to
- do power calculations when designing a study,
- test hypotheses after having gathered the data, and
- make confidence intervals comparing the different treatments.
What should we do if the model assumptions are not correct?
Model diagnostics Treatment Comparisons Multiple comparisons
Model diagnostics
Our model is yij = µj + ǫij. We have shown that, if A1: {ǫij}’s are independent; A2: Var[ǫij] = σ2 for all j; A3: {ǫij}’s are normally distributed. then F = MST/MSE ∼ Fm−1,N−m,λ where λ is the noncentrality parameter. If in addition H0: µi = µ for all i = 1, . . . , m. then the noncentrality parameter is zero and F = MST/MSE ∼ Fm−1,N−m. We make these assumptions to
- do power calculations when designing a study,
- test hypotheses after having gathered the data, and
- make confidence intervals comparing the different treatments.
What should we do if the model assumptions are not correct?
Model diagnostics Treatment Comparisons Multiple comparisons
Model diagnostics
Our model is yij = µj + ǫij. We have shown that, if A1: {ǫij}’s are independent; A2: Var[ǫij] = σ2 for all j; A3: {ǫij}’s are normally distributed. then F = MST/MSE ∼ Fm−1,N−m,λ where λ is the noncentrality parameter. If in addition H0: µi = µ for all i = 1, . . . , m. then the noncentrality parameter is zero and F = MST/MSE ∼ Fm−1,N−m. We make these assumptions to
- do power calculations when designing a study,
- test hypotheses after having gathered the data, and
- make confidence intervals comparing the different treatments.
What should we do if the model assumptions are not correct?
Model diagnostics Treatment Comparisons Multiple comparisons
Model diagnostics
Our model is yij = µj + ǫij. We have shown that, if A1: {ǫij}’s are independent; A2: Var[ǫij] = σ2 for all j; A3: {ǫij}’s are normally distributed. then F = MST/MSE ∼ Fm−1,N−m,λ where λ is the noncentrality parameter. If in addition H0: µi = µ for all i = 1, . . . , m. then the noncentrality parameter is zero and F = MST/MSE ∼ Fm−1,N−m. We make these assumptions to
- do power calculations when designing a study,
- test hypotheses after having gathered the data, and
- make confidence intervals comparing the different treatments.
What should we do if the model assumptions are not correct?
Model diagnostics Treatment Comparisons Multiple comparisons
Model diagnostics
Our model is yij = µj + ǫij. We have shown that, if A1: {ǫij}’s are independent; A2: Var[ǫij] = σ2 for all j; A3: {ǫij}’s are normally distributed. then F = MST/MSE ∼ Fm−1,N−m,λ where λ is the noncentrality parameter. If in addition H0: µi = µ for all i = 1, . . . , m. then the noncentrality parameter is zero and F = MST/MSE ∼ Fm−1,N−m. We make these assumptions to
- do power calculations when designing a study,
- test hypotheses after having gathered the data, and
- make confidence intervals comparing the different treatments.
What should we do if the model assumptions are not correct?
Model diagnostics Treatment Comparisons Multiple comparisons
Model diagnostics
Our model is yij = µj + ǫij. We have shown that, if A1: {ǫij}’s are independent; A2: Var[ǫij] = σ2 for all j; A3: {ǫij}’s are normally distributed. then F = MST/MSE ∼ Fm−1,N−m,λ where λ is the noncentrality parameter. If in addition H0: µi = µ for all i = 1, . . . , m. then the noncentrality parameter is zero and F = MST/MSE ∼ Fm−1,N−m. We make these assumptions to
- do power calculations when designing a study,
- test hypotheses after having gathered the data, and
- make confidence intervals comparing the different treatments.
What should we do if the model assumptions are not correct?
Model diagnostics Treatment Comparisons Multiple comparisons
Model diagnostics
Our model is yij = µj + ǫij. We have shown that, if A1: {ǫij}’s are independent; A2: Var[ǫij] = σ2 for all j; A3: {ǫij}’s are normally distributed. then F = MST/MSE ∼ Fm−1,N−m,λ where λ is the noncentrality parameter. If in addition H0: µi = µ for all i = 1, . . . , m. then the noncentrality parameter is zero and F = MST/MSE ∼ Fm−1,N−m. We make these assumptions to
- do power calculations when designing a study,
- test hypotheses after having gathered the data, and
- make confidence intervals comparing the different treatments.
What should we do if the model assumptions are not correct?
Model diagnostics Treatment Comparisons Multiple comparisons
Residual analysis
These assumptions could be more compactly written as A0 : {ǫij} ∼ i.i.d. normal(0, σ2) However, some assumptions are more important than others. Statistical folklore says the order of importance is A1, A2 then A3. We will discuss A1 in Chapter 6. For now we will talk about A2 and A3.
Model diagnostics Treatment Comparisons Multiple comparisons
Residual analysis
These assumptions could be more compactly written as A0 : {ǫij} ∼ i.i.d. normal(0, σ2) However, some assumptions are more important than others. Statistical folklore says the order of importance is A1, A2 then A3. We will discuss A1 in Chapter 6. For now we will talk about A2 and A3.
Model diagnostics Treatment Comparisons Multiple comparisons
Residual analysis
These assumptions could be more compactly written as A0 : {ǫij} ∼ i.i.d. normal(0, σ2) However, some assumptions are more important than others. Statistical folklore says the order of importance is A1, A2 then A3. We will discuss A1 in Chapter 6. For now we will talk about A2 and A3.
Model diagnostics Treatment Comparisons Multiple comparisons
Residual analysis
These assumptions could be more compactly written as A0 : {ǫij} ∼ i.i.d. normal(0, σ2) However, some assumptions are more important than others. Statistical folklore says the order of importance is A1, A2 then A3. We will discuss A1 in Chapter 6. For now we will talk about A2 and A3.
Model diagnostics Treatment Comparisons Multiple comparisons
Detecting violations with residuals
Violations of assumptions can be checked via residual analysis. Parameter estimates: yij = ¯ y·· + (¯ yi· − ¯ y··) + (yij − ¯ yi·) = ˆ µ + ˆ τi + ˆ ǫij Our fitted value for any observation in group i is ˆ yij = ˆ µ + ˆ τi = ˆ yi· Our estimate of the error is ˆ ǫij = yij − ¯ yi·. ˆ ǫij is called the residual for observation i, j. Assumptions about ǫij can be checked by examining the values of ˆ ǫij’s:
Model diagnostics Treatment Comparisons Multiple comparisons
Detecting violations with residuals
Violations of assumptions can be checked via residual analysis. Parameter estimates: yij = ¯ y·· + (¯ yi· − ¯ y··) + (yij − ¯ yi·) = ˆ µ + ˆ τi + ˆ ǫij Our fitted value for any observation in group i is ˆ yij = ˆ µ + ˆ τi = ˆ yi· Our estimate of the error is ˆ ǫij = yij − ¯ yi·. ˆ ǫij is called the residual for observation i, j. Assumptions about ǫij can be checked by examining the values of ˆ ǫij’s:
Model diagnostics Treatment Comparisons Multiple comparisons
Detecting violations with residuals
Violations of assumptions can be checked via residual analysis. Parameter estimates: yij = ¯ y·· + (¯ yi· − ¯ y··) + (yij − ¯ yi·) = ˆ µ + ˆ τi + ˆ ǫij Our fitted value for any observation in group i is ˆ yij = ˆ µ + ˆ τi = ˆ yi· Our estimate of the error is ˆ ǫij = yij − ¯ yi·. ˆ ǫij is called the residual for observation i, j. Assumptions about ǫij can be checked by examining the values of ˆ ǫij’s:
Model diagnostics Treatment Comparisons Multiple comparisons
Detecting violations with residuals
Violations of assumptions can be checked via residual analysis. Parameter estimates: yij = ¯ y·· + (¯ yi· − ¯ y··) + (yij − ¯ yi·) = ˆ µ + ˆ τi + ˆ ǫij Our fitted value for any observation in group i is ˆ yij = ˆ µ + ˆ τi = ˆ yi· Our estimate of the error is ˆ ǫij = yij − ¯ yi·. ˆ ǫij is called the residual for observation i, j. Assumptions about ǫij can be checked by examining the values of ˆ ǫij’s:
Model diagnostics Treatment Comparisons Multiple comparisons
Detecting violations with residuals
Violations of assumptions can be checked via residual analysis. Parameter estimates: yij = ¯ y·· + (¯ yi· − ¯ y··) + (yij − ¯ yi·) = ˆ µ + ˆ τi + ˆ ǫij Our fitted value for any observation in group i is ˆ yij = ˆ µ + ˆ τi = ˆ yi· Our estimate of the error is ˆ ǫij = yij − ¯ yi·. ˆ ǫij is called the residual for observation i, j. Assumptions about ǫij can be checked by examining the values of ˆ ǫij’s:
Model diagnostics Treatment Comparisons Multiple comparisons
Checking normality assumptions:
Two standard graphical ways of assessing normality are with the following:
- Histogram: Make a histogram of ˆ
ǫij’s.
- This should look approximately bell-shaped if the population is really
normal and there are enough observations.
- If there are enough observations, graphically compare the histogram to a
N(0, s2) distribution.
- In small samples, the histograms need not look particularly bell-shaped.
- Normal probability, or qq-plot: If ǫij ∼ N(0, σ2) then the ordered residuals
(ˆ ǫ(1), . . . , ˆ ǫ(mn)) should correspond linearly with quantiles of a standard normal distribution.
Model diagnostics Treatment Comparisons Multiple comparisons
Checking normality assumptions:
Two standard graphical ways of assessing normality are with the following:
- Histogram: Make a histogram of ˆ
ǫij’s.
- This should look approximately bell-shaped if the population is really
normal and there are enough observations.
- If there are enough observations, graphically compare the histogram to a
N(0, s2) distribution.
- In small samples, the histograms need not look particularly bell-shaped.
- Normal probability, or qq-plot: If ǫij ∼ N(0, σ2) then the ordered residuals
(ˆ ǫ(1), . . . , ˆ ǫ(mn)) should correspond linearly with quantiles of a standard normal distribution.
Model diagnostics Treatment Comparisons Multiple comparisons
Checking normality assumptions:
Two standard graphical ways of assessing normality are with the following:
- Histogram: Make a histogram of ˆ
ǫij’s.
- This should look approximately bell-shaped if the population is really
normal and there are enough observations.
- If there are enough observations, graphically compare the histogram to a
N(0, s2) distribution.
- In small samples, the histograms need not look particularly bell-shaped.
- Normal probability, or qq-plot: If ǫij ∼ N(0, σ2) then the ordered residuals
(ˆ ǫ(1), . . . , ˆ ǫ(mn)) should correspond linearly with quantiles of a standard normal distribution.
Model diagnostics Treatment Comparisons Multiple comparisons
Checking normality assumptions:
Two standard graphical ways of assessing normality are with the following:
- Histogram: Make a histogram of ˆ
ǫij’s.
- This should look approximately bell-shaped if the population is really
normal and there are enough observations.
- If there are enough observations, graphically compare the histogram to a
N(0, s2) distribution.
- In small samples, the histograms need not look particularly bell-shaped.
- Normal probability, or qq-plot: If ǫij ∼ N(0, σ2) then the ordered residuals
(ˆ ǫ(1), . . . , ˆ ǫ(mn)) should correspond linearly with quantiles of a standard normal distribution.
Model diagnostics Treatment Comparisons Multiple comparisons
Checking normality assumptions:
Two standard graphical ways of assessing normality are with the following:
- Histogram: Make a histogram of ˆ
ǫij’s.
- This should look approximately bell-shaped if the population is really
normal and there are enough observations.
- If there are enough observations, graphically compare the histogram to a
N(0, s2) distribution.
- In small samples, the histograms need not look particularly bell-shaped.
- Normal probability, or qq-plot: If ǫij ∼ N(0, σ2) then the ordered residuals
(ˆ ǫ(1), . . . , ˆ ǫ(mn)) should correspond linearly with quantiles of a standard normal distribution.
Model diagnostics Treatment Comparisons Multiple comparisons
Checking normality assumptions:
Two standard graphical ways of assessing normality are with the following:
- Histogram: Make a histogram of ˆ
ǫij’s.
- This should look approximately bell-shaped if the population is really
normal and there are enough observations.
- If there are enough observations, graphically compare the histogram to a
N(0, s2) distribution.
- In small samples, the histograms need not look particularly bell-shaped.
- Normal probability, or qq-plot: If ǫij ∼ N(0, σ2) then the ordered residuals
(ˆ ǫ(1), . . . , ˆ ǫ(mn)) should correspond linearly with quantiles of a standard normal distribution.
Model diagnostics Treatment Comparisons Multiple comparisons
Checking normality assumptions:
y Density −2 −1 1 2 0.0 0.1 0.2 0.3 0.4 0.5
- ●
- −2
−1 1 2 −2 −1 1 Theoretical Quantiles Sample Quantiles y Density −2 −1 1 2 0.0 0.1 0.2 0.3 0.4 0.5
- −2
−1 1 2 −2 −1 1 Theoretical Quantiles Sample Quantiles y Density −2 −1 1 2 0.0 0.1 0.2 0.3 0.4 0.5
- −2
−1 1 2 −2 −1 1 2 Theoretical Quantiles Sample Quantiles
How non-normal can a sample from a normal population look? You can always check yourself by simulating data in R.
Model diagnostics Treatment Comparisons Multiple comparisons
Example (Hermit Crab Data):
Is there variability in hermit crab population across six different coastline sites? Counts were made in 25 randomly sampled transects in each of the six sites. Data: yij = count total in transect j of site i. Model: Yij = µ + τi + ǫij Note that the data are counts so they cannot be exactly normally distributed. Data description site sample mean sample median sample std dev 1 33.80 17 50.39 2 68.72 10 125.35 3 50.64 5 107.44 4 9.24 2 17.39 5 10.00 2 19.84 6 12.64 4 23.01
Model diagnostics Treatment Comparisons Multiple comparisons
Example (Hermit Crab Data):
Is there variability in hermit crab population across six different coastline sites? Counts were made in 25 randomly sampled transects in each of the six sites. Data: yij = count total in transect j of site i. Model: Yij = µ + τi + ǫij Note that the data are counts so they cannot be exactly normally distributed. Data description site sample mean sample median sample std dev 1 33.80 17 50.39 2 68.72 10 125.35 3 50.64 5 107.44 4 9.24 2 17.39 5 10.00 2 19.84 6 12.64 4 23.01
Model diagnostics Treatment Comparisons Multiple comparisons
Example (Hermit Crab Data):
Is there variability in hermit crab population across six different coastline sites? Counts were made in 25 randomly sampled transects in each of the six sites. Data: yij = count total in transect j of site i. Model: Yij = µ + τi + ǫij Note that the data are counts so they cannot be exactly normally distributed. Data description site sample mean sample median sample std dev 1 33.80 17 50.39 2 68.72 10 125.35 3 50.64 5 107.44 4 9.24 2 17.39 5 10.00 2 19.84 6 12.64 4 23.01
Model diagnostics Treatment Comparisons Multiple comparisons
Example (Hermit Crab Data):
Is there variability in hermit crab population across six different coastline sites? Counts were made in 25 randomly sampled transects in each of the six sites. Data: yij = count total in transect j of site i. Model: Yij = µ + τi + ǫij Note that the data are counts so they cannot be exactly normally distributed. Data description site sample mean sample median sample std dev 1 33.80 17 50.39 2 68.72 10 125.35 3 50.64 5 107.44 4 9.24 2 17.39 5 10.00 2 19.84 6 12.64 4 23.01
Model diagnostics Treatment Comparisons Multiple comparisons
Example (Hermit Crab Data):
Is there variability in hermit crab population across six different coastline sites? Counts were made in 25 randomly sampled transects in each of the six sites. Data: yij = count total in transect j of site i. Model: Yij = µ + τi + ǫij Note that the data are counts so they cannot be exactly normally distributed. Data description site sample mean sample median sample std dev 1 33.80 17 50.39 2 68.72 10 125.35 3 50.64 5 107.44 4 9.24 2 17.39 5 10.00 2 19.84 6 12.64 4 23.01
Model diagnostics Treatment Comparisons Multiple comparisons
Example (Hermit Crab Data):
Is there variability in hermit crab population across six different coastline sites? Counts were made in 25 randomly sampled transects in each of the six sites. Data: yij = count total in transect j of site i. Model: Yij = µ + τi + ǫij Note that the data are counts so they cannot be exactly normally distributed. Data description site sample mean sample median sample std dev 1 33.80 17 50.39 2 68.72 10 125.35 3 50.64 5 107.44 4 9.24 2 17.39 5 10.00 2 19.84 6 12.64 4 23.01
Model diagnostics Treatment Comparisons Multiple comparisons
Data description
population Density 100 200 300 400 0.000 0.010 0.020 site1 population Density 100 200 300 400 0.000 0.004 0.008 0.012 site2 population Density 100 200 300 400 0.000 0.005 0.010 0.015 site3 population Density 100 200 300 400 0.00 0.04 0.08 0.12 site4 population Density 100 200 300 400 0.00 0.02 0.04 0.06 site5 population Density 100 200 300 400 0.00 0.02 0.04 0.06 site6
Model diagnostics Treatment Comparisons Multiple comparisons
ANOVA for crab data
> anova ( lm ( crab [ , 2 ] ˜ as . f a c t o r ( crab [ , 1 ] ) ) ) A n a l y s i s
- f
Variance Table Response : crab [ , 2] Df Sum Sq Mean Sq F v a l u e Pr(>F) as . f a c t o r ( crab [ , 1 ] ) 5 76695 15339 2.9669 0.01401 ∗ R e s i d u a l s 144 744493 5170
Model diagnostics Treatment Comparisons Multiple comparisons
Residual diagnostics
ˆ ǫij = yij − ˆ µi = yij − ¯ yi·
residuals −100 100 200 300 400 0.000 0.004 0.008 0.012
- ●
- ●
- −2