Correlation and Regression Lecture 5 Objectives Outline: - - PowerPoint PPT Presentation
Correlation and Regression Lecture 5 Objectives Outline: - - PowerPoint PPT Presentation
Correlation and Regression Lecture 5 Objectives Outline: Situations when our explanatory variable is more continuous than categorical By actively following the lecture and practical and carrying out the independent study the successful
Objectives
Outline: Situations when our explanatory variable is more continuous than categorical By actively following the lecture and practical and carrying out the independent study the successful student will be able to:
- Explain the principles of correlation and of regression
- Apply (appropriately), interpret and evaluate the legitimacy of,
both in R
- Summarise and illustrate with appropriate R figures test results
scientifically
Choosing tests…
Test of differences
Explanatory variables: discrete Response variable: continuous Test: t-tests, anova
Test of relationship
Explanatory variables: continuous Response variable: continuous Test: regression/correlation
Correlation and Regression
Related but DIFFERENT techniques Correlation – association Regression – predictive relationship Linear (always do a scatterplot first!)
Correlation
- Linear association
- No cause and effect
- Axes could be swapped
Regression
- Linear relationship
- Cause and effect (explanatory
and response)
- Axes cannot be swapped
Correlation vs regression
Correlation coefficients
- Measures how strong an association is between two variables.
- Several types of correlation coefficient
- Most commonly used parametric CC = Pearson’s Product
Moment
■
Denoted by ‘r’
■
Ranges from -1 to +1
Types of correlation
Highest scores on one axis associated with highest scores on other
r ≈ 1 r ≈ -1
Lowest scores on one axis associated with highest scores on other
Types of correlation
Types of correlation
No correlation!
r ≈ 0
No LINEAR correlation!
r ≈ 0
How is r calculated?
x = variable 1 y = variable 2 n = sample size
Correlation: example
20 leaf samples: 10 compounds measured Investigate correlation between c1 and c2 ALWAYS DO A SCATTER PLOT
Correlation: example
Investigate normality Not great but we will carry on for demonstration purposes
Correlation: Example
Running and interpreting the test
cor.test(comp$c2, comp$c1, method = " pearson") data: comp$c2 and comp$c1 t = 2.3132, df = 18, p-value = 0.03274 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 0.04589555 0.76018365 sample estimates: cor 0.4786942
The default method = c("pearson", "kendall", "spearman") r could be between 0.046 and 0.76 A wide margin, but positive Correlation coefficient, r t-test of whether r is different from zero
Correlation: Example
Reporting the result data: comp$c2 and comp$c1 t = 2.3132, df = 18, p-value = 0.03274 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 0.04589555 0.76018365 sample estimates: cor 0.4786942
There was a significant positive correlation between compounds 1 and 2 (r = 0.48; t = 2.31; d.f. = 18; p = 0.033). Significance Direction Statistics
Correlation: Example
Understanding the significance test A t-test and works in the same way as all t-tests. Problem: Sensitive to sample size: Big n -> small s.e -> big t -> small p
Correlation: Example Consider both the p value and the r value
Correlation - nonparametric alternative
Not sure about normality? Spearman’s Rank Correlation
Running and interpreting the test
cor.test(comp$c2, comp$c1, method = "spearman") Spearman's rank correlation rho data: comp$c2 and comp$c1 S = 664, p-value = 0.02605 alternative hypothesis: true rho is not equal to 0 sample estimates: rho 0.5007519
Regression
Prediction One variable (the explanatory) causes the other (the response) Develops a best fitting straight line: ! = b1" + b0 b1 = gradient b0 = y intercept
Regression
! = b1" + b0 !i - the observed y for "i ŷi - the predicted y for "i Residual: !i - ŷi Best fitting: minimising sum of squared residuals
Regression
Null hypothesis can be expressed as:
- b1 = 0
- x does not explain y
- Regression line doesn’t explain variance in y
But all mean the same
Regression: example
Concentration of juvenile hormone (JH) and mandible length in stag beetles ALWAYS DO A SCATTER PLOT
Regression: Example
Running the test
mod <- lm(data = stag, mand ~ jh) summary(mod) response ~ explanatory
Interpreting the test
Regression: Example
Summary statistics about residuals
Interpreting the test
b0 (y intercept) b1 (gradient)
Regression: Example
t-test of b0 = 0 Often not impt t-test of b1 = 0 Always of interest
Test of ‘model’ Same as t-test
- f b1 = 0
for single regression
Interpreting the test
Regression: Example
Proportion of y explained by x
Regression: Example
Reporting the result
The concentration of juvenile hormone explained a significant amount of the variation (0.54) in stag beetle mandible length (F = 16.6; d.f. = 1,4; p = 0.00113). The regression line is mandible length = (0.032*Jhconc) +0.419. Significance Direction Statistics
Regression: Example
Illustrating result Include the data and the line (the model)
ggplot(data = reg,aes(x = QTL, y = pheno)) + geom_point(size=2) + xlim(0,15) + ylim(0,40) + xlab("Number of QTL") + ylab("Percentage of phenotype") + geom_smooth(method = "lm", se = FALSE) + theme_bw()
Regression: Example
30
plot(mod) Not that useful here
- small dataset
Spread should be similar in each group: equal variance Should be approx 1:1 for normality
Checking assumptions after running the regression Use the residuals
Regression: Example
Predicting from the model
Summary
Both
- Linear
- Scatterplot first
Correlation
- No cause and effect
- Axes could be swapped
- Do not line
- Quote r and its test
Regression
- Explanatory and response
- Axes cannot be swapped
- Include line
- Quote model test and line
and possibly r2
32
Objectives
Outline: Situation when our explanatory variable is more continuous than categorical By actively following the lecture and practical and carrying out the independent study the successful student will be able to:
- Explain the principles of correlation and of regression
- Apply (appropriately), interpret and evaluate the legitimacy of,
both in R
- Summarise and illustrate with appropriate R figures test results
scientifically