Correlation and Regression Lecture 5 Objectives Outline: - - PowerPoint PPT Presentation

correlation and regression
SMART_READER_LITE
LIVE PREVIEW

Correlation and Regression Lecture 5 Objectives Outline: - - PowerPoint PPT Presentation

Correlation and Regression Lecture 5 Objectives Outline: Situations when our explanatory variable is more continuous than categorical By actively following the lecture and practical and carrying out the independent study the successful


slide-1
SLIDE 1

Correlation and Regression

Lecture 5

slide-2
SLIDE 2

Objectives

Outline: Situations when our explanatory variable is more continuous than categorical By actively following the lecture and practical and carrying out the independent study the successful student will be able to:

  • Explain the principles of correlation and of regression
  • Apply (appropriately), interpret and evaluate the legitimacy of,

both in R

  • Summarise and illustrate with appropriate R figures test results

scientifically

slide-3
SLIDE 3

Choosing tests…

Test of differences

Explanatory variables: discrete Response variable: continuous Test: t-tests, anova

Test of relationship

Explanatory variables: continuous Response variable: continuous Test: regression/correlation

slide-4
SLIDE 4

Correlation and Regression

Related but DIFFERENT techniques Correlation – association Regression – predictive relationship Linear (always do a scatterplot first!)

slide-5
SLIDE 5

Correlation

  • Linear association
  • No cause and effect
  • Axes could be swapped

Regression

  • Linear relationship
  • Cause and effect (explanatory

and response)

  • Axes cannot be swapped

Correlation vs regression

slide-6
SLIDE 6

Correlation coefficients

  • Measures how strong an association is between two variables.
  • Several types of correlation coefficient
  • Most commonly used parametric CC = Pearson’s Product

Moment

Denoted by ‘r’

Ranges from -1 to +1

slide-7
SLIDE 7

Types of correlation

Highest scores on one axis associated with highest scores on other

r ≈ 1 r ≈ -1

Lowest scores on one axis associated with highest scores on other

slide-8
SLIDE 8

Types of correlation

slide-9
SLIDE 9

Types of correlation

No correlation!

r ≈ 0

No LINEAR correlation!

r ≈ 0

slide-10
SLIDE 10

How is r calculated?

x = variable 1 y = variable 2 n = sample size

slide-11
SLIDE 11

Correlation: example

20 leaf samples: 10 compounds measured Investigate correlation between c1 and c2 ALWAYS DO A SCATTER PLOT

slide-12
SLIDE 12

Correlation: example

Investigate normality Not great but we will carry on for demonstration purposes

slide-13
SLIDE 13

Correlation: Example

Running and interpreting the test

cor.test(comp$c2, comp$c1, method = " pearson") data: comp$c2 and comp$c1 t = 2.3132, df = 18, p-value = 0.03274 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 0.04589555 0.76018365 sample estimates: cor 0.4786942

The default method = c("pearson", "kendall", "spearman") r could be between 0.046 and 0.76 A wide margin, but positive Correlation coefficient, r t-test of whether r is different from zero

slide-14
SLIDE 14

Correlation: Example

Reporting the result data: comp$c2 and comp$c1 t = 2.3132, df = 18, p-value = 0.03274 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 0.04589555 0.76018365 sample estimates: cor 0.4786942

There was a significant positive correlation between compounds 1 and 2 (r = 0.48; t = 2.31; d.f. = 18; p = 0.033). Significance Direction Statistics

slide-15
SLIDE 15

Correlation: Example

Understanding the significance test A t-test and works in the same way as all t-tests. Problem: Sensitive to sample size: Big n -> small s.e -> big t -> small p

slide-16
SLIDE 16

Correlation: Example Consider both the p value and the r value

slide-17
SLIDE 17

Correlation - nonparametric alternative

Not sure about normality? Spearman’s Rank Correlation

Running and interpreting the test

cor.test(comp$c2, comp$c1, method = "spearman") Spearman's rank correlation rho data: comp$c2 and comp$c1 S = 664, p-value = 0.02605 alternative hypothesis: true rho is not equal to 0 sample estimates: rho 0.5007519

slide-18
SLIDE 18
slide-19
SLIDE 19

Regression

Prediction One variable (the explanatory) causes the other (the response) Develops a best fitting straight line: ! = b1" + b0 b1 = gradient b0 = y intercept

slide-20
SLIDE 20

Regression

! = b1" + b0 !i - the observed y for "i ŷi - the predicted y for "i Residual: !i - ŷi Best fitting: minimising sum of squared residuals

slide-21
SLIDE 21

Regression

Null hypothesis can be expressed as:

  • b1 = 0
  • x does not explain y
  • Regression line doesn’t explain variance in y

But all mean the same

slide-22
SLIDE 22

Regression: example

Concentration of juvenile hormone (JH) and mandible length in stag beetles ALWAYS DO A SCATTER PLOT

slide-23
SLIDE 23

Regression: Example

Running the test

mod <- lm(data = stag, mand ~ jh) summary(mod) response ~ explanatory

slide-24
SLIDE 24

Interpreting the test

Regression: Example

Summary statistics about residuals

slide-25
SLIDE 25

Interpreting the test

b0 (y intercept) b1 (gradient)

Regression: Example

t-test of b0 = 0 Often not impt t-test of b1 = 0 Always of interest

Test of ‘model’ Same as t-test

  • f b1 = 0

for single regression

slide-26
SLIDE 26

Interpreting the test

Regression: Example

Proportion of y explained by x

slide-27
SLIDE 27

Regression: Example

Reporting the result

The concentration of juvenile hormone explained a significant amount of the variation (0.54) in stag beetle mandible length (F = 16.6; d.f. = 1,4; p = 0.00113). The regression line is mandible length = (0.032*Jhconc) +0.419. Significance Direction Statistics

slide-28
SLIDE 28

Regression: Example

Illustrating result Include the data and the line (the model)

ggplot(data = reg,aes(x = QTL, y = pheno)) + geom_point(size=2) + xlim(0,15) + ylim(0,40) + xlab("Number of QTL") + ylab("Percentage of phenotype") + geom_smooth(method = "lm", se = FALSE) + theme_bw()

slide-29
SLIDE 29

Regression: Example

30

plot(mod) Not that useful here

  • small dataset

Spread should be similar in each group: equal variance Should be approx 1:1 for normality

Checking assumptions after running the regression Use the residuals

slide-30
SLIDE 30

Regression: Example

Predicting from the model

slide-31
SLIDE 31

Summary

Both

  • Linear
  • Scatterplot first

Correlation

  • No cause and effect
  • Axes could be swapped
  • Do not line
  • Quote r and its test

Regression

  • Explanatory and response
  • Axes cannot be swapped
  • Include line
  • Quote model test and line

and possibly r2

32

slide-32
SLIDE 32

Objectives

Outline: Situation when our explanatory variable is more continuous than categorical By actively following the lecture and practical and carrying out the independent study the successful student will be able to:

  • Explain the principles of correlation and of regression
  • Apply (appropriately), interpret and evaluate the legitimacy of,

both in R

  • Summarise and illustrate with appropriate R figures test results

scientifically