Overview Course 02402 Introduction to Statistics Running example: - - PowerPoint PPT Presentation

overview course 02402 introduction to statistics
SMART_READER_LITE
LIVE PREVIEW

Overview Course 02402 Introduction to Statistics Running example: - - PowerPoint PPT Presentation

Overview Course 02402 Introduction to Statistics Running example: Height and weight 1 Lecture 11: Regression Analysis (Chapter 11) Correlation 2 Regression Analysis (kap 11) 3 The Method of Least Squares 4 Per Bruun Brockhoff Inferences


slide-1
SLIDE 1

Course 02402 Introduction to Statistics Lecture 11: Regression Analysis (Chapter 11) Per Bruun Brockhoff

DTU Informatics Building 305 - room 110 Danish Technical University 2800 Lyngby – Denmark e-mail: pbb@imm.dtu.dk

Per Bruun Brockhoff (pbb@imm.dtu.dk) Introduction to Statistics, Lecture 11 Fall 2012 1 / 32

Overview

1

Running example: Height and weight

2

Correlation

3

Regression Analysis (kap 11)

4

The Method of Least Squares

5

Inferences for the Regression Model Inference for intercept and slope Confidence interval for the line Prediction Interval for the line

6

Correlation and Regression

7

R (R note 10)

Per Bruun Brockhoff (pbb@imm.dtu.dk) Introduction to Statistics, Lecture 11 Fall 2012 2 / 32 Running example: Height and weight

Height and weight of young men X = Height Y = Weight n = 10

Per Bruun Brockhoff (pbb@imm.dtu.dk) Introduction to Statistics, Lecture 11 Fall 2012 4 / 32 Correlation

Correlation The correlation coefficient r describes the strength of the linear relationship between the variables x and y The correlation coefficient between two variables x and y is estimated as r = 1 n − 1

n

  • i=1

xi − ¯ x sx yi − ¯ y sy

  • It is assumed that the data points (xi, yi) are values of

a pair of random variables. The following is valid r ∈ [−1 1]

Per Bruun Brockhoff (pbb@imm.dtu.dk) Introduction to Statistics, Lecture 11 Fall 2012 6 / 32

slide-2
SLIDE 2

Correlation

Correlations computations

Per Bruun Brockhoff (pbb@imm.dtu.dk) Introduction to Statistics, Lecture 11 Fall 2012 7 / 32 Regression Analysis (kap 11)

Regression Analysis (Chapter 11) We assume that Y is a stochastic variable. We are interested in modelling Y ’s dependency on an explanatory variable x We look at a linear relationship between Y and x, that is a regression model on the form Y = α + βx + ε

Per Bruun Brockhoff (pbb@imm.dtu.dk) Introduction to Statistics, Lecture 11 Fall 2012 9 / 32 Regression Analysis (kap 11)

Simple Linear Regression Y = α + βx

model

+ ε

  • residual

Y dependent variable x independent variable α intercept with Y-axis β slope ε residual (random error)

Per Bruun Brockhoff (pbb@imm.dtu.dk) Introduction to Statistics, Lecture 11 Fall 2012 10 / 32 Regression Analysis (kap 11)

Simple Linear Regression

* * * * * * * * *

Per Bruun Brockhoff (pbb@imm.dtu.dk) Introduction to Statistics, Lecture 11 Fall 2012 11 / 32

slide-3
SLIDE 3

The Method of Least Squares

The Method of Least Squares Assume we have the following observations

x 1 2 3 4 5 6 7 8 9 10 11 12 y 16 35 45 64 86 96 106 124 134 156 164 182

Is there a relationship between x and y? We propose a model on the form ˆ y = a + bx How do we estimate a and b?

Per Bruun Brockhoff (pbb@imm.dtu.dk) Introduction to Statistics, Lecture 11 Fall 2012 13 / 32 The Method of Least Squares

The Method of Least Squares

2 4 6 8 10 12 20 40 60 80 100 120 140 160 180 200 x Y Scatterplot of x vs. Y

Per Bruun Brockhoff (pbb@imm.dtu.dk) Introduction to Statistics, Lecture 11 Fall 2012 14 / 32 The Method of Least Squares

The Method of Least Squares

2 4 6 8 10 12 20 40 60 80 100 120 140 160 180 200 x Y Regression Model

Per Bruun Brockhoff (pbb@imm.dtu.dk) Introduction to Statistics, Lecture 11 Fall 2012 15 / 32 The Method of Least Squares

The Method of Least Squares We define Sxx =

n

  • i=1

(xi − ¯ x)2 Syy =

n

  • i=1

(yi − ¯ y)2 Sxy =

n

  • i=1

(xi − ¯ x)(yi − ¯ y)

Per Bruun Brockhoff (pbb@imm.dtu.dk) Introduction to Statistics, Lecture 11 Fall 2012 16 / 32

slide-4
SLIDE 4

The Method of Least Squares

The Method of Least Squares a and b are determined by b = Sxy Sxx a = ¯ y − b · ¯ x a and b are the values that give the regression line that minimizes the squared distance between the points and the line a is an estimate for α and b is an estimate for β

Per Bruun Brockhoff (pbb@imm.dtu.dk) Introduction to Statistics, Lecture 11 Fall 2012 17 / 32 The Method of Least Squares

The Method of Least Squares In the example we get Sxx =

n

  • i=1

(xi − ¯ x)2 = 143 Syy =

n

  • i=1

(yi − ¯ y)2 = 31533 Sxy =

n

  • i=1

(xi − ¯ x)(yi − ¯ y) = 2119 along with ¯ x = 6.50 and ¯ y = 100.67

Per Bruun Brockhoff (pbb@imm.dtu.dk) Introduction to Statistics, Lecture 11 Fall 2012 18 / 32 The Method of Least Squares

The Method of Least Squares Estimates for α and β: b = Sxy Sxx = 2119 143 = 14.82 a = ¯ y − b · ¯ x = 100.67 − 14.82 · 6.50 = 4.34 The model is: ˆ y = 4.34 + 14.82 · x

Per Bruun Brockhoff (pbb@imm.dtu.dk) Introduction to Statistics, Lecture 11 Fall 2012 19 / 32 Inferences for the Regression Model

Inferences for the Regression Model We assume that the observed data (Yi, xi) can be described by the model Yi = α + βxi + εi where it is assumed that εi are independent normally distributed stochastic variables with mean 0 and constant variance σ2 An estimate of σ2 is s2

e = Syy − (Sxy)2/Sxx

n − 2

Per Bruun Brockhoff (pbb@imm.dtu.dk) Introduction to Statistics, Lecture 11 Fall 2012 21 / 32

slide-5
SLIDE 5

Inferences for the Regression Model Inference for intercept and slope

Inference for intercept and slope We want to test the hypotheses about the intercept with the y-axis H0 : α = a H1 : α = a The test statistic is t = (a − α) se

  • nSxx

Sxx + n(¯ x)2 The critical value is found in the t-distribution, tα/2(n − 2)

Per Bruun Brockhoff (pbb@imm.dtu.dk) Introduction to Statistics, Lecture 11 Fall 2012 22 / 32 Inferences for the Regression Model Inference for intercept and slope

Inferences for intercept and slope We want to test a hypothesis about the slope β H0 : β = b H1 : β = b The test statistic is t = (b − β) se

  • Sxx

The critical value is found in the t-distribution, tα/2(n − 2)

Per Bruun Brockhoff (pbb@imm.dtu.dk) Introduction to Statistics, Lecture 11 Fall 2012 23 / 32 Inferences for the Regression Model Inference for intercept and slope

Confidence Intervals for α and β Confidence interval for α a ± tα/2 · se

  • 1

n + (¯ x)2 Sxx Confidence interval for β b ± tα/2 · se 1 √Sxx

Per Bruun Brockhoff (pbb@imm.dtu.dk) Introduction to Statistics, Lecture 11 Fall 2012 24 / 32 Inferences for the Regression Model Confidence interval for the line

Confidence Interval for α + βx0 A confidence interval for α + βx0 corresponds to a confidence interval at the point x0 (a + bx0) ± tα/2 · se

  • 1

n + (x0 − ¯ x)2 Sxx

Per Bruun Brockhoff (pbb@imm.dtu.dk) Introduction to Statistics, Lecture 11 Fall 2012 25 / 32

slide-6
SLIDE 6

Inferences for the Regression Model Prediction Interval for the line

Prediction Interval for α + βx0 A prediction interval for α + βx0 corresponds to a prediction interval for the model at the point x0 (a + bx0) ± tα/2 · se

  • 1 + 1

n + (x0 − ¯ x)2 Sxx The prediction interval will be bigger than the confidence interval for fixed α

Per Bruun Brockhoff (pbb@imm.dtu.dk) Introduction to Statistics, Lecture 11 Fall 2012 26 / 32 Correlation and Regression

Correlation and Regression Correlation coefficient and slope: r = √Sxx

  • Syy

b, r2 = Sxx Syy b2 The correlation r describes the strength a of linear relation. The correlation squared r2 expresses the proportion of the y variability explained by the linear relation. Syy = Variation explained by line +Unexplained variation Syy =

S2

xy

Sxx +

  • Syy −

S2

xy

Sxx

  • Per Bruun Brockhoff (pbb@imm.dtu.dk)

Introduction to Statistics, Lecture 11 Fall 2012 28 / 32 Correlation and Regression

Inference for Correlation Assumes that both y and x are stochastic (NOT only y) r is an estimate for ρ - the true linear relationship between y and x. Page 340-341 (7ed: 380-381): Formulae for hypothesis tests and confidence intervals for the correlation coefficient. ρ = 0 corresponds to β = 0 r = 0 corresponds to b = 0 Hypotheses test for ρ = 0 can be carried out by testing β = 0

Per Bruun Brockhoff (pbb@imm.dtu.dk) Introduction to Statistics, Lecture 11 Fall 2012 29 / 32 R (R note 10)

R (R note 10)

> fit.evap <- lm(evap ~ velocity) > summary(fit.evap) Call: lm(formula = evap ~ velocity) Residuals: Min 1Q Median 3Q Max

  • 0.201 -0.1467 0.05261 0.1232 0.1747

Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) 0.0692 0.1010 0.6857 0.5123 velocity 0.0038 0.0004 8.7460 0.0000 Residual standard error: 0.1591 on 8 degrees of freedom Multiple R-Squared: 0.9053 F-statistic: 76.49 on 1 and 8 degrees of freedom, the p-value is 2.286e-05

Per Bruun Brockhoff (pbb@imm.dtu.dk) Introduction to Statistics, Lecture 11 Fall 2012 31 / 32

slide-7
SLIDE 7

R (R note 10)

Oversigt

1

Running example: Height and weight

2

Correlation

3

Regression Analysis (kap 11)

4

The Method of Least Squares

5

Inferences for the Regression Model Inference for intercept and slope Confidence interval for the line Prediction Interval for the line

6

Correlation and Regression

7

R (R note 10)

Per Bruun Brockhoff (pbb@imm.dtu.dk) Introduction to Statistics, Lecture 11 Fall 2012 32 / 32