Marcel Dettling Institute fr Datenanalyse und Prozessdesign Zrcher - - PowerPoint PPT Presentation

marcel dettling
SMART_READER_LITE
LIVE PREVIEW

Marcel Dettling Institute fr Datenanalyse und Prozessdesign Zrcher - - PowerPoint PPT Presentation

Applied Statistical Regression HS 2011 Week 03 Marcel Dettling Institute fr Datenanalyse und Prozessdesign Zrcher Hochschule fr Angewandte Wissenschaften marcel.dettling@zhaw.ch http://stat.ethz.ch/~dettling ETH Zrich, October 10,


slide-1
SLIDE 1

1

Marcel Dettling, Zurich University of Applied Sciences

Applied Statistical Regression

HS 2011 – Week 03

Marcel Dettling

Institute für Datenanalyse und Prozessdesign Zürcher Hochschule für Angewandte Wissenschaften

marcel.dettling@zhaw.ch http://stat.ethz.ch/~dettling

ETH Zürich, October 10, 2011

slide-2
SLIDE 2

2

Marcel Dettling, Zurich University of Applied Sciences

Applied Statistical Regression

HS 2011 – Week 03

Simple Linear Regression

Example: In India, it was observed that alkaline soil hampers plant

  • growth. This gave rise to a search for tree species which show

high tolerance against these conditions. An outdoor trial was performed, where 120 trees of a particular species were planted on a big field with considerable soil pH- value variation. After 3 years of growth, every trees height was measured. Additionally, the pH-value of the soil in the vicinity of each tree was determined and recorded.

slide-3
SLIDE 3

3

Marcel Dettling, Zurich University of Applied Sciences

Applied Statistical Regression

HS 2011 – Week 03

Scatterplot: Tree Height vs. pH-value

7.5 8.0 8.5 2 3 4 5 6 7 phvalue height

Tree Height vs. pH-Value

slide-4
SLIDE 4

4

Marcel Dettling, Zurich University of Applied Sciences

Applied Statistical Regression

HS 2011 – Week 03

Systematic Relation

What is a good description

  • f the systematic relation

between pH-value and tree height? 1) a line connecting all the data points?

7.5 8.0 8.5 2 3 4 5 6 7 phvalue height

Tree Height vs. pH-Value

slide-5
SLIDE 5

5

Marcel Dettling, Zurich University of Applied Sciences

Applied Statistical Regression

HS 2011 – Week 03

Systematic Relation

What is a good description

  • f the systematic relation

between pH-value and tree height? 1) a line connecting all the data points? 2) a smooth line that tries to follow the data?

7.5 8.0 8.5 2 3 4 5 6 7 phvalue height

Tree Height vs. pH-Value

slide-6
SLIDE 6

6

Marcel Dettling, Zurich University of Applied Sciences

Applied Statistical Regression

HS 2011 – Week 03

Systematic Relation

What is a good description

  • f the systematic relation

between pH-value and tree height? 1) a line connecting all the data points? 2) a smooth line that tries to follow the data? 3) a straight line?

7.5 8.0 8.5 2 3 4 5 6 7 phvalue height

Tree Height vs. pH-Value

slide-7
SLIDE 7

7

Marcel Dettling, Zurich University of Applied Sciences

Applied Statistical Regression

HS 2011 – Week 03

Simple Linear Regression

The higher the pH-value, the smaller the trees tend to be. The relation seems to be linear, which is of course also the mathe- matically most simple way of describing the relation. , resp. Name/meaning of the two "Intercept" parameters in the equation: "Slope" Fitting a straight line into a 2-dimensional scatter plot is known as simple linear regression. This is because:

  • there is just one single predictor variable ("simple").
  • the relation is linear in the parameters ("linear").

( ) f x x    

1 (

) height pH value      

 

1

 

slide-8
SLIDE 8

8

Marcel Dettling, Zurich University of Applied Sciences

Applied Statistical Regression

HS 2011 – Week 03

Model, Data & Random Errors

No we are bringing the data into play. The regression line will not run through all the data points. Thus, there are random errors: , for all Meaning of variables/parameters: is the response variable (height) of observation . is the predictor variable (pH-value) of observation . are the regression coefficients. They are unknown previously, and need to be estimated from the data. is the residual or error, i.e. the random difference bet- ween observation and regression line.

i i i

y x E     

i

x

i

y

i

E

1

,   1,..., i n  i i

slide-9
SLIDE 9

9

Marcel Dettling, Zurich University of Applied Sciences

Applied Statistical Regression

HS 2011 – Week 03

Least Squares Fitting

 http://hspm.sph.sc.edu/courses/J716/demos/LeastSquares/LeastSquaresDemo.html

We need to fit a straight line that fits the data well. Many possible solutions exist, some are good, some are worse. Our paradigm is to fit the line such that the squared errors are minimal.

slide-10
SLIDE 10

10

Marcel Dettling, Zurich University of Applied Sciences

Applied Statistical Regression

HS 2011 – Week 03

Least Squares: Mathematics

The paradigm in verbatim... Given a set of data points , the goal is to fit the regression line such that the sum of squared differences between observed value and regression line is minimal. The function measures, how well the regression line, defined by , fits the data. The goal is to minimize the function. Solution:  see next slide...

2 2 2 1 1 1 1

ˆ ( , ) ( ) ( ( )) min!

n n n i i i i i i i i

Q r y y y x    

  

      

  

1,...,

( , )

i i i n

x y

 1

,  

i

y

slide-11
SLIDE 11

11

Marcel Dettling, Zurich University of Applied Sciences

Applied Statistical Regression

HS 2011 – Week 03

Solution Idea: Partial Derivatives

  • We are taking partial derivatives on the function

with respect to both arguments and . As we are after the minimum of the function, we set them to zero: and

  • This results in a linear equation system, which (here) has two

unknowns , but also two equations. These are also known under the name normal equations.

  • The solution for

can be written explicitly as a function of the data pairs , see next slide...

1

( , ) Q  

1

 Q    

1

Q    

1

,  

1,...,

( , )

i i i n

x y

 1

,  

slide-12
SLIDE 12

12

Marcel Dettling, Zurich University of Applied Sciences

Applied Statistical Regression

HS 2011 – Week 03

Least Squares: Solution

According to the least squares paradigm, the best fitting regression line is, i.e. the optimal coefficients are: und

  • For a given set of data points

, we can determine the solution with a pocket calculator (...or better, with R).

  • The solution for our example "Tree Height":

 lm(height ~ phvalue, data=treeheight)

1 1 2 1

( )( ) ˆ ( )

n i i i n i i

x x y y x x 

 

   

 

1,...,

( , )

i i i n

x y

1

ˆ ˆ y x    

1

ˆ ˆ 3.003, 28.723     

slide-13
SLIDE 13

13

Marcel Dettling, Zurich University of Applied Sciences

Applied Statistical Regression

HS 2011 – Week 03

Least Squares Regression Line

7.5 8.0 8.5 2 3 4 5 6 7 pH-Value Tree Height

Tree Height vs. pH-Value

slide-14
SLIDE 14

14

Applied Statistical Regression

HS 2011 – Week 03

Is This a Good Model for Predicting the Tree Height from the Soil pH-Value?

a) Beyond the range of observed data Unknown, but most likely not... b) Within the range of observed data Yes, under the following conditions:

  • the relation is in truth a straight line, i.e.
  • the scatter of the errors is constant, i.e.
  • the data are uncorrelated (from a representative sample)
  • the errors are approximately normally distributed

 Fodder for thougt: irrigation, shaded corners...?

Marcel Dettling, Zurich University of Applied Sciences

[ ]

i

E E 

2

( )

i

Var E  

slide-15
SLIDE 15

15

Applied Statistical Regression

HS 2011 – Week 03

Model Diagnostics

For assessing the quality of the regression line, we need to (at least roughly) check whether the assumptions are met: and can be reviewed by:

[ ]

i

E E 

2

( )

i

Var E  

7.5 8.0 8.5

  • 4
  • 2

2 4 pH-Value Residuals

Residuals vs. pH-Value

2 3 4 5 6 7

  • 4
  • 2

2 4 Fitted Values Residuals

Residuals vs. Fitted Values

slide-16
SLIDE 16

16

Applied Statistical Regression

HS 2011 – Week 03

Model Diagnostics

For assessing the quality of the regression line, we need to (at least roughly) check whether the assumptions are met: Gaussian distribution can be reviewed by:

  • 2
  • 1

1 2

  • 3
  • 2
  • 1

1 2

Normal Plot

Quantiles of the Gaussian Distribution Residuals

We will revisit model diagnostics again later in this course, where it will be discussed more deeply. "Residuals vs. Fitted" and the "Normal Plot" will always stay at the heart of model diagnostics.

slide-17
SLIDE 17

17

Applied Statistical Regression

HS 2011 – Week 03

Why Least Squares?

History... Within a few years (1801, 1805), the method was developed independently by Gauss and Legendre. Both were after solving applied problems in astronomy...

Source:  http://de.wikipedia.org/wiki/Methode_der_kleinsten_Quadrate

Carl Friedrich Gauss

Marcel Dettling, Zurich University of Applied Sciences

Adrien-Marie Legendre

slide-18
SLIDE 18

18

Applied Statistical Regression

HS 2011 – Week 03

Why Least Squares?

Mathematics...

  • Least Squares is simple in the sense that the solution is

known in closed form as a function of .

  • The line runs through the center of gravity
  • The sum of residuals adds up to zero:
  • Some deeper mathematical optimality can be shown when

analyzing the large sample properties of the estimates This is especially true under the assumption of normally distributed errors .

Marcel Dettling, Zurich University of Applied Sciences

1,...,

( , )

i i i n

x y

( , ) x y

1 n i i

r

1

ˆ ˆ ,  

i

E

slide-19
SLIDE 19

19

Marcel Dettling, Zurich University of Applied Sciences

Applied Statistical Regression

HS 2011 – Week 03

Gauss-Markov-Theorem

Mathematical optimality result for the Least Squares line. It only holds if the following conditions are met:

  • the relation is in truth a straight line, i.e.
  • the scatter of the errors is constant, i.e.
  • the errors are uncorrelated, i.e.

Not yet required:

  • the errors are normally distributed:

Gauss-Markov-Theorem:

  • Least Squares yields the best linear unbiased estimates

[ ]

i

E E 

2

( )

i

Var E   ( , ) 0,

i j

Cov E E if i j  

2

~ (0, )

i E

E N 

slide-20
SLIDE 20

20

Marcel Dettling, Zurich University of Applied Sciences

Applied Statistical Regression

HS 2011 – Week 03

Properties of the Least Square Estimates

Under the conditions above, the estimates are unbiased: and The variances of the estimates are as follows: and Precise estimates are obtained with:

  • a large number of observations
  • a good scatter in the predictor
  • an informative/useful predictor, making

small ˆ [ ] E   

1 1

ˆ [ ] E   

2 2 1

1 ˆ ( ) ( )

E n i i

x Var n x x  

           

2 1 2 1

ˆ ( ) ( )

E n i i

Var x x  

 

n

i

x

2 E

slide-21
SLIDE 21

21

Marcel Dettling, Zurich University of Applied Sciences

Applied Statistical Regression

HS 2011 – Week 03

: How Useful is the Regression Line?

Intuitively: the smaller the yellow range is compared to the blue blue one, the more useful the model is.

7.5 8.0 8.5 2 3 4 5 6 7 pH-Wert Baumhoehe

Baumhoehe vs. pH-Wert

2

R

slide-22
SLIDE 22

22

Marcel Dettling, Zurich University of Applied Sciences

Applied Statistical Regression

HS 2011 – Week 03

Coefficient of Determination

The "predictivity" of a regression line can be measured with , named Coefficient of Determination. It is the relation between the yellow and blue range, and tells the portion of variation that is explained by the regression line: What is a good value for ? In observational studies, a value

  • f 0.6 can mostly be considered as good. There are no formal

criteria for judging this, however...

2

R

2 2 1 2 1

ˆ ( ) 1 [0,1] ( )

n i i i n i i

y y R y y

 

    

 

2

R

slide-23
SLIDE 23

23

Marcel Dettling, Zurich University of Applied Sciences

Applied Statistical Regression

HS 2011 – Week 03

Inference on the Slope

Aim: is the relation between response and predictor significant?  Here, we require: , i.i.d. We are then testing the following null hypothesis:  vs. Test statistic: 

2

~ (0, )

i E

E N 

1

: H  

1

:

A

H  

1 1 1 2 2 2 1 1

ˆ ˆ ˆ [ ] ~ ˆ ( ) ˆ ( )

n n E i i

E T t Var x x     

 

    

slide-24
SLIDE 24

24

Marcel Dettling, Zurich University of Applied Sciences

Applied Statistical Regression

HS 2011 – Week 03

Estimating the Error Variance

Besides the regression coefficients, we also need to estimate the error variance. We require it for doing inference on the estimated parameters. The estimate is based on the residual sum of squares (abbreviation: RSS), in particular:

  • this is (almost) the “usual” variance estimator!
  • be careful, the R output shows , and not

2 2 1

1 ˆ ˆ ( ) 2

n E i i i

y y n 

   

ˆE 

2

ˆE 

slide-25
SLIDE 25

25

Marcel Dettling, Zurich University of Applied Sciences

Applied Statistical Regression

HS 2011 – Week 03

Output of Statistical Software Packages

> summary(fit) Call: lm(formula = height ~ phvalue, data = treeheight) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 28.7227 2.2395 12.82 <2e-16 *** phvalue

  • 3.0034 0.2844 -10.56 <2e-16 ***

Residual stand. err.: 1.008 on 121 degrees of freedom Multiple R-squared: 0.4797, Adjusted R-squared: 0.4754 F-statistic: 111.5 on 1 and 121 DF, p-value: < 2.2e-16

slide-26
SLIDE 26

26

Marcel Dettling, Zurich University of Applied Sciences

Applied Statistical Regression

HS 2011 – Week 03

Prediction

The regression line can now be used for predicting the target value at an arbitrary (new) value. We simply do as follows: Example: For a pH-value of 8.0, we expect a tree height of A word of caution: Doing interpolation is usually fine, but extrapolation (i.e. giving the tree height for pH-value 5.0) is generally “dangerous”.

* * 1

ˆ ˆ ˆ y x    

28.7227 ( 3.0034 8.0) 4.4955    

slide-27
SLIDE 27

27

Marcel Dettling, Zurich University of Applied Sciences

Applied Statistical Regression

HS 2011 – Week 03

Confidence and Prediction Intervals

95% confidence interval: this is for the fitted value! 95% prediction interval: this is for future observations!

* 2 * 1 0.975; 2 2 1

1 ( ) ˆ ˆ ˆ ( )

n n i i

x x x t n x x

  

 

      

* 2 * 1 0.975; 2 2 1

1 ( ) ˆ ˆ ˆ 1 ( )

n n i i

x x x t n x x

  

 

       

slide-28
SLIDE 28

28

Marcel Dettling, Zurich University of Applied Sciences

Applied Statistical Regression

HS 2011 – Week 03

Confidence and Prediction Intervals

7.5 8.0 8.5 2 4 6 8 pH-Wert Baumhoehe

Baumhoehe vs. pH-Wert

slide-29
SLIDE 29

29

Marcel Dettling, Zurich University of Applied Sciences

Applied Statistical Regression

HS 2011 – Week 03

Regression: More Than Straight Lines

5 10 15 20 4 6 8 10 12

Curvilinear Relation

xx yy

A straight line is not a good fit at all in this problem. Simple linear regression is still appro- priate, though!

slide-30
SLIDE 30

30

Marcel Dettling, Zurich University of Applied Sciences

Applied Statistical Regression

HS 2011 – Week 03

Regression: More Than Straight Lines

6 8 10 12

  • 2
  • 1

1 Fitted Values Residuals

Model Diagnostics: Residuals vs. Fitted

Constant scatter: but non-zero expectation, i.e.

2

( )

i

Var E   [ ]

i

E E 

slide-31
SLIDE 31

31

Marcel Dettling, Zurich University of Applied Sciences

Applied Statistical Regression

HS 2011 – Week 03

Regression: More Than Straight Lines

5 10 15 20 4 6 8 10 12

Curvilinear Relation

xx yy

1 ln( ) i i i

Y x E      

This is a simple linear regression model, just use .

ln( )

i i

x x  

slide-32
SLIDE 32

32

Marcel Dettling, Zurich University of Applied Sciences

Applied Statistical Regression

HS 2011 – Week 03

Curvilinear Fitting

All models such as:

  • Are simple linear regression models. There is only one single

predictor, and the relation is linear in the parameters. None of these models fits a straight line in the scatterplot, these are all curvilinear relations – linear regression is very versatile!

1 ln( ) i i i

Y x E      

1 i i

Y x E      

1 1 i i

Y x E  

   

slide-33
SLIDE 33

1000 2000 3000 4000 5000 100 200 300 400 income infant

Infant vs. Income

33

Marcel Dettling, Zurich University of Applied Sciences

Applied Statistical Regression

HS 2011 – Week 03

Infant Mortality Rate vs. Income

Does a curvilinear fit such as solve the regression problem here?

1 1 i i

Y x E  

   

slide-34
SLIDE 34

34

Marcel Dettling, Zurich University of Applied Sciences

Applied Statistical Regression

HS 2011 – Week 03

Infant Mortality Rate vs. Income

1000 2000 3000 4000 5000 100 200 300 400 income infant

Infant vs. Income

slide-35
SLIDE 35

35

Marcel Dettling, Zurich University of Applied Sciences

Applied Statistical Regression

HS 2011 – Week 03

Infant Mortality Rate vs. Income

The problem with the previous fit is that the the power is not correct. Adjusting it by hand is very laborious. And if we try a model such as we do no longer have a linear problem in the parameters. And so, least squares fitting cannot be applied. Yet there is a simple, but very powerful trick that often helps: can be linearized, see the blackboard...

1

x

2

1 i i i

Y x E

     

1

i i i

Y x E

   

slide-36
SLIDE 36

36

Marcel Dettling, Zurich University of Applied Sciences

Applied Statistical Regression

HS 2011 – Week 03

Log-Transformation Helps!

4 5 6 7 8 3 4 5 6 log.income log.infant

log(infant) vs. log(income)

slide-37
SLIDE 37

37

Marcel Dettling, Zurich University of Applied Sciences

Applied Statistical Regression

HS 2011 – Week 03

Infant Mortality Rate vs. Income

The problem with the previous fit is that the the power is not correct. Adjusting it by hand is very laborious. And if we try a model such as we do no longer have a linear problem in the parameters. And so, least squares fitting cannot be applied. Yet there is a simple, but very powerful trick that often helps: can be linearized, see the blackboard...

1

x

2

1 i i i

Y x E

     

1

i i i

Y x E

   

slide-38
SLIDE 38

38

Marcel Dettling, Zurich University of Applied Sciences

Applied Statistical Regression

HS 2011 – Week 03

Daily Cost in Rehabilitation

10 20 30 40 50 500 1500 2500 ADL Daily Cost

Daily Cost in Rehab vs. ADL

400 600 800 1000

  • 500

500 1500 Fitted values Residuals Residuals vs Fitted

1427 379 823

Residuals vs. Fitted Values

slide-39
SLIDE 39

39

Marcel Dettling, Zurich University of Applied Sciences

Applied Statistical Regression

HS 2011 – Week 03

Logged Response Model

We transform the response variable and try to explain it using a linear model with our previous predictors: In the original scale, we can write the logged response model using the same predictors:  Multiplicative model  , and thus, has a lognormal distribution

1

log( ) Y Y x E       

1

exp( ) exp( ) Y x E     

2

~ (0, )

E

E N 

exp( ) E

slide-40
SLIDE 40

40

Marcel Dettling, Zurich University of Applied Sciences

Applied Statistical Regression

HS 2011 – Week 03

Also This Transformation Works!

5.8 6.0 6.2 6.4 6.6 6.8 7.0

  • 2
  • 1

1 Fitted values Residuals Residuals vs Fitted

49 682 936

  • 3
  • 2
  • 1

1 2 3

  • 4
  • 2

2 4 Theoretical Quantiles Standardized residuals Normal Q-Q

49 682 936

slide-41
SLIDE 41

41

Marcel Dettling, Zurich University of Applied Sciences

Applied Statistical Regression

HS 2011 – Week 03

Dealing with Zero Response

  • Logged response model is only applicable when the response

is strictly positive…

  • What if there are some cases with

?

  • never omit these
  • additive shifting is possible
  • How to additively shift?
  • usual choice: c=1
  • not good, because effect is scale-dependent

 Shift with the value of the smallest positive observation!

Y 

slide-42
SLIDE 42

42

Marcel Dettling, Zurich University of Applied Sciences

Applied Statistical Regression

HS 2011 – Week 03

Back Transforming the Fitted Values

  • In principle, we can „simply back transform“
  • This is an estimate for the median, but not the mean!
  • If unbiased estimation is required, then use:
  • Confidence/prediction intervals are not problematic

ˆ ˆ exp( ) y y 

2

ˆ ˆ ˆ exp 2

E

y y           [ , ] l u [exp( ),exp( )] l u

slide-43
SLIDE 43

43

Marcel Dettling, Zurich University of Applied Sciences

Applied Statistical Regression

HS 2011 – Week 03

Back Transforming: Example

10 20 30 40 50 500 1000 2000 3000 ADL Daily Cost

Daily Cost in Rehabilitation vs. ADL-Score

slide-44
SLIDE 44

44

Marcel Dettling, Zurich University of Applied Sciences

Applied Statistical Regression

HS 2011 – Week 03

Interpretation of the Coefficients

Important: there is no back transformation for the coefficients to the original scale, but still a good interpretation An increase by one unit in would multiply the fitted value in the

  • riginal scale with

.  Coefficients are interpreted multiplicatively!

1 1 1 1

ˆ ˆ ˆ ˆ log( ) ... ˆ ˆ ˆ ˆ exp( )exp( )...exp( )

p p p p

y x x y x x           

1

x

1

ˆ exp( ) 

slide-45
SLIDE 45

45

Marcel Dettling, Zurich University of Applied Sciences

Applied Statistical Regression

HS 2011 – Week 03

First-Aid Transformations

These are intendend to stabilize the variance First-Aid Transformations:  do always apply these (if no practical reasons against it)  to both response and predictors Absolute values and concentrations: log-transformation: Count data: square-root transformation: Proportions: arcsine transformation:

log( ) y y  

y y  

 

1

sin y y

 