bus 701 advanced statistics
play

Bus 701: Advanced Statistics Harald Schmidbauer c Harald - PowerPoint PPT Presentation

Bus 701: Advanced Statistics Harald Schmidbauer c Harald Schmidbauer & Angi R osch, 2007 13.1 Simple Linear Regression: Goals Goals of Simple Linear Regression. Once again, given are points ( x i , y i ) , from a bivariate metric


  1. Bus 701: Advanced Statistics Harald Schmidbauer c � Harald Schmidbauer & Angi R¨ osch, 2007

  2. 13.1 Simple Linear Regression: Goals Goals of Simple Linear Regression. Once again, given are points ( x i , y i ) , from a bivariate metric variable ( X, Y ) . How can we establish a functional relationship between X and Y ? Most importantly: • Which straight line is “good”? — What does “good” mean? • How can the parameters of a “good” line be computed? c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 2/35

  3. 13.1 Simple Linear Regression: Goals Goals of Simple Linear Regression. Why would we want to fit a line to a cloud of points? • In order to quantify the relationship between X and Y , using a simple model. • In order to forecast Y for a given value of X . c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 3/35

  4. 13.2 The Regression Line Finding a “good” line. . . ● ● ● ● ● ● ● ● ● ● ● y ● ● ● ● ● ● ● ● ● x . . . and how can we find a “good” line? — A criterion is needed! c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 4/35

  5. 13.2 The Regression Line A very simple scatterplot. • observed points: y 2 ● ( x i , y i ) ^ y 3 ^ • points on the line: y 2 y 3 ● ^ y 1 ( x i , ˆ y i ) y 1 ● x 1 x 2 x 3 c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 5/35

  6. 13.2 The Regression Line Definition. Define ˆ y i = a + bx i and e i = y i − ˆ y i . The regression line of Y with respect to X is the line y = a + bx with parameters a and b such that n n n y i ) 2 = � � � ( y i − a − bx i ) 2 e 2 Q ( a, b ) = i = ( y i − ˆ i =1 i =1 i =1 attains its minimum. The parameter b thus obtained is called the regression coefficient. This way to find a and b is called the method of least squares . c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 6/35

  7. 13.2 The Regression Line Regression: some first comments. • “Good” means: The sum of squared distances, parallel to the y -axis , is minimized. • This procedure is asymmetric! • It comforms to the idea: Given X , what is Y ? • X : “independent variable”, Y : “dependent variable” c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 7/35

  8. 13.2 The Regression Line Regression is asymmetric. The regression lines. . . ● • . . . of Y w.r.t. X and ● y • . . . of X w.r.t. Y ● are usually different. x c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 8/35

  9. 13.2 The Regression Line Y w.r.t. X , or rather X w.r.t. Y ? Example: = body-height of a person; X = body-weight of a person Y Here, a regression of Y w.r.t. X looks quite natural, while a regression of X w.r.t. Y would be strange. c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 9/35

  10. 13.2 The Regression Line Y w.r.t. X , or rather X w.r.t. Y ? Example: Consider the change in percent of price indices, on the corresponding month of the previous year: = change of housing price index; X = change of clothing price index Y Here, neither of the regressions — Y w.r.t. X nor X w.r.t. Y — looks very meaningful, because it is neither convincing to say that X influences (or even causes) Y , nor vice versa. In this example, a symmetric procedure is more appropriate than regression. c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 10/35

  11. 13.2 The Regression Line Computing the regression line. Minimizing Q leads to the following equations for the slope b and the intercept a : = n � x i y i − ( � x i ) ( � y i ) � ( x i − ¯ x )( y i − ¯ y ) = b n � x 2 i − ( � x i ) 2 � ( x i − ¯ x ) 2 cov( X, Y ) = var( X ) , = y − b ¯ ¯ a x. c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 11/35

  12. 13.2 The Regression Line Example: (This is a toy example. . . ) x 2 y 2 i x i y i x i y i y i ˆ e i i i 1 5 15 25 225 75 13.9 1.1 2 10 8 100 64 80 11.3 − 3.3 3 15 12 225 144 180 8.7 3.3 4 20 5 400 25 100 6.1 − 1.1 � 50 40 750 458 435 40 0 Then, b = 4 · 435 − 50 · 40 a = 40 4 − ( − 0 . 52) · 50 = − 0 . 52 , 4 = 16 . 5 4 · 750 − 50 2 The regression line is: y = 16 . 5 − 0 . 52 x . Using this regression line, the ˆ y i and the e i can be computed. We observe: ¯ y = ¯ ˆ y , ¯ e = 0 . (This is always the case.) c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 12/35

  13. 13.2 The Regression Line A plot of the toy example. 20 15 ● ● 10 y ● 5 ● 0 0 5 10 15 20 25 x c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 13/35

  14. 13.3 Explanatory Power of the Model Next, we look at the explanatory power of the regression model. y 2 ● ^ y 3 ^ y 2 y 3 ● ^ y 1 y 1 ● x 1 x 2 x 3 c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 14/35

  15. 13.3 Explanatory Power of the Model The explanatory power of the regression model. . . We observe: • There is (in general) less variability in the ˆ y i than in the y i ! — That is, the regression line cannot explain the entire variablity in the observed y i . • The regression could provide a complete explanation if all points ( x i , y i ) were on the regression line. c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 15/35

  16. 13.3 Explanatory Power of the Model Decomposition of variance. y ) 2 = � (ˆ y ) 2 + � ( y i − ˆ y i ) 2 � ( y i − ¯ y i − ¯ SST = SSR + SSE Here, SST: total sum of squares SSR: regression sum of squares SSE: error sum of squares c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 16/35

  17. 13.3 Explanatory Power of the Model The coefficient of determination. It is defined as: SSR SST • The coefficient of determination is the share of variablity in the data which is explained by the regression. • It holds that SSR SST = r 2 = cor 2 ( X, Y ) . • r 2 = 100% if and only if all observed points are on the regression line. • r 2 = 0% if and only if X and Y are uncorrelated. c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 17/35

  18. 13.3 Explanatory Power of the Model Overseas Shipholding Group, Inc. (“OSG”), is a Example: marine transportation company whose stock is listed at New York Stock Exchange (NYSE). Let monthly returns in percent be defined as osg.ret = on OSG stock (black in the figure below); nyse.ret = on the NYSE Composite Index (red) 20 ret on osg / nyse 10 0 −10 −20 2001 2002 2003 2004 2005 c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 18/35

  19. 13.3 Explanatory Power of the Model Scatterplot and regression results. ● 20 ● ● ● ● ● ● ● ● ● ● ● ● ● ● • regression line: 10 ● ● ● ● return on osg ● ● ● ● ● ● ● ● osg.ret = 1 . 50 + 1 . 47 · nyse.ret ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● −10 • coef. of determination: ● ● ● ● ● ● ● ● ● r 2 = 29% ● ● −20 ● −10 −5 0 5 10 return on nyse c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 19/35

  20. 13.3 Explanatory Power of the Model An interpretation of our results. Why are there fluctuations in OSG stock price? • It is not by pure chance that OSG stock price fluctuates. • It is because the market index NYSE Composite fluctuates! • Is this the only reason? — No, but fluctuations in NYSE Composite explain about 29% of the variability in OSG stock price. • So what might be other reasons? This is not investigated here. . . (a guess: import/export quantities, decisions of the CEO, condition of competitors, . . . ) c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 20/35

  21. 13.4 A Stochastic SLR Model SLR in descriptive and inductive statistics. • So far, we have seen SLR from a purely descriptive point of view. (There were no probabilities, no stochastic models.) • Advantage of this approach: simplicity • Disadvantage: We obtain no insight into the mechanism which created the data — for this purpose, we need a stochastic model and the methods of inductive statistics! c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 21/35

  22. 13.4 A Stochastic SLR Model A stochastic simple linear regression model. Y i = α + βx i + ǫ i , i = 1 , . . . , n • The random variable Y i represents the observation belonging to x i . • α and β are unknown parameters (to be estimated). • x i is the observation of the independent variable X . • ǫ i is a random variable; is contains everything not accounted for in the equation y = α + βx . c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 22/35

  23. 13.4 A Stochastic SLR Model Assumptions about ǫ . We shall assume that the ǫ i in Y i = α + βx i + ǫ i , i = 1 , . . . , n are a sequence of independent and identically distributed random variables: ǫ i ∼ N (0 , σ 2 ǫ ) iid The “normality assumption” is very strong. c � Harald Schmidbauer & Angi R¨ osch, 2007 13. Simple Linear Regression 23/35

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend