bus 701 advanced statistics
play

Bus 701: Advanced Statistics Harald Schmidbauer c Harald - PowerPoint PPT Presentation

Bus 701: Advanced Statistics Harald Schmidbauer c Harald Schmidbauer & Angi R osch, 2008 Chapter 14: Multiple Regression c Harald Schmidbauer & Angi R osch, 2008 14. Multiple Regression 2/43 14.1 Introduction SLR and


  1. Bus 701: Advanced Statistics Harald Schmidbauer c � Harald Schmidbauer & Angi R¨ osch, 2008

  2. Chapter 14: Multiple Regression c � Harald Schmidbauer & Angi R¨ osch, 2008 14. Multiple Regression 2/43

  3. 14.1 Introduction SLR and Multiple Linear Regression. • Goal of SLR: Explain the variablity in Y , using a variable X . • Goal of multiple linear regression: Explain the variablity in Y , using a set of variables X 1 , X 2 , . . . , X k . c � Harald Schmidbauer & Angi R¨ osch, 2008 14. Multiple Regression 3/43

  4. 14.1 Introduction The problem. Given are points ( x 1 i , x 2 i , . . . , x ki , y i ) , where: • y i : observations from a variable Y , the dependent variable; • x ji : observations from a variable X j , which is an independent variable. Given a (k+1)-dimensional cloud of points, how can we fit a hyperplane? c � Harald Schmidbauer & Angi R¨ osch, 2008 14. Multiple Regression 4/43

  5. 14.1 Introduction Outlook on Chapter 14. • 14.2 An Intuitive Approach three-dimensional scatterplots and a regression plane • 14.3 The Regression Plane the method of least squares • 14.4 Explanatory Power of the Model decomposition of variance; coefficient of determination • 14.5 A Stochastic Model of Multiple Regression stochastic model and statistical inference • 14.6 Examples • 14.7 Prediction Based on Multiple Regression point prediction and prediction intervals c � Harald Schmidbauer & Angi R¨ osch, 2008 14. Multiple Regression 5/43

  6. 14.2 An Intuitive Approach The case of three variables: X 1 , X 2 , Y . We shall now see a three-dimensional scatterplot in two perspectives with: • black points, representing the observations, • a plane, which somehow fits these points, • red points, the projection of the black points onto the plane, • the distance between the black and the red points. c � Harald Schmidbauer & Angi R¨ osch, 2008 14. Multiple Regression 6/43

  7. 14.2 An Intuitive Approach Observed points and their projections onto the plane. c � Harald Schmidbauer & Angi R¨ osch, 2008 14. Multiple Regression 7/43

  8. 14.2 An Intuitive Approach Observed points and their projections onto the plane. c � Harald Schmidbauer & Angi R¨ osch, 2008 14. Multiple Regression 8/43

  9. 14.2 An Intuitive Approach How to find that plane. . . . in order to find a “good” plane to represent the cloud of points, we need: • the equation of a plane, depending on parameters, • a distance function, • to find the parameter values such that the distance function is minimized. c � Harald Schmidbauer & Angi R¨ osch, 2008 14. Multiple Regression 9/43

  10. 14.3 The Regression Plane A plane and the observations. • Plane in 3-dimensional space: y = a + b 1 x 1 + b 2 x 2 • With observations ( x 1 i , x 2 i , y i ) , i = 1 , . . . , n : ˆ = a + b 1 x 11 + b 2 x 21 , = y 1 − ˆ y 1 e 1 y 1 ˆ = a + b 1 x 12 + b 2 x 22 , = y 2 − ˆ y 2 e 2 y 2 . . . . . . ˆ = a + b 1 x 1 n + b 2 x 2 n , = y n − ˆ y n e n y n • The ˆ y i are called the fitted values. c � Harald Schmidbauer & Angi R¨ osch, 2008 14. Multiple Regression 10/43

  11. 14.3 The Regression Plane Using matrices. — The last relations can be written as: ˆ y = Xb , e = y − ˆ y = y − Xb , where     � a ˆ 1 y 1 x 11 x 21 � ˆ 1 y 2 x 12 x 22 y = ˆ X = b =  ,  , , b 1     . . . . . . . . . . . .   b 2 ˆ 1 y n x 1 n x 2 n     y 1 e 1 y 2 e 2 y = e =  ,  .     . . . . . .   y n e n c � Harald Schmidbauer & Angi R¨ osch, 2008 14. Multiple Regression 11/43

  12. 14.3 The Regression Plane Definition. • Define ˆ y i = a + b 1 x 1 i + b 2 x 2 i and e i = y i − ˆ y i . • The regression plane of Y with respect to X 1 and X 2 is the plane y = a + b 1 x 1 + b 2 x 2 with a , b 1 and b 2 such that n n � � y i ) 2 e 2 Q ( a, b 1 , b 2 ) = i = ( y i − ˆ i =1 i =1 n � ( y i − a − b 1 x 1 i − b 2 x 2 i ) 2 = i =1 attains its minimum. • b 1 and b 2 : regression coefficients. c � Harald Schmidbauer & Angi R¨ osch, 2008 14. Multiple Regression 12/43

  13. 14.3 The Regression Plane Regression: some first comments. • This procedure is asymmetric — like SLR! • It conforms to the idea: Given X 1 and X 2 , what is Y ? • X 1 , X 2 : “independent variables”, Y : “dependent variable” • This procedure can be easily generalized to k > 2 independent variables. • The case k > 2 cannot be easily visualized in terms of a scatterplot. c � Harald Schmidbauer & Angi R¨ osch, 2008 14. Multiple Regression 13/43

  14. 14.3 The Regression Plane Example: Used cars. • For a set of used cars, consider these variables: – mileage (km) – age (months) – price ( e ) • A natural choice is: – dependent variable: price – inpendent variables: mileage, age c � Harald Schmidbauer & Angi R¨ osch, 2008 14. Multiple Regression 14/43

  15. 14.3 The Regression Plane Example: Used cars. • Important: The so-called “independent variables” need not be uncorrelated. • For our sample of 400 cars (VW Golf 1.8): 200 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 150 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● – correlation: 0.43 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● mileage ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 100 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● – red points: cars with ac ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 50 ● ● ● ● 60 80 100 140 180 age c � Harald Schmidbauer & Angi R¨ osch, 2008 14. Multiple Regression 15/43

  16. 14.3 The Regression Plane Computing the regression plane. • Minimizing Q leads to the following vector equation: b = ( X ′ X ) − 1 X ′ y • The fitted values are: y = Xb = X ( X ′ X ) − 1 X ′ y ˆ • These formulas apply to any number k of independent variables. • For k = 1 , the formulas of SLR are obtained. c � Harald Schmidbauer & Angi R¨ osch, 2008 14. Multiple Regression 16/43

  17. 14.3 The Regression Plane Multiple regression — some properties in the context of descriptive statistics. • The vector of arithmetic means (¯ x 1 , ¯ x 2 , ¯ y ) is on the regression plane. • The average error ¯ e equals zero. • The matrix X ( X ′ X ) − 1 X ′ in ˆ y = Xb = X ( X ′ X ) − 1 X ′ y is a projection matrix: y is projected onto a sub-space of R n . c � Harald Schmidbauer & Angi R¨ osch, 2008 14. Multiple Regression 17/43

  18. 14.3 The Regression Plane Example: Used cars. • Data from 400 used cars (VW Golf 1.8, age at least 5 years, mileage at most 200000 km). • The fitted regression plane is: price = 14146 . 2 − 24 . 61 · mileage − 49 . 13 · age (Price in e , mileage in 1000 km, age in months.) • According to this result: What is the average price of a car with mileage 100000 km, age 10 years? • How much will this decrease if the car is used for another year, for another 12000 km? c � Harald Schmidbauer & Angi R¨ osch, 2008 14. Multiple Regression 18/43

  19. 14.3 The Regression Plane Example: Used cars. Scatterplot: c � Harald Schmidbauer & Angi R¨ osch, 2008 14. Multiple Regression 19/43

  20. 14.3 The Regression Plane Example: Used cars. Scatterplot: c � Harald Schmidbauer & Angi R¨ osch, 2008 14. Multiple Regression 20/43

  21. 14.4 Explanatory Power of the Model Decomposition of variance. As in SLR, it holds that: y ) 2 = � (ˆ y ) 2 + � ( y i − ˆ y i ) 2 , � ( y i − ¯ y i − ¯ SST = SSR + SSE where SST: total sum of squares SSR: regression sum of squares SSE: error sum of squares c � Harald Schmidbauer & Angi R¨ osch, 2008 14. Multiple Regression 21/43

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend