multiple linear regression
play

Multiple Linear Regression Recall: a regression model describes how a - PowerPoint PPT Presentation

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Multiple Linear Regression Recall: a regression model describes how a dependent variable (or response ) Y is affected, on average, by one or more


  1. ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Multiple Linear Regression Recall: a regression model describes how a dependent variable (or response ) Y is affected, on average, by one or more independent variables (or factors , or covariates ). The general equation is E ( Y ) = β 0 + β 1 x 1 + β 2 x 2 + · · · + β k x k . I shall sometimes write E ( Y ) as E ( Y | x 1 , x 2 , . . . , x k ), to emphasize that E ( Y ) changes with the values of the terms x 1 , x 2 , . . . , x k : E ( Y | x 1 , x 2 , . . . , x k ) = β 0 + β 1 x 1 + β 2 x 2 + · · · + β k x k . 1 / 21 Multiple Linear Regression General Form

  2. ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II As always, we can write ǫ = Y − E ( Y ) , or Y = E ( Y ) + ǫ, where the random error ǫ has expected value zero: E ( ǫ ) = E ( ǫ | x 1 , x 2 , . . . , x k ) = 0 . So the general equation can also be written Y = β 0 + β 1 x 1 + β 2 x 2 + · · · + β k x k + ǫ. 2 / 21 Multiple Linear Regression General Form

  3. ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Each term on the right hand side may be an independent variable, or a function of one or more independent variables. For instance, E ( Y ) = β 0 + β 1 x + β 2 x 2 has two terms on the right hand side (not counting the intercept β 0 ), but only one independent variable . We write it in the general form as E ( Y ) = β 0 + β 1 x 1 + β 2 x 2 , with x 1 = x and x 2 = x 2 . 3 / 21 Multiple Linear Regression General Form

  4. ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Interpreting the parameters: β 0 β 0 is still called the intercept, but now its interpretation is the expected value of Y when all independent variables are zero: β 0 = E ( Y | x 1 = 0 , x 2 = 0 , . . . , x k = 0) . In some cases, these values cannot all be achieved at the same time; in these cases, β 0 has only a hypothetical meaning. 4 / 21 Multiple Linear Regression General Form

  5. ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Interpreting the parameters: β i , i > 0 For 1 ≤ i ≤ k , β i measures the change in E ( Y ) as x i increases by 1 with all the other independent variables held fixed . Again, in some cases it is not possible to change one variable and none of the others, so β i may also have only a hypothetical meaning. You will sometimes find, for instance, some β i < 0 when you expect that Y should increase , not decrease , when x i increases. That is usually because, when x i changes, other variables also change. 5 / 21 Multiple Linear Regression General Form

  6. ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Quantitative and Qualitative Variables Some variables are measured quantities (i.e., on an interval or ratio scale), and are called quantitative . Others are the result of classification into categories (i.e. on a nominal or ordinal scale), and are called qualitative . Some terms may be functions of independent variables: distance and distance 2 , or sine and cosine of (month / 12). The simplest case is when all variables are quantitative, and no mathematical functions appear: the first-order model. 6 / 21 Multiple Linear Regression General Form

  7. ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Example: Grandfather clocks Dependence of auction price of antique clocks on their age, and the number of bidders at the auction. Data for 32 clocks. Get the data and plot them: clocks = read.table("Text/Exercises&Examples/GFCLOCKS.txt", header = TRUE) pairs(clocks[, c("PRICE", "AGE", "NUMBIDS")]) The first-order model is E (PRICE) = β 0 + β 1 × AGE + β 2 × NUMBIDS . 7 / 21 Multiple Linear Regression General Form

  8. ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Fitting the model: least squares As in the case k = 1, the most common way of fitting a multiple regression model is by least squares . That is, find ˆ β 0 , ˆ β 1 , . . . , ˆ β k so that y = ˆ β 0 + ˆ β 1 x 1 + . . . ˆ ˆ β k x k minimizes � y i ) 2 . SS E = ( y i − ˆ As noted earlier, other criteria such as � | y i − ˆ y i | are sometimes used instead. 8 / 21 Multiple Linear Regression Fitting the model: least squares

  9. ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Calculus leads to k + 1 linear equations in the k + 1 estimates β 0 , ˆ ˆ β 1 , . . . , ˆ β k . These equations are always consistent ; that is, they always have a solution. Usually, they are also non-singular ; that is, the solution is unique. If they are singular, we can find a unique solution by either imposing constraints on the parameters or leaving out redundant variables. 9 / 21 Multiple Linear Regression Fitting the model: least squares

  10. ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II The equations are: n ˆ � x i , 1 ˆ � x i , k ˆ � β 0 + β 1 + · · · + β k = y i � x i , 1 ˆ � i , 1 ˆ � x i , 1 x i , k ˆ � x 2 β 0 + β 1 + · · · + β k = x i , 1 y i . . . � x i , k ˆ � x i , 1 x i , k ˆ � i , k ˆ � x 2 β 0 + β 1 + · · · + β k = x i , k y i where x i , j is the value in the i th observation of the j th variable, 1 ≤ i ≤ n , 1 ≤ j ≤ k . We usually write these more compactly using matrix notation , and solve them using matrix methods . 10 / 21 Multiple Linear Regression Fitting the model: least squares

  11. ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Matrix formulation of least squares Write X for the n × ( k + 1) matrix of values of the independent variables (including a column of 1’s for the intercept):   1 . . . x 1 , 1 x 1 , 2 x 1 , k 1 x 2 , 1 x 2 , 2 . . . x 2 , k   X =  .  . . . .  ... . . . .  . . . .   1 . . . x n , 1 x n , 2 x n , k Also write y for the n × 1 vector of values of the dependent variable:   y 1 y 2   y =  . .   .  .   y n 11 / 21 Multiple Linear Regression Fitting the model: least squares

  12. ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Finally, write ˆ β for the k × 1 vector of parameter estimates: ˆ   β 0 ˆ β 1   ˆ β =  .  .   .   ˆ β k Then the equations for the parameter estimates can be written X ′ X ˆ β = X ′ y . 12 / 21 Multiple Linear Regression Fitting the model: least squares

  13. ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II The equations are non-singular when ( X ′ X ) − 1 exists, and the solution may be written ˆ β = ( X ′ X ) − 1 X ′ y . However, computing first X ′ X and then its inverse ( X ′ X ) − 1 can lead to large numerical errors. Using a transformation of X such as the QR decomposition or the singular value decomposition gives better numerical performance. 13 / 21 Multiple Linear Regression Fitting the model: least squares

  14. ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Model Assumptions No assumptions are needed to find least squares estimates. To use them to make statistical inferences, we need these assumptions: The random errors ǫ 1 , ǫ 2 , . . . , ǫ n are uncorrelated and have common variance σ 2 ; For small sample validity, the random errors are normally distributed, at least approximately. 14 / 21 Multiple Linear Regression Estimating Error Variance

  15. ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II As before, we estimate σ 2 using � y i ) 2 . SS E = ( y i − ˆ We can show that E [SS E ] = ( n − p ) σ 2 , where p = k + 1 is the number of β s in the model, so the unbiased estimator is s 2 = SS E = SS E SS E n − p . = n − ( k + 1) . df E 15 / 21 Multiple Linear Regression Estimating Error Variance

  16. ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Hypothesis Tests Usually, the first test is an overall test of the model: H 0 : β 1 = β 2 = · · · = β k = 0. H a : at least one β i � = 0 . H 0 asserts that none of the independent variables affects Y ; if this hypothesis is not rejected, the model is worthless. For instance, its predictions perform no better than ¯ y . The test statistic is usually denoted F , and P -values are found from the F -distribution with k and n − p = n − ( k + 1) degrees of freedom. 16 / 21 Multiple Linear Regression Testing the Utility of a Model

  17. ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Individual parameters may also be tested: H 0 : β i = 0. H a : β i � = 0. The test statistic is ˆ β i t = standard error of ˆ β i It is tested using the t -distribution with n − p degrees of freedom. 17 / 21 Multiple Linear Regression Inferences About Individual Parameters

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend