SLIDE 1
Business Statistics CONTENTS Multiple regression Dummy regressors - - PowerPoint PPT Presentation
Business Statistics CONTENTS Multiple regression Dummy regressors - - PowerPoint PPT Presentation
MULTIPLE REGRESSION ANALYSIS AND OTHER ISSUES Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression analysis Predicting with regression analysis Old exam question Further study MULTIPLE REGRESSION The
SLIDE 2
SLIDE 3
The regression model so far is for one dependent variable (๐) and one independent (explanatory) variable (๐) โช There are many cases where several explanatory variables might play a role
โช ... might โexplainโ the dependent variable ๐
โช Example: house prices depend on
โช floor area โช ground area (first floor + garden) โช number of rooms โช age of the house โช etc.
MULTIPLE REGRESSION
SLIDE 4
Generalize simple regression model โช from ๐ = ๐พ0 + ๐พ1๐1 + ๐ โช to ๐ = ๐พ0 + ๐พ1๐1 + ๐พ2๐2 + ๐ โช or even to ๐ = ๐พ0 + ๐พ1๐1 + ๐พ2๐2 + โฏ + ๐พ๐๐๐ + ๐ Multiple regression โช a quite obvious extension โช we can reuse much of the theory of simple regression โช still based on OLS, ๐2, ๐บ-test, and ๐ข-test MULTIPLE REGRESSION
Now, youโll understand why we used a subscript 0 for the constant in ๐พ0 ...
SLIDE 5
SPSS output Estimated model: เท ๐ = โ217603 + 5347๐1 + 225๐2 MULTIPLE REGRESSION
SLIDE 6
โStep 0โ (statistical model): ๐ = ๐พ0 + ๐พ1๐1 + ๐พ2๐2 + ๐, with ๐~๐ 0, ๐2 Step 1: โช ๐ผ0: ๐พ1 = ๐พ2 = 0; ๐ผ1: at least one of these not 0 Step 2: โช Sample statistic: ๐บ =
๐๐๐ ๐๐๐น; reject for โtoo largeโ values
Step 3: โช Under ๐ผ0: ๐บ~๐บ2,๐โ3; assumption: see model (step 0) Step 4: โช ๐บcalc = โฏ; ๐บcrit = ๐บ2,๐โ3;๐ฝ Step 5: โช reject/not reject ๐ผ0 MULTIPLE REGRESSION
with ๐ regressors: df1 = ๐ df2 = ๐ โ ๐ โ 1 mind that the null hypothesis does not include the constant (intercept) ๐พ0
SLIDE 7
Rejecting the ๐บ-test in multiple regressions means: โช at least one of the slope coefficients differs from 0
โช โnot ๐พ1 = ๐พ2 = 0โ
โช which one differs (or differ) from 0 must be investigated by separate ๐ข-tests So, โช while in simple regression the overall ๐บ-test and the ๐ข-test for ๐พ1 do exactly the same thing ... โช ... the two tests have a complimentary role in multiple regression
โช first look at overall ๐บ, then go to the individual ๐ขs
MULTIPLE REGRESSION
SLIDE 8
First, overall model test, using ๐บ-test Next, test each slope coefficient, using ๐ times a ๐ข-test MULTIPLE REGRESSION
not interesting
SLIDE 9
What does it mean when in multiple regression
- a. the overall ๐บ-test yields a significant result?
- b. a ๐ข-test of an individual coefficient ๐พ3 yields a significant
result? EXERCISE 1
SLIDE 10
Example: โช overall ๐บ-test: highly significant โช both regression slopes: highly significant โช coefficient of determination (๐2): very high (90%) โช a very useful model โช in fact: better than the simple regression model with ๐2 = 82% MULTIPLE REGRESSION
SLIDE 11
Observe: โช including more explanatory variables will in general improve the model โช ๐2 will increase, even if we include โnon-senseโ variables (e.g., street number of the house) โช ๐adj
2
(โR-square-adjustedโ) penalizes for including โtoo manyโ regressors โช ๐adj
2
= 1 โ
๐๐๐น/๐โ๐โ1 ๐๐๐/๐โ1 while ๐2 = 1 โ ๐๐๐น ๐๐๐
MULTIPLE REGRESSION
SLIDE 12
House prices (numerical) depend on: โช numerical variables (floor area, ground area, etc.) โช binary categorical variables (with/without garage, etc.) โช other categorical variables (no/free/paid parking, etc.) However: โช regression for numerical ๐ and numerical ๐ โช ANOVA for categorical ๐ and numerical ๐ So, how to combine numerical ๐1 and categorical ๐2? Solution: dummy variables for categorical variable โช dummy regressors/dummy regression DUMMY REGRESSORS
SLIDE 13
We can include dummy variables in multiple regression โช Splitting binary in several binary
โช original variable: garage = no/yes โช garage: 0=no; 1=yes
โช Splitting non-binary in several binary
โช original variable: parking = no/free/paid โช free_parking: 0=no; 1=yes โช paid_parking: 0=no; 1=yes
โช Dummy variables only for independent (๐) variables
โช never for dependent (๐) variable โช ๐ must be numerical (think about ๐~๐)
DUMMY REGRESSORS
Omitted variable: no_parking (redundant): free=0, paid=0 Omitted variable: no_garage (redundant): garage=0
SLIDE 14
Example โช House price (๐) as a function of
โช floor area (๐1) โช dummy for garden (๐2; 0=No, 1=Yes) โช ๐๐ ๐๐๐ = โ261741 + 6040๐บ๐๐๐๐ ๐ต๐ ๐๐ + 21825๐ป๐๐ ๐๐๐
DUMMY REGRESSORS
meaning 21825 โฌ extra when there is a garden (whatever the size)
SLIDE 15
โช Use dummy variables only for the independent (explanatory) variable
โช not for the dependent variable.(logistic regression, not in this course!)
โช It is quite common to indicate dummy explanatory variables with a ๐ธ instead of an ๐
โช for instance: ๐ = ๐พ0 + ๐พ1๐1 + ๐พ2๐ธ2 + ๐พ3๐ธ3 + ๐
DUMMY REGRESSORS
SLIDE 16
We want to explain car prices in terms of 1) engine power 2) number of seats 3) gas/diesel/electric. What is the theoretical model? EXERCISE 2
SLIDE 17
The OLS equations always find coefficients ๐0, ๐1, โฆ that minimize the residual sum of squares (๐๐๐น) โช so no assumptions required for that part But when testing the model (and when testing the coefficients ๐พ1, ๐พ2, โฆ) โช we need to assume a statistical model with ๐~๐ 0, ๐2 :
โช the residual terms should be normally distributed โช the residual terms should come from a distribution with constant variance โช the residual terms should be independent of each other โช there should be a linear relationship between the ๐-variable(s) and ๐
ASSUMPTIONS OF REGRESSION ANALYSIS
SLIDE 18
A final word on the residual ๐~๐ 0, ๐2 โช Theoretical regression model
โช ๐ = ๐พ0 + ๐พ1๐1 + ๐พ2๐2 + โฏ + ๐พ๐๐๐ + ๐
โช Estimated regression model
โช เท ๐ = ๐0 + ๐1๐1 + ๐2๐2 + โฏ + ๐พ๐๐๐
โช Observations
โช ๐
๐ = ๐0 + ๐1๐1,๐ + ๐2๐2,๐ + โฏ + ๐พ๐๐๐,๐ + ๐๐
โช And the standard deviation of the residual term ๐ = ๐2
โช is estimated by ๐ก =
๐๐๐น ๐โ๐โ1 =
๐๐๐น โช is known as the standard error of the regression or standard error
- f the estimate
ASSUMPTIONS OF REGRESSION ANALYSIS
SLIDE 19
Given a sample of data ๐ฆ1๐, ๐ฆ2๐, โฆ , ๐ง๐ with ๐ = 1, โฆ , ๐ โช we can use OLS to estimate the regression model เท ๐ = ๐0 + ๐1๐1 + ๐2๐2 + โฏ โช subsequently, given the floor area, we can estimate the price of the house Now, a new โ incompleteโ observations arrives โช for instance, a new house with known floor area (๐ฆ๐+1), but with unknown price (no ๐ง๐+1) We can use the regression model to estimate the house price โช so to predict เท ๐ง๐+1 PREDICTION WITH REGRESSION ANALYSIS
SLIDE 20
Example: โช เท ๐ = โ264749 + 6152๐ โช a house with floor area ๐ฆ = 85 m2 has an estimated price เท ๐ง = โ264748 + 6152 ร 85 = 258142 (โฌ) PREDICTION WITH REGRESSION ANALYSIS
SLIDE 21
So, we can predict a value of เท ๐ง โช for a given ๐ฆ (or ๐ฆ1, ๐ฆ2, โฆ) โช and given estimated regression coefficients (๐0, ๐1, โฆ) The quality of this estimate depends obviously on the quality
- f the regression model
โช try to find a confidence interval for the estimated เท ๐ง-value โช two types:
โช the confidence interval for the average price of a house of 85 m2 โช the confidence interval for a particular house of 85 m2
PREDICTION WITH REGRESSION ANALYSIS
SLIDE 22
Point prediction: 258142 Case 1: confidence interval (95%) for prediction of mean price โช 212866, 303419 Case 2: confidence interval (95%) for individual prediction โช โ96372, 612658 PREDICTION WITH REGRESSION ANALYSIS
Individual predictions are always less accurate ๏ฎ wider confidence interval (this one even includes 0) Price (๐) unknown, area (๐) known
SLIDE 23
26 March 2015, Q3a OLD EXAM QUESTION
SLIDE 24