Business Statistics CONTENTS Multiple regression Dummy regressors - - PowerPoint PPT Presentation

โ–ถ
business statistics
SMART_READER_LITE
LIVE PREVIEW

Business Statistics CONTENTS Multiple regression Dummy regressors - - PowerPoint PPT Presentation

MULTIPLE REGRESSION ANALYSIS AND OTHER ISSUES Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression analysis Predicting with regression analysis Old exam question Further study MULTIPLE REGRESSION The


slide-1
SLIDE 1

MULTIPLE REGRESSION ANALYSIS AND OTHER ISSUES

Business Statistics

slide-2
SLIDE 2

Multiple regression Dummy regressors Assumptions of regression analysis Predicting with regression analysis Old exam question Further study CONTENTS

slide-3
SLIDE 3

The regression model so far is for one dependent variable (๐‘) and one independent (explanatory) variable (๐‘Œ) โ–ช There are many cases where several explanatory variables might play a role

โ–ช ... might โ€œexplainโ€ the dependent variable ๐‘

โ–ช Example: house prices depend on

โ–ช floor area โ–ช ground area (first floor + garden) โ–ช number of rooms โ–ช age of the house โ–ช etc.

MULTIPLE REGRESSION

slide-4
SLIDE 4

Generalize simple regression model โ–ช from ๐‘ = ๐›พ0 + ๐›พ1๐‘Œ1 + ๐œ โ–ช to ๐‘ = ๐›พ0 + ๐›พ1๐‘Œ1 + ๐›พ2๐‘Œ2 + ๐œ โ–ช or even to ๐‘ = ๐›พ0 + ๐›พ1๐‘Œ1 + ๐›พ2๐‘Œ2 + โ‹ฏ + ๐›พ๐‘™๐‘Œ๐‘™ + ๐œ Multiple regression โ–ช a quite obvious extension โ–ช we can reuse much of the theory of simple regression โ–ช still based on OLS, ๐‘†2, ๐บ-test, and ๐‘ข-test MULTIPLE REGRESSION

Now, youโ€™ll understand why we used a subscript 0 for the constant in ๐›พ0 ...

slide-5
SLIDE 5

SPSS output Estimated model: เท  ๐‘ = โˆ’217603 + 5347๐‘Œ1 + 225๐‘Œ2 MULTIPLE REGRESSION

slide-6
SLIDE 6

โ€œStep 0โ€ (statistical model): ๐‘ = ๐›พ0 + ๐›พ1๐‘Œ1 + ๐›พ2๐‘Œ2 + ๐œ, with ๐œ~๐‘‚ 0, ๐œ2 Step 1: โ–ช ๐ผ0: ๐›พ1 = ๐›พ2 = 0; ๐ผ1: at least one of these not 0 Step 2: โ–ช Sample statistic: ๐บ =

๐‘๐‘‡๐‘† ๐‘๐‘‡๐น; reject for โ€œtoo largeโ€ values

Step 3: โ–ช Under ๐ผ0: ๐บ~๐บ2,๐‘œโˆ’3; assumption: see model (step 0) Step 4: โ–ช ๐บcalc = โ‹ฏ; ๐บcrit = ๐บ2,๐‘œโˆ’3;๐›ฝ Step 5: โ–ช reject/not reject ๐ผ0 MULTIPLE REGRESSION

with ๐‘™ regressors: df1 = ๐‘™ df2 = ๐‘œ โˆ’ ๐‘™ โˆ’ 1 mind that the null hypothesis does not include the constant (intercept) ๐›พ0

slide-7
SLIDE 7

Rejecting the ๐บ-test in multiple regressions means: โ–ช at least one of the slope coefficients differs from 0

โ–ช โ€œnot ๐›พ1 = ๐›พ2 = 0โ€

โ–ช which one differs (or differ) from 0 must be investigated by separate ๐‘ข-tests So, โ–ช while in simple regression the overall ๐บ-test and the ๐‘ข-test for ๐›พ1 do exactly the same thing ... โ–ช ... the two tests have a complimentary role in multiple regression

โ–ช first look at overall ๐บ, then go to the individual ๐‘ขs

MULTIPLE REGRESSION

slide-8
SLIDE 8

First, overall model test, using ๐บ-test Next, test each slope coefficient, using ๐‘™ times a ๐‘ข-test MULTIPLE REGRESSION

not interesting

slide-9
SLIDE 9

What does it mean when in multiple regression

  • a. the overall ๐บ-test yields a significant result?
  • b. a ๐‘ข-test of an individual coefficient ๐›พ3 yields a significant

result? EXERCISE 1

slide-10
SLIDE 10

Example: โ–ช overall ๐บ-test: highly significant โ–ช both regression slopes: highly significant โ–ช coefficient of determination (๐‘†2): very high (90%) โ–ช a very useful model โ–ช in fact: better than the simple regression model with ๐‘†2 = 82% MULTIPLE REGRESSION

slide-11
SLIDE 11

Observe: โ–ช including more explanatory variables will in general improve the model โ–ช ๐‘†2 will increase, even if we include โ€œnon-senseโ€ variables (e.g., street number of the house) โ–ช ๐‘†adj

2

(โ€œR-square-adjustedโ€) penalizes for including โ€œtoo manyโ€ regressors โ–ช ๐‘†adj

2

= 1 โˆ’

๐‘‡๐‘‡๐น/๐‘œโˆ’๐‘™โˆ’1 ๐‘‡๐‘‡๐‘ˆ/๐‘œโˆ’1 while ๐‘†2 = 1 โˆ’ ๐‘‡๐‘‡๐น ๐‘‡๐‘‡๐‘ˆ

MULTIPLE REGRESSION

slide-12
SLIDE 12

House prices (numerical) depend on: โ–ช numerical variables (floor area, ground area, etc.) โ–ช binary categorical variables (with/without garage, etc.) โ–ช other categorical variables (no/free/paid parking, etc.) However: โ–ช regression for numerical ๐‘Œ and numerical ๐‘ โ–ช ANOVA for categorical ๐‘Œ and numerical ๐‘ So, how to combine numerical ๐‘Œ1 and categorical ๐‘Œ2? Solution: dummy variables for categorical variable โ–ช dummy regressors/dummy regression DUMMY REGRESSORS

slide-13
SLIDE 13

We can include dummy variables in multiple regression โ–ช Splitting binary in several binary

โ–ช original variable: garage = no/yes โ–ช garage: 0=no; 1=yes

โ–ช Splitting non-binary in several binary

โ–ช original variable: parking = no/free/paid โ–ช free_parking: 0=no; 1=yes โ–ช paid_parking: 0=no; 1=yes

โ–ช Dummy variables only for independent (๐‘Œ) variables

โ–ช never for dependent (๐‘) variable โ–ช ๐‘ must be numerical (think about ๐œ~๐‘‚)

DUMMY REGRESSORS

Omitted variable: no_parking (redundant): free=0, paid=0 Omitted variable: no_garage (redundant): garage=0

slide-14
SLIDE 14

Example โ–ช House price (๐‘) as a function of

โ–ช floor area (๐‘Œ1) โ–ช dummy for garden (๐‘Œ2; 0=No, 1=Yes) โ–ช ๐‘„๐‘ ๐‘—๐‘‘๐‘“ = โˆ’261741 + 6040๐บ๐‘š๐‘๐‘๐‘ ๐ต๐‘ ๐‘“๐‘ + 21825๐ป๐‘๐‘ ๐‘’๐‘“๐‘œ

DUMMY REGRESSORS

meaning 21825 โ‚ฌ extra when there is a garden (whatever the size)

slide-15
SLIDE 15

โ–ช Use dummy variables only for the independent (explanatory) variable

โ–ช not for the dependent variable.(logistic regression, not in this course!)

โ–ช It is quite common to indicate dummy explanatory variables with a ๐ธ instead of an ๐‘Œ

โ–ช for instance: ๐‘ = ๐›พ0 + ๐›พ1๐‘Œ1 + ๐›พ2๐ธ2 + ๐›พ3๐ธ3 + ๐œ

DUMMY REGRESSORS

slide-16
SLIDE 16

We want to explain car prices in terms of 1) engine power 2) number of seats 3) gas/diesel/electric. What is the theoretical model? EXERCISE 2

slide-17
SLIDE 17

The OLS equations always find coefficients ๐‘0, ๐‘1, โ€ฆ that minimize the residual sum of squares (๐‘‡๐‘‡๐น) โ–ช so no assumptions required for that part But when testing the model (and when testing the coefficients ๐›พ1, ๐›พ2, โ€ฆ) โ–ช we need to assume a statistical model with ๐œ~๐‘‚ 0, ๐œ2 :

โ–ช the residual terms should be normally distributed โ–ช the residual terms should come from a distribution with constant variance โ–ช the residual terms should be independent of each other โ–ช there should be a linear relationship between the ๐‘Œ-variable(s) and ๐‘

ASSUMPTIONS OF REGRESSION ANALYSIS

slide-18
SLIDE 18

A final word on the residual ๐œ~๐‘‚ 0, ๐œ2 โ–ช Theoretical regression model

โ–ช ๐‘ = ๐›พ0 + ๐›พ1๐‘Œ1 + ๐›พ2๐‘Œ2 + โ‹ฏ + ๐›พ๐‘™๐‘Œ๐‘™ + ๐œ

โ–ช Estimated regression model

โ–ช เท  ๐‘ = ๐‘0 + ๐‘1๐‘Œ1 + ๐‘2๐‘Œ2 + โ‹ฏ + ๐›พ๐‘™๐‘Œ๐‘™

โ–ช Observations

โ–ช ๐‘

๐‘— = ๐‘0 + ๐‘1๐‘Œ1,๐‘— + ๐‘2๐‘Œ2,๐‘— + โ‹ฏ + ๐›พ๐‘™๐‘Œ๐‘™,๐‘— + ๐‘“๐‘—

โ–ช And the standard deviation of the residual term ๐œ = ๐œ2

โ–ช is estimated by ๐‘ก =

๐‘‡๐‘‡๐น ๐‘œโˆ’๐‘™โˆ’1 =

๐‘๐‘‡๐น โ–ช is known as the standard error of the regression or standard error

  • f the estimate

ASSUMPTIONS OF REGRESSION ANALYSIS

slide-19
SLIDE 19

Given a sample of data ๐‘ฆ1๐‘—, ๐‘ฆ2๐‘—, โ€ฆ , ๐‘ง๐‘— with ๐‘— = 1, โ€ฆ , ๐‘œ โ–ช we can use OLS to estimate the regression model เท  ๐‘ = ๐‘0 + ๐‘1๐‘Œ1 + ๐‘2๐‘Œ2 + โ‹ฏ โ–ช subsequently, given the floor area, we can estimate the price of the house Now, a new โ€œ incompleteโ€ observations arrives โ–ช for instance, a new house with known floor area (๐‘ฆ๐‘œ+1), but with unknown price (no ๐‘ง๐‘œ+1) We can use the regression model to estimate the house price โ–ช so to predict เทŸ ๐‘ง๐‘œ+1 PREDICTION WITH REGRESSION ANALYSIS

slide-20
SLIDE 20

Example: โ–ช เท  ๐‘ = โˆ’264749 + 6152๐‘Œ โ–ช a house with floor area ๐‘ฆ = 85 m2 has an estimated price เทœ ๐‘ง = โˆ’264748 + 6152 ร— 85 = 258142 (โ‚ฌ) PREDICTION WITH REGRESSION ANALYSIS

slide-21
SLIDE 21

So, we can predict a value of เทœ ๐‘ง โ–ช for a given ๐‘ฆ (or ๐‘ฆ1, ๐‘ฆ2, โ€ฆ) โ–ช and given estimated regression coefficients (๐‘0, ๐‘1, โ€ฆ) The quality of this estimate depends obviously on the quality

  • f the regression model

โ–ช try to find a confidence interval for the estimated เทœ ๐‘ง-value โ–ช two types:

โ–ช the confidence interval for the average price of a house of 85 m2 โ–ช the confidence interval for a particular house of 85 m2

PREDICTION WITH REGRESSION ANALYSIS

slide-22
SLIDE 22

Point prediction: 258142 Case 1: confidence interval (95%) for prediction of mean price โ–ช 212866, 303419 Case 2: confidence interval (95%) for individual prediction โ–ช โˆ’96372, 612658 PREDICTION WITH REGRESSION ANALYSIS

Individual predictions are always less accurate ๏‚ฎ wider confidence interval (this one even includes 0) Price (๐‘) unknown, area (๐‘Œ) known

slide-23
SLIDE 23

26 March 2015, Q3a OLD EXAM QUESTION

slide-24
SLIDE 24

Doane & Seward 5/E 12.7, 13.1-13.5 Tutorial exercises week 4 multiple regression dummy regression prediction interval FURTHER STUDY