Chapter 13 Multiple Regression and Model Building Multiple - - PowerPoint PPT Presentation

chapter 13
SMART_READER_LITE
LIVE PREVIEW

Chapter 13 Multiple Regression and Model Building Multiple - - PowerPoint PPT Presentation

Chapter 13 Multiple Regression and Model Building Multiple Regression Models The General Multiple Regression Model ... y x x x 0 1 1 2 2 k k is the dependent variable y are


slide-1
SLIDE 1

Chapter 13

Multiple Regression and Model Building

slide-2
SLIDE 2

Multiple Regression Models

The General Multiple Regression Model

is the dependent variable are the independent variables is the deterministic portion of the model determines the contribution of the independent variable

y

1 1 2 2

...

k k

y x x x           

 

1 1 2 2

...

k k

E y x x x         

i

x

1 2

, , ...,

k

x x x

i

slide-3
SLIDE 3

Multiple Regression Models

Analyzing a Multiple Regression Model

1. Hypothesize the deterministic component of the model 2. Use sample data to estimate β0,β1,β2,… βk 3. Specify probability distribution of ε and estimate σ 4. Check that assumptions on ε are satisfied 5. Statistically evaluate model usefulness 6. Useful model used for prediction, estimation, other purposes

slide-4
SLIDE 4

The First-Order Model: Estimating and Interpreting the -Parameters

For the chosen fitted model minimizes

1 1

ˆ ˆ ˆ ˆ ...

k k

y x x       

 

1 1 2 2 3 3 4 4 5 5

E y x x x x x            

 

2

ˆ S S E y y  

slide-5
SLIDE 5

The First-Order Model: Estimating and Interpreting the -Parameters

y = β0 + β1x1 + β2x2 + β3x3 + ε

where Y = Sales price (dollars) X1 = Appraised land value (dollars) X2 = Appraised improvements (dollars) X3 = Area (square feet)

slide-6
SLIDE 6

The First-Order Model: Estimating and Interpreting the -Parameters

Plot of data for sample size n=20

slide-7
SLIDE 7

The First-Order Model: Estimating and Interpreting the -Parameters

Fit model to data

slide-8
SLIDE 8

The First-Order Model: Estimating and Interpreting the -Parameters

Interpret β estimates

2

ˆ .8 2 0 4  

1

ˆ 1 3 .5 3  

1

ˆ .8 1 4 5  

E(y), the mean sale price of the property is estimated to increase .8145 dollars for every $1 increase in appraised land value, holding other variables constant E(y), the mean sale price of the property is estimated to increase .8204 dollars for every $1 increase in appraised improvements, holding other variables constant E(y), the mean sale price of the property is estimated to increase 13.53 dollars for additional square foot of living area, holding other variables constant

slide-9
SLIDE 9

The First-Order Model: Estimating and Interpreting the -Parameters

Given the model E(y) = 1 +2x1 +x2, the effect

  • f x2 on E(y), holding x1 and x2 constant is
slide-10
SLIDE 10

The First-Order Model: Estimating and Interpreting the -Parameters

Given the model E(y) = 1 +2x1 +x2, the effect

  • f x2 on E(y), holding x1 and x2 constant is
slide-11
SLIDE 11

Model Assumptions

Assumptions about Random Error ε

1. For any given set of values of x1, x2,…..xk, the random error has a normal probability distribution with mean 0 and variance σ2 2. The random errors are independent

Estimators of σ2 for a Multiple Regression Model with k Independent Variables

s2= SSE = SSE n-Number of Estimated β parameters n-(k+1)

slide-12
SLIDE 12

Inferences about the -Parameters

2 types of inferences can be made, using either confidence intervals or hypothesis testing For any inferences to be made, the assumptions made about the random error term ε (normal distribution with mean 0 and variance σ2, independence or errors) must be met

slide-13
SLIDE 13

Inferences about the -Parameters

A 100(1-α)% Confidence Interval for a -Parameter where tα/2 is based on n-(k+1) degrees of freedom and n = Number of observations k+1 = Number of  parameters in the model

ˆ 2

ˆ

i

i

t s

 

 

slide-14
SLIDE 14

Inferences about the -Parameters

A Test of an Individual Parameter Coefficient

One-Tailed Test Two-Tailed Test H0: βi=0 Ha: βi<0 (or Ha: βi>0) H0: βi=0 Ha: βi≠0 Rejection region: t< -tα (or t< -tα when Ha: β1>0) Rejection region: |t|> tα/2 Where tα and tα/2 are based on n-(k+1) degrees of freedom

ˆ

ˆ : 

i

i

T e s t S ta tis tic t s

slide-15
SLIDE 15

Inferences about the -Parameters

An Excel Analysis

Use for confidence Intervals Use for hypotheses about parameter coefficients

slide-16
SLIDE 16

Checking the Overall Utility of a Model

3 tests:

1. Multiple coefficient of determination R2 2. Adjusted multiple coefficient of determination 3. Global F-test

       

 

2 2

1 1 1 1 1 1 1                                   

a y y

n n S S E R R n k S S n k

2

1     

y y y y y y

S S S S E S S E E x p la in e d v a r ia b ility R S S S S T o ta l v a r ia b ility

 

 

 

 

2 2

: 1 1 1                

y y

S S S S E k R k T e st sta tistic F S S E n k R n k

slide-17
SLIDE 17

Checking the Overall Utility of a Model

Testing Global Usefulness of the Model: The Analysis of Variance F-test

H0: β1 =β2=....βk=0 Ha: At least one βi ≠ 0 where n is the sample size and k is number of terms in the model Rejection region: F>Fα, with k numerator degrees of freedom and [n- (k+1)] denominator degrees of freedom

 

 

 

 

2 2

: 1 1 1                 

y y

S S S S E k R k M e a n S q u a re M o d e l T e st sta tistic F M e a n S q u a re E rro r S S E n k R n k

slide-18
SLIDE 18

Checking the Overall Utility of a Model

Checking the Utility of a Multiple Regression Model 1. Conduct a test of overall model adequacy using the F-test. If H0 is rejected, proceed to step 2 2. Conduct t-tests on β parameters of particular interest

slide-19
SLIDE 19

Using the Model for Estimation and Prediction

As in Simple Linear Regression, intervals around a predicted value will be wider than intervals around an estimated value Most statistics packages will print out both confidence and prediction intervals

slide-20
SLIDE 20

Model Building: Interaction Models

An Interaction Model relating E(y) to Two Quantitative Independent Variables where represents the change in E(y) for every 1-unit increase in x1, holding x2 fixed represents the change in E(y) for every 1-unit increase in x2, holding x1 fixed

 

1 3 2

 x  

 

1 1 2 2 3 1 2

    E y x x x x    

 

2 3 1

 x  

slide-21
SLIDE 21

Model Building: Interaction Models

When the relationship between two y and xi is not impacted by a second x (no interaction)

When the linear relationship between y and xi depends on another x

slide-22
SLIDE 22

Model Building: Interaction Models

slide-23
SLIDE 23

Model Building: Quadratic and

  • ther Higher-Order Models

A Quadratic (Second-Order) Model

where is the y-intercept of the curve is a shift parameter is the rate of curvature

 

2 1 2

   E y x x   

1

2

slide-24
SLIDE 24

Model Building: Quadratic and

  • ther Higher-Order Models

Home Size-Electrical Usage Data

Size of Home, x (sq. ft.) Monthly Usage, y (kilowatt-hours)

1,290 1,182 1,350 1,172 1,470 1,264 1,600 1,493 1,710 1,571 1,840 1,711 1,980 1,804 2,230 1,840 2,400 1,95 2,930 1,954

slide-25
SLIDE 25

Model Building: Quadratic and

  • ther Higher-Order Models

2

ˆ 1, 2 1 6 .1 2 .3 9 8 9 .0 0 0 4 5 y x x    

slide-26
SLIDE 26

Model Building: Quadratic and

  • ther Higher-Order Models

A Complete Second-Order Model with Two Quantitative Independent Variables

where is the y-intercept, value of E(y) when x1=x2=0 changes cause the surface to shift along the x1 and x2 axes controls the rotation of the surface control the type of surface, rates of curvature

 

2 2 1 2 2 3 1 2 4 1 5 2

      E y x x x x x x      

1 2

,  

3

4 5

,  

slide-27
SLIDE 27

Model Building: Quadratic and

  • ther Higher-Order Models
slide-28
SLIDE 28

Model Building: Qualitative (Dummy) Variable Models

Dummy variables – coded, qualitative variables

  • Codes are in the form of (1, 0), 1 being the presence of a

condition, 0 the absence

  • Create Dummy variables so that there is one less dummy

variable than categories of the qualitative variable of interest Gender dummy variable coded as x = 1 if male, x=0 if female If model is E(y)=β0+β1x , β1 captures the effect of being male on the dependent variable

slide-29
SLIDE 29

Model Building: Models with both Quantitative and Qualitative Variables

Start with a first order model with one quantitative variable, E(y)=β0+β1x Adding a qualitative variable with no interaction, E(y)=β0+β1x1+ β2x2+ β3x3

slide-30
SLIDE 30

Model Building: Models with both Quantitative and Qualitative Variables

Adding an interaction term, E(y)=β0+β1x1+ β2x2+ β3x3+ β4x1x2+ β5x1x3

Main effect, Main effect Interaction x1 x2 and x3

slide-31
SLIDE 31

Model Building: Comparing Nested Models

Models are nested if one model contains all the terms of the other model and at least

  • ne additional term.

Complete (full) model – the more complex model Reduced model – the simpler model

slide-32
SLIDE 32

Model Building: Comparing Nested Models

Models are nested if one model contains all the terms of the other model and at least one additional term. Complete (full) model – the more complex model Reduced model – the simpler model

 

2 2 1 2 2 3 1 2 4 1 5 2

      E y x x x x x x      

 

1 2 2 3 1 2

    E y x x x x    

slide-33
SLIDE 33

Model Building: Comparing Nested Models

F-Test for comparing nested models:

F-Test for Comparing Nested Models

Reduced model Complete Model

H0: βg+1 =βg+2=....βk=0 Ha: At least one β under test is nonzero. Rejection region: F>Fα, with k-g numerator degrees of freedom and [n-(k+1)] denominator degrees of freedom

       

# ' : 1           

R C R C C C

S S E S S E k g S S E S S E s te ste d in H T e st sta tistic F M S E S S E n k 

 

1 1

...    

g g

E y x x   

 

1 1 1 1

... ...

 

      

g g g g k k

E y x x x x     

slide-34
SLIDE 34

Model Building: Stepwise Regression

Used when a large set of independent variables Software packages will add in variables in

  • rder of explanatory value.

Decisions based on largest t-values at each step Procedure is best used as a screening procedure only

slide-35
SLIDE 35

Residual Analysis: Checking the Regression Assumptions

Regression Residual – the difference between an observed y value and its corresponding predicted value Properties of Regression Residuals

  • The mean of the residuals equals zero
  • The standard deviation of the residuals is equal to the

standard deviation of the fitted regression model

 

ˆ ˆ y y   

slide-36
SLIDE 36

Residual Analysis: Checking the Regression Assumptions

Analyzing Residuals

Top plot of residuals reveals non-random pattern, curved shape Second plot, based on second-order term being added to model, results in random pattern, better model

slide-37
SLIDE 37

Residual Analysis: Checking the Regression Assumptions

Identifying Outliers

Residual plots can reveal outliers Outliers need to be checked to try to determine if error is involved If error is involved, or observation is not representative, analysis can be rerun after deleting data point to assess the effect.

Outlier

slide-38
SLIDE 38

Residual Analysis: Checking the Regression Assumptions

With Outlier Without Outlier

Checking for Normal Errors

slide-39
SLIDE 39

Residual Analysis: Checking the Regression Assumptions

Checking for Equal Variances Pattern in residuals indicate violation of equal variance assumption Can point to use of transformation on the dependent variable to stabilize variance

slide-40
SLIDE 40

Residual Analysis: Checking the Regression Assumptions

Steps in Residual Analysis

1. Check for misspecified model by plotting residuals against quantitative independent variables 2. Examine residual plots for outliers 3. Check for non-normal error using frequency distribution of residuals 4. Check for unequal error variances using plots

  • f residuals against predicted values
slide-41
SLIDE 41

Some Pitfalls: Estimability, Multicollinearity, and Extrapolation

Estimability – the number of levels of

  • bserved x-values must be one more than

the order of the polynomial in x that you want to fit Multicollinearity – when two or more independent variables are correlated

slide-42
SLIDE 42

Some Pitfalls: Estimability, Multicollinearity, and Extrapolation

Multicollinearity – when two or more independent variables are correlated Leads to confusing, misleading results, incorrect parameter estimate signs. Can be identified by

–checking correlations among x’s –non-significant for most/all x’s –signs opposite from expected in the estimated β parameters

Can be addressed by

–Dropping one or more of the correlated variables in the model –Restricting inferences to range of sample data, not making inferences about individual β parameters based on t-tests.

slide-43
SLIDE 43

Some Pitfalls: Estimability, Multicollinearity, and Extrapolation

Extrapolation – use of model to predict

  • utside of range of sample data is

dangerous Correlated Errors – most common when working with time series data, values of y and x’s observed over a period of time. Solution is to develop a time series model.