Case Study 2 Price of Residential Property How does the sale price - - PowerPoint PPT Presentation

case study 2 price of residential property
SMART_READER_LITE
LIVE PREVIEW

Case Study 2 Price of Residential Property How does the sale price - - PowerPoint PPT Presentation

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Case Study 2 Price of Residential Property How does the sale price of a property relate to the appraised values of the land and improvements on


slide-1
SLIDE 1

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

Case Study 2 Price of Residential Property

How does the sale price of a property relate to the appraised values

  • f the land and improvements on the land, and the neighborhood it is

in? Two questions: Do the data indicate that price can be predicted based on these variables? Is the relationship the same in different neighborhoods?

1 / 18 Case Study 2 Sale Prices of Residential Properties

slide-2
SLIDE 2

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

Available data for 176 sales between May 2008 and June 2009: Sale price, y; Appraised land value, in thousands of dollars, x1; Appraised improvement value, in thousands of dollars, x2; Neighborhood, three indicator variables x3, x4, and x5.

2 / 18 Case Study 2 Sale Prices of Residential Properties

slide-3
SLIDE 3

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

Get the data and plot them:

path <- file.path("Text", "Cases", "TAMSALES4.txt") prices <- read.table(path, header = TRUE) pairs(prices[, c("SALES", "LAND", "IMP")])

Consider four (nested) models: Model 1: First order in x1 and x2, no neighborhood effect; Model 2: First order, additive neighborhood effect; Model 3: First order, interactive neighborhood effect; Model 4: Interaction model in x1 and x2, and interactive neighborhood effect;

3 / 18 Case Study 2 Sale Prices of Residential Properties

slide-4
SLIDE 4

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

Model 1: l1 <- lm(SALES ~ LAND + IMP, prices) Model 2: l2 <- lm(SALES ~ LAND + IMP + NBHD, prices) Model 3: l3 <- lm(SALES ~ (LAND + IMP) * NBHD, prices) Model 4: l4 <- lm(SALES ~ (LAND * IMP) * NBHD, prices)

4 / 18 Case Study 2 Sale Prices of Residential Properties

slide-5
SLIDE 5

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

Summary of the models Model s R2

a

sjackknife R2

jackknife

AIC 1 112.9 .9233 118.3 .9154 2168.194 2 111.3 .9256 117.9 .9159 2165.956 3 108.7 .9290 121.3 .9111 2163.340 4 103.1 .9361 130.7 .8967 2148.515

5 / 18 Case Study 2 Sale Prices of Residential Properties

slide-6
SLIDE 6

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

Notes Small s and high R2

a are desirable. Here, Model 4 is optimal for both.

The best model for one is always the best model for the other. sjackknife is the square root of s2

jackknife = 1

n yi − ˆ y(i) 2 , which is not the same as MSEjackknife (Chapter 5). Small sjackknife and high R2

jackknife are also desirable. Here, Model 2 is

  • ptimal for both.

Again, the best model for one is always the best model for the

  • ther.

6 / 18 Case Study 2 Sale Prices of Residential Properties

slide-7
SLIDE 7

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

Information Criteria AIC is Akaike’s Information Criterion: AIC = n log ˆ σ2 + 2(k + 1) (+ . . . ) where ˆ σ2 is the biased estimator of σ2: ˆ σ2 = n − (k + 1) n s2. BIC (not shown in the table) is the Bayesian Information Criterion: BIC = n log ˆ σ2 + (k + 1) log n (+ . . . ) Small values of both are desirable.

7 / 18 Case Study 2 Sale Prices of Residential Properties

slide-8
SLIDE 8

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

R2

a and AIC suggest using Model 4, but R2 jackknife suggests using the

simpler Model 2. The other criterion, BIC, suggests using Model 1! We can also use the nested model F-test approach to decide between any pair of the models.

8 / 18 Case Study 2 Sale Prices of Residential Properties

slide-9
SLIDE 9

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

Model 1 versus Model 2

Model 2 differs from Model 1 only by including NBHD, so the ANOVA table provides the test:

summary(aov(l2)) Df Sum Sq Mean Sq F value Pr(>F) LAND 1 16418670 16418670 1326.237 <2e-16 *** IMP 1 10475902 10475902 846.203 <2e-16 *** NBHD 3 100859 33620 2.716 0.0464 * Residuals 170 2104582 12380

  • Signif. codes:

0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

The NBHD line shows that we reject Model 1 in favor of Model 2 at the 5% level, but not at the 1% level.

9 / 18 Case Study 2 Sale Prices of Residential Properties

slide-10
SLIDE 10

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

Model 2 versus Model 3

Model 3 differs from Model 2 by including the interactions LAND:NBHD and IMP:NBHD, so we need to do some arithmetic:

summary(aov(l3)) Df Sum Sq Mean Sq F value Pr(>F) LAND 1 16418670 16418670 1390.212 <2e-16 *** IMP 1 10475902 10475902 887.022 <2e-16 *** NBHD 3 100859 33620 2.847 0.0393 * LAND:NBHD 3 65732 21911 1.855 0.1392 IMP:NBHD 3 101979 33993 2.878 0.0377 * Residuals 164 1936871 11810

F = (3 × 1.855 + 3 × 2.878)/6 = 2.366 with a P-value of .0322, so we also reject Model 2 in favor of Model 3 at the 5% level.

10 / 18 Case Study 2 Sale Prices of Residential Properties

slide-11
SLIDE 11

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

Model 3 versus Model 4

Model 4 differs from Model 3 by including the interactions LAND:IMP and LAND:IMP:NBHD. ANOVA table in which these are the last two rows:

summary(aov(SALES ~ NBHD * LAND * IMP, data = prices)) Df Sum Sq Mean Sq F value Pr(>F) NBHD 3 6045891 2015297 189.531 < 2e-16 *** LAND 1 11413704 11413704 1073.417 < 2e-16 *** IMP 1 9535835 9535835 896.810 < 2e-16 *** NBHD:LAND 3 65732 21911 2.061 0.1076 NBHD:IMP 3 101979 33993 3.197 0.0251 * LAND:IMP 1 185809 185809 17.475 4.78e-05 *** NBHD:LAND:IMP 3 49773 16591 1.560 0.2011 Residuals 160 1701289 10633

11 / 18 Case Study 2 Sale Prices of Residential Properties

slide-12
SLIDE 12

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

F = (1 × 17.475 + 3 × 1.560)/4 = 5.539 with a P-value of .0003, so we also reject Model 3 in favor of Model 4 at the 5% level, and at the 1% and 0.1% levels. But note: each of these tests answers the question: Is there enough evidence against the simpler model to reject it? That is not the same question as: Which of these models will give the best predictions?

12 / 18 Case Study 2 Sale Prices of Residential Properties

slide-13
SLIDE 13

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

Interpreting The Model

summary(l4) Call: lm(formula = SALES ~ (LAND * IMP) * NBHD, data = prices) Residuals: Min 1Q Median 3Q Max

  • 373.04
  • 46.44
  • 3.40

34.69 562.02

13 / 18 Case Study 2 Sale Prices of Residential Properties

slide-14
SLIDE 14

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.552e+02 1.089e+02 1.425 0.1560 LAND

  • 8.272e-01

1.227e+00

  • 0.674

0.5010 IMP 9.609e-01 4.959e-01 1.938 0.0544 . NBHDDAVISISLES

  • 6.017e+01

1.200e+02

  • 0.501

0.6168 NBHDHUNTERSGREEN

  • 1.325e+02

1.319e+02

  • 1.004

0.3168 NBHDHYDEPARK

  • 2.659e+02

1.459e+02

  • 1.823

0.0702 . LAND:IMP 5.176e-03 3.611e-03 1.434 0.1536 LAND:NBHDDAVISISLES 2.012e+00 1.233e+00 1.631 0.1048 LAND:NBHDHUNTERSGREEN 9.361e-01 1.841e+00 0.509 0.6118 LAND:NBHDHYDEPARK 2.534e+00 1.303e+00 1.945 0.0536 . IMP:NBHDDAVISISLES

  • 1.977e-01

5.081e-01

  • 0.389

0.6978 IMP:NBHDHUNTERSGREEN 2.525e-01 5.877e-01 0.430 0.6680 IMP:NBHDHYDEPARK 4.192e-01 5.603e-01 0.748 0.4555 LAND:IMP:NBHDDAVISISLES

  • 4.278e-03

3.617e-03

  • 1.183

0.2386 LAND:IMP:NBHDHUNTERSGREEN -2.465e-04 5.211e-03

  • 0.047

0.9623 LAND:IMP:NBHDHYDEPARK

  • 5.198e-03

3.662e-03

  • 1.419

0.1578

  • Signif. codes:

0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

14 / 18 Case Study 2 Sale Prices of Residential Properties

slide-15
SLIDE 15

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

Residual standard error: 103.1 on 160 degrees of freedom Multiple R-squared: 0.9415, Adjusted R-squared: 0.9361 F-statistic: 171.8 on 15 and 160 DF, p-value: < 2.2e-16

15 / 18 Case Study 2 Sale Prices of Residential Properties

slide-16
SLIDE 16

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

The base neighborhood is Cheval, so the equation for that neighborhood is E(Y ) = 155.2 − 0.83x1 + 0.96x2 + 0.0052x1x2, a two-variable interaction model. For each other neighborhood, the equation is also a two-variable interaction model, but with different coefficients.

16 / 18 Case Study 2 Sale Prices of Residential Properties

slide-17
SLIDE 17

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

For another neighborhood, say Davis Isles, we must add the corresponding interaction terms: NBHDDAVISISLES = -60.17 to the intercept; LAND:NBHDDAVISISLES = 2.01 to the coefficient of x1; IMP:NBHDDAVISISLES = -0.20 to the coefficient of x2; LAND:IMP:NBHDDAVISISLES = -0.0043 to the coefficient of x1x2. We find E(Y ) = 95.09 + 1.19x1 + 0.76x2 + 0.0009x1x2.

17 / 18 Case Study 2 Sale Prices of Residential Properties

slide-18
SLIDE 18

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

Because all the coefficients are different in each equation, we could also get these four equations by fitting the interaction model separately to the data for each neighborhood. From the ANOVA table for Model 4, note that LAND:IMP:NBHD is not significant. If we refitted the model without that term, the four neighborhood equations would again be two-variable interaction models, but now with the same interaction coefficient in each equation. These equations cannot be obtained from separate analyses.

18 / 18 Case Study 2 Sale Prices of Residential Properties