case study 2 price of residential property
play

Case Study 2 Price of Residential Property How does the sale price - PowerPoint PPT Presentation

ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Case Study 2 Price of Residential Property How does the sale price of a property relate to the appraised values of the land and improvements on


  1. ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Case Study 2 Price of Residential Property How does the sale price of a property relate to the appraised values of the land and improvements on the land, and the neighborhood it is in? Two questions: Do the data indicate that price can be predicted based on these variables? Is the relationship the same in different neighborhoods? 1 / 18 Case Study 2 Sale Prices of Residential Properties

  2. ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Available data for 176 sales between May 2008 and June 2009: Sale price, y ; Appraised land value, in thousands of dollars, x 1 ; Appraised improvement value, in thousands of dollars, x 2 ; Neighborhood, three indicator variables x 3 , x 4 , and x 5 . 2 / 18 Case Study 2 Sale Prices of Residential Properties

  3. ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Get the data and plot them: path <- file.path("Text", "Cases", "TAMSALES4.txt") prices <- read.table(path, header = TRUE) pairs(prices[, c("SALES", "LAND", "IMP")]) Consider four (nested) models: Model 1: First order in x 1 and x 2 , no neighborhood effect; Model 2: First order, additive neighborhood effect; Model 3: First order, interactive neighborhood effect; Model 4: Interaction model in x 1 and x 2 , and interactive neighborhood effect; 3 / 18 Case Study 2 Sale Prices of Residential Properties

  4. ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Model 1: l1 <- lm(SALES ~ LAND + IMP, prices) Model 2: l2 <- lm(SALES ~ LAND + IMP + NBHD, prices) Model 3: l3 <- lm(SALES ~ (LAND + IMP) * NBHD, prices) Model 4: l4 <- lm(SALES ~ (LAND * IMP) * NBHD, prices) 4 / 18 Case Study 2 Sale Prices of Residential Properties

  5. ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Summary of the models R 2 R 2 Model AIC s s jackknife jackknife a 1 112.9 .9233 118.3 .9154 2168.194 2 111.3 .9256 117.9 .9159 2165.956 3 108.7 .9290 121.3 .9111 2163.340 4 103.1 .9361 130.7 .8967 2148.515 5 / 18 Case Study 2 Sale Prices of Residential Properties

  6. ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Notes Small s and high R 2 a are desirable. Here, Model 4 is optimal for both. The best model for one is always the best model for the other. s jackknife is the square root of jackknife = 1 � 2 , � � s 2 y i − ˆ y ( i ) n which is not the same as MSE jackknife (Chapter 5). Small s jackknife and high R 2 jackknife are also desirable. Here, Model 2 is optimal for both. Again, the best model for one is always the best model for the other. 6 / 18 Case Study 2 Sale Prices of Residential Properties

  7. ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Information Criteria AIC is Akaike’s Information Criterion: σ 2 + 2( k + 1) AIC = n log ˆ (+ . . . ) σ 2 is the biased estimator of σ 2 : where ˆ σ 2 = n − ( k + 1) s 2 . ˆ n BIC (not shown in the table) is the Bayesian Information Criterion: σ 2 + ( k + 1) log n BIC = n log ˆ (+ . . . ) Small values of both are desirable. 7 / 18 Case Study 2 Sale Prices of Residential Properties

  8. ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II R 2 a and AIC suggest using Model 4, but R 2 jackknife suggests using the simpler Model 2. The other criterion, BIC, suggests using Model 1! We can also use the nested model F -test approach to decide between any pair of the models. 8 / 18 Case Study 2 Sale Prices of Residential Properties

  9. ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Model 1 versus Model 2 Model 2 differs from Model 1 only by including NBHD , so the ANOVA table provides the test: summary(aov(l2)) Df Sum Sq Mean Sq F value Pr(>F) LAND 1 16418670 16418670 1326.237 <2e-16 *** IMP 1 10475902 10475902 846.203 <2e-16 *** NBHD 3 100859 33620 2.716 0.0464 * Residuals 170 2104582 12380 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 The NBHD line shows that we reject Model 1 in favor of Model 2 at the 5% level, but not at the 1% level. 9 / 18 Case Study 2 Sale Prices of Residential Properties

  10. ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Model 2 versus Model 3 Model 3 differs from Model 2 by including the interactions LAND:NBHD and IMP:NBHD , so we need to do some arithmetic: summary(aov(l3)) Df Sum Sq Mean Sq F value Pr(>F) LAND 1 16418670 16418670 1390.212 <2e-16 *** IMP 1 10475902 10475902 887.022 <2e-16 *** NBHD 3 100859 33620 2.847 0.0393 * LAND:NBHD 3 65732 21911 1.855 0.1392 IMP:NBHD 3 101979 33993 2.878 0.0377 * Residuals 164 1936871 11810 F = (3 × 1 . 855 + 3 × 2 . 878) / 6 = 2 . 366 with a P -value of .0322, so we also reject Model 2 in favor of Model 3 at the 5% level. 10 / 18 Case Study 2 Sale Prices of Residential Properties

  11. ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Model 3 versus Model 4 Model 4 differs from Model 3 by including the interactions LAND:IMP and LAND:IMP:NBHD . ANOVA table in which these are the last two rows: summary(aov(SALES ~ NBHD * LAND * IMP, data = prices)) Df Sum Sq Mean Sq F value Pr(>F) NBHD 3 6045891 2015297 189.531 < 2e-16 *** LAND 1 11413704 11413704 1073.417 < 2e-16 *** IMP 1 9535835 9535835 896.810 < 2e-16 *** NBHD:LAND 3 65732 21911 2.061 0.1076 NBHD:IMP 3 101979 33993 3.197 0.0251 * LAND:IMP 1 185809 185809 17.475 4.78e-05 *** NBHD:LAND:IMP 3 49773 16591 1.560 0.2011 Residuals 160 1701289 10633 11 / 18 Case Study 2 Sale Prices of Residential Properties

  12. ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II F = (1 × 17 . 475 + 3 × 1 . 560) / 4 = 5 . 539 with a P -value of .0003, so we also reject Model 3 in favor of Model 4 at the 5% level, and at the 1% and 0.1% levels. But note: each of these tests answers the question: Is there enough evidence against the simpler model to reject it? That is not the same question as: Which of these models will give the best predictions? 12 / 18 Case Study 2 Sale Prices of Residential Properties

  13. ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Interpreting The Model summary(l4) Call: lm(formula = SALES ~ (LAND * IMP) * NBHD, data = prices) Residuals: Min 1Q Median 3Q Max -373.04 -46.44 -3.40 34.69 562.02 13 / 18 Case Study 2 Sale Prices of Residential Properties

  14. ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.552e+02 1.089e+02 1.425 0.1560 LAND -8.272e-01 1.227e+00 -0.674 0.5010 IMP 9.609e-01 4.959e-01 1.938 0.0544 . NBHDDAVISISLES -6.017e+01 1.200e+02 -0.501 0.6168 NBHDHUNTERSGREEN -1.325e+02 1.319e+02 -1.004 0.3168 NBHDHYDEPARK -2.659e+02 1.459e+02 -1.823 0.0702 . LAND:IMP 5.176e-03 3.611e-03 1.434 0.1536 LAND:NBHDDAVISISLES 2.012e+00 1.233e+00 1.631 0.1048 LAND:NBHDHUNTERSGREEN 9.361e-01 1.841e+00 0.509 0.6118 LAND:NBHDHYDEPARK 2.534e+00 1.303e+00 1.945 0.0536 . IMP:NBHDDAVISISLES -1.977e-01 5.081e-01 -0.389 0.6978 IMP:NBHDHUNTERSGREEN 2.525e-01 5.877e-01 0.430 0.6680 IMP:NBHDHYDEPARK 4.192e-01 5.603e-01 0.748 0.4555 LAND:IMP:NBHDDAVISISLES -4.278e-03 3.617e-03 -1.183 0.2386 LAND:IMP:NBHDHUNTERSGREEN -2.465e-04 5.211e-03 -0.047 0.9623 LAND:IMP:NBHDHYDEPARK -5.198e-03 3.662e-03 -1.419 0.1578 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 14 / 18 Case Study 2 Sale Prices of Residential Properties

  15. ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Residual standard error: 103.1 on 160 degrees of freedom Multiple R-squared: 0.9415, Adjusted R-squared: 0.9361 F-statistic: 171.8 on 15 and 160 DF, p-value: < 2.2e-16 15 / 18 Case Study 2 Sale Prices of Residential Properties

  16. ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II The base neighborhood is Cheval, so the equation for that neighborhood is E ( Y ) = 155 . 2 − 0 . 83 x 1 + 0 . 96 x 2 + 0 . 0052 x 1 x 2 , a two-variable interaction model. For each other neighborhood, the equation is also a two-variable interaction model, but with different coefficients. 16 / 18 Case Study 2 Sale Prices of Residential Properties

  17. ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II For another neighborhood, say Davis Isles, we must add the corresponding interaction terms: NBHDDAVISISLES = -60.17 to the intercept; LAND:NBHDDAVISISLES = 2.01 to the coefficient of x 1 ; IMP:NBHDDAVISISLES = -0.20 to the coefficient of x 2 ; LAND:IMP:NBHDDAVISISLES = -0.0043 to the coefficient of x 1 x 2 . We find E ( Y ) = 95 . 09 + 1 . 19 x 1 + 0 . 76 x 2 + 0 . 0009 x 1 x 2 . 17 / 18 Case Study 2 Sale Prices of Residential Properties

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend