Revenue Prediction of House Resale Resale Bairong Lei University - PowerPoint PPT Presentation

Revenue Prediction of House Resale Resale Bairong Lei University of Waterloo November 6, 2012

Overview � Motivation � Previous Works � Project Goals � Dataset � Dataset � Plan for Analysis

Motivation � People are favor of the ownership of a valuable property. � Home investment is treated as a hedge against � Home investment is treated as a hedge against inflation. � House resale is expected to be able to make a profit.

Census structure of private home 30.5% 29.5% 29.0% 28.5% 28.5% 27.6% 26.5% 26.8% 25.7% household type - Source: Statistics Canada

Motivation Cont’ � New home purchasing VS. resale home purchasing: Issues to Concern New Homes Resold Homes Registration for a home Registration for a home Needed Needed Not needed Not needed builder List Prices Unknown Known Renovation Cost of Upgrade Usually Included in Price Appliance May or May Not Usually included in Price Locations Unpredictable Fixed Offer Presentation Not needed Needed

Previous Work � Basu, S. and Thibodeau, T. Analysis of Spatial Auto-correlation in House Prices. Journal of Real Estate Finance and Economics, Vol. 17:1, 61-85 (1998). � Structural characteristics increases hedonic house price prediction accuracy. � � Chopra, S., Thampy, T., Leahy, J., Caplin, A., and LeCun, Y. Machine Chopra, S., Thampy, T., Leahy, J., Caplin, A., and LeCun, Y. Machine Learning and the Spatial Structure of House Prices and Housing Returns (2008). � Applying linear regression model to account for geography factor to reduce error for price prediction over a long period. � Question: How to predict the revenue when selling a house? What are the factors to affect the revenue when selling a houses?

Project Goals � Predict the difference between sold prices and listing prices of resold houses (regression problem) � Predict whether the sold prices is greater than the asking prices (classification problem)

Raw Dataset - Source: realmarketwatch.com

Raw Dataset � Source: realmarketwatch.com � Description: Resold house Records in Great Toronto Area in recent two weeks � Fields include: � Fields include: � MLS Number, City, Street Number, Street Name, Street Type, Area, House Type, House Style, Number of Bedrooms, Number of Bathrooms, Contract Date, Sold Date, Ask Price, Sold Price

Raw Data Cont’ � Overview of the raw dataset � Number of Records: 4194 � City: totally 54 distinct names � Area: 340 districts � Street Types: 38 distinct types � Street Types: 38 distinct types � House Types: 15 � House Styles: 10 � No. of Bedrooms: 0 ~ 9 � No. of Washrooms: 0 ~ 11 � Ask Price: 89900 ~ 7995000 � Sold Price: 2200 ~ 7025000

Raw Dataset Cont’ � Example: � City: Aurora � St. No.: 51 � St. Name: Cashel � St. Type: Crt � Area: Aurora Hig � Ask Price: 329777 Ask Price: 329777 � Contract Date: 10/09/2012 � Sold Price: 320000 � Sold Date: 24/09/2012 � House Type: Att/Row/Tw � House Style: 2-Storey � Bedroom: 3 � Washroom: 2

Challenges of Raw Data � No house records for Halton region in GTA � Fields with Invalid Data � 0 Bedrooms � 0 Bathrooms � 0 Bathrooms � Ambiguous data � House style as “Vacant land” � Any suggestion on imputation for these misleading data? (mean, hot-deck or machine learning methods?)

Plan for Analysis � Overview � Pre-processing raw data � Potential machine learning methods � Validation

Overview � Feature reconstruction from raw data � Goal: to group categorical and qualitative data into levels to be more ML descriptive (feature encoding) � Focus on features of City, house type, house style, � number of bedrooms and number of bathrooms � Primarily focus on supervised learning methods � Regression method � Classification method

Pre-processing raw data � Feature Encoding � Apply dummy variables to feature construction for qualitative variables � City names are categorized into four regions (City � City names are categorized into four regions (City of Toronto, Peel, York, Durham) Z 1 = 1 if the house resides in Peel, else Z 1 = 0; Z 2 = 1 if the house resides in York, else Z 2 = 0; Z 3 = 1 if the house resides in Durham, else Z 3 = 0;

Machine Learning Methods � Regression problem � Multivariate Linear Regression � Classification problem � Classification problem � Support Vector Machine � Decision Tree

Multivariate Linear Regression � Use encoded qualitative variables to build up models � Recall: City names are categorized into four regions (City of Toronto, Peel, York, Durham) Z 1 = 1 if the house resides in Peel, else Z 1 = 0; Z 2 = 1 if the house resides in York, else Z 2 = 0; Z = 1 if the house resides in Durham, else Z = 0; Z 3 = 1 if the house resides in Durham, else Z 3 = 0; � Y i = α 0 + α 1 Z 1 + α 2 Z 2 + α 3 Z 3 � Training models with training samples � Generate conclusion with the testing results using test sets.

Support Vector Machine � Select features and generate feature subsets � Build up models with various feature subsets and Gaussian Radial basis kernel Gaussian Radial basis kernel � Apply K-fold cross validation to training models � Compare averaged misclassification rates for each feature subset

Decision Tree � Build up the tree with C4.5 algorithm � Handle training set with unknown attribute values by evaluating the gain or gain ratio for that attribute � Pruning would run after tree is created � Pseudocode of C4.5Algorithm: 1. 1. Check for base cases Check for base cases 2. For each attribute a Find the normalized information gain from splitting on a 3. Let a_best be the attribute with the highest normalized information gain 4. Create a decision node that splits on a_best 5. Recurse on the sublists obtained by splitting on a_best , and add those nodes as children of node - Source: C4.5 Algorithm http://en.wikipedia.org/wiki/C4.5_algorithm

Validation � Test data set � New real data released from the website to test the prediction accuracy of the models for test the prediction accuracy of the models for those machine learning methods

Reference � Statistics Canada. Distribution (in percent-age) of private households by household type, 2001 to 2011. http://www12.statcan.gc.ca/census-recensement/2011/as- sa/98-312-x/2011003/fig/fig3_2-1-eng.cfm � Basu, S. and Thibodeau, T. Analysis of Spatial Auto-correlation in House Prices . Journal of Real Estate Finance and Economics, Vol. 17:1, 61-85 (1998). � Chopra, S., Thampy, T., Leahy, J., Caplin, A., and LeCun, Y. Machine Learning and the Spatial Structure of House Prices and Housing Returns (2008). � Antipov E. and Pokryshevskaya, E. Mass appraisal of residential apartments: An application of Random forest for valuation and a CART-based approach for model diagnostics . Working Paper(2010). � RealMarketWatch http://realmarketwatch.com/ � C4.5 Algorithm http://en.wikipedia.org/wiki/C4.5_algorithm

Thank You! Thank You!

Revenue Prediction of House Resale Resale Bairong Lei University - PowerPoint PPT Presentation

Revenue Prediction of House Resale Resale Bairong Lei University of Waterloo November 6, 2012 Overview Motivation Previous Works Project Goals Dataset Dataset Plan for Analysis Motivation People are favor of the

Annual House Price Changes (New & Resale) 2014 Price Growth (Actual), 2015 Forecasts [New]

NSP/HOME Resale, Recapture Requirements 1 92.254 a.5. Resale and www.mhponline.org Recapture

Ticket Resale Phillip Leslie Alan Sorensen Stanford University & NBER Stanford University

What we can do for you? We have next programs: Resale of hydraulic components Production

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Branch Prediction Branch Prediction vs vs Execution Time Execution Time Prediction

HOUSE REVENUE COMMITTEE FY 2013 Economic and Revenue Update And Preliminary FY 2014 Revenue

BIENNIAL REVENUE ESTIMATE REVENUE VOLATILITY January 2017 Glenn Hegar Texas Comptroller of

Fall 2013 Revenue Forecast Department of Revenue January 28, 2014 Angela M. Rodell

Economics and Revenue Forecast for the 2021 Biennium Revenue forecasting steps Current Events

Will Swan En Energy Ho House Lab Laborat atories Energy House: Whole House Testing'

Comparison of House and Senate Budget Amendments House Bills and Senate Bills 29 and 30 House

Open house Open house Open house Open house on on on on on on on on World Raw Cashew

17/12461/OUT Tottenham House Tottenham House Tottenham House - Front Tottenham House - Front

Using lasso and related estimators for prediction Di Liu StataCorp July 12, 2019 1 / 20

Prediction and Odds 18.05 Spring 2017 Probabilistic Prediction Also called probabilistic

BRIC ICS Law Jo Journal New research project of the University of Tyumen BRIC ICS Law Journal

FLORIDA LEGISLATIVE HISTORY RESEARCH Florida Supreme Court Library Feb. 13, 2020 Why do

October 3, 2019 What is the purpose of tonights meeting? Provide property owners with

results from PISA 2015 Wednesday 24 May 2017 Financial Literacy and Education Commission

Back-To-School/Open House- 4 th Grade Lesliean Aponte, Carolina Bermudez, Theresa Early, Tara

Main Points 1. Virginia schools are safer than the public perceives. 2. Threat assessment is an

A House With No Sides: Authentic Collaboration in Student Success Facilities Design Dr. Randy

West of Hudson Regional Transit Access Study Open House presentation J June 12, 2012 12 2012

Revenue Prediction of House Resale Resale Bairong Lei University - PowerPoint PPT Presentation

Revenue Prediction of House Resale Resale Bairong Lei University of Waterloo November 6, 2012 Overview Motivation Previous Works Project Goals Dataset Dataset Plan for Analysis Motivation People are favor of the

Annual House Price Changes (New &amp; Resale) 2014 Price Growth (Actual), 2015 Forecasts [New]

NSP/HOME Resale, Recapture Requirements 1 92.254 a.5. Resale and www.mhponline.org Recapture

Ticket Resale Phillip Leslie Alan Sorensen Stanford University &amp; NBER Stanford University

What we can do for you? We have next programs: Resale of hydraulic components Production

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Branch Prediction Branch Prediction vs vs Execution Time Execution Time Prediction

HOUSE REVENUE COMMITTEE FY 2013 Economic and Revenue Update And Preliminary FY 2014 Revenue

BIENNIAL REVENUE ESTIMATE REVENUE VOLATILITY January 2017 Glenn Hegar Texas Comptroller of

Fall 2013 Revenue Forecast Department of Revenue January 28, 2014 Angela M. Rodell

Economics and Revenue Forecast for the 2021 Biennium Revenue forecasting steps Current Events

Will Swan En Energy Ho House Lab Laborat atories Energy House: Whole House Testing'

Comparison of House and Senate Budget Amendments House Bills and Senate Bills 29 and 30 House

Open house Open house Open house Open house on on on on on on on on World Raw Cashew

17/12461/OUT Tottenham House Tottenham House Tottenham House - Front Tottenham House - Front

Using lasso and related estimators for prediction Di Liu StataCorp July 12, 2019 1 / 20

Prediction and Odds 18.05 Spring 2017 Probabilistic Prediction Also called probabilistic

BRIC ICS Law Jo Journal New research project of the University of Tyumen BRIC ICS Law Journal

FLORIDA LEGISLATIVE HISTORY RESEARCH Florida Supreme Court Library Feb. 13, 2020 Why do

October 3, 2019 What is the purpose of tonights meeting? Provide property owners with

results from PISA 2015 Wednesday 24 May 2017 Financial Literacy and Education Commission

Back-To-School/Open House- 4 th Grade Lesliean Aponte, Carolina Bermudez, Theresa Early, Tara

Main Points 1. Virginia schools are safer than the public perceives. 2. Threat assessment is an

A House With No Sides: Authentic Collaboration in Student Success Facilities Design Dr. Randy

West of Hudson Regional Transit Access Study Open House presentation J June 12, 2012 12 2012

Annual House Price Changes (New & Resale) 2014 Price Growth (Actual), 2015 Forecasts [New]

Ticket Resale Phillip Leslie Alan Sorensen Stanford University & NBER Stanford University