house prices advanced regression techniques
play

House Prices: Advanced Regression Techniques Haiyang Shi Apr. 17, - PowerPoint PPT Presentation

House Prices: Advanced Regression Techniques Haiyang Shi Apr. 17, 2018 Outline Introduction ML Techniques Feature Engineering Experiments Observations The Ohio State University 2 Introduction Goal : predicting the


  1. House Prices: Advanced Regression Techniques Haiyang Shi Apr. 17, 2018

  2. Outline • Introduction • ML Techniques • Feature Engineering • Experiments • Observations The Ohio State University 2

  3. Introduction • Goal : predicting the final price for each house using advanced regression techniques. • Data : a Kaggle competition, based on property data in Ames, Iowa from 2006 and 2010. • Evaluation : Root-Mean-Square-Error (RMSE) (the log price is to reduce the impact of biased higher price). + (log 0 ∑ ()* ( ) 4 1 ( − log 1 !"#$% = 5 The Ohio State University 3

  4. ML Techniques • Linear Regression: Ridge, Lasso • Support Vector Regression • Random Forest • Adaptive Boosting • Gradient Boosted Decision Tree • K Nearest Neighbors • Neural Network The Ohio State University 4

  5. Feature Engineering • Impute missing values • Clean outliers • Categorize categorical attributes • Transform skewed attributes • Generate features* • Select feature subset The Ohio State University 5

  6. Missing Values and Highly Correlated Attributes Attribute Missing Values Attribute 1 Attribute 2 Correlation BsmtFullBath 2 MSSubClass BldgType 0.75 BsmtHalfBath 2 OverallQual ExterQual 0.72 GarageYrBlt 159 OverallQual SalePrice 0.82 GarageCars 1 YearBuilt GarageYrBlt 0.78 LotFrontage 486 Exterior1st Exterior2nd 0.86 MasVnrArea 23 ExterQual KitchenQual 0.72 BsmtFinSF1 1 TotalBsmtSF 1stFlrSF 0.78 BsmtFinSF2 1 GrLivArea TotRmsAbvGrd 0.82 BsmtUnfSF 1 GrLivArea SalePrice 0.73 TotalBsmtSF 1 Fireplaces FireplaceQu 0.80 GarageArea 1 GarageCars GarageArea 0.89 GarageQual GarageCond 0.90 The Ohio State University 6

  7. Outliers The Ohio State University 7

  8. Skewness • Log Transformation The Ohio State University 8

  9. Skewness (Cont.) Before After The Ohio State University 9

  10. Bivariate Relationship Analysis The Ohio State University 10

  11. Experiments • GridSearchCV to select hyperparameters • 10-fold cross validation The Ohio State University 11

  12. Experiments • Lasso Regression – Most important feature engineering • Transformation of skewed data • Categorization of categorical attributes • Imputation of missing values – Score • 0.12789 – Most important features • Above grade (ground) living area square feet • Lot size in square feet • Rates the overall material and finish of the house The Ohio State University 12

  13. Experiments • Random Forest – Main hyperparameters • n_estimators (800): number of trees in the forest • max_features (0.3): number of features to consider when looking for the best split • max_depth (20): maximum depth – Deeper trees with smaller max_features performs better – Resilient to data preprocessing with smaller max_features – Score • 0.14169 The Ohio State University 13

  14. Experiments • Gradient Boosted Decision Tree – Main hyperparameters: n_estimators (3000), learning_rate (0.05), max_features (log2) and max_depth (3) – Score: 0.12365 • Support Vector Regression – Main hyperparameters: kernel (linear) – Score: 0.15413 • Adaptive Boosting – Main hyperparameters: base_estimator (DecisionTree(max_features=0.3)) – Score: 0.14149 The Ohio State University 14

  15. Experiments • K Nearest Neighbors – Main hyperparameters: n_neighbors (11) – Score: 0.24084 • Neural Network – Main hyperparameters: hidden_layer_sizes ((30, 30, 30, 30)) – Score: 0.23495 The Ohio State University 15

  16. Observations • Feature engineering is very important – Feature selection – Feature creation • Transforming neighborhood attribute to geographical location – Feature combination • Overfitting is considered harmful, and cross validation alone is not enough • Tuning hyperparameters is very time consuming More details: http://www.shihaiyang.me/2018/04/16/house-prices/ The Ohio State University 16

  17. Thank You! The Ohio State University 17

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend