hazardous models and risk mitigation in real estate
play

Hazardous Models and Risk Mitigation in Real Estate DataEngConf SF, - PowerPoint PPT Presentation

Hazardous Models and Risk Mitigation in Real Estate DataEngConf SF, April 2018 David Lundgren & Xinlu Huang Who has modeled time-to-event data before? Who has modeled time-to-event data before? Whats the half-life of a startup in


  1. Hazardous Models and Risk Mitigation in Real Estate DataEngConf SF, April 2018 David Lundgren & Xinlu Huang

  2. Who has modeled time-to-event data before?

  3. Who has modeled time-to-event data before? What’s the half-life of a startup in Silicon Valley?

  4. Who has modeled time-to-event data before? What’s the half-life of a startup in Silicon Valley? When’s my team going to score another goal?

  5. Did you use survival analysis?

  6. Introduction Xinlu Huang David Lundgren

  7. Talk Structure ● Real Estate 100 and Opendoor 101 Modeling Liquidity via Days-on-market ○ Home Sale Case Studies ○ ● Pay Attention to the Negative Space (Model 1) ● Solve a Simpler Problem (Model 2) ● A General Recipe for Survival Analysis (Model 3) ● Q & A

  8. Real Estate 100 and Opendoor 101 How a home’s duration on the market impacts Opendoor Opendoor bears the risk in reselling the home ● Time-on-market varies substantially by home ● Our unit costs are driven by how long it takes us to find a buyer for ● a home

  9. The Problem How long will it take us to find a buyer for a home?

  10. Home Sale Case Studies Home 1 Listed ~$800k

  11. Home Sale Case Studies Home 1 Listed ~$800k 6+ months on the market

  12. Home Sale Case Studies Home 2 Listed ~$300k

  13. Home Sale Case Studies Home 2 Listed ~$300k 1 month on the market

  14. Framing the Problem

  15. Framing the Problem Home List Price Square Feet Other Features Days-on-market (y) 423 Main Street $200k 2000 .... 30 111 Side Road $200k 2200 ... 100 ... 52 Downtown Ave $400k 1945 n/a 90 Outskirts Lane $300k 2100 n/a

  16. Model #1: Linear Regression Home List Price Square Feet Other Features Days-on-market (y) 423 Main Street $200k 2000 .... 30 111 Side Road $200k 2200 ... 100 ...

  17. Does it work?

  18. Results

  19. Results

  20. Results

  21. Results

  22. Results

  23. Censoring

  24. Model #1: Linear Regression Home List Price Square Feet ... Days-on-market (y) Explanation 423 Main Street $200k 2000 .... 30 111 Side Road $200k 2200 ... 100 ... Still on market 52 Downtown Ave $400k 1945 n/a after 200 days Delisted after 300 90 Outskirts Lane $300k 2100 n/a days

  25. Model #1: Takeaway Pay attention to the negative space

  26. Reframing the Problem

  27. Model #2: Classify “closed before 100 days-on-market” 100 days ? days-on-market

  28. Model #2: Classify “closed before 100 days-on-market” Home List Price ... Days-on-market Closed Within 100 Days (y) 423 Main Street $200k ... 30 1 111 Side Road $200k ... 100 0 ... 52 Downtown Ave $400k ... n/a 0 (still on market after 200 days) 90 Outskirts Lane $300k ... n/a 0 (delisted after 300 days)

  29. Does it Work?

  30. Pros

  31. Pro: Easy to Implement ? days-on-market

  32. Pro: Easy to Implement - Just Set a Threshold 100 days ? days-on-market

  33. Pro: Easy-to-interpret Output Predicted Probability 0-100 days 100+ days

  34. Pro: Uses Censored Data 100 days ✔ ? days-on-market

  35. Cons

  36. Easy to Implement - Just Set a Threshold 100 days ? days-on-market

  37. Easy to Implement - Just Set a Threshold - But Which One? 10 days 45 days 100 days 120 days ? days-on-market

  38. Easy-to-interpret Output Predicted Probability 0-100 days 100+ days

  39. Easy-to-interpret Output Wrong API Predicted Predicted Probability Probability x 50 + 150 x = ?? 0-100 days 100+ days 0-100 days 100+ days

  40. Easy-to-interpret Output Ideal API 60 days Predicted Predicted Probability Probability 0-100 days 100+ days days-on-market

  41. Uses Censored Data 100 days ✔ ? days-on-market

  42. Uses Censored Data (Partially) But Discards Recent Observations 100 days 100 days ✔ ? ? days-on-market days-on-market

  43. Model #2: Takeaway Solve a Simpler Problem

  44. Attempt #3 Survival Analysis

  45. When stuck, see if someone has already solved the problem... Actuaries & medical professionals are interested in What is the life expectancy of ● the population of city A? What is the probability of person ● B surviving the next decade? Given person C is 70 years old, ● what is his/her life expectancy? Censored data is always an issue.

  46. In this analogy, “death” is a happy event of finding a buyer: Opendoor is interested in Actuaries & medical professionals are interested in What is the life expectancy of ● What is the expected days on ● the population of city A? market for all listings in city A? What is the probability of person ● What is the probability of listing B ● B surviving the next decade? taking 10 more days to sell? Given person C is 70 years old, ● Given listing C was on market for ● what is his/her life expectancy? 70 days, how much longer until we expect to find a buyer?

  47. Previously…. Predicted Days-on-market = 45 Predicted Probability 0-100 days 100+ days With survival analysis... Days-on-market 60 Predicted Probability time

  48. Model #3: Takeaway 1 Look for Existing Solutions to Similar Problems

  49. We found the right approach, but...

  50. Hurdle #1 It’s not easy to explain ???? The fundamental concepts requires calculus to explain well Limited intuition and tie-ins to tangible concepts for decision makers

  51. Hurdle #2 Scaling is hard with existing tools Lots of R packages ● Limited options for production-ready languages ● Works great for small dataset; broke down with larger ones ●

  52. Hurdle #3 Modeling flexibility is hard with existing tools Off-the-shelf packages: model choices are limited (proportional or ● additive hazard models) Non-flexible feature specification ○ Hard to implement time-varying features ○ … ○ Markov Chain Monte Carlo (Stan): complete freedom of model ● specification, but Took hours to train on a tiny dataset ○ Hard to maintain ○

  53. Let’s try to reformulate the problem

  54. Survival analysis made easy Instead of telling you about... S(t), � (t), Cox Proportional Models, Kaplan-Meier, ... We will show you a reformulation that Easily scalable to large datasets ● More concretely tied to real life numbers ● Equivalent* ● Allows flexible modeling extension ● * with some hand-waving. Rigorous proof left to mathematicians in the audience as an exercise.

  55. Changing target again Home Ini. List ... Days-on- Price market 423 Main Street $200k .... 30

  56. Changing target again Home Ini. List ... Days-on- “Current” days on Sold in the next day Price market market (y) 423 Main Street $200k .... 30 0 0 423 Main Street $200k .... 30 1 0 30 new data rows 423 Main Street $200k .... 30 2 0 ... 423 Main Street $200k .... 30 28 0 423 Main Street $200k .... 30 29 1

  57. Changing target again Home Ini. List ... Days-on- “Current” days on Sold in the next day Price market market (y) 423 Main Street $200k .... 30 0 0 423 Main Street $200k .... 30 1 0 30 rows 423 Main Street $200k .... 30 2 0 ... 423 Main Street $200k .... 30 28 0 423 Main Street $200k .... 30 29 1 52 Downtown Ave $400k ... Still on market after 200 days

  58. Changing target again Home Ini. List ... Days-on- “Current” days on Sold in the next day Price market market (y) 423 Main Street $200k .... 30 0 0 423 Main Street $200k .... 30 1 0 30 rows 423 Main Street $200k .... 30 2 0 ... 423 Main Street $200k .... 30 28 0 423 Main Street $200k .... 30 29 1 52 Downtown Ave $400k ... n/a 0 0 200 rows ... 52 Downtown Ave $400k ... n/a 199 0

  59. Change fundamental unit of data listings ⇒ listing-days All listing data are used: closed, active, delisted...

  60. Binary classification to the rescue, again We transformed the problem into vanilla binary classification Pick your favorite binary classifier, as long as ● Log-loss minimizing ○ Calibrated probabilities ○ Scalability ✔ (even though we made the dataset larger!) ●

  61. How to interpret? Prediction = probability of listing closing in the next day (hazard rate in survival analysis parlance) Prediction = housing clearance rate, a.k.a. inventory turnover rate if we start with 100 homes on market today, how many will close before the end of the day/week/month/year? ✔ Model output ties directly to real world numbers, no calculus needed!

  62. How to interpret? (cont’d) Prediction, a.k.a. the hazard rate, is the building block hazard rate + laws of probabilities = everything we want to know Example : expected days on market For each listing, we have a series of predictions (h 1 , h 2 , h 3 , h 4 , ...) for each day E[y] = ∑ y × P(y) = 1 × h 1 + 2 × (1 - h 1 ) h 2 + 3 × (1 - h 1 ) (1 - h 2 ) h 3 + 4 × … + ... P(closing on day 1) P(days-on-market = 2) = P(not closing on day 1) × P(closing on day 2)

  63. Model #3: Takeaway 2 Complex modeling technique doesn’t always need complex implementation

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend