alternative data in finance what is alternative data in
play

Alternative Data in Finance What is Alternative Data in Finance? - PowerPoint PPT Presentation

Alternative Data in Finance What is Alternative Data in Finance? Example: Lodging Key Metrics Occupancy x Room Rate ~ Revenues Example: Lodging Key Metrics Occupancy x Room Rate ~ Revenues Number of lights on Example: Lodging Key


  1. Alternative Data in Finance

  2. What is Alternative Data in Finance?

  3. Example: Lodging Key Metrics Occupancy x Room Rate ~ Revenues

  4. Example: Lodging Key Metrics Occupancy x Room Rate ~ Revenues Number of lights on

  5. Example: Lodging Key Metrics Occupancy x Room Rate ~ Revenues Online Room Number of lights on Rates

  6. Example: Lodging Key Metrics Occupancy x Room Rate ~ Revenues Online Room Number of lights on Rates Alternative Data

  7. Alternative Dataset Examples 1. Point of sale transactions 2. Online behavior 3. Purchases 1. Online 2. Brick and mortar 4. Obscure public records 5. Satellite imagery 6. Etc etc etc

  8. Data Sourcing • Direct data gathering • Data vendors • Just download the data (JDD)

  9. Free Datasets http://aws.amazon.com/datasets http://databib.org http://datacite.org http://figshare.com http://linkeddata.org http://reddit.com/r/datasets http://thedatahub.org alias http://ckan.net http://quandl.com http://enigma.io More here: http://www.quora.com/Where-can-I-find-large-datasets-open- to-the-public

  10. And even MORE Free-ish Datasets http://knoema.com http://www.google.com/publicdata/directory http://datahub.io http://datamob.org http://www.freebase.com http://www.xdayta.com http://www.redliondata.com http://opendata.arcgis.com http://www.bigdataexchange.com https://www.opensciencedatacloud.org/publicdata/ https://opendata.socrata.com/ http://www.data.gov http://www.factual.com/

  11. Generating value with alternative data • Revenue surprise estimates • Operating GAAP measures • Non GAAP measures • Churn, etc • Fully or partially automated quant strategies • Non equity asset classes • PE could benefit from the same operating metrics for diligence • PM Development and Big Data Thought Leadership • Strategic Investments • Marketing Tool for Raising Capital and Talent Recruitment

  12. The small sample problem of big data • The predictors are big data • The response variable is small data • Control a model • Use common sense when modeling • Parsimonious models • Penalize complexity • Test on out of sample • Bootstrap • Cross validation • Increase data points per model • Reduce the number of models • Use one model to describe the behavior of multiple companies • Estimate intra-quarter data

  13. Increase sample size by interpolating monthly revenues • Using only the time series in question • Linear interpolation doesn’t capture intra -quarter variation • Spline interpolation is better, but curvature is controlled via arbitrary polynomial and knot setting, not informed by any external data

  14. Why R? • Native and near native handling of time series data • Zoo, XTS • Date format nightmares fixed with lubridate • Rob J Hyndman Forecast Package • TSclust and TSdist • Very specialized spline, constrained spline, localized regression, state space and other methods

  15. Real example, Fastenal Fastenal Company sells industrial and construction supplies to Revenues (MM) 2011 Q1 $641 end-users (business-to-business), and also has some walk-in 2011 Q2 $702 retail business. The Company's product offerings include 2011 Q3 $727 fasteners and other industrial and construction supplies, many 2011 Q4 $698 of which are sold under the Fastenal product name. 2012 Q1 $769 2012 Q2 $805 2012 Q3 $803 2012 Q4 $757 2013 Q1 $806 2013 Q2 $848 2013 Q3 $858 2013 Q4 $814 2014 Q1 $877 2014 Q2 $950 2014 Q3 $981 2014 Q4 $926

  16. FAST Actuals – Monthly Reported

  17. FAST Actuals – Monthly Cumulative

  18. FAST Actuals – Monthly Cumulative De- trended

  19. FAST Actuals – Quarterly Cumulative De- trended

  20. FAST Actuals – Quarterly Cumulative De- trended – Interpolated library(zoo) na.spline()

  21. FAST Actuals – Monthly Cumulative De- trended – Interpolated Errors

  22. FAST Actuals – Monthly Actuals vs. Interpolated - Errors Date FAST Actual Linear Interpolation % Error Apr-14 $315,148 $313,166 1% May-14 $313,535 $323,605 3% Jun-14 $321,255 $353,166 5% Jul-14 $323,696 $330,492 2% Aug-14 $326,897 $330,492 1% Sep-14 $330,221 $319,831 3% Oct-14 $356,300 $312,107 12% Nov-14 $286,292 $302,039 6% Dec-14 $283,662 $312,107 10% Jan-15 $313,474 $328,365 5% Feb-15 $298,189 $296,588 1% Mar-15 $341,654 $328,365 4% 7% MAPE

  23. Increase sample size by interpolating monthly revenues • Adding external datasets to help estimate monthly series • Fit a localized regression using external datasets as predictors • Use a combination approach of interpolation and localized regression

  24. Interpolation: Using external data • Compile a library of potentially related monthly time series’ from Quandl (could use other data feed providers such as Bloomberg) • Aggregate to a quarterly frequency • Compute a TS distance metric between company revenues and Quandl time series (must account for low frequency, non stationary) • Use distance weighted monthly data to compute restricted spline curves between the quarterly data points with COBS

  25. Quandl

  26. Search Quandl for matching data

  27. Relevant datasets from Quandl quandl_source quandl_code quandl_name quandl_frequency Distance Metric Merchant Wholesalers, Except Manufacturers' Sales Federal Reserve FRED/S4235SM144NCEN monthly 0.93 Branches and Offices Sales: Durable Goods: Metals Value of Manufacturers' Shipments for Nondurable Federal Reserve FRED/U26SVS monthly 0.92 Goods Industries: Plastics and Rubber Products Value of Manufacturers' Shipments for Durable Goods Industries: Primary Metals: Iron and Steel Federal Reserve FRED/U31AVS monthly 0.92 Mills Chemicals and Allied Products: U.S. Total Not USCENSUS/BI_MWTS_4246_S Seasonally Adjusted Sales - Monthly [Millions of U.S. Census monthly 0.90 M Dolla Job Openings and Labor Turnover Survey - Hires rate, BLS BLS/JTU30000000HIR monthly 0.88 Manufacturing (NSA)

  28. Helper function to normalize external datasets # Transform few monthly predictors datasets into one dataset # Transformation is based on ratio of month in quarter # Calculate monthly ratios of quarterly results df_idxs_m_top_pred <- df_idxs_m_top %>% mutate(month_ratio = value / quartval ) %>% # Calculate month ratio in quarter inner_join(df_cie_q, by = c ("quart")) %>% # Join original quarter data select(quandl_code, var , pos, date , quart, idx_value = value, idx_cumval = cumval.x, idx_quartval = quartval.x, idx_month_ratio = month_ratio, cie_quartval = quartval.y) %>% mutate(cie_value_pred = cie_quartval * idx_month_ratio) %>% # Calculate monthly data estimation ungroup()

  29. FAST Actuals Quarterly with External Monthly • Merchant Wholesalers, Except M anufacturer’s Sales Branches and Office Sales: Durable Goods: Metals • Value of Manufacturer’s Shipments for Nondurable Goods Industries: Plastics and Rubber Products • Value of Manufacturer’s Shipments for Nondurable Goods Industries: Primary Metals: Iron and Steel Mills • Chemicals and Allied Products: U.S. Total Not Seasonally Adjusted Sales – Monthly [Millions of Dollars]

  30. Monthly Revenue Interpolation Desired monthly revenue interpolation system: 1. Guarantee a monotonically increasing solution 2. Capture local density like a kernel smoother or local regression 3. Guarantee that the solution passes through certain points like an interpolation 4. Penalize relative changes in curvature not just penalizing total curvature 5. Allows predictor weighting at the localized regressions

  31. COBS in action • Pointwise constrain the solutions to pass through quarterly cumulative points from target company • Mandate monotonically increasing solution • Use correlations as weights for the local regressions

  32. Cobs Solution date actuals pred error 4/1/2011 222.566 220.4966 -1% 5/1/2011 232.046 235.5108 1% 6/1/2011 247.118 244.0034 -1% 7/1/2011 222.344 225.3296 1% 8/1/2011 259.393 259.6066 0% 9/1/2011 245.005 246.9389 1% 10/1/2011 246.735 246.8107 0% 11/1/2011 234.357 235.8555 1% 12/1/2011 216.712 216.1679 0% 1/1/2012 245.97 248.2574 1% 2/1/2012 247.248 243.2598 -2% 3/1/2012 275.657 274.6227 0% 4/1/2012 261.093 263.568 1% 5/1/2012 274.836 274.8073 0% 6/1/2012 268.961 270.5066 1% 7/1/2012 261.796 257.2763 -2% 8/1/2012 290.447 292.2593 1% 9/1/2012 250.335 245.8927 -2% 10/1/2012 288.54 284.7738 -1% 11/1/2012 253.493 253.2018 0% LOESS MAPE: 4.9% COBS MAPE: 1.9%

  33. To do: • Move monotonically increasing constraint to a Bayesian prior Code available soon via Github Questions: gene.ekster@gmail.com

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend