Alternative Data in Finance What is Alternative Data in Finance? - - PowerPoint PPT Presentation
Alternative Data in Finance What is Alternative Data in Finance? - - PowerPoint PPT Presentation
Alternative Data in Finance What is Alternative Data in Finance? Example: Lodging Key Metrics Occupancy x Room Rate ~ Revenues Example: Lodging Key Metrics Occupancy x Room Rate ~ Revenues Number of lights on Example: Lodging Key
What is Alternative Data in Finance?
Example: Lodging Key Metrics
Occupancy Room Rate Revenues x ~
Example: Lodging Key Metrics
Occupancy Room Rate Revenues x ~
Number of lights on
Example: Lodging Key Metrics
Occupancy Room Rate Revenues x ~
Number of lights on
Online Room Rates
Example: Lodging Key Metrics
Occupancy Room Rate Revenues x ~
Number of lights on
Online Room Rates
Alternative Data
Alternative Dataset Examples
- 1. Point of sale transactions
- 2. Online behavior
- 3. Purchases
1. Online 2. Brick and mortar
- 4. Obscure public records
- 5. Satellite imagery
- 6. Etc etc etc
Data Sourcing
- Direct data gathering
- Data vendors
- Just download the data (JDD)
Free Datasets
http://aws.amazon.com/datasets http://databib.org http://datacite.org http://figshare.com http://linkeddata.org http://reddit.com/r/datasets http://thedatahub.org alias http://ckan.net http://quandl.com http://enigma.io More here: http://www.quora.com/Where-can-I-find-large-datasets-open- to-the-public
And even MORE Free-ish Datasets
http://knoema.com http://www.google.com/publicdata/directory http://datahub.io http://datamob.org http://www.freebase.com http://www.xdayta.com http://www.redliondata.com http://opendata.arcgis.com http://www.bigdataexchange.com https://www.opensciencedatacloud.org/publicdata/ https://opendata.socrata.com/ http://www.data.gov http://www.factual.com/
Generating value with alternative data
- Revenue surprise estimates
- Operating GAAP measures
- Non GAAP measures
- Churn, etc
- Fully or partially automated quant strategies
- Non equity asset classes
- PE could benefit from the same operating metrics for diligence
- PM Development and Big Data Thought Leadership
- Strategic Investments
- Marketing Tool for Raising Capital and Talent Recruitment
The small sample problem of big data
- The predictors are big data
- The response variable is small data
- Control a model
- Use common sense when modeling
- Parsimonious models
- Penalize complexity
- Test on out of sample
- Bootstrap
- Cross validation
- Increase data points per model
- Reduce the number of models
- Use one model to describe the behavior of multiple companies
- Estimate intra-quarter data
Increase sample size by interpolating monthly revenues
- Using only the time series in question
- Linear interpolation doesn’t capture intra-quarter variation
- Spline interpolation is better, but curvature is controlled via arbitrary
polynomial and knot setting, not informed by any external data
Why R?
- Native and near native handling of time series data
- Zoo, XTS
- Date format nightmares fixed with lubridate
- Rob J Hyndman Forecast Package
- TSclust and TSdist
- Very specialized spline, constrained spline, localized regression, state
space and other methods
Fastenal Company sells industrial and construction supplies to end-users (business-to-business), and also has some walk-in retail business. The Company's product offerings include fasteners and other industrial and construction supplies, many
- f which are sold under the Fastenal product name.
Real example, Fastenal
Revenues (MM) 2011 Q1 $641 2011 Q2 $702 2011 Q3 $727 2011 Q4 $698 2012 Q1 $769 2012 Q2 $805 2012 Q3 $803 2012 Q4 $757 2013 Q1 $806 2013 Q2 $848 2013 Q3 $858 2013 Q4 $814 2014 Q1 $877 2014 Q2 $950 2014 Q3 $981 2014 Q4 $926
FAST Actuals – Monthly Reported
FAST Actuals – Monthly Cumulative
FAST Actuals – Monthly Cumulative De- trended
FAST Actuals – Quarterly Cumulative De- trended
FAST Actuals – Quarterly Cumulative De- trended – Interpolated
library(zoo) na.spline()
FAST Actuals – Monthly Cumulative De- trended – Interpolated Errors
FAST Actuals – Monthly Actuals vs. Interpolated - Errors
Date FAST Actual Linear Interpolation % Error Apr-14 $315,148 $313,166 1% May-14 $313,535 $323,605 3% Jun-14 $321,255 $353,166 5% Jul-14 $323,696 $330,492 2% Aug-14 $326,897 $330,492 1% Sep-14 $330,221 $319,831 3% Oct-14 $356,300 $312,107 12% Nov-14 $286,292 $302,039 6% Dec-14 $283,662 $312,107 10% Jan-15 $313,474 $328,365 5% Feb-15 $298,189 $296,588 1% Mar-15 $341,654 $328,365 4% 7% MAPE
Increase sample size by interpolating monthly revenues
- Adding external datasets to help estimate monthly series
- Fit a localized regression using external datasets as predictors
- Use a combination approach of interpolation and localized regression
Interpolation: Using external data
- Compile a library of potentially related monthly time series’ from Quandl
(could use other data feed providers such as Bloomberg)
- Aggregate to a quarterly frequency
- Compute a TS distance metric between company revenues and Quandl time
series (must account for low frequency, non stationary)
- Use distance weighted monthly data to compute restricted spline curves
between the quarterly data points with COBS
Quandl
Search Quandl for matching data
Relevant datasets from Quandl
quandl_source quandl_code quandl_name quandl_frequency Distance Metric Federal Reserve FRED/S4235SM144NCEN
Merchant Wholesalers, Except Manufacturers' Sales Branches and Offices Sales: Durable Goods: Metals
monthly 0.93 Federal Reserve FRED/U26SVS
Value of Manufacturers' Shipments for Nondurable Goods Industries: Plastics and Rubber Products
monthly 0.92 Federal Reserve FRED/U31AVS
Value of Manufacturers' Shipments for Durable Goods Industries: Primary Metals: Iron and Steel Mills
monthly 0.92 U.S. Census USCENSUS/BI_MWTS_4246_S M
Chemicals and Allied Products: U.S. Total Not Seasonally Adjusted Sales - Monthly [Millions of Dolla
monthly 0.90 BLS BLS/JTU30000000HIR
Job Openings and Labor Turnover Survey - Hires rate, Manufacturing (NSA)
monthly 0.88
Helper function to normalize external datasets
# Transform few monthly predictors datasets into one dataset # Transformation is based on ratio of month in quarter # Calculate monthly ratios of quarterly results df_idxs_m_top_pred <- df_idxs_m_top %>% mutate(month_ratio = value / quartval ) %>% # Calculate month ratio in quarter inner_join(df_cie_q, by = c("quart")) %>% # Join original quarter data select(quandl_code, var, pos, date, quart, idx_value = value, idx_cumval = cumval.x, idx_quartval = quartval.x, idx_month_ratio = month_ratio, cie_quartval = quartval.y) %>% mutate(cie_value_pred = cie_quartval * idx_month_ratio) %>% # Calculate monthly data estimation ungroup()
FAST Actuals Quarterly with External Monthly
- Merchant Wholesalers, Except
Manufacturer’s Sales Branches and Office Sales: Durable Goods: Metals
- Value of Manufacturer’s Shipments for
Nondurable Goods Industries: Plastics and Rubber Products
- Value of Manufacturer’s Shipments for
Nondurable Goods Industries: Primary Metals: Iron and Steel Mills
- Chemicals and Allied Products: U.S. Total
Not Seasonally Adjusted Sales – Monthly [Millions of Dollars]
Monthly Revenue Interpolation
Desired monthly revenue interpolation system:
- 1. Guarantee a monotonically increasing solution
- 2. Capture local density like a kernel smoother or local regression
- 3. Guarantee that the solution passes through certain points like an
interpolation
- 4. Penalize relative changes in curvature not
just penalizing total curvature
- 5. Allows predictor weighting at the localized regressions
COBS in action
- Pointwise constrain the
solutions to pass through quarterly cumulative points from target company
- Mandate monotonically
increasing solution
- Use correlations as weights
for the local regressions
Cobs Solution
date actuals pred error 4/1/2011 222.566 220.4966
- 1%
5/1/2011 232.046 235.5108 1% 6/1/2011 247.118 244.0034
- 1%
7/1/2011 222.344 225.3296 1% 8/1/2011 259.393 259.6066 0% 9/1/2011 245.005 246.9389 1% 10/1/2011 246.735 246.8107 0% 11/1/2011 234.357 235.8555 1% 12/1/2011 216.712 216.1679 0% 1/1/2012 245.97 248.2574 1% 2/1/2012 247.248 243.2598
- 2%
3/1/2012 275.657 274.6227 0% 4/1/2012 261.093 263.568 1% 5/1/2012 274.836 274.8073 0% 6/1/2012 268.961 270.5066 1% 7/1/2012 261.796 257.2763
- 2%
8/1/2012 290.447 292.2593 1% 9/1/2012 250.335 245.8927
- 2%
10/1/2012 288.54 284.7738
- 1%
11/1/2012 253.493 253.2018 0%
LOESS MAPE: 4.9% COBS MAPE: 1.9%
Code available soon via Github Questions: gene.ekster@gmail.com
To do:
- Move monotonically increasing constraint to a Bayesian prior