Alternative Data in Finance What is Alternative Data in Finance? - - PowerPoint PPT Presentation

alternative data in finance what is alternative data in
SMART_READER_LITE
LIVE PREVIEW

Alternative Data in Finance What is Alternative Data in Finance? - - PowerPoint PPT Presentation

Alternative Data in Finance What is Alternative Data in Finance? Example: Lodging Key Metrics Occupancy x Room Rate ~ Revenues Example: Lodging Key Metrics Occupancy x Room Rate ~ Revenues Number of lights on Example: Lodging Key


slide-1
SLIDE 1

Alternative Data in Finance

slide-2
SLIDE 2

What is Alternative Data in Finance?

slide-3
SLIDE 3

Example: Lodging Key Metrics

Occupancy Room Rate Revenues x ~

slide-4
SLIDE 4

Example: Lodging Key Metrics

Occupancy Room Rate Revenues x ~

Number of lights on

slide-5
SLIDE 5

Example: Lodging Key Metrics

Occupancy Room Rate Revenues x ~

Number of lights on

Online Room Rates

slide-6
SLIDE 6

Example: Lodging Key Metrics

Occupancy Room Rate Revenues x ~

Number of lights on

Online Room Rates

Alternative Data

slide-7
SLIDE 7

Alternative Dataset Examples

  • 1. Point of sale transactions
  • 2. Online behavior
  • 3. Purchases

1. Online 2. Brick and mortar

  • 4. Obscure public records
  • 5. Satellite imagery
  • 6. Etc etc etc
slide-8
SLIDE 8

Data Sourcing

  • Direct data gathering
  • Data vendors
  • Just download the data (JDD)
slide-9
SLIDE 9

Free Datasets

http://aws.amazon.com/datasets http://databib.org http://datacite.org http://figshare.com http://linkeddata.org http://reddit.com/r/datasets http://thedatahub.org alias http://ckan.net http://quandl.com http://enigma.io More here: http://www.quora.com/Where-can-I-find-large-datasets-open- to-the-public

slide-10
SLIDE 10

And even MORE Free-ish Datasets

http://knoema.com http://www.google.com/publicdata/directory http://datahub.io http://datamob.org http://www.freebase.com http://www.xdayta.com http://www.redliondata.com http://opendata.arcgis.com http://www.bigdataexchange.com https://www.opensciencedatacloud.org/publicdata/ https://opendata.socrata.com/ http://www.data.gov http://www.factual.com/

slide-11
SLIDE 11

Generating value with alternative data

  • Revenue surprise estimates
  • Operating GAAP measures
  • Non GAAP measures
  • Churn, etc
  • Fully or partially automated quant strategies
  • Non equity asset classes
  • PE could benefit from the same operating metrics for diligence
  • PM Development and Big Data Thought Leadership
  • Strategic Investments
  • Marketing Tool for Raising Capital and Talent Recruitment
slide-12
SLIDE 12
slide-13
SLIDE 13

The small sample problem of big data

  • The predictors are big data
  • The response variable is small data
  • Control a model
  • Use common sense when modeling
  • Parsimonious models
  • Penalize complexity
  • Test on out of sample
  • Bootstrap
  • Cross validation
  • Increase data points per model
  • Reduce the number of models
  • Use one model to describe the behavior of multiple companies
  • Estimate intra-quarter data
slide-14
SLIDE 14

Increase sample size by interpolating monthly revenues

  • Using only the time series in question
  • Linear interpolation doesn’t capture intra-quarter variation
  • Spline interpolation is better, but curvature is controlled via arbitrary

polynomial and knot setting, not informed by any external data

slide-15
SLIDE 15

Why R?

  • Native and near native handling of time series data
  • Zoo, XTS
  • Date format nightmares fixed with lubridate
  • Rob J Hyndman Forecast Package
  • TSclust and TSdist
  • Very specialized spline, constrained spline, localized regression, state

space and other methods

slide-16
SLIDE 16

Fastenal Company sells industrial and construction supplies to end-users (business-to-business), and also has some walk-in retail business. The Company's product offerings include fasteners and other industrial and construction supplies, many

  • f which are sold under the Fastenal product name.

Real example, Fastenal

Revenues (MM) 2011 Q1 $641 2011 Q2 $702 2011 Q3 $727 2011 Q4 $698 2012 Q1 $769 2012 Q2 $805 2012 Q3 $803 2012 Q4 $757 2013 Q1 $806 2013 Q2 $848 2013 Q3 $858 2013 Q4 $814 2014 Q1 $877 2014 Q2 $950 2014 Q3 $981 2014 Q4 $926

slide-17
SLIDE 17

FAST Actuals – Monthly Reported

slide-18
SLIDE 18

FAST Actuals – Monthly Cumulative

slide-19
SLIDE 19

FAST Actuals – Monthly Cumulative De- trended

slide-20
SLIDE 20

FAST Actuals – Quarterly Cumulative De- trended

slide-21
SLIDE 21

FAST Actuals – Quarterly Cumulative De- trended – Interpolated

library(zoo) na.spline()

slide-22
SLIDE 22

FAST Actuals – Monthly Cumulative De- trended – Interpolated Errors

slide-23
SLIDE 23

FAST Actuals – Monthly Actuals vs. Interpolated - Errors

Date FAST Actual Linear Interpolation % Error Apr-14 $315,148 $313,166 1% May-14 $313,535 $323,605 3% Jun-14 $321,255 $353,166 5% Jul-14 $323,696 $330,492 2% Aug-14 $326,897 $330,492 1% Sep-14 $330,221 $319,831 3% Oct-14 $356,300 $312,107 12% Nov-14 $286,292 $302,039 6% Dec-14 $283,662 $312,107 10% Jan-15 $313,474 $328,365 5% Feb-15 $298,189 $296,588 1% Mar-15 $341,654 $328,365 4% 7% MAPE

slide-24
SLIDE 24

Increase sample size by interpolating monthly revenues

  • Adding external datasets to help estimate monthly series
  • Fit a localized regression using external datasets as predictors
  • Use a combination approach of interpolation and localized regression
slide-25
SLIDE 25

Interpolation: Using external data

  • Compile a library of potentially related monthly time series’ from Quandl

(could use other data feed providers such as Bloomberg)

  • Aggregate to a quarterly frequency
  • Compute a TS distance metric between company revenues and Quandl time

series (must account for low frequency, non stationary)

  • Use distance weighted monthly data to compute restricted spline curves

between the quarterly data points with COBS

slide-26
SLIDE 26

Quandl

slide-27
SLIDE 27

Search Quandl for matching data

slide-28
SLIDE 28

Relevant datasets from Quandl

quandl_source quandl_code quandl_name quandl_frequency Distance Metric Federal Reserve FRED/S4235SM144NCEN

Merchant Wholesalers, Except Manufacturers' Sales Branches and Offices Sales: Durable Goods: Metals

monthly 0.93 Federal Reserve FRED/U26SVS

Value of Manufacturers' Shipments for Nondurable Goods Industries: Plastics and Rubber Products

monthly 0.92 Federal Reserve FRED/U31AVS

Value of Manufacturers' Shipments for Durable Goods Industries: Primary Metals: Iron and Steel Mills

monthly 0.92 U.S. Census USCENSUS/BI_MWTS_4246_S M

Chemicals and Allied Products: U.S. Total Not Seasonally Adjusted Sales - Monthly [Millions of Dolla

monthly 0.90 BLS BLS/JTU30000000HIR

Job Openings and Labor Turnover Survey - Hires rate, Manufacturing (NSA)

monthly 0.88

slide-29
SLIDE 29

Helper function to normalize external datasets

# Transform few monthly predictors datasets into one dataset # Transformation is based on ratio of month in quarter # Calculate monthly ratios of quarterly results df_idxs_m_top_pred <- df_idxs_m_top %>% mutate(month_ratio = value / quartval ) %>% # Calculate month ratio in quarter inner_join(df_cie_q, by = c("quart")) %>% # Join original quarter data select(quandl_code, var, pos, date, quart, idx_value = value, idx_cumval = cumval.x, idx_quartval = quartval.x, idx_month_ratio = month_ratio, cie_quartval = quartval.y) %>% mutate(cie_value_pred = cie_quartval * idx_month_ratio) %>% # Calculate monthly data estimation ungroup()

slide-30
SLIDE 30

FAST Actuals Quarterly with External Monthly

  • Merchant Wholesalers, Except

Manufacturer’s Sales Branches and Office Sales: Durable Goods: Metals

  • Value of Manufacturer’s Shipments for

Nondurable Goods Industries: Plastics and Rubber Products

  • Value of Manufacturer’s Shipments for

Nondurable Goods Industries: Primary Metals: Iron and Steel Mills

  • Chemicals and Allied Products: U.S. Total

Not Seasonally Adjusted Sales – Monthly [Millions of Dollars]

slide-31
SLIDE 31

Monthly Revenue Interpolation

Desired monthly revenue interpolation system:

  • 1. Guarantee a monotonically increasing solution
  • 2. Capture local density like a kernel smoother or local regression
  • 3. Guarantee that the solution passes through certain points like an

interpolation

  • 4. Penalize relative changes in curvature not

just penalizing total curvature

  • 5. Allows predictor weighting at the localized regressions
slide-32
SLIDE 32

COBS in action

  • Pointwise constrain the

solutions to pass through quarterly cumulative points from target company

  • Mandate monotonically

increasing solution

  • Use correlations as weights

for the local regressions

slide-33
SLIDE 33

Cobs Solution

date actuals pred error 4/1/2011 222.566 220.4966

  • 1%

5/1/2011 232.046 235.5108 1% 6/1/2011 247.118 244.0034

  • 1%

7/1/2011 222.344 225.3296 1% 8/1/2011 259.393 259.6066 0% 9/1/2011 245.005 246.9389 1% 10/1/2011 246.735 246.8107 0% 11/1/2011 234.357 235.8555 1% 12/1/2011 216.712 216.1679 0% 1/1/2012 245.97 248.2574 1% 2/1/2012 247.248 243.2598

  • 2%

3/1/2012 275.657 274.6227 0% 4/1/2012 261.093 263.568 1% 5/1/2012 274.836 274.8073 0% 6/1/2012 268.961 270.5066 1% 7/1/2012 261.796 257.2763

  • 2%

8/1/2012 290.447 292.2593 1% 9/1/2012 250.335 245.8927

  • 2%

10/1/2012 288.54 284.7738

  • 1%

11/1/2012 253.493 253.2018 0%

LOESS MAPE: 4.9% COBS MAPE: 1.9%

slide-34
SLIDE 34

Code available soon via Github Questions: gene.ekster@gmail.com

To do:

  • Move monotonically increasing constraint to a Bayesian prior