Alternative Data in Finance Example: Lodging Key Metrics Occupancy - PowerPoint PPT Presentation

Alternative Data in Finance

Example: Lodging Key Metrics Occupancy x Room Rate ~ Revenues Online Room Number of lights on Rates Alternative Data

Alternative Data 1. Point of sale transactions 2. Online behavior 3. Purchases 1. Online 2. Brick and mortar 4. Obscure public records 5. Drone footage analysis ;) 6. Etc etc etc

Supply Chain 1. Data Vendors / Suppliers 2. Aggregators and Analysts 3. Clients / Funds

Outline • Basic Example (done) • What's Alternative Big Data (done) • Sourcing • Compliance and ethics • Predicting revenue and other uses • Walk though of common technical challenges • Basic trading strategy • Q & A

Data Sourcing • Direct data gathering • Data vendors • Just download the data (JDD)

Data gathering / Sourcing • Harvest the web • Primary Research

Harvesting: Build or Buy? Build Buy Control over compliance procedures Faster to scale All IP and harvesting target information Back data stays in house Complete control over costs Risk mitigated by an intermediary Some structuring of the data done by vendor Leverage vendors’ expertise in the data and spidering * Tip for finding web harvesting firms: Look on LinkedIn for folks with web scraping skills and see who they work for.

Harvesting: Symantec web • Diffbot recognizes the content of web pages • Compares against schema.org’s structures • Automatically collect structured data without explicit structure definitions • Adjusts for changes in page layouts

Primary Research • Expert networks • Surveys • New ways to look at the world • Receipts • Serial numbers • Alexa or other web monitoring tools • Google trends • Classified • Drone footage

Evaluating Datasets • Scarcity • How widely used or marketed is it? • Granularity • Time • Aggregation levels • How structured is it? • Coverage • Sectors / Stocks – Hedge fund motels? • Geo * Creating a standardized quantitative scoring system or ROI matrix to evaluate datasets based on these criteria is a worthwhile endeavor

Evaluating Vendors • Companies monetizing their exhaust data • High quality high margin revenue • Upstream insights from buyer • Traditional data vendors • Survey data • Financial data aggregation • Hybrids • 1010 / ITG

Free Datasets http://aws.amazon.com/datasets http://databib.org http://datacite.org http://figshare.com http://linkeddata.org http://reddit.com/r/datasets http://thedatahub.org alias http://ckan.net http://quandl.com http://enigma.io Hundreeds more! http://www.quora.com/Where-can-I-find-large-datasets- open-to-the-public

High opportunity datasets • International • Asia • Latam • Insight into margins • Companies are more EPS surprise sensitive than revenue surprise sensitive • COGS • SG&A • Etc • B2B

Compliance overview • Intent / Ethics • Regulatory

Compliance overview Restricted Production Data Vendor Environment Environment PII Scrubbing Organization Process / Encrypted Archiving

Compliance overview: Guidelines / Control Frameworks • NIST 800-122 • GLBA (Gramm-Leach-Bliley Act) • COBIT 5 • COSO 2013

Compliance overview • Just use regular expressions ^(?:(?=.*\d)(?=.*[A-Z])(?=.*[a-z])|(?=.*\d)(?=.*[Â-Za-z0-9])(?=.*[a-z])|(?=.*[Â-Za-z0-9])(?=.*[A- Z])(?=.*[a-z])|(?=.*\d)(?=.*[A-Z])(?=.*[Â-Za-z0-9]))(?!.*(.)\1{2,})[A-Za-z0- 9!~<>,;:_=?*+#."&§%°()\|\[\]\-\$\^\@\/]{8,32} [a-zA-Z]:|\\)\\)?(((\.)|(\.\.)|([^\\/:*?"|<>. ](([^\\/:*?"|<>. ])|([^\\/:*?"|<>]*[^\\/:*?"|<>. ]))?))\\)*[^\\/:*?"|<>. ](([^\\/:*?"|<>. ])|([^\\/:*?"|<>]*[^\\/:*?"|<>. ]))? ((25[0-5]|2[0-4][0-9]|19[0-1]|19[3-9]|18[0-9]|17[0-1]|17[3- 9]|1[0-6][0-9]|1[1-9]|[2-9][0-9]|[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9%[0-9A-Fa- f]{2}|[-()_.!~*';/?:@&=+$,A-Za-z0- 9])+)([).!';/?:^(([^<>()[\]\\.,;:\s@\"]+(\.[^<>()[\]\\.,;:\s@\"]+)*)|(\".+\"))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0- 9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$ * Use Regexp Buddy.

Compliance overview: Web Harvesting Precedent Cases • Major (and the majority) of cases. Its an uncharted territory • Feist Publications, Inc., v. Rural Telephone Service Co., • Ryanair Scraping Cases • Ebay vs Bidders Edge • Intel vs Hamidi • Cases discussing Browserwrap vs clickwraps • Cvent, Inc. v. Eventbrite, Inc • 3taps vs Craigslist • These do not apply to investment research

Compliance overview • Respect website’s TOS especially if in a Clickwrap • Sensibly web harvesting policy • Address incoming complaints • Limit number of http requests • Stay recent on laws and cases • Explicitly address headline risk and regulatory risk, create a cost benefit analysis for headline risk

Generating value with alternative data • Revenue surprise estimates • Operating GAAP measures • Non GAAP measures • Churn, etc • Fully or partially automated quant strategies • Non equity asset classes • PE could benefit from the same operating metrics for diligence • PM Development and Big Data Thought Leadership • Strategic Investments • Marketing Tool for Raising Capital and Talent Recruitment

Workflow and Process • Data Partners • Web Collection Third Party Data • Storage optimization Sources Data • Cleansing Visualizations L/S Acquisition Metrics • Benchmarking • De-biasing, Enrichment Teams Normalization Raw Data High Performance Data Analysts & Published Signal Computing Production • GAAP / Operating Metrics Quant • Quant Signals Sector Interpretive Modeling • Investment Thesis Insights Teams Research Research Data • Metrics Reporting R&D Quant • R&D Portfolio Vendors • Published Signal Deliverable

The shifting bias longitudinal panel problem Full Panel Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec User 1 User 2 User 3 User 4 User 5 User 6 User 7 User 8 User 9 User 10 Panel with user add and churn (missing data MAR) Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec User 1 User 2 User 3 User 4 User 5 User 6 User 7 User 8 User 9 User 10

The 200k and the ~800k are different Solutions: Total Spend Index • Imputation 5 • 4.5 Complete case analysis 4 • Weighting methods 3.5 3 2.5 2 1.5 1 0.5 0 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec The complete panel - ~200k users Users who have the second year of data, but not the first ……………. Dashed Line - 95% confidence N(μ,σ2).

Complete Panel and the rest of users are different Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Panel 1 >200K Users (680K) Panel 2 Panel 3 Panel 4 Panel 5 Panel 6 Panel 7 Panel 8 Panel 9 Panel 10

Complete Panel and the rest of users are different Jan Jan Feb Feb Mar Mar Apr Apr May May Jun Jun Jul Jul Aug Aug Sep Sep Oct Oct Nov Nov Dec Dec Panel 1 Panel 1 680K >200K Users (680K) Panel 2 Panel 2 720K Panel 3 Panel 3 Panel 4 Panel 4 Panel 5 Panel 5 Panel 6 Panel 6 Panel 7 Panel 7 Panel 8 Panel 8 Panel 9 Panel 9 Panel 10 Panel 10

Complete Panel and the rest of users are different Jan Jan Feb Feb Mar Mar Apr Apr May May Jun Jun Jul Jul Aug Aug Sep Sep Oct Oct Nov Nov Dec Dec Panel 1 Panel 1 680K >200K Users (680K) Panel 2 Panel 2 720K Panel 3 Panel 3 740K Panel 4 Panel 4 760K Panel 5 Panel 5 Panel 6 Panel 6 Panel 7 Panel 7 Panel 8 Panel 8 Panel 9 Panel 9 Panel 10 Panel 10 Many users are the The further apart the panels, the less user overlap, P1 – P22 only same, ~90% overlap ~32% overlap, most users different

Multivariate Time Series Clustering User A User C 25 180 25 350 160 300 20 20 140 250 120 15 15 200 100 80 150 10 10 Sum of Cnt.4 Sum of Cnt.1 60 100 40 Sum of DPT.4 Sum of DPT.1 5 5 50 20 0 0 0 0 User B User D 20 160 25 350 18 140 300 16 20 120 250 14 100 12 15 200 10 80 150 8 10 60 Sum of Cnt.2 Sum of Cnt.1 6 100 40 Sum of DPT.2 Sum of DPT.1 4 5 50 20 2 0 0 0 0

Multivariate Time Series Clustering The pdc package) takes a permutation distribution, User A which is as measure of the complexity of a time series. Similarity of time series' is constructed as the distance User B between their permutation distributions. It allows us to make groupings, based on multiple variables, over time. User C clust<-pdclust(datamatrix, m=4) plot(clust, cols=c("red", "blue", "red", "blue")) User D

User dropout in a longitudinal panel • We cluster each panel • Can use multivariate time series clustering like pdclust • Cluster on number of transactions and avg transaction amount, low covariance features • Each panel’s cluster boundaries are independently defined January February March April May June July August SeptemberOctober Panel 1 Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Panel 2 Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5

Alternative Data in Finance Example: Lodging Key Metrics Occupancy - PowerPoint PPT Presentation

Alternative Data in Finance Example: Lodging Key Metrics Occupancy x Room Rate ~ Revenues Online Room Number of lights on Rates Alternative Data Alternative Data 1. Point of sale transactions 2. Online behavior 3. Purchases 1. Online

Alternative Data in Finance What is Alternative Data in Finance? Example: Lodging Key Metrics

Lodging Industry Update The Pulse of the Hospitality Industry Pinkowski & Company

Lodging Industry Update Year End 2015 Pinkowski & Company Metropolitan Memphis Hotel &

National Lodging Renovation Management NATIONAL LODGING RENOVATION MANAGEMENT IS A UNIQUE

National Lodging Renovation Management NATIONAL LODGING RENOVATION MANAGEMENT IS A UNIQUE

2013 Los Angeles Year End Lodging Statistics 2013 Jan-Dec YE Lodging Performance LA County &

Local lodging as an asset to the Algarve Tourism Industry A one stop shop event for those

Local Lodging TaxBasic Facts (1 (1) Lodging taxes are a local option. 2015 report

What we learned from Community Metrics Agenda Why are metrics used? How metrics are used

Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics

AGENCY OPERATIONS METRICS The Metrics of Me The Metrics of Me x 159 13,006 5 days old books

Proposal Metrics Dashboard What Gets Measured Gets Done Topics Why Keep Metrics? What

CorePoint Lodging Investor Presentation November 2018 Safe Harbor Disclosure This document has

Legal Regime for Local Lodging (RJAL) Faro, Abril 2015 Ana Blanco Decreto-Lei n 128/2014, 29

U.S. Lodging Industry 2019 Navigating at Peak SWOT Analysis April 4, 2019 Daniel H. Lesser

Software Metrics Alex Boughton Executive Summary What are software metrics? Why are

E me rg ing L e a de rs o f Ga ming G2E Pre vie w Ple a se sta nd by. We bina r will be g

Q3 2018 Cautionary notes CAUTIONARY NOTE REGARDING FORWARD-LOOKING INFORMATION This document may

10/25/2011 1 2 Alternate Title Rejected By W iser Heads Than Me: Contract Negotiations : W

Top tips to continue fundraising for your church during COVID-19 and beyond July 2020 Developed

Tax Reform and Related Implications for Business Owners January 2018 Tax Reform Summary Snap

Pricing Algorithms and Tacit Collusion Bruno Salcedo The Pennsylvania State University January

a n I K E A x a i r b n b e x p e r i e n c e CO M P E T E N C I E S The IKEA shopping

Presented at CERN - Geneva, Switzerland March 27, 2009 Dean Nelson Sr. Director, Global Lab

Alternative Data in Finance Example: Lodging Key Metrics Occupancy - PowerPoint PPT Presentation

Alternative Data in Finance Example: Lodging Key Metrics Occupancy x Room Rate ~ Revenues Online Room Number of lights on Rates Alternative Data Alternative Data 1. Point of sale transactions 2. Online behavior 3. Purchases 1. Online

Alternative Data in Finance What is Alternative Data in Finance? Example: Lodging Key Metrics

Lodging Industry Update The Pulse of the Hospitality Industry Pinkowski &amp; Company

Lodging Industry Update Year End 2015 Pinkowski &amp; Company Metropolitan Memphis Hotel &amp;

National Lodging Renovation Management NATIONAL LODGING RENOVATION MANAGEMENT IS A UNIQUE

National Lodging Renovation Management NATIONAL LODGING RENOVATION MANAGEMENT IS A UNIQUE

2013 Los Angeles Year End Lodging Statistics 2013 Jan-Dec YE Lodging Performance LA County &amp;

Local lodging as an asset to the Algarve Tourism Industry A one stop shop event for those

Local Lodging TaxBasic Facts (1 (1) Lodging taxes are a local option. 2015 report

What we learned from Community Metrics Agenda Why are metrics used? How metrics are used

Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics

AGENCY OPERATIONS METRICS The Metrics of Me The Metrics of Me x 159 13,006 5 days old books

Proposal Metrics Dashboard What Gets Measured Gets Done Topics Why Keep Metrics? What

CorePoint Lodging Investor Presentation November 2018 Safe Harbor Disclosure This document has

Legal Regime for Local Lodging (RJAL) Faro, Abril 2015 Ana Blanco Decreto-Lei n 128/2014, 29

U.S. Lodging Industry 2019 Navigating at Peak SWOT Analysis April 4, 2019 Daniel H. Lesser

Software Metrics Alex Boughton Executive Summary What are software metrics? Why are

E me rg ing L e a de rs o f Ga ming G2E Pre vie w Ple a se sta nd by. We bina r will be g

Q3 2018 Cautionary notes CAUTIONARY NOTE REGARDING FORWARD-LOOKING INFORMATION This document may

10/25/2011 1 2 Alternate Title Rejected By W iser Heads Than Me: Contract Negotiations : W

Top tips to continue fundraising for your church during COVID-19 and beyond July 2020 Developed

Tax Reform and Related Implications for Business Owners January 2018 Tax Reform Summary Snap

Pricing Algorithms and Tacit Collusion Bruno Salcedo The Pennsylvania State University January

a n I K E A x a i r b n b e x p e r i e n c e CO M P E T E N C I E S The IKEA shopping

Presented at CERN - Geneva, Switzerland March 27, 2009 Dean Nelson Sr. Director, Global Lab

Lodging Industry Update The Pulse of the Hospitality Industry Pinkowski & Company

Lodging Industry Update Year End 2015 Pinkowski & Company Metropolitan Memphis Hotel &

2013 Los Angeles Year End Lodging Statistics 2013 Jan-Dec YE Lodging Performance LA County &