Predicting Risk from Financial Reports with Regression Shimon - PowerPoint PPT Presentation

Predicting Risk from Financial Reports with Regression Shimon Kogan, University of Texas at Austin Dimitry Levin, Carnegie Mellon University Bryan R. Routledge, Carnegie Mellon University Jacob S. Sagi, Vanderbilt University Noah A. Smith, Carnegie Mellon University

Talk In A Nutshell financial risk = f(financial report) volatility SV Form 10-K, of returns regression Item 7

What This Talk Isn’t and Is New statistical models for NLP ... Exciting text domains like political blogs ... Advances in applications like translation and summarization ...

What This Talk Isn’t and Is Shay Cohen, 10:40 am yesterday New statistical models for NLP ... Tae Yano, 10:40 am Exciting text domains tomorrow like political blogs ... Advances in Ashish applications like Venugopal, translation and right now summarization ... André Martins, 11 am Thursday

What This Talk Isn’t and Is New statistical models for NLP ... Exciting text domains like political blogs ... Advances in applications like translation and summarization ...

What This Talk Isn’t and Is Bag of terms representation and New statistical models SVR model. for NLP ... Boring (to read) text domain of financial Exciting text domains reports. like political blogs ... Advances in Under-explored applications like application: translation and forecasting . summarization ...

See Also ... • Lavrenko et al. (2000), Koppel and Shtrimberg (2004), and others: prices • Blei and McAuliffe (2007): popularity • Lerman et al. (2008): prediction markets

Outline • Mini-lesson in finance • A new text-driven forecasting task • Regression models trained on text • Experimental results and analysis • Outlook

Finance Allocation of wealth (e.g., money) across time and risk (states of nature).

Finance From an NLP perspective: crucial information about your investments that’s buried in documents you’d rather not read.

financial risk = f(financial report)

financial risk = f(financial report) volatility of returns

What is Risk? • Return on day t: closingprice t + dividends t = − 1 r t closingprice t − 1 • Sample standard deviation from day t - τ to day t: � � � τ � � r ) 2 = ( r t − i − ¯ v [ t − τ ,t ] τ � i =0 • This is called measured volatility.

Why Not Predict Returns, Get Rich, Retire Early? • Hard: predicting a stock’s performance. • To predict returns , we would need to find new information. • Our reports probably don’t contain new information (10-Ks do not precede big price changes).

Will This Talk Make Anyone Rich? • Some people think you can exploit accurate volatility predictions. • I’m not really qualified to give financial advice. • Consulting to portfolio/wealth managers is a huge industry.

So Then Why Do Finance Researchers Care? • Models of economics and finance treat information simplistically. • No notion of extracting information from large amounts of raw data . • These reports are produced at huge expense. Are they worth it?

Important Property of Volatility • Autoregressive conditional heteroscedacity: volatility tends to be stable (over horizons like ours). • v [t - τ , t] is a strong predictor of v [t, t + τ ] • This is our strong baseline.

financial risk = f(financial report) volatility Form 10-K, of returns Item 7

Form 10-K, Item 7 Item 7. Management’s Discussion and Analysis of Financial Condition and Results of Operations Overview We are primarily engaged in the worldwide production and marketing of cars and trucks. We operate in two businesses, consisting of our automotive operations, which we also refer to as Automotive, GM Automotive or GMA, that includes our four automotive segments consisting of General GMNA, GME, GMLAAM and GMAP, and our financing and insurance operations (FIO). Our finance and insurance operations are primarily conducted through GMAC, a wholly-owned Motors subsidiary through November 2006. On November 30, 2006, we sold a 51% controlling ownership interest in GMAC to a consortium of investors. After the sale, we have accounted for our 49% ownership interest in GMAC under the equity method. GMAC provides a broad range of Corp. financial services, including consumer vehicle financing, automotive dealership and other commercial financing, residential mortgage services, automobile service contracts, personal automobile insurance coverage and selected commercial insurance coverage. March 5, Automotive Industry In 2008, the global automotive industry has been severely affected by the deepening global credit 2009 crisis, volatile oil prices and the recession in North America and Western Europe, decreases in the employment rate and lack of consumer confidence. The industry continued to show growth in Eastern Europe, the LAAM region and in Asia Pacific, although the growth in these areas moderated from previous levels and is beginning to show the effects of the credit market crisis which began in the United States and has since spread to Western Europe and the rest of the world. Global industry vehicle sales to retail and fleet customers were 67.1 million units in 2008, representing a 5.1% decrease compared to 2007. We expect industry sales to be approximately 57.5 million units in 2009.

Our Corpus • Edgar database at http://www.sec.gov • 26,806 examples of Item 7, 1996-2006 • 247.7 million words in total • http://www.ark.cs.cmu.edu/10K

“Annotation” • For each report at time t, we gathered • “Historical” volatility: v [t - 1y, t] • “Future” volatility: v [t, t + 1y] • Source: Center for Research in Security Prices U.S. Stocks Databases

Methodology • Input: Item 7 and/or historical volatility • Output: predicted future volatility • Test on (input, output) pairs from year Y • Train on (input, output) from years < Y • Evaluation: MSE of (log) volatility

financial risk = f(financial report) volatility SV Form 10-K, of returns regression Item 7

Support-Vector Regression (Drucker et al., 1997) • Predicted future volatility is a function of a document (Item 7), d , and a weight vector w : v = f ( d ; w ) ˆ • The training criterion: � � � � N 1 2 � w � 2 + C � � � min max 0 , � v i − f ( d i ; w ) � − ǫ � � N � � w ∈ R d i =1 prediction within ε of correct regularize

Representation N N N N � � � � f ( d ; w ) = h ( d ) ⊤ w = α i h ( d ) ⊤ h ( d i ) α i K ( d , d i ) = i =1 i =1 i =1 i =1 • Vector-space model (tf, tfidf, etc.) • So far, unigrams and bigrams • Linear kernel (for interpretability) N � w = α i h ( d i ) i =1

Representation N N � � f ( d ; w ) = h ( d ) ⊤ w = α i h ( d ) ⊤ h ( d i ) α i K ( d , d i ) = i =1 i =1 • Vector-space model (tf, tfidf, etc.) • So far, unigrams and bigrams dual • Linear kernel (for interpretability) N � w = α i h ( d i ) i =1

Experiment • Test on year Y. • Train on (Y - 5, Y - 4, Y - 3, Y - 2, Y - 1). • Six such splits. • Compare history-only baseline, text-only SVR, combined SVR.

MSE of Log-Volatility History Text 0.210 Text + History 0.188 * 0.165 * * 0.143 * * 0.120 2001 2002 2003 2004 2005 2006 Micro-ave. lower is better Using “log(1+freq.)” representation on all unigrams and bigrams. See paper.

Dominant Weights (2000-4) loss 0.025 net income -0.021 net loss 0.017 rate -0.017 year # 0.016 properties -0.014 expenses 0.015 dividends -0.013 going concern 0.014 lower interest -0.012 a going 0.013 critical accounting -0.012 administrative 0.013 insurance -0.011 personnel 0.013 distributions -0.011 high volatility words low volatility words

MSE of Log-Volatility History Text 0.210 Text + History 0.188 * 0.165 * * 0.143 * * 0.120 2001 2002 2003 2004 2005 2006 Micro-ave. lower is better Using “log(1+freq.)” representation on all unigrams and bigrams. See paper.

Changes Over Time average length of Item 7 13,000 9,750 6,500 3,250 0 ‘96 ‘97 ‘98 ‘99 ‘00 ‘01 ‘02 ‘03 ‘04 ‘05 ‘06

2002 • Enron and other accounting scandals • Sarbanes-Oxley Act of 2002 • Longer reports • Are the reports more informative after 2002? Because of Sarbanes-Oxley?

Changes In w change from previous weights 62 58 54 50 ‘97-’01 ‘98-’02 ‘99-’03 ‘00-’04 ‘01-’05 Measured in L 1 distance; based on unigram model with “log(1 + freq.)” representation.

Language Over Time 0.005 8 ave. term frequency 0 6 estimates -0.005 4 w -0.010 accounting policies 2 -0.015 0 96-00 97-01 98-02 99-03 00-04 01-05

Language Over Time 0.005 0.8 ave. term frequency 0.6 mortgages 0 0.4 w reit -0.005 0.2 (“Real Estate Investment Trust”) -0.010 0 96-00 97-01 98-02 99-03 00-04 01-05

Language Over Time 0.010 0.20 ave. term frequency higher margin 0.005 0.15 0 0.10 w -0.005 0.05 lower margin -0.010 0 96-00 97-01 98-02 99-03 00-04 01-05

Delisting • Rare (4%) event: delisting due to dissolution after bankruptcy, merger, violation of rules. • bulletin, creditors, dip, otc, court 100 75 50 precision at 10 25 precision at 100 0 01 02 03 04 05 06

Predicting Risk from Financial Reports with Regression Shimon - PowerPoint PPT Presentation

Predicting Risk from Financial Reports with Regression Shimon Kogan, University of Texas at Austin Dimitry Levin, Carnegie Mellon University Bryan R. Routledge, Carnegie Mellon University Jacob S. Sagi, Vanderbilt University Noah A. Smith,

Audit Reports Guide Table of Contents Audit Reports Available Reports Accessing

Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression

th NATIONAL REPORTS 6 th th th 6 6 6 NATIONAL REPORTS NATIONAL REPORTS NATIONAL REPORTS

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Chapter 10: Regression Think about predicting the sons height from the fathers height The

Welcome Predicting Change Outcomes Leveraging SQL Server Profiler Lee Everest SQL Rx Predicting

Risk Management Workshop 1 Risk management workshop Why do we Risk Risk and need risk

DATAIR DC/Win Reports Webinar 6/28/2007 DC/Win Reports Report Setup DC/Win Reports Report

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Kernel Methods for Regression Support Vector Regression Gaussian Mixture Regression Gaussian

Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan

Multiple Regression and Logistic Regression I Dajiang Liu @PHS 525 Apr-14-2016 Multiple

Planning and Optimization B2. Regression: Introduction & STRIPS Case Malte Helmert and

Understanding Key Financial Reports Finance Workshop February 2020 General Accounting &

Fourth Quarter and Full Year 2018 Overview March 7, 2019 Disclaimers: Safe Harbor Some of the

Track performance of those assets - and asset managers - against that criteria and other

State of Alaska Fiscal Overview December 6, 2017 Office of Management and Budget Pat Pitney,

invest in the future. www.divestinvest.org Disclaimer: Divest Invest assumes no legal or financial

Fiscal 2020 Third Quarter Earnings J une 2 5 , 2 0 2 0 1 Forward-Looking Statements

Inves estor or and An Analys lyst Forum um Sept. 12, 2012 North Americas oldest, largest

Economics and Revenue Forecast for the 2021 Biennium Revenue forecasting steps Current Events

Oil Price Shocks and Inflation (Bharat Trehan, FRBSF Economic Letter, 2005) 1 When oil price

Predicting Risk from Financial Reports with Regression Shimon - PowerPoint PPT Presentation

Predicting Risk from Financial Reports with Regression Shimon Kogan, University of Texas at Austin Dimitry Levin, Carnegie Mellon University Bryan R. Routledge, Carnegie Mellon University Jacob S. Sagi, Vanderbilt University Noah A. Smith,

Audit Reports Guide Table of Contents Audit Reports Available Reports Accessing

Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression

th NATIONAL REPORTS 6 th th th 6 6 6 NATIONAL REPORTS NATIONAL REPORTS NATIONAL REPORTS

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Chapter 10: Regression Think about predicting the sons height from the fathers height The

Welcome Predicting Change Outcomes Leveraging SQL Server Profiler Lee Everest SQL Rx Predicting

Risk Management Workshop 1 Risk management workshop Why do we Risk Risk and need risk

DATAIR DC/Win Reports Webinar 6/28/2007 DC/Win Reports Report Setup DC/Win Reports Report

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Kernel Methods for Regression Support Vector Regression Gaussian Mixture Regression Gaussian

Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan

Multiple Regression and Logistic Regression I Dajiang Liu @PHS 525 Apr-14-2016 Multiple

Planning and Optimization B2. Regression: Introduction &amp; STRIPS Case Malte Helmert and

Understanding Key Financial Reports Finance Workshop February 2020 General Accounting &amp;

Fourth Quarter and Full Year 2018 Overview March 7, 2019 Disclaimers: Safe Harbor Some of the

Track performance of those assets - and asset managers - against that criteria and other

State of Alaska Fiscal Overview December 6, 2017 Office of Management and Budget Pat Pitney,

invest in the future. www.divestinvest.org Disclaimer: Divest Invest assumes no legal or financial

Fiscal 2020 Third Quarter Earnings J une 2 5 , 2 0 2 0 1 Forward-Looking Statements

Inves estor or and An Analys lyst Forum um Sept. 12, 2012 North Americas oldest, largest

Economics and Revenue Forecast for the 2021 Biennium Revenue forecasting steps Current Events

Oil Price Shocks and Inflation (Bharat Trehan, FRBSF Economic Letter, 2005) 1 When oil price

Planning and Optimization B2. Regression: Introduction & STRIPS Case Malte Helmert and

Understanding Key Financial Reports Finance Workshop February 2020 General Accounting &