Text Selection Bryan Kelly Yale University Asaf Manela Washington - PowerPoint PPT Presentation

Intro Text Regression Text Selection News-implied ICR Macro Forecasts Conclusion Text Selection Bryan Kelly Yale University Asaf Manela Washington University in St. Louis Alan Moreira University of Rochester October 2018

Intro Text Regression Text Selection News-implied ICR Macro Forecasts Conclusion Motivation ◮ Digital text is increasingly available to social scientists ◮ Newspapers, blogs, regulatory fillings, congressional records ... ◮ Unlike data often used by economists ◮ Text is ultra high-dimensional ◮ Phrase counts are sparse ◮ Statistical learning from text requires ◮ Machine learning techniques ◮ Scalable algorithms

Intro Text Regression Text Selection News-implied ICR Macro Forecasts Conclusion This paper ◮ Text is often selected by journalists, speechwriters, and others who cater to an audience with limited attention ◮ Hurdle Distributed Multiple Regression (HDMR) ◮ Highly scalable approach to inference from big counts data ◮ Includes an economically-motivated selection equation ◮ Especially useful when cover/no-cover choice is separate or more interesting than coverage quantity ◮ Applications using newspaper coverage for prediction 1. Backcast intermediary capital ratio (He-Kelly-Manela 2017 JFE) 2. Forecast macroeconomic series (Stock-Watson 2012 JBES)

Intro Text Regression Text Selection News-implied ICR Macro Forecasts Conclusion Related literature ◮ We extend machinery developed by Taddy (2012, 2015, 2016) to text selection ◮ Layer economically-motivated hurdle / selection equation on his Distributed Multinomial Regression (DMR) ◮ Find advantage of HDMR over DMR increases with sparsity ◮ Provide new tools to literatures in economics and finance ◮ Finance and media: Antweiler-Frank (2004), Tetlock (2007, 2011), Fang-Peress (2009), Engelberg-Parsons (2011), Dougal et al (2012), Peress (2014), Manela (2014), Fedyk (2018) ◮ Text-based uncertainty: Baker-Bloom-Davis (2016), Manela-Moreira (2017), Hassan et al (2017) ◮ Polarization: Gentzkow-Shapiro (2006), Gentzkow-Shapiro-Taddy ◮ Can better control and learn from high-dimensional content

Intro Text Regression Text Selection News-implied ICR Macro Forecasts Conclusion Text data is inherently high-dimensional Document-term matrix c Documents digital text is available is selected text is 1: Digital text is available. ⇒ 2: Text is selected! · · · . 1: 1 1 1 0 . . 2: 0 1 0 1 . ... . .

Intro Text Regression Text Selection News-implied ICR Macro Forecasts Conclusion Text regression is prone to overfit ◮ c i vector of counts in d categories for observation i ◮ e.g. c ij is date i newspaper mentions of phrase j (“world war”) ◮ v i vector of p covariates ◮ e.g. intermediary capital ratio, realized variance on date i ◮ Let v iy ∈ v i be a target variable ◮ e.g. intermediary capital ratio ◮ Because d ≫ n , we cannot run an OLS regression v iy = β 0 + [ c i , v i, − y ] ′ β + ε i

Intro Text Regression Text Selection News-implied ICR Macro Forecasts Conclusion Text inverse regression ◮ A text inverse regression approach would instead 1. Regress word counts on covariates � � c i = λ α j + v ′ i ϕ j + υ i (backward regression) 2. Construct low dimensional projection into v iy direction � ˆ (sufficient reduction projection) z iy ≡ ϕ jy c ij j 3. Regress target variable on z iy and other covariates v iy = β 0 + [ z iy , v i, − y ] ′ β + ε i (forward regression) ◮ d + p − 1 dimensional regression reduced to p + 1 dimensional! ◮ z iy summarizes all textual information relevant for prediction

Intro Text Regression Text Selection News-implied ICR Macro Forecasts Conclusion Why would we need a hurdle? Full Range Positive Range 1200 50 1000 40 Mean across phrases Mean across phrases 800 30 600 20 400 10 200 0 0 0 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 Counts of phrase j across documents Counts of phrase j across documents Wall Street Journal, monthly front page text, July 1926 to February 2016 ◮ Statistics: hurdle better describes text data ◮ Text data often has many more zeros than predicted by Poisson ◮ Economics: text is selected ◮ Publishers cater to a boundedly rational reader (Gabaix, 2014) ◮ Politicians select phrases that resonate with voters (Gentzkow-Shapiro-Taddy, 2017) ◮ Censored or socially taboo words (Michel et al, 2011) ◮ Fixed cost of introducing new terms, low marginal cost

Intro Text Regression Text Selection News-implied ICR Macro Forecasts Conclusion Text selection model With sparse text, extensive margin may be more informative than intensive margin ◮ We suggest a text selection model instead 1. Two part text selection model for counts h ∗ i = f ( κ j + w ′ i δ j ) + ω i (Inclusion) � � c ∗ i = λ α j + v ′ i ϕ j + υ i (Repetition) i × 1 ( h ∗ c i = c ∗ i > 0) = c ∗ (Observation) i × h i 2. Construct two low dimensional projections into v iy (= w iy ) iy ≡ � iy ≡ � j ˆ z + z 0 j ˆ δ jy h ij ϕ jy c ij (SR projections) Inclusion Repetition 3. Regress target variable on z + iy , z 0 iy and other covariates � � ′ β + ε i z 0 iy , z + v iy = β 0 + (forward regression) iy , w i, − y , v i, − y ◮ d + p − 1 dimensional regression reduced to p + 2 dimensional!

Text Selection Bryan Kelly Yale University Asaf Manela Washington - PowerPoint PPT Presentation

Intro Text Regression Text Selection News-implied ICR Macro Forecasts Conclusion Text Selection Bryan Kelly Yale University Asaf Manela Washington University in St. Louis Alan Moreira University of Rochester October 2018 Intro Text

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

Post-Conference Presentation Sunday Oladayo Oladejo Table of Content A Introduction B

Enhancing ICANN Text Accountability 26 June 2014 Text #ICANN50 Text #ICANN50 Text #ICANN50

Add Your Title Here Replace your text here! Replace your text here! Insert your title here 1

Text Text #ICANN51 15 October 2014 Text Text IDN Root Zone LGR Sarmad Hussain IDN Program

Text Text #ICANN51 Contractual Compliance Text Text Contractual Compliance Update

Text Text #ICANN50 Contractual Compliance Text Text GNSO Council Meeting Wednesday, Jun 25

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?

God Rescues Daniel from the Lions Daniel 6 Here is some test text Here is some test text Here

5. Text CHAPTER HIGHLIGHTS Text tradition. Codes for computer text. C d f t t t

Stack Stack Heap Heap Data Data Text Text Program A Program B Stack Stack Text Heap

Business Proposal Infographic Style Your Text Here Your Text Here Your Text Here Your Text

How to Stay Faithful in Exile Daniel 1 Here is some test text Here is some test text Here is

Nehemiah Prays Nehemiah 1-2 Here is some test text Here is some test text Here is some test

SECONDHAND SELECTION Sales Price - 275,000.00 EU SECONDHAND SELECTION INTERNAL VIEWS SECONDHAND

Econometric Analysis Using Stata Introduction Time Series Panel Data Stata : Data Analysis and

Regression Testing Gavan Fantom gavan@NetBSD.org pkgsrcCon 2005 Introduction Have you ever

Statistical Machine Learning Lecture 13: Kernel Regression and Gaussian Processes Kristian

Poli 5D Social Science Data Analytics Regression in Stata Shane Xinyang Xuan ShaneXuan.com

Political Science 209 - Fall 2018 Linear Regression Florian Hollenbach 22nd October 2018

Measuring Performance on OpenBSD Alexander Bluhm bluhm@openbsd.org BSDCan, May 2019 What did

LINEAR REGRESSION Sylvain Calinon Robot Learning & Interaction Group Idiap Research

Projection Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU)

Sambuz

Useful Links

Newsletter

Mail Us

Text Selection Bryan Kelly Yale University Asaf Manela Washington - PowerPoint PPT Presentation

Intro Text Regression Text Selection News-implied ICR Macro Forecasts Conclusion Text Selection Bryan Kelly Yale University Asaf Manela Washington University in St. Louis Alan Moreira University of Rochester October 2018 Intro Text

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

Post-Conference Presentation Sunday Oladayo Oladejo Table of Content A Introduction B

Enhancing ICANN Text Accountability 26 June 2014 Text #ICANN50 Text #ICANN50 Text #ICANN50

Add Your Title Here Replace your text here! Replace your text here! Insert your title here 1

Text Text #ICANN51 15 October 2014 Text Text IDN Root Zone LGR Sarmad Hussain IDN Program

Text Text #ICANN51 Contractual Compliance Text Text Contractual Compliance Update

Text Text #ICANN50 Contractual Compliance Text Text GNSO Council Meeting Wednesday, Jun 25

ERP Selection KIRTANE &amp; PANDIT Suhas Deshpande Why ERP Selection is important ?

God Rescues Daniel from the Lions Daniel 6 Here is some test text Here is some test text Here

5. Text CHAPTER HIGHLIGHTS Text tradition. Codes for computer text. C d f t t t

Stack Stack Heap Heap Data Data Text Text Program A Program B Stack Stack Text Heap

Business Proposal Infographic Style Your Text Here Your Text Here Your Text Here Your Text

How to Stay Faithful in Exile Daniel 1 Here is some test text Here is some test text Here is

Nehemiah Prays Nehemiah 1-2 Here is some test text Here is some test text Here is some test

SECONDHAND SELECTION Sales Price - 275,000.00 EU SECONDHAND SELECTION INTERNAL VIEWS SECONDHAND

Econometric Analysis Using Stata Introduction Time Series Panel Data Stata : Data Analysis and

Regression Testing Gavan Fantom gavan@NetBSD.org pkgsrcCon 2005 Introduction Have you ever

Statistical Machine Learning Lecture 13: Kernel Regression and Gaussian Processes Kristian

Poli 5D Social Science Data Analytics Regression in Stata Shane Xinyang Xuan ShaneXuan.com

Political Science 209 - Fall 2018 Linear Regression Florian Hollenbach 22nd October 2018

Measuring Performance on OpenBSD Alexander Bluhm bluhm@openbsd.org BSDCan, May 2019 What did

LINEAR REGRESSION Sylvain Calinon Robot Learning &amp; Interaction Group Idiap Research

Projection Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU)

Sambuz

Useful Links

Newsletter

Mail Us

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?

LINEAR REGRESSION Sylvain Calinon Robot Learning & Interaction Group Idiap Research