Interpretable Proximate Factors for Large Dimensions Markus Pelger 1 - PowerPoint PPT Presentation

Interpretable Proximate Factors for Large Dimensions Markus Pelger 1 Ruoxuan Xiong 2 1 Stanford University 2 Stanford University February 1, 2018 Risk Management Seminar UC Berkeley

Intro Illustration Model Simulation Empirical Results Conclusion Appendix Motivation Motivation: What are the factors? Statistical Factor Analysis Factor models are widely used in big data settings Reduce data dimensionality Factors are traded extensively Problem: Which factors should be used? Statistical (latent) factors perform well Factors estimated from principle component analysis (PCA) Weighted averages of all features/assets Problem: Hard to interpret Goals of this paper: Create interpretable proximate factors Shrink most assets’ weights to zero to get proximate factors ⇒ More interpretable ⇒ Significantly lower transaction costs when trading factors 1

Intro Illustration Model Simulation Empirical Results Conclusion Appendix Motivation Contribution of this paper Contribution This Paper: Estimation of interpretable proximate factors Key elements of estimator: Statistical factors instead of pre-specified (and potentially 1 miss-specified) factors Uses information from large panel data sets: Many assets with 2 many time observations Proximate factors approximate latent factors very well with a 3 few assets without sparse structure in population factors Only 5-10% of the cross-sectional observations with the largest 4 exposure are needed for proximate factors 2

Intro Illustration Model Simulation Empirical Results Conclusion Appendix Motivation Contribution Theoretical Results Asymptotic probabilistic lower bound for generalized correlations of proximate factors with population factors Guidance on how to construct proximate factors Empirical Results Very good approximation to population factors with 5-10% portfolios, measured by generalized correlation, variance explained, pricing error and Sharpe-ratio Interpret statistical latent factors for Double-sorted portfolio data 370 single-sorted anomaly portfolios High-frequency returns of S&P 500 companies 3

Intro Illustration Model Simulation Empirical Results Conclusion Appendix Motivation Literature (partial list) Large-dimensional factor models with PCA Bai (2003): Distribution theory Fan et al. (2013): Sparse matrices in factor modeling Fan et al. (2016): Projected PCA for time-varying loadings Pelger (2016), A¨ ıt-Sahalia and Xiu (2015): High-frequency Large-dimensional factor models with penalty term Bai and Ng (2017): Robust PCA with ridge shrinkage Lettau and Pelger (2017): Risk-Premium PCA with pricing penalty Zhou et al. (2006): Sparse PCA 4

Intro Illustration Model Simulation Empirical Results Conclusion Appendix Illustration Empirical example: Double-sorted portfolios Daily data of 25 double-sorted Fama-French portfolios (a) Size and Book-to-Market (b) Size and Investment Figure: Sum of generalized correlation ˆ ρ between estimated 3 PCA factors and 3 proximate factors Problem in interpreting factors: Factors only identified up to invertible linear transformations. Generalized correlation measures how many factors two sets have in common. 5

Intro Illustration Model Simulation Empirical Results Conclusion Appendix Illustration Empirical Application: Size and Book-to-market Portfolios 25 portfolios formed on size and book-to-market (07/1963-10/2017, 3 factors, daily data) (b) Variance explained (a) Generalized correlation (c) RMS pricing error (d) Max Sharpe Ratio 6

Intro Illustration Model Simulation Empirical Results Conclusion Appendix Illustration Empirical Application: Size and Book-to-market Portfolios Figure: Portfolio weights of 1. statistical factor ⇒ Equally weighted market factor 7

Intro Illustration Model Simulation Empirical Results Conclusion Appendix Illustration Empirical Application: Size and Book-to-market Portfolios Figure: Portfolio weights of 2. statistical factor ⇒ Small-minus-big size factor ⇒ Proximate factor with 4 largest weights correlation 0.88 with size factor 8

Intro Illustration Model Simulation Empirical Results Conclusion Appendix Illustration Empirical Application: Size and Book-to-market Portfolios Figure: Portfolio weights of 3. statistical factor ⇒ High-minus-low value factor ⇒ Proximate factor with 4 largest weights correlation 0.91 with value factor 9

Intro Illustration Model Simulation Empirical Results Conclusion Appendix Illustration Empirical Application: Size and Investment Portfolios 25 portfolios formed on size and investment (07/1963-10/2017, 3 factors, daily data) (b) Variance explained (a) Generalized correlation (c) RMS pricing error (d) Max Sharpe Ratio 10

Intro Illustration Model Simulation Empirical Results Conclusion Appendix Illustration Empirical Application: Size and Investment Portfolios Figure: Portfolio weights of 1. statistical factor ⇒ Equally weighted market factor 11

Intro Illustration Model Simulation Empirical Results Conclusion Appendix Illustration Empirical Application: Size and Investment Portfolios Figure: Portfolio weights of 2. statistical factor ⇒ Small-minus-big size factor ⇒ Proximate factor with 4 largest weights correlation 0.97 with size factor 12

Intro Illustration Model Simulation Empirical Results Conclusion Appendix Illustration Empirical Application: Size and Investment Portfolios Figure: Portfolio weights of 3. statistical factor ⇒ High-minus-low value factor ⇒ Proximate factor with 4 largest weights correlation 0.79 with investment factor 13

Intro Illustration Model Simulation Empirical Results Conclusion Appendix Model The Model Approximate Factor Model Observe excess returns of N assets over T time periods: ⊤ X t , i = F t Λ i + e t , i i = 1 , ..., N t = 1 , ..., T 1 × K K × 1 �� idiosyncratic factors loadings Matrix notation Λ ⊤ = + X F e �� T × N T × K K × N T × N N assets (large) T time-series observation (large) K systematic factors (fixed) F , Λ and e are unknown 14

Intro Illustration Model Simulation Empirical Results Conclusion Appendix Model The Model Approximate Factor Model Systematic and non-systematic risk ( F and e uncorrelated): Var ( X ) = Λ Var ( F )Λ ⊤ + Var ( e ) � �� systematic non − systematic ⇒ Systematic factors should explain a large portion of the variance ⇒ Idiosyncratic risk can be weakly correlated Estimation: PCA (Principal component analysis) X ⊤ with T X ⊤ X − ¯ X ¯ 1 Apply PCA to the sample covariance matrix: ¯ X = sample mean of asset excess returns Eigenvectors of largest eigenvalues estimate loadings ˆ Λ. F estimator for factors: ˆ ˆ N X ˆ Λ = X ˆ Λ ⊤ (ˆ Λ ⊤ ˆ F = 1 Λ) − 1 . 15

Intro Illustration Model Simulation Empirical Results Conclusion Appendix Model The Model Proximate Factors Sparse loadings ˜ Λ are obtained from Select finitely many m N loadings with largest absolute value from ˆ Λ k Shrink estimated loadings ˆ Λ to 0 except for m N largest values Divide by column norms, i.e. ˜ k ˜ λ ⊤ λ k = 1 F = X T ˜ Proximate factors ˜ Λ 16

Intro Illustration Model Simulation Empirical Results Conclusion Appendix Model The Model Closeness measure For 1-factor model: Correlation between ˜ F and F . Problem for multiple factors: Factors are only identified up to invertible linear transformations ⇒ Need measure for closeness between span of two vector spaces For multi-factor model: The ”closeness” between ˜ F and F is measured by generalized correlation: Total generalized correlation measure: � � ( F T F / T ) − 1 ( F T ˜ F T ˜ F / T )( ˜ F / T ) − 1 ( ˜ F T F / T ) ρ = trace ρ = 0: ˜ F and F are orthogonal ρ = K : ˜ F and F are span the same space Alternative measure: Element-wise generalized correlations are eigenvalues instead of trace of above matrix Element-wise generalized correlations close to 1 measure how many factors are well approximated 17

Intro Illustration Model Simulation Empirical Results Conclusion Appendix Intuition Intuition: Why picking largest elements in ˆ Λ works? Consider one factor and one nonzero element in ˜ Λ: F = [ f 1 t ] ∈ ❘ T × 1 , Λ = [ λ 1 , i ] ∈ ❘ N × 1 ˜ Λ = [˜ λ 1 , i ] is sparse. Assume nonzero element in ˜ λ 1 , i is ˜ λ 1 , 1 . X T ˜ Λ = F Λ T ˜ Λ + e T ˜ ˜ F = Λ = f 1 λ 1 , 1 + e 1 Assume iid f 1 , t ∼ (0 , σ 2 ∼ (0 , σ 2 f ) , e ) e 1 , t f T e T 1 f 1 1 e 1 → σ 2 → σ 2 f , e T T Define signal-to-noise ratio s = σ f σ e 18

Intro Illustration Model Simulation Empirical Results Conclusion Appendix Intuition Intuition: Why pick the largest elements in ˆ Λ? � � ( F T F / T ) − 1 ( F T ˜ F T ˜ F / T )( ˜ F / T ) − 1 ( ˜ F T F / T ) ρ = tr � 2 � f T 1 ( f 1 λ 1 , 1 + e 1 ) / T = ( f T 1 f 1 / T ) 1 / 2 (( f 1 λ 1 , 1 + e 1 ) T ( f 1 λ 1 , 1 + e 1 ) / T ) 1 / 2 λ 2 1 , 1 → λ 2 1 , 1 + 1 / s 2 (Generalized) correlation increases in size of loading | λ 1 , 1 | . (Generalized) correlation increases in signal-to-noise ratio s . No sparsity in population loadings assumed! 19

Interpretable Proximate Factors for Large Dimensions Markus Pelger 1 - PowerPoint PPT Presentation

Interpretable Proximate Factors for Large Dimensions Markus Pelger 1 Ruoxuan Xiong 2 1 Stanford University 2 Stanford University February 1, 2018 Risk Management Seminar UC Berkeley Intro Illustration Model Simulation Empirical Results

Chapter 7 A proximate determinant of long-run growth: Productivity Introduction: Proximate

Interpretable sets in o-minimal structures Will Johnson March 27, 2015 Will Johnson

CTA WEIGHTS AND CTA WEIGHTS AND DIMENSIONS DIMENSIONS INITIATIVES INITIATIVES Meeting of the

Module 4: Building Working with Standard Dimensions Dimensions Using the Basic Level

Canterbury Fried Chicken: exclusion clauses denoting less than proximate cause and the Wayne Tank

Long-run growth Part I The proximate determinants Keep in mind our approach Total income and

in the open economy The proximate causes Physical capital Population growth fertility

Efficiency The proximate causes Physical capital Population growth fertility

Chapter 8 Technology and Growth The proximate causes Physical capital Population growth

Not Just a Black Box: Interpretable Deep Learning for Genomics Avan> Shrikumar, Peyton

Incremental Approach to Interpretable Classification Rule Learning Bishwamittra Ghosh and Kuldeep

Inducing Interpretable Word Senses for WSD and Enrichment of Lexical Resources Overview Jan 11,

Two-level Authoring of Computer- Interpretable Guidelines David Buenestado, Juan M. Pikatza, Unai

IMLI: An Incremental Framework for MaxSAT-Based Learning of Interpretable Classification Rules

Deep Visual Models with Interpretable Features and Modularized Structures Quanshi Zhang John

From ML Successes to Applications ICIP18 Tutorial on Interpretable Deep Learning 2 Black Box

c12) United States Patent US 9,521,255 Bl (10) Patent No.: Lavian et al. (45) Date of Patent:

Presentation Outline 1.Overview of 2010/11 2.Budget and Outturn 2010/11 3.Budget Performance

Conference Title Making the Energy System Work for Consumers and the Environment Rhode Island

Russian Journalists Online and Mainstream: More common than different Svetlana Pasti, University

Discretion and Systemic Risk in Credit-Line Contracts: Theory and Evidence Maria Chaderina 1 Angel

Estimating and Testing a Quantile Regression Model with Interactive Effects Matthew Harding 1 and

RtI, SST and 504 Helping ALL students succeed Created by Kira Austin, Jenny Carpenter, Stephanie

Merger and Q1 2017 Trading Updates Wednesday 10 May 2017 Keith Skeoch, Chief Executive Good

Sambuz

Useful Links

Newsletter

Mail Us

Interpretable Proximate Factors for Large Dimensions Markus Pelger 1 - PowerPoint PPT Presentation

Interpretable Proximate Factors for Large Dimensions Markus Pelger 1 Ruoxuan Xiong 2 1 Stanford University 2 Stanford University February 1, 2018 Risk Management Seminar UC Berkeley Intro Illustration Model Simulation Empirical Results

Chapter 7 A proximate determinant of long-run growth: Productivity Introduction: Proximate

Interpretable sets in o-minimal structures Will Johnson March 27, 2015 Will Johnson

CTA WEIGHTS AND CTA WEIGHTS AND DIMENSIONS DIMENSIONS INITIATIVES INITIATIVES Meeting of the

Module 4: Building Working with Standard Dimensions Dimensions Using the Basic Level

Canterbury Fried Chicken: exclusion clauses denoting less than proximate cause and the Wayne Tank

Long-run growth Part I The proximate determinants Keep in mind our approach Total income and

in the open economy The proximate causes Physical capital Population growth fertility

Efficiency The proximate causes Physical capital Population growth fertility

Chapter 8 Technology and Growth The proximate causes Physical capital Population growth

Not Just a Black Box: Interpretable Deep Learning for Genomics Avan&gt; Shrikumar, Peyton

Incremental Approach to Interpretable Classification Rule Learning Bishwamittra Ghosh and Kuldeep

Inducing Interpretable Word Senses for WSD and Enrichment of Lexical Resources Overview Jan 11,

Two-level Authoring of Computer- Interpretable Guidelines David Buenestado, Juan M. Pikatza, Unai

IMLI: An Incremental Framework for MaxSAT-Based Learning of Interpretable Classification Rules

Deep Visual Models with Interpretable Features and Modularized Structures Quanshi Zhang John

From ML Successes to Applications ICIP18 Tutorial on Interpretable Deep Learning 2 Black Box

c12) United States Patent US 9,521,255 Bl (10) Patent No.: Lavian et al. (45) Date of Patent:

Presentation Outline 1.Overview of 2010/11 2.Budget and Outturn 2010/11 3.Budget Performance

Conference Title Making the Energy System Work for Consumers and the Environment Rhode Island

Russian Journalists Online and Mainstream: More common than different Svetlana Pasti, University

Discretion and Systemic Risk in Credit-Line Contracts: Theory and Evidence Maria Chaderina 1 Angel

Estimating and Testing a Quantile Regression Model with Interactive Effects Matthew Harding 1 and

RtI, SST and 504 Helping ALL students succeed Created by Kira Austin, Jenny Carpenter, Stephanie

Merger and Q1 2017 Trading Updates Wednesday 10 May 2017 Keith Skeoch, Chief Executive Good

Sambuz

Useful Links

Newsletter

Mail Us

Not Just a Black Box: Interpretable Deep Learning for Genomics Avan> Shrikumar, Peyton