Feature Selection Risk Alex Chinco University of Illinois at - - PowerPoint PPT Presentation

feature selection risk
SMART_READER_LITE
LIVE PREVIEW

Feature Selection Risk Alex Chinco University of Illinois at - - PowerPoint PPT Presentation

Feature Selection Risk Alex Chinco University of Illinois at Urbana-Champaign September 15, 2014 Our model allows us to identify and interpret events faster than more traditional methods used by other investors. Quant. Fund Pitch Book


slide-1
SLIDE 1 Feature Selection Risk Alex Chinco University of Illinois at Urbana-Champaign September 15, 2014
slide-2
SLIDE 2

Our model allows us to identify and interpret events faster than more traditional methods used by other investors. —Quant. Fund Pitch Book
slide-3
SLIDE 3

Our model allows us to identify and interpret events faster than more traditional methods used by other investors. —Quant. Fund Pitch Book
slide-4
SLIDE 4 Imagine you’re a trader. Each stock can have Y/ N exposure to 7 features. Whether or not. . .
  • 1. It’s involved in a crowded trade
  • 2. It’s mentioned in M&A rumors
  • 3. Its major supplier closed down
  • 4. Its labor force unionized
  • 5. It belongs alcohol/tobacco/gaming industry
  • 6. It’s referenced in a scientific article
  • 7. It’s been added to the S&P 500
1 of the 7 features might have realized a shock. Having mystery feature raises demand by α > 0 shares. Question: How many observations do you need to see in order to decide which (if any) of the 7 features has realized a shock?
slide-5
SLIDE 5 Answer: Only 3!
slide-6
SLIDE 6 Answer: Only 3! ◮ Stock 1: crowded trade, supplier close, ATG ind., S&P 500 add. ◮ Stock 2: M&A rumor, supplier close, sci. article, S&P 500 add. ◮ Stock 3: labor unionization, ATG ind., sci. article, S&P 500 add. Data matrix (X)3×7 tells you if stock n has attribute q: xn,q =
  • 1
if yes if no with ǫn iid ∼ N(0, σ2 ǫ ), α ≫ σǫ e.g., if only d1 ≈ α then crowded trade shock:    α   
  • (d)3×1
≈    1 1 1 1 1 1 1 1 1 1 1 1   
  • (X)3×7
      α . . .      
  • (α)7×1
+    ǫ1 ǫ2 ǫ3    (ǫ)3×1 e.g., if d1 ≈ d2 ≈ d3 ≈ α, then S&P 500 addition shock.
slide-7
SLIDE 7 Key Insight: Inference problem changes character at N ⋆ = 3.
slide-8
SLIDE 8 Key Insight: Inference problem changes character at N ⋆ = 3. First, imagine you’ve seen N = 4 observations:      α α      (d)4×1 ≈      1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1     
  • (X)4×7
      α . . .      
  • (α)7×1
+      ǫ1 ǫ2 ǫ3 ǫ4      (ǫ)4×1 Estimate of α is now (d1 + d4)/2 ≈ α ± σǫ/ √ 2.
slide-9
SLIDE 9 Key Insight: Inference problem changes character at N ⋆ = 3. First, imagine you’ve seen N = 4 observations:      α α      (d)4×1 ≈      1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1     
  • (X)4×7
      α . . .      
  • (α)7×1
+      ǫ1 ǫ2 ǫ3 ǫ4      (ǫ)4×1 Estimate of α is now (d1 + d4)/2 ≈ α ± σǫ/ √ 2. Now, imagine you’ve instead seen only N = 2 observations:
  • α
  • (d)2×1
  • 1
1 1 1 1 1 1 1
  • (X)2×7
      α . . .      
  • (α)7×1
+
  • ǫ1
ǫ2
  • (ǫ)2×1
Could be either crd. trade or ATG ind. How to value 3rd asset? x3 = 1 1 1 1
slide-10
SLIDE 10 This is a stylized example, but. . . the problem scales! Suppose Q = 400, K = 5, and xn,q iid ∼ N(0, 1): dn = ˜ dn − E[ ˜ dn|f] = 400
  • q=1
αq · xn,q + ǫn N ⋆ ≈ 22 N ⋆ ≈ 22 N ⋆ ≈ 22 Bonferroni Threshold FDR Threshold LASSO 0.00 0.25 0.50 0.75 1.00 3 4 5 6 3 4 5 6 3 4 5 6 log(N) 1/ 25 · (400 q=1 1{αq=ˆ αq})2
slide-11
SLIDE 11 1) Derive feature selection bound 2) Embed in eqm. asset-pricing model 3) Outline empirical predictions: ◮ Noise trader and feature selection risks are substitutes. ◮ Derivatives more informative than Arrow securities. Slogan: There are fundamental limits on how quickly even the most sophisticated trader can interpret market signals. Sparse B.R.: Gabaix (2012); Compressed Sensing: Candes, Romberg, and Tao (2004); Candes and Tao (2005); Donoho (2006); Cogn. Control: Chinco (2014); High-D. Inference: Chinco and Clark-Joseph (2014); Info-Based Asset Pricing: Grossman and Stiglitz (1980); Kyle (1985); Veldkamp (2006); Behavioral Finance: Barberis, Shleifer, and Wurgler (2005); Garleanu and Pedersen (2012).
slide-12
SLIDE 12 Consider sequences of Kyle (1985)-type markets where: lim N→∞ QN, KN = ∞ N≥ KN lim N→∞ KN/ QN = 0 Agents must use feature selection rule, φ(d, X), to identify shocks: φ : RN × RN×Q → RQ where FSE[φ] is prob. that φ identifies wrong features. Proposition (Feature Selection Bound) If there exists some constant C > 0 such that: N < C × KN · log(QN/ KN) as N → ∞, then there exists some constant c > 0 such that: min φ∈Φ FSE[φ] > c N ⋆(Q, K) ≍ K · log(Q/ K) is the feature selection bound.
slide-13
SLIDE 13 Static Kyle (1985)-type model with N assets. N informed traders each get priv. signal about value of single asset. Single market maker (MM) views agg. demand for N assets:
  • α = arg min
  • α∈RQ
  • X
α − (1/ θ) · d2 2 + γ · α1
  • ◮ Informed trader demand rule: yn = θ · vn
◮ Market maker pricing rule: pn = λ · dn Proposition (Equilibrium Using the LASSO) If MM uses the LASSO and N > N ⋆, then there exists an equilibrium: λ = 1 2 · θ and θ = C ·
  • log(Q) ×
  • K
N · σz σv
  • for C > 0 and γ = 2 · (σz/
θ) ·
  • 2 · log(Q).
slide-14
SLIDE 14 Informed trader expected profit: C/ 2 ·
  • K/
N · log(Q) × σv · σz Question: What is the feature count for noise trader demand volatility exchange rate that leaves informed traders indifferent?
slide-15
SLIDE 15 Informed trader expected profit: C/ 2 ·
  • K/
N · log(Q) × σv · σz Question: What is the feature count for noise trader demand volatility exchange rate that leaves informed traders indifferent? Consider transformations: Q → Q′ = Q · (1 + ∆Q) and σz → σ′ z = σz · (1 + ∆σz) Proposition (Substituting Risks) If σz decreases by ∆σz < 0, then informed trader expected profits are unchanged if Q increases by ∆Q > 0: ∆Q = 2 · log(Q) · Q σz
  • × −∆σz
slide-16
SLIDE 16 Question: What kind of asset reveals shocks using fewest obs.?
slide-17
SLIDE 17 Question: What kind of asset reveals shocks using fewest obs.? Could look at Arrow securities:        d(A) 1 d(A) 2 d(A) 3 . . . d(A) Q        =         1 · · · 1 · · · 1 · · · . . . . . . . . . ... . . . · · · 1        
  • X(A)
       α1 α2 α3 . . . αQ        + “Noise” . . . but this is over-kill!
slide-18
SLIDE 18 Question: What kind of asset reveals shocks using fewest obs.? Could look at Arrow securities:        d(A) 1 d(A) 2 d(A) 3 . . . d(A) Q        =         1 · · · 1 · · · 1 · · · . . . . . . . . . ... . . . · · · 1        
  • X(A)
       α1 α2 α3 . . . αQ        + “Noise” . . . but this is over-kill! Could also look at N deriv. constr. by fin. eng. from Q Arrow sec.: X N×Q = N×Q D X(A) Q×Q Can’t have ind. exposures to all Q features since N ≪ Q. e.g., all
  • deriv. must have sim. exp. to, say, crwd. trade and S&P 500 incl.
slide-19
SLIDE 19 Key insight: Don’t need complete independence! If any (2 · K) columns of X are lin. indep., then any K-sparse signal α ∈ RQ can be reconstructed uniquely from Xα. Why? Suppose not. i.e., there exists α, α′ ∈ RQ with Xα = Xα′; but, this implies X(α − α′) = 0 which is a contrdtn. α − α′ is at most (2 · K)-sparse. There can’t be lin. dep. betw. (2 · K) cols. of X by asm. Proposition (Seemingly Redundant Assets) If N ≥ N ⋆(Q, K), then MM studying deriv. using the LASSO can identify K-sparse shocks with prob. greater than 1 − C1 · e−C2·K using: Θ[K/ Q · log(Q/ K)] times fewer assets than MM studying Arrow sec with C1, C2 > 0.
slide-20
SLIDE 20 Thanks!