SLIDE 1 Feature Selection Risk
Alex Chinco
University of Illinois at Urbana-Champaign September 15, 2014
Feature Selection Risk Alex Chinco University of Illinois at - - PowerPoint PPT Presentation
Feature Selection Risk Alex Chinco University of Illinois at - - PowerPoint PPT Presentation
Feature Selection Risk Alex Chinco University of Illinois at Urbana-Champaign September 15, 2014 Our model allows us to identify and interpret events faster than more traditional methods used by other investors. Quant. Fund Pitch Book
SLIDE 2
SLIDE 3
”
Our model allows us to identify and interpret events faster than more traditional methods used by other investors. —Quant. Fund Pitch Book SLIDE 4 Imagine you’re a trader. Each stock can have Y/
N exposure to 7 features. Whether or not. . .
- 1. It’s involved in a crowded trade
- 2. It’s mentioned in M&A rumors
- 3. Its major supplier closed down
- 4. Its labor force unionized
- 5. It belongs alcohol/tobacco/gaming industry
- 6. It’s referenced in a scientific article
- 7. It’s been added to the S&P 500
SLIDE 5 Answer: Only 3!
SLIDE 6 Answer: Only 3!
◮ Stock 1: crowded trade, supplier close, ATG ind., S&P 500 add. ◮ Stock 2: M&A rumor, supplier close, sci. article, S&P 500 add. ◮ Stock 3: labor unionization, ATG ind., sci. article, S&P 500 add.
Data matrix (X)3×7 tells you if stock n has attribute q: xn,q =
- 1
- (d)3×1
- (X)3×7
- (α)7×1
SLIDE 7 Key Insight: Inference problem changes character at N ⋆ = 3.
SLIDE 8 Key Insight: Inference problem changes character at N ⋆ = 3. First, imagine you’ve seen N = 4 observations:
α α
(d)4×1
≈ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
- (X)4×7
- (α)7×1
SLIDE 9 Key Insight: Inference problem changes character at N ⋆ = 3. First, imagine you’ve seen N = 4 observations:
α α
(d)4×1
≈ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
- (X)4×7
- (α)7×1
- α
- (d)2×1
- 1
- (X)2×7
- (α)7×1
- ǫ1
- (ǫ)2×1
SLIDE 10 This is a stylized example, but. . . the problem scales! Suppose Q = 400, K = 5, and xn,q
iid
∼ N(0, 1): dn = ˜ dn − E[ ˜ dn|f] =
400
- q=1
SLIDE 11 1) Derive feature selection bound 2) Embed in eqm. asset-pricing model 3) Outline empirical predictions:
◮ Noise trader and feature selection risks are substitutes. ◮ Derivatives more informative than Arrow securities.
Slogan: There are fundamental limits on how quickly even the most sophisticated trader can interpret market signals.
Sparse B.R.: Gabaix (2012); Compressed Sensing: Candes, Romberg, and Tao (2004); Candes and Tao (2005); Donoho (2006); Cogn. Control: Chinco (2014); High-D. Inference: Chinco and Clark-Joseph (2014); Info-Based Asset Pricing: Grossman and Stiglitz (1980); Kyle (1985); Veldkamp (2006); Behavioral Finance: Barberis, Shleifer, and Wurgler (2005); Garleanu and Pedersen (2012).
SLIDE 12 Consider sequences of Kyle (1985)-type markets where: lim
N→∞ QN, KN = ∞
N≥ KN lim
N→∞ KN/ QN = 0
Agents must use feature selection rule, φ(d, X), to identify shocks: φ : RN × RN×Q → RQ where FSE[φ] is prob. that φ identifies wrong features.
Proposition (Feature Selection Bound)
If there exists some constant C > 0 such that: N < C × KN · log(QN/
KN)
as N → ∞, then there exists some constant c > 0 such that: min
φ∈Φ FSE[φ] > c
N ⋆(Q, K) ≍ K · log(Q/
K) is the feature selection bound.
SLIDE 13 Static Kyle (1985)-type model with N assets. N informed traders each get priv. signal about value of single asset. Single market maker (MM) views agg. demand for N assets:
- α = arg min
- α∈RQ
- X
- ◮ Informed trader demand rule: yn = θ · vn
- log(Q) ×
- K
- for C > 0 and γ = 2 · (σz/
- 2 · log(Q).
SLIDE 14 Informed trader expected profit:
C/ 2 ·
- K/
SLIDE 15 Informed trader expected profit:
C/ 2 ·
- K/
- × −∆σz
SLIDE 16 Question: What kind of asset reveals shocks using fewest obs.?
SLIDE 17 Question: What kind of asset reveals shocks using fewest obs.? Could look at Arrow securities:
d(A)
1
d(A)
2
d(A)
3
. . . d(A)
Q
=
1 · · · 1 · · · 1 · · · . . . . . . . . . ... . . . · · · 1
- X(A)
SLIDE 18 Question: What kind of asset reveals shocks using fewest obs.? Could look at Arrow securities:
d(A)
1
d(A)
2
d(A)
3
. . . d(A)
Q
=
1 · · · 1 · · · 1 · · · . . . . . . . . . ... . . . · · · 1
- X(A)
- deriv. must have sim. exp. to, say, crwd. trade and S&P 500 incl.
SLIDE 19 Key insight: Don’t need complete independence! If any (2 · K) columns of X are lin. indep., then any K-sparse signal α ∈ RQ can be reconstructed uniquely from Xα. Why? Suppose not. i.e., there exists α, α′ ∈ RQ with Xα = Xα′; but, this implies X(α − α′) = 0 which is a contrdtn. α − α′ is at most (2 · K)-sparse. There can’t be lin. dep. betw. (2 · K) cols. of X by asm.
Proposition (Seemingly Redundant Assets)
If N ≥ N ⋆(Q, K), then MM studying deriv. using the LASSO can identify K-sparse shocks with prob. greater than 1 − C1 · e−C2·K using: Θ[K/
Q · log(Q/ K)]
times fewer assets than MM studying Arrow sec with C1, C2 > 0.
SLIDE 20 Thanks!