Graphlet Screening (GS)
Achieves Optimal Rate in Variable Selection Jiashun Jin
Carnegie Mellon University Collaborated with Cun-Hui Zhang (Rutgers) Qi Zhang (Univ. of Pittsburgh)
Jiashun Jin Graphlet Screening (GS)
Graphlet Screening (GS) Achieves Optimal Rate in Variable Selection - - PowerPoint PPT Presentation
Graphlet Screening (GS) Achieves Optimal Rate in Variable Selection Jiashun Jin Carnegie Mellon University Collaborated with Cun-Hui Zhang (Rutgers) Qi Zhang (Univ. of Pittsburgh) Jiashun Jin Graphlet Screening (GS) Variable selection Y = X
Jiashun Jin Graphlet Screening (GS)
◮ p ≫ n ≫ 1 ◮ signals are rare and weak ◮ let G = X ′X be the Gram matrix
◮ diagonals of G are normalized to 1 ◮ G is sparse (few large entries each row) Jiashun Jin Graphlet Screening (GS)
2 + λ2
◮ L0-penalization method ◮ Variants: Cp, AIC, BIC, RIC ◮ Computationally challenging
Mallows (1973), Akaike (1974), Schwartz (1978), Foster & George (1994)
Jiashun Jin Graphlet Screening (GS)
2 + λβ1
◮ L1-penalization method; Basis Pursuit ◮ Widely used
◮ computationally efficient even when p is large ◮ in the noiseless case, if signals sufficiently
Chen et al. (1998); Tibshirani (1996); Donoho (2006)
Jiashun Jin Graphlet Screening (GS)
◮ I. No signal ◮ II. One signal ◮ III. Two signals
Jiashun Jin Graphlet Screening (GS)
◮ one-stage method ◮ one tuning parameter ◮ does not exploit ‘local’ graphical structure
‘local’: neighboring nodes in geodesic distance of a graph (TBD)
Jiashun Jin Graphlet Screening (GS)
Jiashun Jin Graphlet Screening (GS)
◮ V = {1, 2, . . . , p}: each variable is a node ◮ An edge between nodes i and j iff
◮ G = X ′X sparse =
Jiashun Jin Graphlet Screening (GS)
◮ Despite its sparsity, G is usually complicate ◮ Denote the support of β by
◮ Key insight: GS decomposes into many
Jiashun Jin Graphlet Screening (GS)
◮ gs-step: graphlet screening by sequential
◮ gc-step: graphlet cleaning by Penalized MLE ◮ Focus: rare and weak signals
Jiashun Jin Graphlet Screening (GS)
Y = Xβ + z, X = Xn,p, z ∼ N(0, In); G : GOSD
◮ Fix m ≥ 1 (small) ◮ Let {Gt : 1 ≤ t ≤ T} be all connected
◮ arranged by size, ties breaking lexicographically:
1 9 5 7 6 8 3 2 10 1 4 8
p = 10, m = 3, T = 30; {Gt, 1 ≤ t ≤ T}: {1}, {2}, . . . {10} {1, 2}, {1, 7}, . . . , {9, 10} {1, 2, 4}, {1, 2, 7}, . . . , {8, 9, 10}
Jiashun Jin Graphlet Screening (GS)
X = [x1, x2, . . . , xp], {Gt}T
t=1:
all connected subgraphs with size ≤ m
◮ St−1: set of retained indices in last stage ◮ Define T(Y ; D, F) = PGtY 2 − PFY 2
◮ F = Gt ∩ St−1: nodes accepted previously ◮ D = Gt \ F: nodes currently under investigation ◮ PF: projection from Rn to subspace {xj : j ∈ F}
◮ Adding nodes in D to St−1 iff
Once accepted, a node is kept until the end of gs-step
Jiashun Jin Graphlet Screening (GS)
◮ Marginal screening
◮ ineffective (neglects ‘local’ graphical structure) ◮ ‘brute-forth’ m-variate screening is computationally
◮ gs-step
◮ only screens connected subgraphs of G ◮ if maximum degree of G ≤ K, then there are
Fan & Lv (2008), Wasserman & Roeder (2009), Frieze & Molloy (1999)
Jiashun Jin Graphlet Screening (GS)
◮ Sure Screening (SS): S∗ retains all but a
◮ Separable After Screening (SAS): S∗
Jiashun Jin Graphlet Screening (GS)
G = X ′X, I0 ⊂ S∗ : a component G I0 : row restriction; G I0,I0 : row & column restriction
◮ Restrict regression to I0
◮ (X ′z)I0 ∼ N(0, G I0,I0) since z ∼ N(0, In) ◮ Key: (Gβ)I0 ≈ G I0,I0βI0 ◮ Result: many small-size regression:
Jiashun Jin Graphlet Screening (GS)
◮ I0, J0 ⊂ S∗: components ◮ By SS property, β = 0 ◮ By SAS property, G I0,J0 ≈ 0
Jiashun Jin Graphlet Screening (GS)
Y = Xβ + z, z ∼ N(0, In)
◮ I0: a component of S∗;
S∗: set of all survived nodes
◮ βI0: restricting rows of β to I0 ◮ X ∗,I0: restricting columns of X to I0
◮ j /
◮ j ∈ S∗: estimate βI0 via minimizing
Jiashun Jin Graphlet Screening (GS)
1
n
iid
◮ Ω: unknown correlation matrix ◮ Ex: Compressive Sensing, Computer Security
Dinur and Nissim (2004), Nowak et al. (2007)
Jiashun Jin Graphlet Screening (GS)
iid
p(τ, a)
◮ b ◦ µ ∈ Rp: (b ◦ µ)j = bjµj ◮ Θ∗
p(τ, a) = {µ ∈ Rp : τ ≤ |µj| ≤ aτ}, a > 1
◮ Two key parameters:
Jiashun Jin Graphlet Screening (GS)
◮ Signal rarity:
◮ Signal weakness:
◮ Sample size:
Jiashun Jin Graphlet Screening (GS)
Jiashun Jin Graphlet Screening (GS)
p(ϑ, θ, r, a, Ω) = inf ˆ β
µ∈Θ∗
p(τp,a)
Jiashun Jin Graphlet Screening (GS)
i
i
j = ρ∗ j (ϑ, r; Ω) =
(S0,S1):j∈S0∪S1 ρ(S0, S1, ϑ, r, a, Ω)
◮ not dependent on (θ, a) (mild regularity cond.) ◮ computable; has explicit form for some Ω
Jiashun Jin Graphlet Screening (GS)
0j, S∗ 1j) = argmax{(S0,S1): j∈S0∪S1}
0j ∪ S∗ 1j) ∩ (S∗ 0k ∪ S∗ 1k) = ∅
Jiashun Jin Graphlet Screening (GS)
β = b ◦ µ, bj
iid
∼ Bernoulli(ǫp), µ ∈ Θ∗
p(τp, a)
ǫp = p−ϑ, τp =
p(ϑ, θ, r, a, Ω) ≥
j=1 p−ρ∗
j
Jiashun Jin Graphlet Screening (GS)
◮ Assume p j=1 |Ω(i, j)|γ ≤ C,
◮ gs-step: set thresholds at 2qρ∗ j log p, 0 < q < 1 ◮ gc-step: set ugs = √2ϑ log p, and v gs = τp
◮ Both SS and SAS property hold ◮ Maximum degree of GOLF ≤ Lp ◮ GS achieves optimal rate of convergence:
µ∈Θ∗
p(τp,a)
j
Jiashun Jin Graphlet Screening (GS)
◮ (δ, m): flexible (e.g. δ = 1/ log(p), m = 3) ◮ Q: only need to be in a certain range
◮ ugs is relatively easy to estimate ◮ v gs is relatively hard to estimate
Jiashun Jin Graphlet Screening (GS)
3(Ω) > 2(5 − 2
4(Ω) > 5 − 2
5 − 2 √ 6 ≈ 0.1, 19 − 8 √ 6 ≈ −0.6,
√ 6 − √ 2/(
p(ϑ, θ, r, a, Ω)
4r ,
ϑ < 5 + 2
Jiashun Jin Graphlet Screening (GS)
◮ I. Region of No Recovery ◮ II. Region of Almost Full Recovery: ◮ III. Region of Exact Recovery
Jiashun Jin Graphlet Screening (GS)
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.5 1 1.5 2 2.5 3 3.5 4 4.5
ϑ r
Exact Recovery Almost Full Recovery No Recovery
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.5 1 1.5 2 2.5 3 3.5 4 4.5
ϑ r
Exact Recovery Almost Full Recovery No Recovery
r ϑ = 5 + 2
Jiashun Jin Graphlet Screening (GS)
0.5 1 2 4 6
Exact Recovery Almost Full Recovery No Recovery 0.5 1 2 4 6
Exact Recovery Optimal Non−
No Recovery 0.5 1 5 10 15 20
Exact Recovery
Optimal
Non−optimal
No Recovery
Left: GS. Middle: subset selection. Right: lasso (y-axis is prolonged) ǫp = p−ϑ, τp = √2r log p, each signal ≥ τp
Jiashun Jin Graphlet Screening (GS)
6 8 10 12 0.1 0.2 0.3 0.4 0.5
2-by-2 blockwise
LASSO Graphlet Screening
6 8 10 12 0.1 0.2 0.3 0.4 0.5
Penta-diagonal
LASSO Graphlet Screening
6 8 10 12 0.1 0.2 0.3 0.4 0.5
Random Correlation
LASSO Graphlet Screening
p = 5000, n = 4000, pǫp = 250; τp = 6, 7, . . . , 12. Left to right: G is block-wise, penta-diagonal, randomly generated (‘sprandsym’ in matlab).
Jiashun Jin Graphlet Screening (GS)
◮ Main results not tied to Rare Weak model;
◮ Extensions to non-random design is mostly
◮ Successfully extended to cases where G is
◮ change-point problem ◮ long-memory time series ◮ factor model
Ke, Jin, Fan (2012)
Jiashun Jin Graphlet Screening (GS)
◮ Proposed Graphlet Screening (GS) for variable
◮ Proved optimality of GS ◮ Key insight:
◮ original model is decomposable due to interaction
◮ minimax rate depends on X ‘locally’ so we have to
◮ Exposed intuition for the non-optimality of
Jiashun Jin Graphlet Screening (GS)