OPTIMAL DETECTION OF SPARSE PRINCIPAL COMPONENTS Philippe Rigollet - - PowerPoint PPT Presentation
OPTIMAL DETECTION OF SPARSE PRINCIPAL COMPONENTS Philippe Rigollet - - PowerPoint PPT Presentation
OPTIMAL DETECTION OF SPARSE PRINCIPAL COMPONENTS Philippe Rigollet (joint with Quentin Berthet) High dimensional data Cloud of point in R p High dimensional data Cloud of point in R p High dimensional data Cloud of n points in R p Principal
High dimensional data
Cloud of point in Rp
High dimensional data
Cloud of point in Rp
High dimensional data
Cloud of n points in Rp
Principal component
Principal component = direction of largest variance
Principal component analysis (PCA)
- Tool for dimension reduction
- Spectrum of covariance matrix
- Main tool for exploratory data analysis.
We study only the first principal component This talk: high-dimensional , finite sample framework.
p n
Testing for sphericity under rank-one alternative
H0 : Σ = Ip
Isotropic Principal component
H1 : Σ = Ip + θvv> |v|2 = 1
- Observations: i.i.d.
- Estimator: empirical covariance matrix
The model
If it is a consistent estimator. If it is inconsistent (Nadler, Paul, Onatski, ...)
X1, . . . , Xn ∼ Np(0, Σ) ˆ Σ = 1 n
n
X
i=1
XiX>
i
n p n ' cp
eigenvectors are orthogonal
Empirical spectrum under the null
ˆ Σ
1 2 3 4 5 6 7 8 5 10 15 20 25 30
Spectrum of Marcenko-Pastur distribution H0 : Σ = Ip
p n → α > 0
The BBP (Baik, Ben Arous, Péché) transition
Empirical spectrum under the alternative
H1 : Σ = Ip + θvv> |v|2 = 1
Indistinguishable from the null detection possible if very strong signal!
θ ≤ √α θ > √α θ > r p n
Testing for sparse principal component
H0 : Σ = Ip H1 : Σ = Ip + θvv>, |v|2 = 1 , |v|0 ≤ k
Isotropic Sparse principal direction
Testing for sparse principal component
minimum detection level ?
θ
Goal: find a statistic such that
ϕ : S+
p 7! R
small under
PH0(ϕ(ˆ Σ) < τ0) ≥ 1 − δ PH1(ϕ(ˆ Σ) > τ1) ≥ 1 − δ
large under
H0 H1
1 − δ 1 − δ
τ0 τ1
H0 : Σ = Ip H1 : Σ = Ip + θvv>, |v|2 = 1 , |v|0 ≤ k
small under
PH0(ϕ(ˆ Σ) < τ0) ≥ 1 − δ PH1(ϕ(ˆ Σ) > τ1) ≥ 1 − δ
large under
H0 H1
1 − δ 1 − δ
τ0 τ1 τ0 ≤ τ ≤ τ1
Take the test: ψ(ˆ
Σ) = 1{ϕ(ˆ Σ) > τ}. It satisfies: PH0(ψ = 1) ∨ max
|v|2=1 |v|0≤k
PH1(ψ = 0) ≤ δ
Sparse eigenvalue
k-sparse eigenvalue:
λk
max(Ip) = 1
and λk
max(Ip + θvv>) = 1 + θ
Note that: Smaller fluctuations than the largest eigenvalue λmax(ˆ
Σ) ϕ(ˆ Σ) = λk
max(ˆ
Σ) = max
|x|2 = 1 |x|0 k
x> ˆ Σ x = max
|S|=k λmax(ˆ
ΣS)
Upper bounds w.p.
Under the null hypothesis: Under the alternative hypothesis:
1 − δ λk
max(ˆ
Σ) ≥ 1 + θ − 2(1 + θ) r log(1/δ) n =: τ1 λk
max(ˆ
Σ) ≤ 1 + 8 r k log(9ep/k) + log(1/δ) n =: τ0
Can detect as soon as , which yields
τ0 < τ1 θ ≥ C r k log(p/k) n
Then there exists a constant such that if
Minimax lower bound
Fix (small).
Cν > 0
Then
θ < ¯ θ := r k log (Cνp/k2 + 1) n ∧ 1 √ 2 ν > 0 inf
ψ
n P n
0 (ψ = 1) ∨ max |v|2=1 |v|0≤k
P n
v (ψ = 0)
- ≥ 1
2 − ν
See also Arias-Castro, Bubeck and Lugosi (12)
To compute , need to compute eigenvalues
Computational issues
λk
max(ˆ
Σ) ✓p k ◆
Can be used to find cliques in graphs: NP-complete pb. Need an approximation...
A x
Tr( ) Tr( )
Semidefinite relaxation 101
λk
max(A) = max.
subject to
x> x>x = 1 SDPk(A) = AZ Z rank(Z) = 1 Z ⌫ 0 ≤ k |x|0
- Cauchy-Schwarz
Z xx>
Semidefinite program program (SDP) introduced by d’Aspremont, El Gahoui, Jordan and Lanckriet (2004). Testing procedure:
1{SDPk(ˆ Σ) > τ}
Defined even if solution of SDP has rank > 1
Z = xx>
| |1 ≤ k
Performance of SDP
For the null: use dual (Bach et al. 2010)
SDPk(A) = min
U∈S+
p
{λmax(A + U) + k|U|∞}
For any this gives an upper bound on Enough to look only at minimum dual perturbation
U ∈ S+
p
For the alternative: relaxation of so
λk
max(ˆ
Σ) SDPk(ˆ Σ) ≥ λk
max(ˆ
Σ) SDPk(ˆ Σ) MDPk(ˆ Σ) = min
z≥0
n λmax(stz(ˆ Σ)) + kz
Upper bounds w.p.
Under the null hypothesis: Under the alternative hypothesis:
1 − δ
Can detect as soon as , which yields
τ0 < τ1 θ ≥ C r k2 log(p/k) n ∗DPk(ˆ Σ) ≤ 1 + 10 r k2 log(ep/δ) n =: τ0 ∗DPk(ˆ Σ) ≥ 1 + θ − 2(1 + θ) r log(1/δ) n =: τ1 ∗DP ∈
- SDP, MDP
0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.98 1 1.02 1.04 1.06 1.08
θ H1/H0
SDPk MDPk λmax(·)
Ratio of 5% quantile under H1 over 95% quantile under H0, versus signal strength θ. When this ratio is larger than one, both type I and type II errors are below 5%.
- A. d’Aspremont
Soutenance HDR, ENS Cachan, Nov. 2012. 32/33
Summary
No detection detection with detection with
r k n log ⇣p k ⌘ r k2 n log ⇣p k ⌘
θ
λk
max
∗DPk
Can we tighten the gap?
Numerical evidence
Fix type I error at 1%, plot type II error of MDPk p={50, 100, 200, 500}, k=√p
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Q f 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 P e
k2 n log ⇣p k ⌘ k n log ⇣p k ⌘
minimax optimal scaling proved scaling
largest clique is of size =7.8 asymp. almost surely
Random graphs
A random (Erdos-Renyi) graph on N vertices is obtained by drawing edges at random with probability 1/2
2 log N
N=50
Hidden clique
We can hide a clique (here of size 10) in this graph Choose points arbitrarily and draw a clique
Hidden clique
embed in the original random graph
Hidden clique
Question: is there a hidden clique in this graph?
Hidden clique problem
It is believed that it is hard to find/test the presence of a clique in a random graph (Alon, Arora, Feige, Hazan, Krauthgamer,... Cryptosystems are based on this fact!) Conjecture: It is hard to find cliques of size between and
2 log N √ N
Alon, Krivelevich, Sudakov 98 Feige and Krauthgamer 00 Dekel et al. 10 Feige and Ron 10 Ames and Vavasis 11 Canonical example of average case complexity
Hidden clique problem
It seems related to our problem but not trivially (the randomness structure is very fragile) Note that all our results extend to sub-Gaussian r.v.
- Theorem. If we could prove that there exists
such that under the null hypothesis it holds for some , then it can be used to test the presence of a clique of size
SDPk(ˆ Σ) ≤ 1 + C r kα log(ep/δ) n α ∈ (1, 2) C > 0 polylog(N)N
1 4−α