Stay on path: PCA along graph paths
Electrical and Computer Engineering Communication Sciences and Disorders Anastasios Kyrillidis Alexandros Dimakis Megasthenis Asteris Bharath Chandrasekaran Han - Gyol Yi
Stay on path: PCA along graph paths Megasthenis Asteris - - PowerPoint PPT Presentation
Stay on path: PCA along graph paths Megasthenis Asteris Electrical and Computer Engineering Anastasios Kyrillidis Alexandros Dimakis Han - Gyol Yi Communication Sciences and Disorders Bharath Chandrasekaran Sparse PCA Direction of x
Electrical and Computer Engineering Communication Sciences and Disorders Anastasios Kyrillidis Alexandros Dimakis Megasthenis Asteris Bharath Chandrasekaran Han - Gyol Yi
yn y1 . . . Direction of maximum variance p n observations / datapoints p variables
x1
x2
Find new variable (feature) that captures most of the variance.
yn y1 . . . Direction of maximum variance p n observations / datapoints p variables
x1
x2
Find new variable (feature) that captures most of the variance. b Σ = 1
n · n
X
i=1
yiy>
i
y>
i
yi
Empirical
kxk2 = 1
yn y1 . . . p n observations / datapoints p variables Find new variable (feature) that captures most of the variance. b Σ = 1
n · n
X
i=1
yiy>
i
y>
i
yi
Empirical
Sparse direction of maximum variance
x1 x2
NP-Hard
kxk2 = 1
kxk0 = k
Extracted feature is more interpretable; it depends on only a few original variables. Recovery of “true” PC in high dimensions; # observations << # variables. [ Statistician ] [ Engineer ] Why sparsity?
Extracted feature is more interpretable; it depends on only a few original variables. Recovery of “true” PC in high dimensions; # observations << # variables. [ Statistician ] [ Engineer ] Why sparsity? More structure…? More interpretable. Better sample complexity.
[Baraniuk et al., 2008; Kyrillidis et al., 2014, Friedman et al., 2010, …]
E.g. wavelets of natural images, block structures, periodical neuronal spikes, …
Extracted feature is more interpretable; it depends on only a few original variables. Recovery of “true” PC in high dimensions; # observations << # variables. [ Statistician ] [ Engineer ] Why sparsity? More structure…? More interpretable. Better sample complexity.
[Baraniuk et al., 2008; Kyrillidis et al., 2014, Friedman et al., 2010, …]
E.g. wavelets of natural images, block structures, periodical neuronal spikes, …
S T
x1
x2
Active variables
p xi xi x2 x3 xp x1 . . . . . . Directed, Acyclic
Graph Path PCA
S T
x1
x2
Active variables
p xi xi x2 x3 xp x1 . . . . . . Directed, Acyclic
T S
divided in sectors 1 stock/ sector
divided in sectors 1 stock/ sector Chase BofA UBS
divided in sectors 1 stock/ sector Chase BofA UBS BANKS
divided in sectors 1 stock/ sector Chase BofA UBS BANKS Chevron Shell
divided in sectors 1 stock/ sector Chase BofA UBS BANKS Chevron Shell ENERGY
divided in sectors 1 stock/ sector Chase BofA UBS BANKS Chevron Shell ENERGY
divided in sectors 1 stock/ sector Chase BofA UBS BANKS Chevron Shell ENERGY
divided in sectors 1 stock/ sector
S T
Chase BofA UBS BANKS Chevron Shell ENERGY
divided in sectors 1 stock/ sector
S T
Chase BofA UBS BANKS Chevron Shell ENERGY
2 3
. . .
p−2 k
. . . . . .
· · · · · ·
. . . . . .
p−2 1 S p T = d = d k layers
(p,k,d)-layer graph
2 3
. . .
p−2 k
. . . . . .
· · · · · ·
. . . . . .
p−2 1 S p T = d = d k layers
Source vertex Target vertex (p,k,d)-layer graph
2 3
. . .
p−2 k
. . . . . .
· · · · · ·
. . . . . .
p−2 1 S p T = d = d k layers
Source vertex Target vertex layer (p,k,d)-layer graph
2 3
. . .
p−2 k
. . . . . .
· · · · · ·
. . . . . .
p−2 1 S p T = d = d k layers
Source vertex Target vertex in & out degree layer (p,k,d)-layer graph
2 3
. . .
p−2 k
. . . . . .
· · · · · ·
. . . . . .
p−2 1 S p T = d = d k layers
Source vertex Target vertex in & out degree layer (p,k,d)-layer graph
yi = p · ui · x? + zi,
Gaussian noise (i.i.d) Signal, supported on path of G. Samples Spike along a path
[ Theorem 1 ] : -layer graph (known).
(p, k, d) G
: signal support on st-path of .
x? G (unknown)
y1, . . . , yn of i.i.d. samples from .
Observe sequence
b Σ b x n = O ⇣ log p k + k log d ⌘
Then, samples suffice for recovery.
N(0, β · x?x>
? + I)
[ Theorem 1 ] : -layer graph (known).
(p, k, d) G
: signal support on st-path of .
x? G (unknown)
y1, . . . , yn of i.i.d. samples from .
Observe sequence
b Σ b x
vs for sparse PCA. Ω ⇣ k log p k ⌘
n = O ⇣ log p k + k log d ⌘
Then, samples suffice for recovery.
N(0, β · x?x>
? + I)
[ Theorem 1 ] : -layer graph (known).
(p, k, d) G
: signal support on st-path of .
x? G (unknown)
y1, . . . , yn of i.i.d. samples from .
Observe sequence
b Σ b x
vs for sparse PCA. Ω ⇣ k log p k ⌘
n = O ⇣ log p k + k log d ⌘
Then, samples suffice for recovery.
N(0, β · x?x>
? + I)
[ Theorem 2 ] That many samples are also necessary.
[ Theorem 1 ] : -layer graph (known).
(p, k, d) G
: signal support on st-path of .
x? G (unknown)
y1, . . . , yn of i.i.d. samples from .
Observe sequence
b Σ b x
vs for sparse PCA. Ω ⇣ k log p k ⌘
n = O ⇣ log p k + k log d ⌘
Then, samples suffice for recovery.
N(0, β · x?x>
? + I)
[ Theorem 2 ] That many samples are also necessary.
A Power Method-based approach.
End?
b x ← xi+1
init x0, i ← 0 Input:
wi ← b Σxi
Power Iteration with projection step.
S T
arg min
x∈X(G) kx wk2
Project a p-dimensional on w
S T
Due to the constraints. arg min
x∈X(G) kx wk2
Project a p-dimensional on w
S T
Due to Cauchy
Due to the constraints. arg min
x∈X(G) kx wk2
Project a p-dimensional on w
S T
Due to Cauchy
Longest (weighted) path problem on G, with special weights!
G acyclic;
Due to the constraints. arg min
x∈X(G) kx wk2
Project a p-dimensional on w
Data generated according to the (p,k,d)-layer graph model. (p=1000, k=50, d=10 , 100 MC iterations)
Samples n
1000 2000 3000 4000 5000
kb xb x> ! xx>kF
0.2 0.4 0.6 0.8 1 1.2 1.4
Graph Power M. Low-D Sampling
extracted based on Harvard-Oxford Atlas [Desikan et al., 2006].
distances between center of mass of ROIs.
*[Human Connectome Project, WU-Minn Consortium]
Identified core neural components
[ Future ]