Learning graphs from data:
A signal processing perspective
Graph Signal Processing Workshop Pittsburgh, PA, May 2017
Xiaowen Dong
MIT Media Lab
Learning graphs from data: A signal processing perspective Xiaowen - - PowerPoint PPT Presentation
Learning graphs from data: A signal processing perspective Xiaowen Dong MIT Media Lab Graph Signal Processing Workshop Pittsburgh, PA, May 2017 Introduction What is the problem of graph learning? 2 /34 Introduction What is the
Graph Signal Processing Workshop Pittsburgh, PA, May 2017
MIT Media Lab
/34
2
/34
2
# samples # variables
M N
(distribution, model, etc)
/34
2
# samples # variables
M N
(distribution, model, etc)
topology/operator or equivalent)
/34
2
# samples # variables
M N
x : V → RN
(distribution, model, etc)
topology/operator or equivalent)
v1 v2
...
negative positive
v1 v2 v3 v4 v5 v6
v7 v8 v9
/34
2
# samples # variables
M N
x : V → RN
(distribution, model, etc)
topology/operator or equivalent)
v1 v2
...
negative positive
v1 v2 v3 v4 v5 v6
v7 v8 v9
/34
2
# samples # variables
M N
x : V → RN
(distribution, model, etc)
topology/operator or equivalent)
v1 v2
...
negative positive
v1 v2 v3 v4 v5 v6
v7 v8 v9
/34
3
/34
3
Objective: functional connectivity between brain regions Input: fMRI recordings in these regions Objective: behavioral similarity/ influence between people Input: individual history of activities
image credit: http://blog.myesr.org/mri-reveals-the-human-connectome/ https://www.iconexperience.com
/34
3
Objective: functional connectivity between brain regions Input: fMRI recordings in these regions Objective: behavioral similarity/ influence between people Input: individual history of activities
image credit: http://blog.myesr.org/mri-reveals-the-human-connectome/ https://www.iconexperience.com
/34
3
Objective: functional connectivity between brain regions Input: fMRI recordings in these regions Objective: behavioral similarity/ influence between people Input: individual history of activities
How do we build/learn the graph?
image credit: http://blog.myesr.org/mri-reveals-the-human-connectome/ https://www.iconexperience.com
/34
4
/34
5
/34
5
Undirected graphical models: Markov random fields (MRF)
/34
5
Directed graphical models: Bayesian networks (BN) Undirected graphical models: Markov random fields (MRF)
/34
5
Factor graphs Directed graphical models: Bayesian networks (BN) Undirected graphical models: Markov random fields (MRF)
/34
5
Factor graphs Directed graphical models: Bayesian networks (BN) Undirected graphical models: Markov random fields (MRF)
/34
6
/34
6
v1 v2
v3
v4
v5
x1 x2 x3 x4 x5
(i, j) / ∈ E ⇔ xi ⊥ xj | x \ {xi, xj} conditional independence:
/34
6
v1 v2
v3
v4
v5
x1 x2 x3 x4 x5
P(x|Θ) = 1 Z(Θ)exp X
i∈V
θiix2
i +
X
(i,j)∈E
θijxixj
Θ
(i, j) / ∈ E ⇔ xi ⊥ xj | x \ {xi, xj} conditional independence:
/34
6
v1 v2
v3
v4
v5
x1 x2 x3 x4 x5
P(x|Θ) = 1 Z(Θ)exp X
i∈V
θiix2
i +
X
(i,j)∈E
θijxixj
Θ
Gaussian graphical models with precision :
Θ
(i, j) / ∈ E ⇔ xi ⊥ xj | x \ {xi, xj} conditional independence:
P(x|Θ) = |Θ|1/2 (2π)N/2 exp
2xT Θx
/34
6
v1 v2
v3
v4
v5
x1 x2 x3 x4 x5
P(x|Θ) = 1 Z(Θ)exp X
i∈V
θiix2
i +
X
(i,j)∈E
θijxixj
Θ
Gaussian graphical models with precision :
Θ
Learning a sparse :
Θ (i, j) / ∈ E ⇔ xi ⊥ xj | x \ {xi, xj}
conditional independence:
P(x|Θ) = |Θ|1/2 (2π)N/2 exp
2xT Θx
/34
7 1972 covariance selection Dempster
Prune the smallest elements in precision (inverse covariance) matrix
/34
7 1972 covariance selection Dempster
Prune the smallest elements in precision (inverse covariance) matrix
data matrix
S−1
inverse of sample covariance
X ∼ N(0, Θ)
Θ
groundtruth precision
/34
7 1972 covariance selection Dempster
Prune the smallest elements in precision (inverse covariance) matrix
Not applicable when sample covariance is not invertible!
data matrix
S−1
inverse of sample covariance
X ∼ N(0, Θ)
Θ
groundtruth precision
/34
8 1972 2006 covariance selection Dempster Meinshausen & Buhlmann
`1-regularized
neighborhood regression
v1 v2
v3
v4
v5
Learning a graph = learning neighborhood of each node
/34
8 1972 2006 covariance selection Dempster Meinshausen & Buhlmann
`1-regularized
neighborhood regression
v1 v2
v3
v4
v5
β13 β14 β15 β12
X1
T
X\1
T
Learning a graph = learning neighborhood of each node
/34
8 1972 2006 covariance selection Dempster Meinshausen & Buhlmann
`1-regularized
neighborhood regression
v1 v2
v3
v4
v5
min
β1
||X1 − X\1β1||2 + λ||β1||1
β14 β12
X1
T
X\1
T
LASSO regression: Learning a graph = learning neighborhood of each node
/34
9 1972 2008 2006 covariance selection Dempster Meinshausen & Buhlmann
`1-regularized
neighborhood regression
S
X
Estimation of sparse precision matrix
log-determinant
`1
Banerjee Friedman
/34
9 1972 2008 2006 covariance selection Dempster Meinshausen & Buhlmann
`1-regularized
neighborhood regression
S
X
graphical LASSO maximizes likelihood
Θ
Estimation of sparse precision matrix
|Θ|M/2exp
M
X
m=1
1 2X(m)T ΘX(m)
log-determinant
`1
Banerjee Friedman
/34
9 1972 2008 2006 covariance selection Dempster Meinshausen & Buhlmann
`1-regularized
neighborhood regression
S
X
max
Θ
log detΘ − tr(SΘ) − ρ||Θ||1 graphical LASSO maximizes likelihood
Θ
log-likelihood function
Estimation of sparse precision matrix
log-determinant
`1
Banerjee Friedman
/34
9 1972 2008 2006 covariance selection Dempster Meinshausen & Buhlmann
`1-regularized
neighborhood regression Hsieh quadratic approx.
log-likelihood 2011
S
X
max
Θ
log detΘ − tr(SΘ) − ρ||Θ||1 graphical LASSO maximizes likelihood
Θ
log-likelihood function
Estimation of sparse precision matrix
log-determinant
`1
Banerjee Friedman
/34
10 1972 2010 2006 covariance selection Dempster Meinshausen & Buhlmann Ravikumar
`1-regularized
neighborhood regression
`1-regularized
logistic regression
v1 v2
v3
v4
v5
β13 β14 β15 β12
Neighborhood learning for discrete variables
Hsieh quadratic approx.
log-likelihood 2011 2008
log-determinant
`1
Banerjee Friedman
/34
10 1972 2010 2006 covariance selection Dempster Meinshausen & Buhlmann Ravikumar
`1-regularized
neighborhood regression
`1-regularized
logistic regression
v1 v2
v3
v4
v5
β13 β14 β15 β12
Neighborhood learning for discrete variables
Hsieh quadratic approx.
log-likelihood 2011 2008
log-determinant
`1
Banerjee Friedman
m
X1m
X\1m
max
β1
log Pβ(X1m|X\1m) − λ||β1||1
/34
10 1972 2010 2006 covariance selection Dempster Meinshausen & Buhlmann Ravikumar
`1-regularized
neighborhood regression
`1-regularized
logistic regression
v1 v2
v3
v4
v5
β13 β14 β15 β12
logistic function
regularized logistic regression: Neighborhood learning for discrete variables
Hsieh quadratic approx.
log-likelihood 2011 2008
log-determinant
`1
Banerjee Friedman
m
X1m
X\1m
/34
11
/34
11
attractive GMRF (Slawski and Hein 2015)
/34
11
attractive GMRF (Slawski and Hein 2015)
equivalent to graph topology
/34
11
attractive GMRF (Slawski and Hein 2015)
From arbitrary precision matrix to graph Laplacian!
equivalent to graph topology
/34
12 1972 2011 2010 2006 covariance selection Dempster Meinshausen & Buhlmann
`1-regularized
neighborhood regression
`1-regularized
logistic regression quadratic approx.
log-likelihood
max
Θ
log detΘ − tr(SΘ) − ρ||Θ||1
graph Laplacian L can be the precision, BUT it is singular 2008 Ravikumar Hsieh
log-determinant
`1
Banerjee Friedman
/34
12 1972 2011 2010 2006 covariance selection Dempster Meinshausen & Buhlmann
`1-regularized
neighborhood regression
`1-regularized
logistic regression
`1-regularized
log-determinant
Lake quadratic approx.
log-likelihood
max
Θ
log detΘ − tr(SΘ) − ρ||Θ||1
graph Laplacian L can be the precision, BUT it is singular 2008 Ravikumar Hsieh
s.t. Θ = L + 1 σ2 I
log-determinant
`1
Banerjee Friedman
/34
12 1972 2011 2010 2006 covariance selection Dempster Meinshausen & Buhlmann
`1-regularized
neighborhood regression
`1-regularized
logistic regression
`1-regularized
log-determinant
Lake quadratic approx.
log-likelihood
max
Θ
log detΘ − tr(SΘ) − ρ||Θ||1
precision by graphical LASSO Laplacian by Lake et al. graph Laplacian L can be the precision, BUT it is singular 2008 Ravikumar Hsieh
s.t. Θ = L + 1 σ2 I
log-determinant
`1
Banerjee Friedman
/34
12 1972 2011 2010 2006 covariance selection Dempster Meinshausen & Buhlmann
`1-regularized
neighborhood regression
`1-regularized
logistic regression
`1-regularized
log-determinant
Lake quadratic approx.
log-likelihood
max
Θ
log detΘ − tr(SΘ) − ρ||Θ||1
precision by graphical LASSO Laplacian by Lake et al. graph Laplacian L can be the precision, BUT it is singular 2008 Ravikumar Hsieh
s.t. Θ = L + 1 σ2 I
Slawski and Hein (2015)
log-determinant
`1
Banerjee Friedman
/34
13 1972 2011 2010 2006 covariance selection Dempster Meinshausen & Buhlmann
`1-regularized
neighborhood regression
`1-regularized
logistic regression
`1-regularized
log-determinant
Lake 2009 Daitch quadratic form
quadratic approx.
log-likelihood 2013 Hu quadratic form
locally linear embedding [Roweis00]
||LX||2
F = XT L2X
tr(XT LsX) − β||W||F
2008 Ravikumar Hsieh Slawski and Hein (2015)
log-determinant
`1
Banerjee Friedman
/34
14 1972 2011 2010 2006 covariance selection Dempster Meinshausen & Buhlmann
`1-regularized
neighborhood regression
`1-regularized
logistic regression
`1-regularized
log-determinant
Lake 2009 Daitch quadratic form
quadratic approx.
log-likelihood 2013 Hu quadratic form
GSP
v1 v2 v3 v4 v5 v6
v7 v8 v9
x : V → RN
2008 Ravikumar Hsieh Slawski and Hein (2015)
log-determinant
`1
Banerjee Friedman
/34
14 1972 2011 2010 2006 covariance selection Dempster Meinshausen & Buhlmann
`1-regularized
neighborhood regression
`1-regularized
logistic regression
`1-regularized
log-determinant
Lake 2009 Daitch quadratic form
quadratic approx.
log-likelihood 2016 2015 2013 Hu quadratic form
Dong Segarra Pasdeloup Egilmez Mei Kalofolias Thanou GSP Baingana Chepuri signal processing perspective
v1 v2 v3 v4 v5 v6
v7 v8 v9
x : V → RN
2008 Ravikumar Hsieh Slawski and Hein (2015)
log-determinant
`1
Banerjee Friedman
/34
topologies with non-negative weights
domain interpretation
15
/34
topologies with non-negative weights
domain interpretation
15
learning problem
non-Gaussian behavior
/34
16
/34
v1 v2
v3
v4
v5
16
/34
v1 v2
v3
v4
v5
16
Fourier/wavelet atoms graph Fourier/ wavelet coefficient graph dictionary coefficient trained dictionary atoms
[Coifman06,Narang09,Hammond11, Shuman13,Sandryhaila13] [Zhang12,Thanou14]
/34
v1 v2
v3
v4
v5
16
/34
v1 v2
v3
v4
v5
16
/34
v1 v2
v3
v4
v5
16
/34
17
xT Lx = 1 2 X
i,j
Wij (x(i) − x(j))2
/34
17
xT Lx = 1 2 X
i,j
Wij (x(i) − x(j))2
x : V → RN
v1
v2 v3 v4 v5 v6 v7 v8 v9
v1
v2 v3 v4 v5 v6 v7 v8 v9
xT Lx = 1 xT Lx = 21
/34
17
xT Lx = 1 2 X
i,j
Wij (x(i) − x(j))2
Similar to previous approaches:
min
L
XT L2X min
L
tr(XT LsX) − β||W||F
max
Θ=L+ 1
σ2 I log detΘ − 1
M tr(XXT Θ) − ρ||Θ||1
Lake (2010): Hu (2013):
x : V → RN
v1
v2 v3 v4 v5 v6 v7 v8 v9
v1
v2 v3 v4 v5 v6 v7 v8 v9
xT Lx = 1 xT Lx = 21
Daitch (2009):
/34
18
D(G) = χ
× =
v1 v2 v3 v4
v5
c G x χ
/34
18
min
L,Y ||X − Y||2 F + α tr(YT LY) + β||L||2 F
data fidelity smoothness on Y regularization
quadratic form:
D(G) = χ
× =
v1 v2 v3 v4
v5
c G x χ
min
c
||x − χc||2
2 − log Pc(c)
/34
18
min
L,Y ||X − Y||2 F + α tr(YT LY) + β||L||2 F
data fidelity smoothness on Y regularization
quadratic form:
x
D(G) = χ
× =
v1 v2 v3 v4
v5
c G x χ
v1 v2 v3 v4 v5 v6
v7 v8 v9
min
c
||x − χc||2
2 − log Pc(c)
/34
18
min
L,Y ||X − Y||2 F + α tr(YT LY) + β||L||2 F
data fidelity smoothness on Y regularization
quadratic form:
y
D(G) = χ
× =
v1 v2 v3 v4
v5
c G x χ
v1 v2 v3 v4 v5 v6
v7 v8 v9
min
c
||x − χc||2
2 − log Pc(c)
/34
18
min
L,Y ||X − Y||2 F + α tr(YT LY) + β||L||2 F
data fidelity smoothness on Y regularization
Learning enforces signal property (global smoothness)!
quadratic form:
y
D(G) = χ
× =
v1 v2 v3 v4
v5
c G x χ
v1 v2 v3 v4 v5 v6
v7 v8 v9
min
c
||x − χc||2
2 − log Pc(c)
/34
19
Θ
× =
v1 v2 v3 v4
v5
c G x χ
s.t. K = S − α 2 (11T − I)
min
Θ
tr(ΘK) − log detΘ
/34
19
Θ
non-negative negative generalized Laplacian
× =
v1 v2 v3 v4
v5
c G x χ
s.t. K = S − α 2 (11T − I)
min
Θ
tr(ΘK) − log detΘ
Θ = L + V = Deg − W + V
/34
19
Θ
non-negative negative diagonally dominant generalized Laplacian generalized Laplacian
× =
v1 v2 v3 v4
v5
c G x χ
s.t. K = S − α 2 (11T − I)
min
Θ
tr(ΘK) − log detΘ
Θ = L + V = Deg − W + V
Θ = L + V = Deg − W + V (V ≥ 0)
/34
19
Θ
non-negative negative combinatorial Laplacian diagonally dominant generalized Laplacian generalized Laplacian
× =
v1 v2 v3 v4
v5
c G x χ
s.t. K = S − α 2 (11T − I)
min
Θ
tr(ΘK) − log detΘ
Θ = L + V = Deg − W + V
Θ = L + V = Deg − W + V (V ≥ 0)
Θ = L = Deg − W
/34
19
Θ
non-negative negative combinatorial Laplacian diagonally dominant generalized Laplacian generalized Laplacian
× =
v1 v2 v3 v4
v5
c G x χ
s.t. K = S − α 2 (11T − I)
min
Θ
tr(ΘK) − log detΘ
Generalizes graphical LASSO and Lake Adding priors on edge weights leads to interpretation of MAP estimation
Θ = L + V = Deg − W + V
Θ = L + V = Deg − W + V (V ≥ 0)
Θ = L = Deg − W
/34
20
v1
v2 v3 v4 v5 v6 v7 v8 v9
× =
v1 v2 v3 v4
v5
c G x χ
/34
20
v1
v2 v3 v4 v5 v6 v7 v8 v9
× =
v1 v2 v3 v4
v5
c G x χ
/34
20
v1
v2 v3 v4 v5 v6 v7 v8 v9
v1
v2 v3 v4 v5 v6 v7 v8 v9
× =
v1 v2 v3 v4
v5
c G x χ
/34
20
v1
v2 v3 v4 v5 v6 v7 v8 v9
v1
v2 v3 v4 v5 v6 v7 v8 v9
v1
v2 v3 v4 v5 v6 v7 v8 v9
× =
v1 v2 v3 v4
v5
c G x χ
/34
20
v1
v2 v3 v4 v5 v6 v7 v8 v9
v1
v2 v3 v4 v5 v6 v7 v8 v9
v1
v2 v3 v4 v5 v6 v7 v8 v9
× =
v1 v2 v3 v4
v5
c G x χ
Similar in spirit to Dempster Good for learning unweighted graph Explicit edge-handler is desirable in some applications
/34
local smoothness than global one!)
21
/34
local smoothness than global one!)
21
v1 v2 v3 v4 v6
v7 v8 v9
v5
initial stage
/34
local smoothness than global one!)
21
v1 v2 v3 v4 v6
v7 v8 v9
v5 v1 v2 v3 v4 v5 v6
v7 v8 v9
initial stage
heat diffusion
/34
local smoothness than global one!)
21
v1 v2 v3 v4 v6
v7 v8 v9
v5 v1 v2 v3 v4 v5 v6
v7 v8 v9
initial stage
v1 v2 v3 v4 v5 v6
v7 v8 v9
heat diffusion general graph shift
(e.g., A)
/34
22
c G x × =
v1 v2 v3 v4
v5
Wk
norm
D(G) = Tk(m) = Wk(m)
norm
{cm}
/34
22
(polynomial of )
Wnorm
c G x × =
v1 v2 v3 v4
v5
Wk
norm
D(G) = Tk(m) = Wk(m)
norm
Σ = E h
M
X
m=1
X(m)X(m)T i =
M
X
m=1
W2k(m)
norm
{cm}
/34
22
Wnorm
(polynomial of )
Wnorm
Wnorm
c G x × =
v1 v2 v3 v4
v5
Wk
norm
D(G) = Tk(m) = Wk(m)
norm
Σ = E h
M
X
m=1
X(m)X(m)T i =
M
X
m=1
W2k(m)
norm
{cm}
/34
22
More a “graph-centric” learning framework: Cost function on graph components instead of signals
Wnorm
(polynomial of )
Wnorm
Wnorm
c G x × =
v1 v2 v3 v4
v5
Wk
norm
D(G) = Tk(m) = Wk(m)
norm
Σ = E h
M
X
m=1
X(m)X(m)T i =
M
X
m=1
W2k(m)
norm
{cm}
/34
23
(diffusion defined by a graph shift operator that can be arbitrary, but practically W or L)
SG
D(G) = H(SG) =
L−1
X
l=0
hlSG
l
× =
v1 v2 v3 v4
v5
c G x
H(SG)
/34
23
(diffusion defined by a graph shift operator that can be arbitrary, but practically W or L)
SG
D(G) = H(SG) =
L−1
X
l=0
hlSG
l
× =
v1 v2 v3 v4
v5
c G x
H(SG)
/34
23
(diffusion defined by a graph shift operator that can be arbitrary, but practically W or L)
SG
Σ = HHT
min
SG,λ ||SG||0
s.t. SG =
N
X
n=1
λnvnvn
T
“spectral templates” (eigenvectors)
D(G) = H(SG) =
L−1
X
l=0
hlSG
l
× =
v1 v2 v3 v4
v5
c G x
H(SG)
SG
/34
23
(diffusion defined by a graph shift operator that can be arbitrary, but practically W or L)
SG
Σ = HHT
min
SG,λ ||SG||0
s.t. SG =
N
X
n=1
λnvnvn
T
“spectral templates” (eigenvectors)
D(G) = H(SG) =
L−1
X
l=0
hlSG
l
× =
v1 v2 v3 v4
v5
c G x
H(SG)
Similar in spirit to Pasdeloup, same assumption on stationarity but different inference framework due to different D Can handle noisy or incomplete information on spectral templates
SG
/34
24
D(G) = e−τL
× =
v1 v2 v3 v4
v5
c G x
e−τL
/34
24
τ
D(G) = e−τL
× =
v1 v2 v3 v4
v5
c G x
e−τL
/34
24
τ
s.t. D = [e−τ1L, ..., e−τSL]
min
L,C,τ ||X − D(L)C||2 F + α M
X
m=1
||cm||1 + β||L||2
F
D(G) = e−τL
× =
v1 v2 v3 v4
v5
c G x
e−τL
/34
24
τ
data fidelity sparsity on c regularization
s.t. D = [e−τ1L, ..., e−τSL]
min
L,C,τ ||X − D(L)C||2 F + α M
X
m=1
||cm||1 + β||L||2
F
D(G) = e−τL
× =
v1 v2 v3 v4
v5
c G x
e−τL
/34
24
τ
data fidelity sparsity on c regularization
s.t. D = [e−τ1L, ..., e−τSL]
min
L,C,τ ||X − D(L)C||2 F + α M
X
m=1
||cm||1 + β||L||2
F
D(G) = e−τL
× =
v1 v2 v3 v4
v5
c G x
e−τL
Still diffusion-based model, but more “signal-centric” No assumption on eigenvectors/stationarity, but on signal structure and sparsity Can be extended to general polynomial case (Maretic et al. 2017)
/34
friends at different timestamps
(SEM)
25
/34
26
Ds(G) = Ps(W) cs
× =
v1 v2 v3 v4
v5
G x
ΣS
s=1
Ps(W)
x[t − s]
/34
26
=
+ +
v1 v2 v3 v4 v5 v6 v7 v8 v9
v1
v2 v3 v4 v5 v6 v7 v8 v9
v1
v2 v3 v4 v5 v6 v7 v8 v9
x[t]
x[t − 1]
x[t − S]
x[t − s] Ds(G) = Ps(W) cs
× =
v1 v2 v3 v4
v5
G x
ΣS
s=1
Ps(W)
x[t − s]
/34
26
=
+ +
v1 v2 v3 v4 v5 v6 v7 v8 v9
v1
v2 v3 v4 v5 v6 v7 v8 v9
v1
v2 v3 v4 v5 v6 v7 v8 v9
x[t]
x[t − 1]
x[t − S]
x[t − s] Ds(G) = Ps(W) cs
× =
v1 v2 v3 v4
v5
G x
ΣS
s=1
Ps(W)
x[t − s]
min
W,a
1 2
K
X
k=S+1
||x[k] −
S
X
s=1
Ps(W)x[k − s]||2
2 + λ1||vec(W)||1 + λ2||a||1
/34
26
data fidelity sparsity on W sparsity on a
=
+ +
v1 v2 v3 v4 v5 v6 v7 v8 v9
v1
v2 v3 v4 v5 v6 v7 v8 v9
v1
v2 v3 v4 v5 v6 v7 v8 v9
x[t]
x[t − 1]
x[t − S]
x[t − s] Ds(G) = Ps(W) cs
× =
v1 v2 v3 v4
v5
G x
ΣS
s=1
Ps(W)
x[t − s]
min
W,a
1 2
K
X
k=S+1
||x[k] −
S
X
s=1
Ps(W)x[k − s]||2
2 + λ1||vec(W)||1 + λ2||a||1
/34
26
data fidelity sparsity on W sparsity on a
=
+ +
v1 v2 v3 v4 v5 v6 v7 v8 v9
v1
v2 v3 v4 v5 v6 v7 v8 v9
v1
v2 v3 v4 v5 v6 v7 v8 v9
x[t]
x[t − 1]
x[t − S]
x[t − s] Ds(G) = Ps(W) cs
× =
v1 v2 v3 v4
v5
G x
ΣS
s=1
Ps(W)
x[t − s]
min
W,a
1 2
K
X
k=S+1
||x[k] −
S
X
s=1
Ps(W)x[k − s]||2
2 + λ1||vec(W)||1 + λ2||a||1
Polynomial design similar in spirit to Pasdeloup and Segarra Good for inferring causal relations between signals Kernelized version (nonlinear): Shen et al. (2016)
/34
27
× =
v1 v2 v3 v4
v5
G x
W
x
+
ext.
(topologies switch at each time between S discrete states)
D(G) = Ws(t)
/34
27
x[t] = Ws(t)x[t] + Bs(t)y[t]
internal (neighbors) external
× =
v1 v2 v3 v4
v5
G x
W
x
+
ext.
(topologies switch at each time between S discrete states)
D(G) = Ws(t)
/34
27
x[t] = Ws(t)x[t] + Bs(t)y[t]
internal (neighbors) external data fidelity sparsity on W
min
{Ws(t),Bs(t)}
1 2
T
X
t=1
||x[t] − Ws(t)x[t] − Bs(t)y[t]||2
F + S
X
s=1
λs||Ws(t)||1
× =
v1 v2 v3 v4
v5
G x
W
x
+
ext.
(topologies switch at each time between S discrete states)
D(G) = Ws(t)
/34
27
x[t] = Ws(t)x[t] + Bs(t)y[t]
internal (neighbors) external data fidelity sparsity on W
min
{Ws(t),Bs(t)}
1 2
T
X
t=1
||x[t] − Ws(t)x[t] − Bs(t)y[t]||2
F + S
X
s=1
λs||Ws(t)||1
× =
v1 v2 v3 v4
v5
G x
W
x
+
ext.
(topologies switch at each time between S discrete states)
D(G) = Ws(t)
Good for inferring causal relations between signals as well as dynamic topologies
/34
28
Methods Signal model Assumption Learning output Edge direction Inference Dong (2015) Global smoothness Gaussian Laplacian Undirected Signal-centric Kalofolias (2016) Global smoothness Gaussian Adjacency Undirected Signal-centric Egilmez (2016) Global smoothness Gaussian Generalized Laplacian Undirected Signal-centric Chepuri (2016) Global smoothness Gaussian Adjacency Undirected Graph-centric Pasdeloup (2015) Diffusion by Adj. Stationary Normalized Adj./ Laplacian Undirected Graph-centric Segarra (2016) Diffusion by Graph shift operator Stationary Graph shift
Undirected Graph-centric Thanou (2016) Heat diffusion Sparsity Laplacian Undirected Signal-centric Mei (2015) Time-varying Dependent on previous states Adjacency Directed Signal-centric Baingana (2016) Time-varying Dependent on current int/ext info Time-varying Adjacency Directed Signal-centric
/34
29
v1 v2 v3 v4 v5 v6
v7 v8 v9
v1 v2 v3 v4 v5 v6
v7 v8 v9
GSP for graph learning
/34
29
v1 v2 v3 v4 v5 v6
v7 v8 v9
v1 v2 v3 v4 v5 v6
v7 v8 v9
Learning input
e.g., by sampling GSP for graph learning
/34
29
v1 v2 v3 v4 v5 v6
v7 v8 v9
v1 v2 v3 v4 v5 v6
v7 v8 v9
Learning input
e.g., by sampling Learning output
(Shen 2017)
(Kalofolias 2017)
“ego-networks”
graph representation GSP for graph learning
/34
29
v1 v2 v3 v4 v5 v6
v7 v8 v9
v1 v2 v3 v4 v5 v6
v7 v8 v9
Learning input
e.g., by sampling Signal/graph model
localization in vertex-frequency domain, bandlimited (Sardellitti 2017) Learning output
(Shen 2017)
(Kalofolias 2017)
“ego-networks”
graph representation GSP for graph learning
/34
29
v1 v2 v3 v4 v5 v6
v7 v8 v9
v1 v2 v3 v4 v5 v6
v7 v8 v9
Learning input
e.g., by sampling Signal/graph model
localization in vertex-frequency domain, bandlimited (Sardellitti 2017) Theoretical consideration
(Rabbat 2017)
Learning output
(Shen 2017)
(Kalofolias 2017)
“ego-networks”
graph representation GSP for graph learning
/34
29
v1 v2 v3 v4 v5 v6
v7 v8 v9
v1 v2 v3 v4 v5 v6
v7 v8 v9
Learning input
e.g., by sampling Signal/graph model
localization in vertex-frequency domain, bandlimited (Sardellitti 2017) Theoretical consideration
(Rabbat 2017)
Learning objective
classification (Yankelevsky 2016), coding and compression (Rotondo 2015, Fracastoro 2016)
e.g., clustering, dim. reduction, ranking Learning output
(Shen 2017)
(Kalofolias 2017)
“ego-networks”
graph representation GSP for graph learning
/34
30 Friday June 2nd Thursday June 1st
/34
estimation for multivariate gaussian or binary data. The Journal of Machine Learning Research, 9:485-516, 2008.
arXiv:1609.03448, 2016.
processing on graphs: Extending high-dimensional data analysis to networks and other irregular
International Conference on Machine Learning, 201-208, 2009.
signal representation. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 3736-3740, 2015.
signal representations. IEEE Transactions on Signal Processing, 64(23):6160-6173, 2016.
31
/34
Proceedings of the Picture Coding Symposium, 2016.
Biostatistics, 9(3):432-441, 2008.
Applied and Computational Harmonic Analysis, 30(2):129-150, 2011.
estimation using quadratic approximation. In Advances in Neural Information Processing Systems 24, 2330-2338, 2011.
for brain connectivity learning of Alzheimer’s disease. In Proceedings of the IEEE International Symposium on Biomedical Imaging, 616-619, 2013.
the IEEE International Conference on Acoustics, Speech and Signal Processing, 2826-2830, 2017.
Annual Cognitive Science Conference, 2010.
32
/34
IEEE International Conference on Acoustics, Speech and Signal Processing, 6523-6527, 2017.
Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 5495-5499, 2015.
Annals of Statistics, 34(3):1436-1462, 2006.
Pacific Signal and Information Processing Association Annual Summit and Conference, 441-444, 2009.
weighted graph topologies from observations of diffused signals. arXiv:1605.02569, 2016.
regularized logistic regression. Annals of Statistics, 38(3):1287-1319, 2010.
block transform coding of images. In Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 571-574, 2015.
Processing, 61(7):1644-1656, 2013.
33
/34
inferring effective brain network connectivity. arXiv:1610.06551, 2016.
identification of directed networks. IEEE Transactions on Signal Processing, 65(10):2503-2516, 2017.
Gaussian Markov random fields. Linear Algebra and its Applications, 473:145-179, 2015.
Transactions on Signal Processing, 62(15):3849-3862, 2014.
2016.
and Information Processing over Networks, 2(4):611-624, 2016.
IEEE International Conference on Acoustics, Speech and Signal Processing, 3373-3376, 2012.
34