CANDIDACY EXAM
Electrical Engineering Doctoral program (EDEE)
Deep Learning on Graphs
for Advanced Big Data Analysis
Student Michaël Defferrard Supervisor Xavier Bresson Advisor Pierre Vandergheynst EPFL LTS2 Laboratory August 30, 2016
Deep Learning on Graphs for Advanced Big Data Analysis Student - - PowerPoint PPT Presentation
CANDIDACY EXAM Electrical Engineering Doctoral program (EDEE) Deep Learning on Graphs for Advanced Big Data Analysis Student Supervisor Advisor Michal Defferrard Xavier Bresson Pierre Vandergheynst EPFL LTS2 Laboratory August 30, 2016
CANDIDACY EXAM
Electrical Engineering Doctoral program (EDEE)
for Advanced Big Data Analysis
Student Michaël Defferrard Supervisor Xavier Bresson Advisor Pierre Vandergheynst EPFL LTS2 Laboratory August 30, 2016
State of Research Performed Research Further Research
◮ Objective: analyze and extract information for
decision-making from large-scale and high-dimensional datasets
◮ Method: Deep Learning (DL), especially Convolutional Neural
Networks (CNNs), on Graphs
◮ Fields: Deep Learning and Graph Signal Processing (GSP)
2 / 25
State of Research Performed Research Further Research
◮ Important and growing class of data lies on irregular domains
◮ Natural graphs / networks ◮ Constructed (feature / data) graphs
◮ Modeling versatility: graphs model heterogeneous pairwise
relationships
◮ Important problem: recent works, high demand ◮ Reproduce the breakthrough of DL beyond Computer Vision !
3 / 25
State of Research Performed Research Further Research Problem State of the Art Further Work
Formulate DL components on graphs (& discover alternatives)
Convolutional Neural Networks (CNNs)
◮ Localization: compact filters for low complexity ◮ Stationarity: translation invariance ◮ Compositionality: analysis with a filterbank
Challenges
◮ Generalize convolution, downsampling and pooling to graphs ◮ Evaluate the assumptions on graph signals
4 / 25
State of Research Performed Research Further Research Problem State of the Art Further Work
Gregor and LeCun 2010; Coates and Ng 2011; Bruna et al. 2013
◮ Group features based upon similarity
◮ Reduce the number of learned parameters ◮ Can use graph adjacency matrix
◮ No weight-sharing / convolution / stationarity
5 / 25
State of Research Performed Research Further Research Problem State of the Art Further Work
Niepert, Ahmed, and Kutzkov 2016; Vialatte, Gripon, and Mercier 2016
6 / 25
State of Research Performed Research Further Research Problem State of the Art Further Work
Masci et al. 2015
◮ Generalization of CNNs to non-Euclidean manifolds ◮ Local geodesic system of polar coordinates to extract patches ◮ Tailored for geometry analysis and processing
7 / 25
State of Research Performed Research Further Research Problem State of the Art Further Work
Scarselli et al. 2009
◮ Recurrent Neural Networks (RNNs) on Graphs ◮ Propagate node representations until convergence ◮ Representations used as features
8 / 25
State of Research Performed Research Further Research Problem State of the Art Further Work
Atwood and Towsley 2015
◮ Multiplication with powers (0 to H) of transition matrix ◮ Diffused features multiplied by weight vector of support H ◮ No pooling, followed by a fully connected layer
Node classification
Graph classification Edge classification
9 / 25
State of Research Performed Research Further Research Problem State of the Art Further Work
Bruna et al. 2013; Henaff, Bruna, and LeCun 2015
◮ First spectral definition ◮ Introduced a supervised graph estimation strategy ◮ Experiments on image recognition, text categorization and
bioinformatics
◮ Spline filter parametrization ◮ Agglomerative method for coarsening
10 / 25
State of Research Performed Research Further Research Problem State of the Art Further Work
Build on (Bruna et al. 2013) and (Henaff, Bruna, and LeCun 2015)
◮ Spectral formulation ◮ Computational complexity ◮ Localization ◮ Ad hoc coarsening & pooling
11 / 25
State of Research Performed Research Further Research Learning Fast Localized Spectral Filters Coarsening & Pooling Results
Proposed an efficient spectral generalization of CNNs to graphs
Main contributions
12 / 25
State of Research Performed Research Further Research Learning Fast Localized Spectral Filters Coarsening & Pooling Results
“Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering” Defferrard, Bresson, and Vandergheynst 2016
◮ Accepted for publication at NIPS 2016 ◮ Presented by Xavier at SUTD and University of Bergen
Peer Reviews
◮ “extend ... data driven, end-to-end learning with excellent learning complexity” ◮ “very clean, efficient parametrization [for] efficient learning and evaluation” ◮ “highly promising paper ... shows how to efficiently generalize the [convolution]” ◮ “the potential for significant impact is high” ◮ “new and upcoming area with only a few recent works”
13 / 25
State of Research Performed Research Further Research Learning Fast Localized Spectral Filters Coarsening & Pooling Results
Chung 1997
◮ G = (V, E, W ): undirected and connected graph ◮ W ∈ Rn×n: weighted adjacency matrix ◮ Dii = j Wij: diagonal degree matrix ◮ x : V → R, x ∈ Rn: graph signal ◮ L = D − W ∈ Rn×n: combinatorial graph Laplacian ◮ L = In − D−1/2WD−1/2: normalized graph Laplacian ◮ L = UΛUT, U = [u0, . . . , un−1] ∈ Rn×n: graph Fourier basis ◮ ˆ
x = UTx ∈ Rn: graph Fourier transform
14 / 25
State of Research Performed Research Further Research Learning Fast Localized Spectral Filters Coarsening & Pooling Results
y = gθ(L)x = gθ(UΛUT)x = Ugθ(Λ)UTx Non-parametric filter: gθ(Λ) = diag(θ)
◮ Non-localized in vertex domain ◮ Learning complexity in O(n) ◮ Computational complexity in O(n2) (& memory)
15 / 25
State of Research Performed Research Further Research Learning Fast Localized Spectral Filters Coarsening & Pooling Results
gθ(Λ) =
K−1
θkΛk
◮ Value at j of gθ centered at i:
(gθ(L)δi)j = (gθ(L))i,j =
k θk(Lk)i,j ◮ dG(i, j) > K implies (LK)i,j = 0
(Hammond, Vandergheynst, and Gribonval 2011, Lemma 5.2)
◮ K-localized ◮ Learning complexity in O(K) ◮ Computational complexity in O(n2)
16 / 25
State of Research Performed Research Further Research Learning Fast Localized Spectral Filters Coarsening & Pooling Results
gθ(Λ) =
K−1
θkTk(˜ Λ), ˜ Λ = 2Λ/λmax − In
◮ Chebyshev polynomials: Tk(x) = 2xTk−1(x) − Tk−2(x)
with T0 = 1 and T1 = x
◮ Filtering: y = gθ(L)x = K−1 k=0 θkTk(˜
L)x
◮ Recurrence: y = gθ(L)x = [¯
x0, . . . , ¯ xK−1]θ, ¯ xk = Tk(˜ L)x = 2˜ L¯ xk−1 − ¯ xk−2 with ¯ x0 = x and ¯ x1 = ˜ Lx
◮ K-localized ◮ Learning complexity in O(K) ◮ Computational complexity in O(K|E|)
17 / 25
State of Research Performed Research Further Research Learning Fast Localized Spectral Filters Coarsening & Pooling Results
ys,j =
Fin
gθi,j(L)xs,i ∈ Rn
◮ xs,i: feature map i of sample s ◮ θi,j: trainable parameters
(Fin × Fout vectors of Chebyshev coefficients) Gradients for backpropagation:
◮ ∂E ∂θi,j = S s=1[¯
xs,i,0, . . . , ¯ xs,i,K−1]T ∂E
∂ys,j ◮ ∂E ∂xs,i = Fout j=1 gθi,j(L) ∂E ∂ys,j
Overall cost of O(K|E|FinFoutS) operations
18 / 25
State of Research Performed Research Further Research Learning Fast Localized Spectral Filters Coarsening & Pooling Results
1 5 6 4 8
10
9 0 1 2 3 2 4 5 1 2 2 3 4 5 1 4 5 1 8 9 2 3 7
11
7 11 3 2 1 6
10
◮ Coarsening: Graclus / Metis
◮ Normalized cut minimization
◮ Pooling: as regular 1D signals
◮ Satisfies parallel architectures like GPUs
◮ Activation: ReLU (or tanh, sigmoid)
19 / 25
State of Research Performed Research Further Research Learning Fast Localized Spectral Filters Coarsening & Pooling Results
2000 4000 6000 8000 10000 12000 number of features (words) 200 400 600 800 1000 1200 1400 time (ms)
Non-Param / Spline Chebyshev
Make CNNs practical for graph signals !
Spline: gθ(Λ) = Bθ (Bruna et al. 2013; Henaff, Bruna, and LeCun 2015)
20 / 25
State of Research Performed Research Further Research Learning Fast Localized Spectral Filters Coarsening & Pooling Results
500 1000 1500 2000 step 10 20 30 40 50 60 70 80 90 100 validation accuracy
Chebyshev Non-Param Spline
Validation accuracy
500 1000 1500 2000 step 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 training loss
Chebyshev Non-Param Spline
Training loss
Faster convergence !
21 / 25
State of Research Performed Research Further Research Learning Fast Localized Spectral Filters Coarsening & Pooling Results
Model Architecture Accuracy Classical CNN C32-P4-C64-P4-FC512 99.33 Proposed graph CNN GC32-P4-GC64-P4-FC512 99.14
Table: Comparison to classical CNNs.
Comparable to classical CNNs and better than other parametrizations ! Accuracy Architecture Non-Param Spline Chebyshev GC10 95.75 97.26 97.48 GC32-P4-GC64-P4-FC512 96.28 97.15 99.14
Table: Comparison between spectral filters, K = 25.
22 / 25
State of Research Performed Research Further Research
◮ Polynomial of the Laplacian ◮ Krylov subspace methods
◮ Contraction-based schemes ◮ Kron reduction ◮ Algebraic Multigrid methods (AMG) ◮ Multi-level label propagation ◮ Multi-level graph embedding ◮ Spectral clustering 23 / 25
State of Research Performed Research Further Research
◮ Rotation invariance for Computer Vision ◮ Topic Categorization on Wikipedia ◮ Collaborate for social & biological sciences 24 / 25
State of Research Performed Research Further Research
25 / 25