Graph Convolutional Network Heting Gao University of Illinois at - - PowerPoint PPT Presentation

graph convolutional network
SMART_READER_LITE
LIVE PREVIEW

Graph Convolutional Network Heting Gao University of Illinois at - - PowerPoint PPT Presentation

Graph Convolutional Network Heting Gao University of Illinois at Urbana-Champaign hgao17@illinois.edu November 10, 2018 Heting Gao (UIUC) Short title November 10, 2018 1 / 27 Overview Graph Convolution 1 Preliminary Graph Fourier


slide-1
SLIDE 1

Graph Convolutional Network

Heting Gao

University of Illinois at Urbana-Champaign hgao17@illinois.edu

November 10, 2018

Heting Gao (UIUC) Short title November 10, 2018 1 / 27

slide-2
SLIDE 2

Overview

1

Graph Convolution Preliminary Graph Fourier Transform Graph Spectral Filtering Fast Localized Spectral Filtering Convolutional Graph Network

Heting Gao (UIUC) Short title November 10, 2018 2 / 27

slide-3
SLIDE 3

Preliminaryi

A connected undirected graph is represented as G = {V, E, W}

V is the set of vertices and |V| = N. E is the set of edges. W is the weighted adjacency matrix.

Wi,j is the weight of the edge e = (i, j) connecting vertex i and j. Wi,j = 0 if the edge does not exist. If the weight of the graph is not naturally defined, a common way to define the weight is Wi,j =

  • exp
  • − [dist(i,j)]2

  • if dist(i, j) ≤ κ
  • therwise

for some parameter κ and θ. dist(i, j) can be the actual distance on the graph between vertex i and j, or the distance between features of vertex i and j

i(IEEE-2012) [David i Shuman] The Emerging Field of Signal Processing on Graphs Heting Gao (UIUC) Short title November 10, 2018 3 / 27

slide-4
SLIDE 4

Preliminary

A signal or a function on the graph f : V → R can be represented as a vector f ∈ RN. fi = f (vi) is the function value on vertex vi ∈ V The Non-Normalized Graph Laplacian L = D − W

W is the weight matrix. D is the degree matrix. It is a diagonal matrix and diagonal element is the sum of all the incident edge weights. Di,i = N

j=1 Wi,j

The Graph Laplacian L is a difference operator. ∀f∈ RN, (Lf )(i) =

  • j∈Ni

Wi,j[f (i) − f (j)] (Lf)i =

  • j∈Ni

Wi,j(fi − fj) where Ni denote the set of neighbor nodes of vertex i.

Heting Gao (UIUC) Short title November 10, 2018 4 / 27

slide-5
SLIDE 5

Laplacian

1-D Laplacian operator ∆ f ′(t) = lim

h→0

f (t + h) − f (t) h ∆f (t) = ∂2 ∂t2 f (t) = ∂ ∂t f ′(t) = lim

h→0

f ′(t + h) − f ′(t) h

Heting Gao (UIUC) Short title November 10, 2018 5 / 27

slide-6
SLIDE 6

Laplacian

1-D discrete Laplacian operator ∆ f ′[n] = f [n + 1] − f [n] ∆f [n] = f ′[n + 1] − f ′[n] = (f [n + 1] − f [n]) − (f [n] − f [n − 1]) = f [n + 1] + f [n − 1] − 2f [n] 2-D discrete Laplacian operator ∆ ∆f [n, m] = f [n + 1, m] + f [n − 1, m] + f [n, m + 1] + f [n, m − 1] −4f [n, m]

Heting Gao (UIUC) Short title November 10, 2018 6 / 27

slide-7
SLIDE 7

Laplacian

The Graph Laplacian L is a discrete Laplaican operator on the graph signals. (Lf)i =

  • j∈Ni

Wi,j(fi − fj) −∆f = Lf

Heting Gao (UIUC) Short title November 10, 2018 7 / 27

slide-8
SLIDE 8

Fourier Transform

For a given function f (t), its Fourier transform F at a given frequency 2πk is F(Ω) =< f (t), ejΩt >=

  • R

f (t)e−jΩtdt The Laplacian of the basis ejΩt is in form of itself. −∆ejΩt = − ∂2 ∂t2 ejΩt = Ω2ejΩt For graph Fourier transform, we also want to find a set of analogous

  • basis. Let u ∈ RN be a basis for graph transform, we want

−∆u = Lu = λu This is eigenvalue decomposition

Heting Gao (UIUC) Short title November 10, 2018 8 / 27

slide-9
SLIDE 9

Graph Fourier Transform

Let U = [ul]l=1,...,N denote the matrix of eigenvectors of L Let Λ = [λl]l=1,...,N denote the diagonal matrix of eigenvalues of L For a given signal f, its Fourier transform F(λl) at the given frequency λl is F(λl) =< f, ul >=

N

  • i=1

fiu∗

l,i

The inverse Fourier transform is then fi =

N

  • l=1

F(λl)ul,i Let F ∈ RN denote the Fourier transform vector of the given graph signal f ∈ RN, we have the following matrix form of Fourier transform. F = UTf f = UF

Heting Gao (UIUC) Short title November 10, 2018 9 / 27

slide-10
SLIDE 10

Graph Spectral Filtering

Let F : RN → RN denote the graph Fourier transform Let F−1 : RN → RN denote the inverse graph Fourier transform. Let h ∈ RN denote the filter function on the graph. Let H ∈ RN denote the Fourier transform of the filter function. Let y ∈ RN denote the function after filtering on the graph. y = h ∗ f = F−1[F(h) ⊙ F(f)] = U[UTh ⊙ UTf] = U[H ⊙ UTf] = U    H(λ1) ... H(λl)    UTf

Heting Gao (UIUC) Short title November 10, 2018 10 / 27

slide-11
SLIDE 11

Graph Spectral Filtering

Define H(L) the spectral filter as H(L) = U    H(λ1) ... H(λl)    UT The adjustable parameter would be [Hl]l=1,2,...,N Let θ = [H(λl)]l=1,2,...,N. Let gθ(Λ) = diag(θ) We can define the convolutional layer as y = σ(Ugθ(Λ)UTf)

Heting Gao (UIUC) Short title November 10, 2018 11 / 27

slide-12
SLIDE 12

Fast Localized Spectral Filteringii

If we define the convolutional layer as y = σ(Ugθ(Λ)UTf) There are however 3 limitations.

The convolution is not localized. With arbitrary θ, the signal f can be propagated to any other nodes. θ ∈ RN means that we need N parameter. Eigen decomposition has a computational complexity of O(N3) and every forward propagation has complexity of O(N2)

ii(NIPS-2016) [Michal Defferrard] Convolutional Neural Networks onGraphs with Fast Localized Spectral Filtering Heting Gao (UIUC) Short title November 10, 2018 12 / 27

slide-13
SLIDE 13

Fast Localized Spectral Filtering

We can instead define gθ(Λ) =

K

  • k=1

θkΛk We get the new convolutional layer as y = σ(Ugθ(Λ)UTf) = σ(U(

K

  • k=1

θkΛk)UTf) = σ(

K

  • k=1

θkUΛkUTf) = σ(

K

  • k=1

θkLKf)

Heting Gao (UIUC) Short title November 10, 2018 13 / 27

slide-14
SLIDE 14

Fast Localized Spectral Filtering

The new definition of a convolutional layer is y = σ(

K

  • k=1

θkLkf) It have three advantage

The convolution is localized and is exactly K-hop localized we are using at most K’s power of L We need only K parameters We do not need to decompose L and the forward propagation can be approximated using Chebyshev polynomials (I do not understand this part but I will still try to describe the steps described in the paper).

Heting Gao (UIUC) Short title November 10, 2018 14 / 27

slide-15
SLIDE 15

Chebychev Polynomial

Chebychev Polynomial Expansion T0(y) = 1 T1(y) = y Tk(y) = 2yTk−1 − Tk−2 These polynomials forms an orthogonal basis for x ∈ L2([−1, 1],

dy

1−y2 ), the Hilbert space of square integrable

functions with respect to the measure

dy

1−y2

1

−1

Tl(y)Tm(y)

  • 1 − y2

dy =

  • δl,mπ/2

m, l > 0 π m = l = 0

Heting Gao (UIUC) Short title November 10, 2018 15 / 27

slide-16
SLIDE 16

Chebychev Polynomial

In particular, ∀h ∈ L2([−1, 1],

dy

1−y2 ), h has the following chebychev

polynomial expansion. h(y) = 1 2c0 +

  • k=1

ckTk(y) Since λ ∈ [0, λmax] plug in y = U( 2Λ

λmax − I)UT = 2L λmax − I,

gθ(L) =

K

  • k=1

θkLk = 1 2c0 +

  • k=1

ckTk(y) where Tk(y) can be computed recursively as Tk(y) = 2yTk−1(y) + Tk−2(y)

Heting Gao (UIUC) Short title November 10, 2018 16 / 27

slide-17
SLIDE 17

Chebychev Polynomial

Let ¯ fk = Tk(y)f can be computed recursively as ¯ fk = Tk(y)f = Tk(y)f = 2yTk−1(y)f + Tk−2(y)f = 2y¯ fk−1 + ¯ fk−2 = 2( 2L λmax − I)¯ fk−1 + ¯ fk−2 The approximated convolutional layer is as y = σ(gθ(L)f) = σ(

K

  • k=0

θk¯ fk) with ¯ f0 = f, and ¯ f1 = yf = ( 2L

λmax − I)f

Heting Gao (UIUC) Short title November 10, 2018 17 / 27

slide-18
SLIDE 18

Convolutional Graph Networkiv

Instead of using K-hop localized filter, set K = 1, but instead stack multiple layers. Use symmetric normalized Laplacian Lsym = D− 1

2 LD− 1 2 = I − D− 1 2 WD− 1 2 iii.

The entries of Lsym are Lsym

i,j

=        1 , i = j

1

didj

, i = j and vertex i andj are connected , otherwise The equation is equivalent to (Lsymf)i =

1 √di

  • j∈Ni Wi,j( fi

√di − fj

dj )

The eigenvalues [λl]l=1,2,...,N of Lsym is in the range of [0, 2]

iiiKipf’s paper uses A to represent the weight matrix. I will stick to W to be consistent in this presentation iv(ICLR-2017) [Thomas N. Kipf] Semi-Supervised Classification with Graph Convolutional Networks Heting Gao (UIUC) Short title November 10, 2018 18 / 27

slide-19
SLIDE 19

Convolutional Graph Network

Then the convolutional layer can be approximated as y = σ(gθ(L)f) ≈ σ(θ0f + θ1yf) = σ(θ0f + θ1(2Lsym λmax − I)f) = σ[(θ0 − θ1)f + θ1Lsymf] (assume λmax = 2) = σ[(θ0 − θ1)f + θ1(I − D− 1

2 WD− 1 2 )f]

= σ(θ0f − θ1D− 1

2 WD− 1 2 )f) Heting Gao (UIUC) Short title November 10, 2018 19 / 27

slide-20
SLIDE 20

Convolutional Graph Network

The approximated output layer y = σ(θ0f − θ1D− 1

2 WD− 1 2 )f)

The number of parameters is further reduced to 1 in the paper by assuming θ = θ0 = −θ1 y = σ(θ(I + D− 1

2 WD− 1 2 )f)

The matrix I + D− 1

2 WD− 1 2 has eigenvalues λ ∈ [0, 2]. Repeated

application on this matrix can result in numerical instability. Renormalize I + D− 1

2 WD− 1 2 to

D− 1

2

W D− 1

2

  • W

= W + I

  • Di,i

=

  • j
  • Wi,j

Heting Gao (UIUC) Short title November 10, 2018 20 / 27

slide-21
SLIDE 21

Convolutional Graph Network

The final version of graph convolutional network is Z = D− 1

2

W D− 1

2 FΘ

where

F ∈ RN×C is the signal matrix. Θ ∈ RC×F is the learnable parameters of the filter. Z ∈ RN×F is the output matrix. C is the number of input feature channels. F is the number of output feature channels.

Heting Gao (UIUC) Short title November 10, 2018 21 / 27

slide-22
SLIDE 22

Convolutional Graph Network

The paper performed semi-supervised node classification using the following GCN architecture Z = f (F, W) = softmax( ˆ WReLU( ˆ WFΘ(0))Θ(1)) where ˆ W = D− 1

2

W D− 1

2

The loss function L is defined on all the labeled nodes L = −

  • l∈YL

F

  • f =1

Ylf lnZlf where YL is the set of labeled node indices, Y is the set of true labels.

Heting Gao (UIUC) Short title November 10, 2018 22 / 27

slide-23
SLIDE 23

Convolutional Graph Network

The experiments are performed on the following dataset The results are

Heting Gao (UIUC) Short title November 10, 2018 23 / 27

slide-24
SLIDE 24

Convolutional Graph Network

Model comparison

Heting Gao (UIUC) Short title November 10, 2018 24 / 27

slide-25
SLIDE 25

Convolutional Graph Network

The limitation of this paper

Memory requirement is high. Each iteration requires the entire adjacency matrix. Does not work for large dense graphs. Directed edges and edge features cannot be naturally Incorporated into this model. The adding of I to adjacency matrix W is assuming the equal importance of self-connection and edges to neighbor nodes. It might be useful to introduce a weight λ on self-loop

  • W = W + λI

The convolutional layer only captures 1-hop locality. The expressiveness of the filter is still limited.

Heting Gao (UIUC) Short title November 10, 2018 25 / 27

slide-26
SLIDE 26

Reference

(IEEE-2012) [David i Shuman] The Emerging Field of Signal Processing on Graphs (NIPS-2016) [Michal Defferrard] Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering (ICLR-2017) [Thomas N. Kipf] Semi-Supervised Classification with Graph Convolutional Networks This presentation follows the outline of this post https://www.zhihu.com/question/54504471

Heting Gao (UIUC) Short title November 10, 2018 26 / 27

slide-27
SLIDE 27

The End

Heting Gao (UIUC) Short title November 10, 2018 27 / 27