Graph Convolutional Networks - - PDF document

graph convolutional networks
SMART_READER_LITE
LIVE PREVIEW

Graph Convolutional Networks - - PDF document

Graph_Convolutional_Networks slides 12/11/19, 8(56 AM Thanks for joining me for a presentation on ... Graph Convolutional Networks http://127.0.0.1(8000/Graph_Convolutional_Networks.slides.html?print-pdf#/ Page 1 of 30


slide-1
SLIDE 1

12/11/19, 8(56 AM Graph_Convolutional_Networks slides Page 1 of 30 http://127.0.0.1(8000/Graph_Convolutional_Networks.slides.html?print-pdf#/

Thanks for joining me for a presentation on ...

Graph Convolutional Networks

slide-2
SLIDE 2

12/11/19, 8(56 AM Graph_Convolutional_Networks slides Page 2 of 30 http://127.0.0.1(8000/Graph_Convolutional_Networks.slides.html?print-pdf#/

Your Presenter:

Christian McDaniel

Data Scientist & Software Engineer,

Graph Convolutional Networks

Background

Graph Convolutional Networks are both simple and complex They borrow from multiple domains to arrive at an elegant analysis algorithm Deep Learning - Convolutional Neural Networks Spectral Graph Clustering - Graph Laplacian Signal Processing - Fourier Transform

slide-3
SLIDE 3

12/11/19, 8(56 AM Graph_Convolutional_Networks slides Page 3 of 30 http://127.0.0.1(8000/Graph_Convolutional_Networks.slides.html?print-pdf#/

Graph Convolutional Networks

Background

Imagine a dataset: Made up many points Each point can be described by p features and falls into one of c classes. These points are interconnected.

slide-4
SLIDE 4

12/11/19, 8(56 AM Graph_Convolutional_Networks slides Page 4 of 30 http://127.0.0.1(8000/Graph_Convolutional_Networks.slides.html?print-pdf#/

Graph Convolutional Networks Graph Convolutional Networks

Background - Convolutional Neural Networks

_ _ + Great success with computer vision-based applications + Some advantages of CNN's: + Computational efficiency (∼ O(V + E)) + ︎Fixed number of parameters (independent

  • f input size)

+ Localisation: acts on a local neighborhood + Learns the importance of different neighbors + Images have highly regular connectivity pattern + each pixel is "connected" to its eight neighboring pixels + Convolving a kernel matrix across the "nodes" is trivial

slide-5
SLIDE 5

12/11/19, 8(56 AM Graph_Convolutional_Networks slides Page 5 of 30 http://127.0.0.1(8000/Graph_Convolutional_Networks.slides.html?print-pdf#/

Images as Well-Behaved Graphs Images as Well-Behaved Graphs

_ _

slide-6
SLIDE 6

12/11/19, 8(56 AM Graph_Convolutional_Networks slides Page 6 of 30 http://127.0.0.1(8000/Graph_Convolutional_Networks.slides.html?print-pdf#/

Images as Well-Behaved Graphs

_ _ _

Graph Convolutional Networks

Generalizing the convolution operation for arbitrary graph structures is much more tricky We will use some very convenient (and awesome) rules from Signal Processing and Spectral Graph Theory Next we'll discuss recent advances improving performance and computational efficiency (Semi-Supervised Classification with Graph Convolutional Networks, Kipf & Welling 2016)

slide-7
SLIDE 7

12/11/19, 8(56 AM Graph_Convolutional_Networks slides Page 7 of 30 http://127.0.0.1(8000/Graph_Convolutional_Networks.slides.html?print-pdf#/

Graph Convolutional Networks

Before we learn the ituitions, let's first look at what a simple GCN may look like:

The graph-based convolution: Add in a Nonlinear Activation Graph structure is encoded directly into the neural network model by incorporating the adjacency matrix: We can do so by wrapping the above equation in a nonlinear activation function: Where is a layer-specific trainable weight matrix, is an activation function, is the matrix of activations from the layer, and A two-layer Graph Convolutional Network may look something like Where is the precalculated So... Where did this implimentation come from??

푍 = 푋Θ 퐷 ̃

−1 2 퐴̃퐷

̃

−1 2

푓(푋, 퐴) = 휎( ) 퐻(푙+1) 퐷̃

−1 2 퐴̃퐷̃ −1 2 퐻푙푊 푙

푊 푙 휎(⋅) ∈ 퐻푙 ℝ푁푥퐹 푙 푙푡ℎ = 푋 퐻0 푍 = 푓(푋, 퐴) = softmax( ReLU( 푋 ) ) 퐴̂ 퐴̂ 푊 0 푊 1 퐴̂ 퐴 퐷

−1 2

−1 2

slide-8
SLIDE 8

12/11/19, 8(56 AM Graph_Convolutional_Networks slides Page 8 of 30 http://127.0.0.1(8000/Graph_Convolutional_Networks.slides.html?print-pdf#/

  • 1. Generalizing the Convolution

A convolution operation combines one function with another function such that the first function is transformed by the other In Convolutional Neural Networks, the second function is learned for a given data set so that the transformations on are meaningful w.r.t. some class values e.g., a set of pixels showing a dog, may be transformed to values near while a set of pixels showing a cat, may be transformed to values near Let's see how we might do this for our graph-based data...

푓 푔 푔 푓 푓푑 푓푐 1

slide-9
SLIDE 9

12/11/19, 8(56 AM Graph_Convolutional_Networks slides Page 9 of 30 http://127.0.0.1(8000/Graph_Convolutional_Networks.slides.html?print-pdf#/

Graph Convolutional Networks

Signal Processing With the nodes of the graph representing individual examples from the dataset, And some data at each node E.g., scalar intensity values at each pixel of an image feature vectors for higher dimensional data We can consider the data values as signals on each node

slide-10
SLIDE 10

12/11/19, 8(56 AM Graph_Convolutional_Networks slides Page 10 of 30 http://127.0.0.1(8000/Graph_Convolutional_Networks.slides.html?print-pdf#/

Graph Convolutional Networks

Signal Processing With the nodes of the graph representing individual examples from the dataset, And some data at each node

Graph Convolutional Networks

Signal Processing With the nodes of the graph representing individual examples from the dataset, And some data at each node scalar intensity values at each pixel of an image feature vectors for higher dimensional data We can consider the data values as signals on each node The changing of the signals across the edges of the graph resemble the fluctuation

  • f a signal over time

Borrowing from signal processing, these oscillations could be characterized by their component frequencies, via the Fourier transform It just so happens that graphs have their own version of the FT...

slide-11
SLIDE 11

12/11/19, 8(56 AM Graph_Convolutional_Networks slides Page 11 of 30 http://127.0.0.1(8000/Graph_Convolutional_Networks.slides.html?print-pdf#/

Graph Convolutional Networks

From Spectral Graph Theory ... The Normalized Graph Laplacian

_ _ + Where is the diagonal identity matrix is the diagonal degree matrix and is the weighted adjacency matrix

퐿 = 퐼 − 퐴 퐷

−1 2

−1 2

퐼 퐷 퐴

Graph Convolutional Networks

The Normalized Graph Laplacian in the Spectral Domain The Normalized Graph Laplacian is a real symmetric positive semidefinite matrix for all columns z it has a complete set of orthonormal eigenvectors a.k.a. the graph Fourier modes and it has the associated ordered real nonnegative eigenvalues , the frequencies of the graph is diagonalized in the Fourier basis such that Where I.e.,

퐿 퐿푧 ≥ 0 푧푇 { ∈ 푢푙}푛−1

푙=0

ℝ푛 {λ푙}푛−1

푙=0

퐿 푈 = [ , , . . . , 푢푛 − 1] ∈ 푢0 푢1 ℝ푛푥푛 퐿 = Λ푈 푈 푇 Λ = diag([ , , . . . , ] ∈ ) 휆0 휆1 휆푛−1 ℝ푛푥푛 퐿 = 퐼 − 퐴 = Λ푈 퐷−1/2 퐷−1/2 푈 푇

slide-12
SLIDE 12

12/11/19, 8(56 AM Graph_Convolutional_Networks slides Page 12 of 30 http://127.0.0.1(8000/Graph_Convolutional_Networks.slides.html?print-pdf#/

Graph Convolutional Networks

The Normalized Graph Laplacian in the Spectral Domain The Graph Fourier Transform The Fourier transform in the graph domain, where eigenvectors denote Fourier modes and eigenvalues denote frequencies of the graph is defined As on Euclidean spaces, this transform enables the formulation of fundamental

  • perations.

E.g., the convolution operation becomes multiplication, and we can define a convolution of a data vector with a filter as

퐿 = 퐼 − 퐴 = Λ푈 퐷−1/2 퐷−1/2 푈 푇 = 푥 ∈ 푥̂ 푈 푇 ℝ푛 푥 푔Θ ∗ 푥 = 푈(( ) ⨀( 푥)) = (푈Λ )푥 푔Θ 푈 푇푔Θ 푈 푇 푔Θ 푈 푇

slide-13
SLIDE 13

12/11/19, 8(56 AM Graph_Convolutional_Networks slides Page 13 of 30 http://127.0.0.1(8000/Graph_Convolutional_Networks.slides.html?print-pdf#/

Graph Convolutional Networks

The Normalized Graph Laplacian in the Spectral Domain The Graph Fourier Transform

퐿 = 퐼 − 퐴 = Λ푈 퐷−1/2 퐷−1/2 푈 푇 ∗ 푥 = 푈(( ) ⨀( 푥)) = (푈Λ )푥 푔Θ 푈 푇푔Θ 푈 푇 푔Θ 푈 푇

Graph Convolutional Networks

The Normalized Graph Laplacian in the Spectral Domain The Graph Fourier Transform Signal Processing As computing the eigenspectrum of a matrix can be computationally exepensive, we can approximate the Fourier coefficients using a Kth order Chebychev Polynomial Where rescaled

퐿 = 퐼 − 퐴 = Λ푈 퐷−1/2 퐷−1/2 푈 푇 ∗ 푥 = 푈(( ) ⨀( 푥)) = (푈Λ )푥 푔Θ 푈 푇푔Θ 푈 푇 푔Θ 푈 푇 ∗ 푥 = (푈Λ )푥 = 푈 푥 푔Θ 푔Θ 푈 푇 퐺̂푈 푇 ′ ≈ Θ ( ) 푔Θ ∑퐾

푘=0

′푘푇푘 Λ̃ = Λ − 퐼 Λ̃

2 휆푚푎푥

slide-14
SLIDE 14

12/11/19, 8(56 AM Graph_Convolutional_Networks slides Page 14 of 30 http://127.0.0.1(8000/Graph_Convolutional_Networks.slides.html?print-pdf#/

Graph Convolutional Networks

The Normalized Graph Laplacian in the Spectral Domain The Graph Fourier Transform Signal Processing This lets us define with The convolution is now

  • localized, operating only on the nodes a distance of

away from any given node

퐿 = 퐼 − 퐴 = Λ푈 퐷−1/2 퐷−1/2 푈 푇 ∗ 푥 = 푈(( ) ⨀( 푥)) = (푈Λ )푥 푔Θ 푈 푇푔Θ 푈 푇 푔Θ 푈 푇 ∗ 푥 = (푈Λ )푥 = 푈 푥 푔Θ 푔Θ 푈 푇 퐺̂푈 푇 ≈ ( ) 푔′

Θ

∑퐾

푘=0 Θ′ 푘푇푘 Λ̃

∗ 푥 ≈ ( )푥 푔Θ ∑퐾

푘=0 Θ′ 푘푇푘 퐿̃

= 퐿 − 퐼 퐿̃

2 휆푚푎푥

퐾 퐾

slide-15
SLIDE 15

12/11/19, 8(56 AM Graph_Convolutional_Networks slides Page 15 of 30 http://127.0.0.1(8000/Graph_Convolutional_Networks.slides.html?print-pdf#/

Graph Convolutional Networks

The Normalized Graph Laplacian in the Spectral Domain The Graph Fourier Transform and Signal Processing

퐿 = 퐼 − 퐴 = Λ푈 퐷−1/2 퐷−1/2 푈 푇 ∗ 푥 = 푈(( ) ⨀( 푥)) = (푈Λ )푥 = 푈 푥 푔Θ 푈 푇푔Θ 푈 푇 푔Θ 푈 푇 퐺̂푈 푇 ∗ 푥 ≈ ( )푥 푔Θ ∑퐾

푘=0 Θ′ 푘푇푘 퐿̃

slide-16
SLIDE 16

12/11/19, 8(56 AM Graph_Convolutional_Networks slides Page 16 of 30 http://127.0.0.1(8000/Graph_Convolutional_Networks.slides.html?print-pdf#/

Graph Convolutional Networks

The Normalized Graph Laplacian in the Spectral Domain The Graph Fourier Transform and Signal Processing

퐿 = 퐼 − 퐴 = Λ푈 퐷−1/2 퐷−1/2 푈 푇 ∗ 푥 = 푈(( ) ⨀( 푥)) = (푈Λ )푥 = 푈 푥 푔Θ 푈 푇푔Θ 푈 푇 푔Θ 푈 푇 퐺̂푈 푇 ∗ 푥 ≈ ( )푥 푔Θ ∑퐾

푘=0 Θ′ 푘푇푘 퐿̃

Graph Convolutional Networks

The Normalized Graph Laplacian in the Spectral Domain The Graph Fourier Transform and Signal Processing Subsequent Computational Advancements 1) 2) When calculating the rescaled and , we can approximate , expecting the neural network paramaters to adapt accordingly during training. , where with two free parameters and

퐿 = 퐼 − 퐴 = Λ푈 퐷−1/2 퐷−1/2 푈 푇 ∗ 푥 = 푈(( ) ⨀( 푥)) = (푈Λ )푥 = 푈 푥 푔Θ 푈 푇푔Θ 푈 푇 푔Θ 푈 푇 퐺̂푈 푇 ∗ 푥 ≈ ( )푥 푔Θ ∑퐾

푘=0 Θ′ 푘푇푘 퐿̃

퐾 = 1 Λ̃ 퐿̃ ≈ 2 휆푚푎푥 = 퐿 − 퐼 퐿̃

2 휆푚푎푥

퐿 = 퐼 − 퐴 → ≈ 퐿 − 퐼 = 퐴 퐷−1/2 퐷−1/2 퐿̃ 퐷

−1 2

−1 2

∗ 푥 ≈ 푥 + (퐿 − 퐼)푥 = 푥 + 퐴 푥 푔Θ 휃′ 휃′

1

휃′ 휃′

1퐷

−1 2

−1 2

휆′ 휆′

1

slide-17
SLIDE 17

12/11/19, 8(56 AM Graph_Convolutional_Networks slides Page 17 of 30 http://127.0.0.1(8000/Graph_Convolutional_Networks.slides.html?print-pdf#/

slide-18
SLIDE 18

12/11/19, 8(56 AM Graph_Convolutional_Networks slides Page 18 of 30 http://127.0.0.1(8000/Graph_Convolutional_Networks.slides.html?print-pdf#/

Graph Convolutional Networks

The Normalized Graph Laplacian in the Spectral Domain The Graph Fourier Transform and Signal Processing Subsequent Computational Advancements 1) 2) 3) We can further constrain the problem to learning a single parameter (per dimension in ): This further prevents overfittings and reduces computations

퐿 = 퐼 − 퐴 = Λ푈 퐷−1/2 퐷−1/2 푈 푇 ∗ 푥 = 푈(( ) ⨀( 푥)) = (푈Λ )푥 = 푈 푥 푔Θ 푈 푇푔Θ 푈 푇 푔Θ 푈 푇 퐺̂푈 푇 ∗ 푥 ≈ ( )푥 푔Θ ∑퐾

푘=0 Θ′ 푘푇푘 퐿̃

퐾 = 1 ∗ 푥 ≈ 푥 + 퐴 푥 푔Θ 휃′ 휃′

1퐷

−1 2

−1 2

푥 휃 = = − 휃′ 휃′

1

slide-19
SLIDE 19

12/11/19, 8(56 AM Graph_Convolutional_Networks slides Page 19 of 30 http://127.0.0.1(8000/Graph_Convolutional_Networks.slides.html?print-pdf#/

Subsequent Computational Advancements 4) The previous revisions have left with eigenvalues in , potentially leading to numerical instabilities and exploding/vanishing gradients due to repeated operations A "renormalization trick" has been devised: With and (i.e., self-loops have been added) We generalize this to a signal with

  • dimensional vectors at each node

and filters as

퐿 [0, 2] 퐴 → 퐷

−1 2

−1 2

퐷̃

−1 2 퐴̃퐷

̃

−1 2

= 퐴 + 퐼 퐴̃ = 퐷̃ 푖푖 ∑푗 퐴̃ 푖푗 푋 ∈ ℝ푁푥푃 푃 퐹 푍 = 푋Θ 퐷 ̃

−1 2 퐴̃퐷

̃

−1 2

slide-20
SLIDE 20

12/11/19, 8(56 AM Graph_Convolutional_Networks slides Page 20 of 30 http://127.0.0.1(8000/Graph_Convolutional_Networks.slides.html?print-pdf#/

Graph Convolutional Networks

The graph-based convolution: Nonlinear Activation Graph structure is encoded directly into the neural network model by incorporating the adjacency matrix: We can do so by wrapping the above equation in a nonlinear activation function: Where is a layer-specific trainable weight matrix, is an activation function, is the matrix of activations from the layer, and A two-layer Graph Convolutional Network may look something like Where is the precalculated

푍 = 푋Θ 퐷 ̃

−1 2 퐴̃퐷

̃

−1 2

푓(푋, 퐴) = 휎( ) 퐻(푙+1) 퐷̃

−1 2 퐴̃퐷̃ −1 2 퐻푙푊 푙

푊 푙 휎(⋅) ∈ 퐻푙 ℝ푁푥퐹 푙 푙푡ℎ = 푋 퐻0 푍 = 푓(푋, 퐴) = softmax( ReLU( 푋 ) ) 퐴̂ 퐴̂ 푊 0 푊 1 퐴̂ 퐴 퐷

−1 2

−1 2

slide-21
SLIDE 21

12/11/19, 8(56 AM Graph_Convolutional_Networks slides Page 21 of 30 http://127.0.0.1(8000/Graph_Convolutional_Networks.slides.html?print-pdf#/

In [13]: """ Github: tkipf/pygcn """ import math import torch from torch.nn.parameter import Parameter from torch.nn.modules.module import Module class GraphConvolution(Module): """ Simple GCN layer, similar to https://arxiv.org/abs/1609.02907 """ def __init__(self, in_features, out_features, bias=True): super(GraphConvolution, self).__init__() self.in_features = in_features self.out_features = out_features self.weight = Parameter(torch.FloatTensor(in_features, out_features)) if bias: self.bias = Parameter(torch.FloatTensor(out_features)) else: self.register_parameter('bias', None) self.reset_parameters() def reset_parameters(self): stdv = 1. / math.sqrt(self.weight.size(1)) self.weight.data.uniform_(-stdv, stdv) if self.bias is not None: self.bias.data.uniform_(-stdv, stdv) def forward(self, input, adj): support = torch.mm(input, self.weight)

  • utput = torch.spmm(adj, support)

if self.bias is not None: return output + self.bias else: return output

slide-22
SLIDE 22

12/11/19, 8(56 AM Graph_Convolutional_Networks slides Page 22 of 30 http://127.0.0.1(8000/Graph_Convolutional_Networks.slides.html?print-pdf#/

In [ ]: """ Github: tkipf/pygcn """ import torch.nn as nn import torch.nn.functional as F from layers import GraphConvolution class GCN(nn.Module): def __init__(self, nfeat, nhid, nclass, dropout): super(GCN, self).__init__() self.gc1 = GraphConvolution(nfeat, nhid) self.gc2 = GraphConvolution(nhid, nclass) self.dropout = dropout def forward(self, x, adj): x = F.relu(self.gc1(x, adj)) x = F.dropout(x, self.dropout, training=self.training) x = self.gc2(x, adj) return F.log_softmax(x, dim=1)

Graph Convolutional Networks

Time for Some Examples!

slide-23
SLIDE 23

12/11/19, 8(56 AM Graph_Convolutional_Networks slides Page 23 of 30 http://127.0.0.1(8000/Graph_Convolutional_Networks.slides.html?print-pdf#/

Graph Convolutional Networks

Example

The Cora Dataset 2708 Machine Learning papers --> the Nodes 7 classes: Case_Based Genetic_Algorithms Neural_Networks Probabilistic_Methods Reinforcement_Learning Rule_Learning Theory Each paper is cited by at least one other paper --> the Edges Each paper is described by the frequency of 1433 unique and meaningful words --> the Features

slide-24
SLIDE 24

12/11/19, 8(56 AM Graph_Convolutional_Networks slides Page 24 of 30 http://127.0.0.1(8000/Graph_Convolutional_Networks.slides.html?print-pdf#/

In [18]: import pandas as pd edges = pd.read_csv('/Users/cm185255/Documents/pygcn/data/cora/cora.cites',se p='\t',header=None,names=["cited paper ID","ID of paper cited"]) features = pd.read_csv('/Users/cm185255/Documents/pygcn/data/cora/cora.content', sep='\t',header=None) print('Table for the Edges (citations between papers)') print(edges.head()) print('\nTable for the Features (word counts for each paper; last column = class )') print(features.head()) ... Table for the Edges (citations between papers)

ManiReg = Manifold regularization (Belkin et al 2006) SemiEmb = semi-supervised embedding (Watson et al, 2012) LP = label propagation (Zhu et al, 2003) DeepWalk = skip-gram graph embeddings (Perozzi et al, 2014) ICA = iterative classification algorithm (Lu & Getoor, 2003) Planetois (Yang et al, 2016)

slide-25
SLIDE 25

12/11/19, 8(56 AM Graph_Convolutional_Networks slides Page 25 of 30 http://127.0.0.1(8000/Graph_Convolutional_Networks.slides.html?print-pdf#/

Graph Convolutional Networks

Example #2 - Adding Attention coefficients for whole-graph classification

Graph Attention Networks (2018) Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, Yoshua Bengio Utilizes self-attention to compute a concise representation of a signal sequence

slide-26
SLIDE 26

12/11/19, 8(56 AM Graph_Convolutional_Networks slides Page 26 of 30 http://127.0.0.1(8000/Graph_Convolutional_Networks.slides.html?print-pdf#/

Graph Convolutional Networks Graph Convolutional Networks

Example #2:

Developing a Graph Convolution-Based Analysis Pipeline for Multi-Modal Neuroimage Data: An Application to Parkinson’s Disease Christian McDaniel & Shannon Quinn

slide-27
SLIDE 27

12/11/19, 8(56 AM Graph_Convolutional_Networks slides Page 27 of 30 http://127.0.0.1(8000/Graph_Convolutional_Networks.slides.html?print-pdf#/ Anatomical MRI Diffusion MRI

Hypothesis: Parkinson's disease results in structural and functional brain changes related to decreased dopamine production Neuroimaging has shown promise (though limited) for PD detection and research Combining insights from multiple modalities may help reveal new discoveries and improve detection The data Anatomical MRI helps identify regions of the brain; doesn't say much about PD pathology There are multiple tractography algorithms; each offers unique information about structural connectivity Each tractography algorithm generates a new set of features for the same graph The GCN Implimentation The algorithm will need to consolidate data from multiple tractography algorithms The outputs from the GCN on each node will need to be consolidated to a single

  • utput (PD vs HC) for each graph
slide-28
SLIDE 28

12/11/19, 8(56 AM Graph_Convolutional_Networks slides Page 28 of 30 http://127.0.0.1(8000/Graph_Convolutional_Networks.slides.html?print-pdf#/

Graph Convolutional Networks

slide-29
SLIDE 29

12/11/19, 8(56 AM Graph_Convolutional_Networks slides Page 29 of 30 http://127.0.0.1(8000/Graph_Convolutional_Networks.slides.html?print-pdf#/

Graph Convolutional Networks

Example #2

Graph Convolutional Networks

Example #2

Results Attentions

slide-30
SLIDE 30

12/11/19, 8(56 AM Graph_Convolutional_Networks slides Page 30 of 30 http://127.0.0.1(8000/Graph_Convolutional_Networks.slides.html?print-pdf#/

Fin! Questions?