Kipf, T., Welling, M.: Semi-Supervised Classification with Graph - - PowerPoint PPT Presentation

β–Ά
kipf t welling m semi supervised classification with
SMART_READER_LITE
LIVE PREVIEW

Kipf, T., Welling, M.: Semi-Supervised Classification with Graph - - PowerPoint PPT Presentation

Kipf, T., Welling, M.: Semi-Supervised Classification with Graph Convolutional Networks Radim petlk Czech Technical University in Prague 2 Overview - Kipf and Welling - use first order approximation in Fourier-domain to obtain an efficient


slide-1
SLIDE 1

Kipf, T., Welling, M.: Semi-Supervised Classification with Graph Convolutional Networks Radim Ε petlΓ­k

Czech Technical University in Prague

slide-2
SLIDE 2

Overview

2

  • Kipf and Welling
  • use first order approximation in Fourier-domain to
  • btain an efficient linear-time graph-CNNs
  • apply the approximation to the semi-supervised graph node

classification problem

slide-3
SLIDE 3

Graph Adjacency Matrix 𝑩

3

  • symmetric,

square matrix

  • π΅π‘—π‘˜ = 1 iff vertices

𝑀𝑗 and π‘€π‘˜ are incident

  • π΅π‘—π‘˜ = 0 otherwise

http://mathworld.wolfram.com/AdjacencyMatrix.html

slide-4
SLIDE 4

Graph Convolutional Network

4

  • given a graph 𝐻 = π‘Š, 𝐹 , graph-CNN is a function which:
  • takes as input:
  • feature description π’šπ’‹ ∈ ℝ𝐸 for every node 𝑗;

summarized as π‘Œ ∈ ℝ𝑂×𝐸, where 𝑂 is number of nodes, 𝐸 is number of input features

  • description of the graph structure in matrix form, typically

an adjacency matrix 𝐡

  • produces:
  • node-level output π‘Ž ∈ ℝ𝑂×𝐺, where 𝐺 is the number of
  • utput features per node
slide-5
SLIDE 5

Graph Convolutional Network

5

  • is composed of non-linear functions

𝐼(π‘š+1) = 𝑔(𝐼 π‘š , 𝐡), where 𝐼 0 = π‘Œ, and 𝐼(𝑀) = π‘Ž, and 𝑀 is the number of layers.

slide-6
SLIDE 6

Graph Convolutional Network

6

  • graphically:

https://tkipf.github.io/graph-convolutional-networks/

slide-7
SLIDE 7

Graph Convolutional Network

7

Let’s start with a simple layer-wise propagation rule 𝑔 𝐼 π‘š , 𝐡 = 𝜏(𝐡𝐼 π‘š 𝑋 π‘š ), where 𝑋(π‘š) ∈ β„πΈπ‘šΓ—πΈπ‘š+1 is a weight matrix for the π‘š-th neural network layer, 𝜏(β‹…) is a non-linear activation function, 𝐡 ∈ ℝ𝑂×𝑂 is adjacency matrix, 𝑂 is the number of nodes, 𝐼(π‘š) ∈ β„π‘‚Γ—πΈπ‘š

https://samidavies.wordpress.com/2016/09/20/whats-up-with-the-graph-laplacian/

slide-8
SLIDE 8

Graph Convolutional Network

8

multiplication with 𝐡 not enough, we’re missing the node itself 𝑔 𝐼 π‘š , 𝐡 = 𝜏(𝐡𝐼 π‘š 𝑋 π‘š ), we fix it by 𝑔 𝐼 π‘š , 𝐡 = 𝜏( መ 𝐡𝐼 π‘š 𝑋 π‘š ), where መ 𝐡 = 𝐡 + 𝐽, 𝐽 is the identity matrix

slide-9
SLIDE 9

Graph Convolutional Network

9

መ 𝐡 is typically not normalized; this multiplication 𝑔 𝐼 π‘š , 𝐡 = 𝜏( መ 𝐡𝐼 π‘š 𝑋 π‘š ), would change the scale of features 𝐼(π‘š) we fix that by symmetric normalization, i.e. ΰ·‘ πΈβˆ’1

2𝐡ෑ

πΈβˆ’1

2, where ΰ·‘

𝐸 is the diagonal node degree matrix of መ 𝐡, ΰ·‘ 𝐸𝑗𝑗 = Οƒπ‘˜ መ π΅π‘—π‘˜, producing 𝑔 𝐼 π‘š , 𝐡 = 𝜏(ΰ·‘ πΈβˆ’1

2 መ

𝐡ෑ πΈβˆ’1

2𝐼 π‘š 𝑋 π‘š ),

slide-10
SLIDE 10

Graph Convolutional Network

10

Examining a single layer, single filter πœ„ ∈ ℝ, and a single node feature vector π’š ∈ ℝ𝐸

slide-11
SLIDE 11

Graph Convolutional Network

11

መ 𝐡 = 𝐡 + 𝐽, ΰ·‘ 𝐸𝑗𝑗 = Οƒπ‘˜ መ π΅π‘—π‘˜ … renormalization trick

slide-12
SLIDE 12

Graph Convolutional Network

12

πœ„ = πœ„0

β€²=- πœ„1 β€²

πœ„0

β€²π’š + πœ„1 β€² 𝑀 βˆ’ 𝐽 π’š

slide-13
SLIDE 13

Graph Convolutional Network

13

πœ„0

β€²π’š + πœ„1 β€² 𝑀 βˆ’ 𝐽 π’š

Inverse Fourier transform – filtering – Fourier transform

ΰ·¨ 𝑀 = 𝑑 𝑀 βˆ’ 𝐽 , 𝑑 ∈ ℝ

π’‰πœΎ ⋆ π’š = π‘‰π’‰πœΎπ‘‰βŠ€π’š

slide-14
SLIDE 14

Graph Convolutional Network

14

An efficient graph convolution approximation was performed when the multiplication was interpreted as approximation of convolution in Fourier domain using Chebyshev polynomials. where 𝑂 is number of nodes, E is number of edges, πΈπ‘š is number of input channels, πΈπ‘š+1 is number of output channels.

slide-15
SLIDE 15

Overview

15

  • Kipf and Welling
  • use first order approximation in Fourier-domain to obtain an

efficient linear-time graph-CNNs

  • apply the approximation to the semi-supervised graph

node classification problem

slide-16
SLIDE 16

β–ͺ given a point set π‘Œ = {𝑦1, … , π‘¦π‘š, π‘¦π‘š+1, … ,π‘¦π‘œ} β–ͺ and a label set 𝑀 = {1,… 𝑑}, where

– first π‘š points have labels 𝑧1, … , π‘§π‘š ∈ 𝑀 – remaining points are unlabeled – 𝑑 is the number of classes

β–ͺ the goal is to

– predict the labels of the unlabeled points

16

Semi-supervised Classification Task

slide-17
SLIDE 17

β–ͺ graphically:

17

Semi-supervised Classification Task

https://papers.nips.cc/paper/2506-learning-with-local-and-global-consistency.pdf

slide-18
SLIDE 18

β–ͺ example:

– two-layer graph-CNN π‘Ž = 𝑔 π‘Œ, 𝐡 = softmax መ 𝐡 ReLU መ π΅π‘Œπ‘‹ 0 𝑋 1 where 𝑋 0 ∈ ℝ𝐷×𝐼 with 𝐷 input channels and 𝐼 features maps, 𝑋 1 ∈ ℝ𝐼×𝐺 with 𝐺 output features per node

18

graph-CNN EXAMPLE

slide-19
SLIDE 19

Graph Convolutional Network

19

  • graphically:

https://arxiv.org/pdf/1609.02907.pdf

slide-20
SLIDE 20

β–ͺ objective function:

– cross-entropy where Y𝑀 is a set of node indices that have labels, π‘Žπ‘šπ‘” is the element in the l-th row, f-th column of matrix π‘Ž, ground truth: 𝑍

π‘šπ‘” is 1 if instance π‘š comes from a class 𝑔.

20

graph-CNN EXAMPLE

slide-21
SLIDE 21

β–ͺ weights trained with gradient descent

21

graph-CNN EXAMPLE - RESULTS

slide-22
SLIDE 22

β–ͺ different variants of propagation models

22

graph-CNN EXAMPLE - RESULTS

slide-23
SLIDE 23

β–ͺ 3-layer GCN, β€œkarate-club” problem, one labeled example per

class:

23

graph-CNN another EXAMPLE

300 training iterations

slide-24
SLIDE 24

Limitations

24

  • Memory grows linearly with data
  • only works with undirected graph
  • assumption of locality
  • assumption of equal importance of self-connections vs.

edges to neighboring nodes መ 𝐡 = 𝐡 + πœ‡π½ where πœ‡ is a learnable parameter.

slide-25
SLIDE 25

Summary

25

  • Kipf and Welling
  • use first order approximation in Fourier-domain to obtain an

efficient linear-time graph-CNNs

  • apply the approximation to the semi-supervised graph node

classification problem

slide-26
SLIDE 26

26

Thank you very much for your time…

slide-27
SLIDE 27

Answers to Questions

27

ሚ 𝐡 = 𝐡 + πœ‡π½π‘‚

  • The lambda parameter would control the influence of

neighbouring edges vs. self-connections.

  • How (or why) would the lambda parameter trade-off also

between supervised and unsupervised learning?