Kipf, T., Welling, M.: Semi-Supervised Classification with Graph - PowerPoint PPT Presentation

Kipf, T., Welling, M.: Semi-Supervised Classification with Graph Convolutional Networks Radim Špetlík Czech Technical University in Prague

2 Overview - Kipf and Welling - use first order approximation in Fourier-domain to obtain an efficient linear-time graph-CNNs - apply the approximation to the semi-supervised graph node classification problem

3 Graph Adjacency Matrix 𝑩 - symmetric, square matrix - 𝐵 𝑗𝑘 = 1 iff vertices 𝑤 𝑗 and 𝑤 𝑘 are incident - 𝐵 𝑗𝑘 = 0 otherwise http://mathworld.wolfram.com/AdjacencyMatrix.html

4 Graph Convolutional Network - given a graph 𝐻 = 𝑊, 𝐹 , graph-CNN is a function which: - takes as input: - feature description 𝒚 𝒋 ∈ ℝ 𝐸 for every node 𝑗 ; summarized as 𝑌 ∈ ℝ 𝑂×𝐸 , where 𝑂 is number of nodes, 𝐸 is number of input features - description of the graph structure in matrix form, typically an adjacency matrix 𝐵 - produces: - node-level output 𝑎 ∈ ℝ 𝑂×𝐺 , where 𝐺 is the number of output features per node

5 Graph Convolutional Network - is composed of non-linear functions 𝐼 (𝑚+1) = 𝑔(𝐼 𝑚 , 𝐵) , where 𝐼 0 = 𝑌 , and 𝐼 (𝑀) = 𝑎 , and 𝑀 is the number of layers.

6 Graph Convolutional Network - graphically: https://tkipf.github.io/graph-convolutional-networks/

7 Graph Convolutional Network Let’s start with a simple layer -wise propagation rule 𝑔 𝐼 𝑚 , 𝐵 = 𝜏(𝐵𝐼 𝑚 𝑋 𝑚 ) , where 𝑋 (𝑚) ∈ ℝ 𝐸 𝑚 ×𝐸 𝑚+1 is a weight matrix for the 𝑚 -th neural network layer, 𝜏(⋅) is a non-linear activation function, 𝐵 ∈ ℝ 𝑂×𝑂 is adjacency matrix, 𝑂 is the number of nodes, 𝐼 (𝑚) ∈ ℝ 𝑂×𝐸 𝑚 https://samidavies.wordpress.com/2016/09/20/whats-up-with-the-graph-laplacian/

8 Graph Convolutional Network multiplication with 𝐵 not enough, we’re missing the node itself 𝑔 𝐼 𝑚 , 𝐵 = 𝜏(𝐵𝐼 𝑚 𝑋 𝑚 ) , we fix it by 𝑔 𝐼 𝑚 , 𝐵 = 𝜏( መ 𝐵𝐼 𝑚 𝑋 𝑚 ) , where መ 𝐵 = 𝐵 + 𝐽 , 𝐽 is the identity matrix

9 Graph Convolutional Network መ 𝐵 is typically not normalized; this multiplication 𝑔 𝐼 𝑚 , 𝐵 = 𝜏( መ 𝐵𝐼 𝑚 𝑋 𝑚 ) , would change the scale of features 𝐼 (𝑚) 𝐸 − 1 𝐸 − 1 we fix that by symmetric normalization, i.e. ෡ 2 𝐵෡ 2 , where ෡ 𝐸 is the diagonal node degree matrix of መ 𝐵 , ෡ 𝐸 𝑗𝑗 = σ 𝑘 መ 𝐵 𝑗𝑘 , producing 𝐸 − 1 𝐸 − 1 𝑔 𝐼 𝑚 , 𝐵 = 𝜏( ෡ 2 𝐼 𝑚 𝑋 𝑚 ) , 2 መ 𝐵 ෡

10 Graph Convolutional Network Examining a single layer, single filter 𝜄 ∈ ℝ , and a single node feature vector 𝒚 ∈ ℝ 𝐸

11 Graph Convolutional Network 𝐵 = 𝐵 + 𝐽 , ෡ መ 𝐸 𝑗𝑗 = σ 𝑘 መ 𝐵 𝑗𝑘 … renormalization trick

12 Graph Convolutional Network ′ =- 𝜄 1 ′ 𝜄 = 𝜄 0 ′ 𝑀 − 𝐽 𝒚 ′ 𝒚 + 𝜄 1 𝜄 0

13 Graph Convolutional Network ′ 𝑀 − 𝐽 𝒚 ′ 𝒚 + 𝜄 1 𝜄 0 ෨ 𝑀 = 𝑑 𝑀 − 𝐽 , 𝑑 ∈ ℝ 𝒉 𝜾 ⋆ 𝒚 = 𝑉𝒉 𝜾 𝑉 ⊤ 𝒚 Inverse Fourier transform – filtering – Fourier transform

14 Graph Convolutional Network An efficient graph convolution approximation was performed when the multiplication was interpreted as approximation of convolution in Fourier domain using Chebyshev polynomials. where 𝑂 is number of nodes, E is number of edges, 𝐸 𝑚 is number of input channels, 𝐸 𝑚+1 is number of output channels.

15 Overview - Kipf and Welling - use first order approximation in Fourier-domain to obtain an efficient linear-time graph-CNNs - apply the approximation to the semi-supervised graph node classification problem

16 Semi-supervised Classification Task ▪ given a point set 𝑌 = {𝑦 1 , … , 𝑦 𝑚 , 𝑦 𝑚+1 , … ,𝑦 𝑜 } ▪ and a label set 𝑀 = {1,… 𝑑} , where – first 𝑚 points have labels 𝑧 1 , … , 𝑧 𝑚 ∈ 𝑀 – remaining points are unlabeled – 𝑑 is the number of classes ▪ the goal is to – predict the labels of the unlabeled points

17 Semi-supervised Classification Task ▪ graphically: https://papers.nips.cc/paper/2506-learning-with-local-and-global-consistency.pdf

18 graph-CNN EXAMPLE ▪ example: – two-layer graph-CNN 𝐵𝑌𝑋 0 𝑋 1 𝑎 = 𝑔 𝑌, 𝐵 = softmax መ 𝐵 ReLU መ where 𝑋 0 ∈ ℝ 𝐷×𝐼 with 𝐷 input channels and 𝐼 features maps, 𝑋 1 ∈ ℝ 𝐼×𝐺 with 𝐺 output features per node

19 Graph Convolutional Network - graphically: https://arxiv.org/pdf/1609.02907.pdf

20 graph-CNN EXAMPLE ▪ objective function: – cross-entropy where Y 𝑀 is a set of node indices that have labels, 𝑎 𝑚𝑔 is the element in the l-th row, f-th column of matrix 𝑎 , ground truth: 𝑍 𝑚𝑔 is 1 if instance 𝑚 comes from a class 𝑔 .

21 graph-CNN EXAMPLE - RESULTS ▪ weights trained with gradient descent

22 graph-CNN EXAMPLE - RESULTS ▪ different variants of propagation models

23 graph-CNN another EXAMPLE ▪ 3- layer GCN, “karate - club” problem, one labeled example per class: 300 training iterations

24 Limitations - Memory grows linearly with data - only works with undirected graph - assumption of locality - assumption of equal importance of self-connections vs. edges to neighboring nodes መ 𝐵 = 𝐵 + 𝜇𝐽 where 𝜇 is a learnable parameter.

25 Summary - Kipf and Welling - use first order approximation in Fourier-domain to obtain an efficient linear-time graph-CNNs - apply the approximation to the semi-supervised graph node classification problem

26 Thank you very much for your time…

27 Answers to Questions ሚ 𝐵 = 𝐵 + 𝜇𝐽 𝑂 - The lambda parameter would control the influence of neighbouring edges vs. self-connections. - How (or why) would the lambda parameter trade-off also between supervised and unsupervised learning?

Kipf, T., Welling, M.: Semi-Supervised Classification with Graph - PowerPoint PPT Presentation

Kipf, T., Welling, M.: Semi-Supervised Classification with Graph Convolutional Networks Radim petlk Czech Technical University in Prague 2 Overview - Kipf and Welling - use first order approximation in Fourier-domain to obtain an efficient

CS330 Paper Presentation: October 16th, 2019 Supervised Classification Semi-Supervised

Iterative Hybrid Algorithm for Semi-supervised Classification Martin SAVESKI Supervised by

Q Questions to an expert ti t t Problems related to drying of hardwood Johannes Welling

Use of EN Drying Quality Standards Johannes Welling Federal Research Centre for Forestry and

Margin-based Semi-supervised Learning Using Apollonius circle MONA EMADI AND JAFAR TANHA T TC S

Semi-supervised Image Classification in Likelihood Space Rong Duan, Wei Jiang, Hong Man Stevens

Shoestring: Graph-Based Semi- Supervised Classification with Severely Limited Labeled Data Wanyu

Semi-Supervised Kernel Mean Shift Clustering A Semi-Supervised Clustering Approach Motivation:

Semi-Supervised Local Fisher Semi-Supervised Local Fisher Discriminant Analysis Discriminant

Support Vector Machines (SVMs). Semi-Supervised Learning. Semi-Supervised SVMs.

Semi-Supervised Learning Maria-Florina Balcan 03/30/2015 Readings: Semi-Supervised Learning.

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Classification Semi-supervised learning based on network Speakers: Hanwen Wang, Xinxin Huang, and

Weakly Supervised Classification Weakly Supervised Classification and Robust Learning and Robust

10701 Semi supervised learning Can Unlabeled Data improve supervised learning? Important

An Information Flow Model for Conflict and Fission in Small Groups By: Wayne W. Zachary

Sean P. Cornelius With Emma K. Towlson and Albert-Lszl Barabsi www.BarabasiLab.com

x ? Machine Learning 5/4/20 Tim Althoff, UW CS547: Machine Learning for Big Data,

struc2vec : Learning Node Representations from Structural Identity Leonardo Ribeiro, Pedro

A Distance Measure for the Analysis of Polar Opinion Dynamics in Social Networks Victor Amelkin

Statistical Inference for Networks 4th Lehmann Symposium, Rice University, May 2011 Peter Bickel

CMU 15-251 Graphs: Basics Teachers: Anil Ada Ariel Procaccia (this time) Zachary Karate Club

Recommender Systems Instructor: Ekpe Okorafor 1. Accenture Big Data Academy 2. Computer