On the Experimental transferability of Spectral Graph Convolutional - - PowerPoint PPT Presentation

on the experimental transferability of spectral graph
SMART_READER_LITE
LIVE PREVIEW

On the Experimental transferability of Spectral Graph Convolutional - - PowerPoint PPT Presentation

On the Experimental transferability of Spectral Graph Convolutional Networks Masters project presentation 6/ 7/ 2020 Axel Nilsson Outline 1. Introduction - Spectral graph convolutional networks - ChebNet 2. Benchmarking -


slide-1
SLIDE 1

On the Experimental transferability of Spectral Graph Convolutional Networks

Master’s project presentation 6/ 7/ 2020 Axel Nilsson

slide-2
SLIDE 2
  • 1. Introduction 

  • Spectral graph convolutional networks

  • ChebNet
  • 2. Benchmarking

  • Benchmarking GNNs 

  • OGB
  • 3. Structural edge dropout
  • 4. Questions (20 minutes)

Outline

slide-3
SLIDE 3
  • 1. Introduction - Graphs

3

Meshes Social networks

G - Graph N - Set of nodes E - set of edges A - Adjacency matrix D - Degree matrix h - Node features e - Edge features g - Graph features

Molecules Worldwide Web

slide-4
SLIDE 4

4

Graph Convolutional Networks (GCNs)

Convolutional neural networks do not translate well to graphs:

  • No ordering of nodes
  • No orientation
  • Varying neighbourhood sizes

∆n = In − D

1 2 AD 1 2

<latexit sha1_base64="BClwxI3FfOIOcgvRI4ve2qwkvQ=">ACGnicbVDLSsNAFJ3UV62vqEs3g0VwY0mqoBuhahe6q2Af0MQwmU7aoZNJmJkIJeQ73Pgrblwo4k7c+DdO2y09cC9HM65l5l7/JhRqSzr2ygsLC4trxRXS2vrG5tb5vZOS0aJwKSJIxaJjo8kYZSTpqKkU4sCAp9Rtr+8Grstx+IkDTid2oUEzdEfU4DipHSkmfaTp0whTwOz290O4L1+9QJBMKpnaXVLIMXs4pnlq2KNQGcJ3ZOyiBHwzM/nV6Ek5BwhRmSsmtbsXJTJBTFjGQlJ5EkRniI+qSrKUchkW46OS2DB1rpwSASuriCE/X3RopCKUehrydDpAZy1huL/3ndRAVnbkp5nCjC8fShIGFQRXCcE+xRQbBiI0QFlT/FeIB0jEonWZJh2DPnjxPWtWKfVyp3p6Ua5d5HEWwB/bBIbDBKaiBa9ATYDBI3gGr+DNeDJejHfjYzpaMPKdXfAHxtcPVHCf1w=</latexit>

∆u = D − A

<latexit sha1_base64="tpbfwoVml3a/G4OEIusF4n9c+jA=">AB+XicbVBNS8NAEN34WetX1KOXxSJ4sSRV0ItQtQePFewHtCFstpt26WYTdieFEvpPvHhQxKv/xJv/xm2bg7Y+GHi8N8PMvCARXIPjfFsrq2vrG5uFreL2zu7evn1w2NRxqihr0FjEqh0QzQSXrAEcBGsnipEoEKwVDO+nfmvElOaxfIJxwryI9CUPOSVgJN+2uzUmgPgpvsE1fI5vfbvklJ0Z8DJxc1JCOeq+/dXtxTSNmAQqiNYd10nAy4gCTgWbFLupZgmhQ9JnHUMliZj2stnlE3xqlB4OY2VKAp6pvycyEmk9jgLTGREY6EVvKv7ndVIr72MyQFJul8UZgKDGexoB7XDEKYmwIoYqbWzEdEUomLCKJgR38eVl0qyU3Yty5fGyVL3L4yigY3SCzpCLrlAVPaA6aiCKRugZvaI3K7NerHfrY96YuUzR+gPrM8f276R2A=</latexit>

The Laplacian operator:

∆ = ΦT ΛΦ

<latexit sha1_base64="OYsZpP546dVRoZHgMDSVFXo3E=">ACBXicbVDLSgMxFM3UV62vUZe6CBbBVZmpgm6Eoi5cuKjQh9AZSyZzpw3NPEgyQhm6ceOvuHGhiFv/wZ1/YzrtQlsPBE7OuYfkHi/hTCrL+jYKC4tLyvF1dLa+sbmlrm905JxKig0acxjcecRCZxF0FRMcbhLBJDQ49D2Bpdjv/0AQrI4aqhAm5IehELGCVKS1z37kCrg+x069z+4b2LnRYZ/k165ZtipWDjxP7CkpoynqXfPL8WOahApyomUHdtKlJsRoRjlMCo5qYSE0AHpQUfTiIQg3SzfYoQPteLjIBb6RArn6u9ERkIph6GnJ0Oi+nLWG4v/eZ1UBWduxqIkVRDRyUNByrGK8bgS7DMBVPGhJoQKpv+KaZ8IQpUurqRLsGdXnietasU+rlRvT8q1i2kdRbSHDtARstEpqFrVEdNRNEjekav6M14Ml6Md+NjMlowpld9AfG5w8lwJcT</latexit>

Spectral decomposition: n eigenvalues λ and eigenvectors Φ Vanilla spectral GCN:

h`+1 = ξ ⇣ Φˆ θ(Λ)Φ>h`⌘ = ξ ⇣ ˆ θ(∆)h`⌘

<latexit sha1_base64="rygHktOznzaPEHpPqJKOHTvAE=">ACh3icbVFNb9NAEF2bAq35CnDksiICBZC3VbQC1L5OHDgECTSVsqm0Xg9jldr63dMSKy/Ff4Udz4N6zTIJGWkVZ6em/ezOxMWmvlKI5/B+GNnZu3bu/uRXfu3rv/YPDw0YmrGitxKitd2bMUHGplcEqKNJ7VFqFMNZ6mFx97/fQ7Wqcq841WNc5LWBqVKwnkqcXgp0hxqUzrfDPqouK8Faj1q6Tjz/k78UMJjTmNxKRQXBRAraACTo+El98kwxe8F7zLqrqjhfnvZsLq5YFeUlE2W2K3xCTb7AtikSaLK/4/DFYBiP43Xw6yDZgCHbxGQx+CWySjYlGpIanJslcU3zFiwpqbGLROwBnkBS5x5aKBEN2/Xe+z4M89kPK+sf4b4mv3X0ULp3KpMfWYJVLirWk/+T5s1lB/NW2XqhtDIy0Z5ozlVvD8Kz5RFSXrlAUir/KxcFmBkj9d5JeQXP3ydXCyP04OxvtfD4fHzbr2GVP2FM2Ygl7y47ZzZhUyaDneBlcBAchnvh6/BNeHSZGgYbz2O2FeH7Pz6nwZ8=</latexit>

h - node feature xi - Non-linear activation function theta - matrix of learnable weights phi - eigenvectors of the laplacian

slide-5
SLIDE 5

ChebNet: a fast spectral GCN

5

Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering. page 9. arXiv:1606.09375

˜ ∆ = 2λ−1

max∆n − I

<latexit sha1_base64="E90jtfDd+M5HCaRdNbC5umZQvuU=">ACMXicbVDLShxBFK32kegkJqNZuikcAtk4dI+BEQdaGQhYGMCtOT5nb1HaewqrpTdVscmv4lN/5JyMZFQsjWn7DmsfCRAwWHc87l1j1poaSjMLwN5uYXFl+8XFpuvHq98uZtc3XtxOWlFdgVucrtWQoOlTYJUkKzwqLoFOFp+nF/tg/vUTrZG6+0ajAvoZzIwdSAHkpaR7GJFWGVXyAiqDmO7zDY+XnM0gqDVf192ozqvnUTgzf5EeNZR9IUVX4Y9tiYnKrky910myF7XAC/pxEM9JiMxwnzZ9xlotSoyGhwLleFBbUr8CSFArRlw6LEBcwDn2PDWg0fWrycU1f+VjA9y658hPlEfTlSgnRvp1Cc10NA9cbi/7xeSYP/UqaoiQ0YrpoUCpOR/XxzNpUZAaeQLCSv9XLoZgQZAvueFLiJ6e/JycdNrRVrvz9WNrd29WxJbZxvsA4vYJ7bLDtkx6zLBrtkv9pv9CW6C2+Bv8G8anQtmM+/YIwR39zsbqQU=</latexit>

Learned filters:

  • O(1) parameter per layer
  • Filters are localised
  • No eigendecomposition
  • Filters are basis dependent

Re-normalised Laplacian: Chebyschev Polynoms: Re-scales the eigenvalues to [-1,1] Recursively computes a basis For the corresponding order k

   T0 = h T1 = ˜ ∆T0 Tn≥2 = 2 ˜ ∆Tn−1 − Tn−2

<latexit sha1_base64="LICBwgaGZRWz2b6/74DFAgP4BTU=">ACanicbVHBbtQwFHTSAmWhdFskUNWLxYLEpaskIMFlpQo4cGylbltpvVo5zkvWquME+6XSKvKhv8iNL+DCR+BNc6BdnmR5NPOex6ntZIWo+hXEG5tP3r8ZOfp4Nnz3Rd7w/2DC1s1RsBUVKoyVym3oKSGKUpUcFUb4GWq4DK9/rWL2/AWFnpc1zVMC95oWUuBUdPLYa3TEGOrGUpFK3Bi+cq1yg/NFG7nJkjK2hrGbMJQqg5Z9A4Xc0U7vVU1ZAT9o4iYJ3WzTx7E7vbEDRjorHdhRhZLHC+Go2gcdU3QdyDEenrdDH8ybJKNCVoFIpbO4ujGuf+UJRCgbdoLNRcXPMCZh5qXoKdt1Ujr7zTEbzyvilkXbsvxMtL61dlanvLDku7UNtTf5PmzWYf563UtcNghZ3RnmjKFZ0nTvNpAGBauUBF0b6u1Kx5IYL9L8z8CHED5+8CS6ScfxhnJx9HJ186ePYIUfkDXlPYvKJnJDv5JRMiSC/g93gVfA6+BMehIfh0V1rGPQzL8m9Ct/+BSwvuX8=</latexit>

gθ( ˜ ∆)h =

k

X

j=0

θjTj( ˜ ∆)

<latexit sha1_base64="3CE7J9nOv6NxqLkm/HcR1l1ow=">ACNXicbVC7SgNBFJ2N7/iKWtoMBiE2YTcK2giFhYWCkUsnGZndwkY2YfzNwVwrI/ZeN/WGlhoYitv+DkUWjihRnOnHsOd+7xYyk02varlZuZnZtfWFzKL6+srq0XNjbrOkoUhxqPZKRufaZBihBqKFDCbayABb6EG793NujfPIDSIgqr2I+hGbBOKNqCMzSUV7jseKmLXUCWlVwUsgWpew7SPdo9jVSeCl98d2dpf2MjoSGiKj1cE96fAKRbtsD4tOA2cMimRcV17h2W1FPAkgRC6Z1g3HjrGZMoWCS8jybqIhZrzHOtAwMGQB6GY63Dqju4Zp0XakzAmRDtnfjpQFWvcD3ygDhl092RuQ/UaCbaPmqkI4wQh5KNB7URSjOgQtoSCjKvgGMK2H+SnmXKcbRBJ03ITiTK0+DeqXs7Jcr1wfFk9NxHItkm+yQEnHITkhF+SK1Agnj+SFvJMP68l6sz6tr5E0Z409W+RPWd8/9/Ksvg=</latexit>
slide-6
SLIDE 6

6

A proof of transferability

Levie et al. - 2019 - Transferability of Spectral Graph Convolutional Networks

The work of Levie et al. debunked the prejudices of the vanilla spectral GCNs “If two graphs discretise the same continuous metric space, then a spectral GCN has approximately the same repercussion

  • n both graphs.”

Spectral GCNs should work well on sets of graphs

slide-7
SLIDE 7

Objective

Give experimental proof of transferability

  • f spectral GCNs on datasets with sets of

graphs Try to improve the transferability of the spectral GCNs -> Structural Edge Dropout

7

slide-8
SLIDE 8
  • 2. Benchmarking
  • Several benchmarks aim at comparing GCNs
  • Provides a series of different tasks with large datasets
  • Framework giving training hyper parameters which

ensures replicability

  • None include spectral GCNs!

8

slide-9
SLIDE 9

9

Data of the MNIST Superpixel dataset - label: 0

  • Task: Graph classification on images to superpixel graphs with the

SLIC transform.

  • Results: Average performance on MNIST and CIFAR10 compared to

similar models

Graph Classification MNIST & CIFAR10 Superpixels

slide-10
SLIDE 10

10

Data of the ZINC, node colours are related to atom type - label: -0.2070

  • Task: Graph regression, prediction of the

solubility of each molecule

  • Result: Best performance between models

learning isotropic filters. Good performance

  • verall.
  • Questionable whether the train/val/test set are

representative of any underlying space

  • Unlikely that each molecule is a sample of a

continuous space

Graph regression - ZINC

slide-11
SLIDE 11

Node classification - SBM

11

Data of the SBM Cluster dataset. The colour of the nodes represent their labels.

  • Task: Predict the node label between six

communities of various sizes with a probability p of being connected to other nodes of the community and q to others

  • Result: Very good performance
  • All graphs describe a non-euclidian

continuous underlying manifold

slide-12
SLIDE 12

OGB: Result Summary

12

link : https://ogb.stanford.edu/docs/leader_graphprop/

  • Task: Graph regression, prediction of the proprieties of each molecule
  • Result: Above average performance overall. Good performances with

regard to classical models GCN and GIN on both tasks

  • Splitting in Test/train/val is more equitable than ZINC
  • Relatively better performance for the larger dataset
  • New models have been added to the leaderboard since the report that

show greater performance

slide-13
SLIDE 13

13

MNIST image on a 4 - NN Lattice MNIST image on a 4 - NN with structural edge dropout

Structural augmentation are particular to graphs Cut a random set of edges at a variable rate between 0 and r % of all the edges for every graph during the training

  • 3. Structural Edge Dropout
slide-14
SLIDE 14

14

20 40 60 80 100 20 40 60 80 100 Mde CebNeeaeded4NNace CebNeeaedadb-aededef4NNacea

%feededge %accac

Structural edge dropout

  • The node features are not changed, only the graph is
  • Shows improvement on transferability outside the region of training
slide-15
SLIDE 15

Structural edge dropout - on the benchmarking tasks

15

  • The performance of the ChebNet is improved in every case.
  • Most significantly in the case of the CIFAR dataset
  • Does not work for ZINC -> limitation of the technique
slide-16
SLIDE 16

16

Conclusion

  • The ChebNet provide state of the art performance on

ZINC and CLUSTER of the ‘benchmarking-GNNs’ and good performances for two of OGB’s datasets

  • Supports experimentally the argument that spectral GCNs

have good performance and transferability

  • Structural edge dropout can not only increase the

performance of a spectral GCN but also its transferability

slide-17
SLIDE 17
  • 4. Questions

&

slide-18
SLIDE 18

Benchmarking-GNNs: Result Summary

18