Geometric Deep Learning going beyond Euclidean data Michael - - PowerPoint PPT Presentation

geometric deep learning
SMART_READER_LITE
LIVE PREVIEW

Geometric Deep Learning going beyond Euclidean data Michael - - PowerPoint PPT Presentation

Geometric Deep Learning going beyond Euclidean data Michael Bronstein Imperial College London / Twitter Perceptron ( " ! " ( # ! # + * = sign / 0 1 ( $ ! $ & = ( $)" 1 simplest neural network Rosenblatt 1957


slide-1
SLIDE 1

Geometric

going beyond Euclidean data

Deep Learning

Michael Bronstein

Imperial College London / Twitter

slide-2
SLIDE 2

+

!" !# !$

1 & = ($)" (" (# ($ * = sign /01

Rosenblatt 1957

Perceptron

simplest neural network

slide-3
SLIDE 3

Multilayer perceptron

! " #$ %$

&'( )(

*$ "

+(

+

= *$sign #$!+%$

+

!

two-layer perceptron

slide-4
SLIDE 4

Multilayer perceptron

! " #$ %$

&'( )(

*$ " !

+(

+

= *$sign #$!+%$

+

#2 %2 −*$

+

−*$sign #2!+%2

&'4 )4

slide-5
SLIDE 5

Multilayer perceptron

! " " !

+ + + + +

⋮ ⋮

Cybenko 1989; Hornik 1991

multilayer perceptron can approximate a continuous function to any desired accuracy = universal approximator

slide-6
SLIDE 6

Curse of dimensionality

! " = $

% &

Γ %

&() 2%

Volume of ball inscribed in a unit hypercube

Figure: Vision Dummy

$ 1 2

&

≈ 0.785

2 3$ 1

2

&

≈ 0.52 ≈ 0.0159 ~6 78%

! " dimension "

to approximate a continuous function 9: ℝ% → ℝ with = accuracy

  • ne needs 6 =8% samples
slide-7
SLIDE 7

Computer vision problems

input image input vector

+

⋮ ⋮

slide-8
SLIDE 8

Computer vision problems must learn shift invariance from data!

input image input vector

+

⋮ ⋮

slide-9
SLIDE 9

Take advantage of structure!

Fukushima 1980 LeCun et al. 1989 Hubel, Wiesel 1962; 1981

slide-10
SLIDE 10

Convolutional Neural Networks

  • Take advantage of self-similar structures at different scales
  • Local operations with shared weights
  • Shift-equivariant convolutional filters + pooling = shift invariance
  • ! 1 parameters per filter
  • ! # complexity per layer

LeCun et al. 1989

slide-11
SLIDE 11

Multilayer perceptrons “Universal approximators” CNNs Translation equivariance

LeCun et al. 1989

G-CNNs Group equivariance

Cohen, Welling 2016

Inductive biases

slide-12
SLIDE 12

energy U = ?

slide-13
SLIDE 13

energy U = ?

slide-14
SLIDE 14

Social networks Interaction networks Functional networks Meshes Molecules

slide-15
SLIDE 15
slide-16
SLIDE 16

Graph neural networks Relational inductive biases Graph representation learning

slide-17
SLIDE 17

Manifolds Graphs

“Differential geometry and graph theory […] are insufficiently known in the signal processing community. One of our goals is to provide an accessible overview of these models”

1 2 3 4 5 6 7 8 9 10 11

slide-18
SLIDE 18

Images Graphs § Constant number of neighbors § Fixed ordering of neighbors § Shift invariance § Different number of neighbors § No ordering of neighbors § Permutation invariance

Pooling? Convolution? Efficient computation?

slide-19
SLIDE 19

Convolution

slide-20
SLIDE 20

!" !"#$ !"%$ &'

&' = )

',$!$ + ⋯ + ) ',-!-

dense weights: n2 parameters Fully connected layer

slide-21
SLIDE 21

!" !"#$ !"%$ &'

&" = )

',"%$!"%$ + ) ',"!"+ ) ',"#$!"#$

position-dependent weights: 3n parameters Sparsely connected layer

slide-22
SLIDE 22

!" !"#$ !"%$ &'

&' = )%$!"%$ + )+!"+ )#$!"#$ shared weights: 3 parameters Convolutional layer

slide-23
SLIDE 23

=

! " #

slide-24
SLIDE 24

Convolution = circulant matrix

=

! " # $

slide-25
SLIDE 25

!(#) !(%)

=

!(#) !(%)

circulant matrices commute

slide-26
SLIDE 26

Shift-equivariance

!" #(%)

=

!" #(%)

circulant matrices commute convolution commutes with shift

shift operator shift operator

slide-27
SLIDE 27

Joint diagonalization

!" #(%)

=

!" #(%)

commuting matrices are jointly diagonalizable convolution is diagonalized by eigenvectors of !"

shift operator shift operator

slide-28
SLIDE 28

Eigenvectors of !"

=

$ %∗ ' ()

"

' (*

"

' (+,*

"

… % () (* (+,* … !" $ = ./01

+ )

⋱ ./01

+ +,*

= Fourier transform

(3 = 1 5 1 ./01

+ 3

⋮ ./01

+ +,* 3

Shift operator is diagonalized by the Fourier transform

inverse DFT forward DFT

slide-29
SLIDE 29

Convolution in spectral domain

!(#)

=

&!(#) '∗ ) *+

,

) *-

,

) *./-

,

… ' *+ *- *./- … inverse DFT forward DFT

slide-30
SLIDE 30

Convolution in spectral domain

!(#)

=

&!(#) = diag(+ ,- … + ,/) 0∗ 2 34

5

2 3-

5

2 3/6-

5

… 34 3- 3/6- … inverse DFT forward DFT

Eigenvalues of !(#) are the Fourier transform + # = 0∗#

slide-31
SLIDE 31

! " # " ⋆ # Circulant matrix

Convolution theorem

slide-32
SLIDE 32

Convolution theorem

! " ⋆ $ % &' ⋱ % &) * $ + Element-wise product $ " ⋆ $ IDFT +∗ DFT

slide-33
SLIDE 33

Convolution theorem

!∗ ! # $ !∗ ! Circulant matrix Element-wise product DFT

spatial domain frequency domain

IDFT % $ ⋆ ' ( )* ⋱ ( ),

  • '

' $ ⋆ '

slide-34
SLIDE 34

Key insights

  • Convolution in spatial domain: circulant matrix
  • Local aggregation on adjacent nodes with shared parameters
  • Special structure due to underlying grid
  • All circulant matrices diagonalized by DFT (eigenvectors of shift)
  • Convolution in frequency domain: apply DFT, apply element-wise

product, apply inverse DFT

  • Efficient computation: ! " with small filter

! " log" using FFT

slide-35
SLIDE 35

From grids to graphs

slide-36
SLIDE 36

Graphs

  • Vertices or nodes

! = 1, … , &

  • Edges

ℰ = (, ) ∶ (, ) ∈ ! ⊆ !×! (directed)

slide-37
SLIDE 37

Graphs

  • Vertices or nodes

! = 1, … , &

  • Edges

ℰ = (, ) ∶ (, ) ∈ ! ⊆ !×! (directed) ℰ = (, ) : (, ) ∈ ! ⊆ !×! (undirected)

slide-38
SLIDE 38

Graphs

  • Neighbourhood

! " = $ ∶ ", $ ∈ ℰ

  • Degree

)* = ! "

"

slide-39
SLIDE 39

Attributes

  • Node features

! ∶ # → ℝ& ' = !), … , !,

  • Edge features

. ∶ ℰ → ℝ&0

!1 !2 .12

slide-40
SLIDE 40

Attributes

  • Node features

! ∶ # → ℝ& ' = !), … , !,

  • Edge features

. ∶ ℰ → ℝ&0 particular case 1: ℰ → ℝ3 (weighted graph)

!4 !5 145

slide-41
SLIDE 41

Adjacency matrix

! " # ! " # ! " # $ $ $

slide-42
SLIDE 42

Adjacency matrix

! " # ! " # ! " # $ $ $

slide-43
SLIDE 43

Weighted adjacency matrix

! " # ! " # ! " # $%& $%' ( ( ( $)%

slide-44
SLIDE 44

Adjacency matrix

1 2 3 4 5 6 7 8 9 10 11

+,- = 1 ⇔ 0, 2 ∈ ℰ

adjacency matrix symmetric for undirected graphs

slide-45
SLIDE 45

Graph Laplacian

! " #$%

&' $ = '$ − 1 +$ ,

%∈. $

#$%'% “Local difference operator”

slide-46
SLIDE 46

Graph Laplacian

! " #$%

&' $ = 1 *$ +

%∈- $

#$% '$ − '% (normalized) Laplacian matrix & = /01 / − 2 = 3 − /01 2 Degree matrix / = *1 ⋱ *5

slide-47
SLIDE 47

Dirichlet energy

! " #$%

& ' = 1 *$ +

%∈- $

#$% .$ − .% = trace '6'7 measures how smooth the signal is on the graph

slide-48
SLIDE 48

Laplacian on manifolds and meshes

!" # = 1 &# '

(∈* #

cot.#( + cot0#( 2 "# − "(

Laplace-Beltrami operator Cotangent Laplacian

&# .#( 0#(

Duffin 1959; Pinkal, Polthier 1993; Desbrun et al. 1999 Laplace 1787; Beltrami 1902

3 4 Δ6 = div ∇6 ; ℳ =

>ℳ

slide-49
SLIDE 49

Convolution on graphs?

local aggregation on adjacent nodes with shared parameters

Spatial domain Frequency domain

Graph Fourier transform

slide-50
SLIDE 50

1D grid = ring graph adjacency matrix ! of ring graph = shift operator "#

1 2 3 ' …

Ortega et al. 2017

slide-51
SLIDE 51

From Grid to Graph Fourier Transform

  • Key idea: use eigenvectors of the adjacency matrix ! or graph

Laplacian " as the analogy of Fourier transform

  • The two are equivalent on grids (all circulant matrices are jointly

diagonalizable) but different on graphs

  • Undirected graphs (with symmetric ! and ") have orthogonal

eigenvectors

  • Directed graphs (with asymmetric ! and ") require more

elaborate analysis using Jordan decomposition/generalized eigenvectors

Taubin 1995; Karni, Gotsman 2000; Levy 2006; B et al. 2010-2014 Shuman et al. 2013; Sandryhaila, Moura 2013

slide-52
SLIDE 52

Graph Fourier Transform

First eigenvectors of ring graph Laplacian = classical Fourier basis First eigenvectors of the Minnesota road network Laplacian

slide-53
SLIDE 53

Spectral graph convolution

! ⋆ # = % & '( ⋱ & '* %+! In order to compute convolution ! ⋆ #

  • Graph FT:

, ! = %+!

  • Apply filter:

, ! ∘ & # = & '( ⋱ & '* , !

  • Inverse graph FT:

! ⋆ # = % , ! ∘ & # . /0 . / . /0 +. /2

slide-54
SLIDE 54

Spectral graph convolution: drawbacks

  • Computational complexity at least ! "#

vs ! " or ! "log"

  • Number of parameters ! "

vs ! 1

  • No guarantee of spatial localization
  • Isotropic filters
  • Filters are basis-dependent and do not generalize across graphs
slide-55
SLIDE 55

Basis dependence

Original signal !

slide-56
SLIDE 56

Basis dependence

Filter output ! ⋆ # = % & '( ⋱ & '* %+!

slide-57
SLIDE 57

Basis dependence

Filter output ! ⋆ # = % & '( ⋱ & '* %+!

slide-58
SLIDE 58

Isotropic filters on graphs

1 " 1 2 3 4

Grid Graph

2 3 4 5 6 "

slide-59
SLIDE 59

Isotropic filters on graphs

3 " 1 2 3 4

Grid Graph

5 1 5 4 6

local permutation invariance (no “directions” on graphs)

"

slide-60
SLIDE 60

Isotropic filters on graphs

3 "

Graph

5 1 5 4 6

'( ) = 1 +) ,

  • ∈/ )

0)- () − (-

graph Laplacian is permutation-invariant hence isotropic

slide-61
SLIDE 61

Anisotropic filters on meshes

Mesh Graph

1 " 2 3 4 5 6 3 " 5 1 5 4 6

slide-62
SLIDE 62

Anisotropic filters on meshes

Graph

rotation

6 " 1 2 3 4 5

Mesh

permutation

1 " 2 3 4 5 6

slide-63
SLIDE 63

Anisotropic filters on meshes

Anisotropic spectral filters on meshes

Boscaini et B 2016

slide-64
SLIDE 64

Spectral graph convolution, take 2

! " # = % ! &' ⋱ ! &) %*#

  • Matrix function applied to "
  • Interpret eigenvalues &', … , &) as frequencies and ! & as

spectral transfer function

  • Make ! & parametric with - 1 parameters
  • Make ! & expressible in terms of simple matrix operations, to

avoid explicit computation of %

  • Possible to guarantee stability under graph perturbations
  • Possible to guarantee localization

Levie et B 2019; Levie, Monti et B 2018

slide-65
SLIDE 65

Polynomial filter (ChebNet)

! " = $% + $'"' + ⋯ + $)")

  • Number of learnable parameters * 1

Defferard et al. 2016

slide-66
SLIDE 66

Polynomial filter (ChebNet)

! " = $%& + $("( + ⋯ + $*"*

  • Number of learnable parameters + 1
  • Efficient computation + ℰ ~+ / avoiding graph FT altogether

1 2 3 4 5 6 7 8 9 10 11

+ ℰ non-zeros

Defferard et al. 2016

slide-67
SLIDE 67

Polynomial filter (ChebNet)

! " = $%& + $("( + ⋯ + $*"*

  • Number of learnable parameters + 1
  • Efficient computation + ℰ ~+ / avoiding graph FT altogether
  • Localization to p-hops since "* is localized to p-hops

1 2 3 4 5 6 7 8 9 10 11

Defferard et al. 2016

slide-68
SLIDE 68

Polynomial filter (ChebNet)

! " = $%& + $("( + ⋯ + $*"*

  • Number of learnable parameters + 1
  • Efficient computation + ℰ ~+ / avoiding graph FT altogether
  • Localization to p-hops since "* is localized to p-hops
  • Generalization across graphs (stability under graph perturbation)
  • Can be used with other operators, e.g. 0
  • Can be used with directed graphs

Defferard et al. 2016

slide-69
SLIDE 69

Spatial graph convolution

slide-70
SLIDE 70

1D grid = ring graph

1 2 3

adjacency matrix = Shift operator

$ …

slide-71
SLIDE 71

Convolution, revisited

!(#) %&'&

+

%)')

+

%*'*

= , ' = %&- + %)') + %*'*

Sandryhaila, Moura 2013

slide-72
SLIDE 72

Graph Convolutional Networks (GCN)

Node-wise features

1 2 3 4 5 6 7 8 9 10 11

Node-wise transform.

Kipf, Welling 2016

slide-73
SLIDE 73

Graph Convolutional Networks (GCN)

Node-wise features

1 2 3 4 5 6 7 8 9 10 11

Graph diffusion Node-wise transform.

+ = ReLU 123

Kipf, Welling 2016

slide-74
SLIDE 74

Graph Convolutional Networks (GCN)

Node-wise features

1 2 3 4 5 6 7 8 9 10 11

Graph diffusion Node-wise transform.

+ = softmax 4ReLU 49:

; :<

Kipf, Welling 2016

slide-75
SLIDE 75

Graph Attention Networks (GAT)

! "

# = % &, (, ) &

Monti et B 2017; Veličković et al. 2018

*+ = ,

  • ∈/ +

0+-1- attention score 0+- = exp ξ 1+(, 1-( ) ∑7∈/ + exp ξ 1+(, 17( )

slide-76
SLIDE 76

Message Passing Neural Network (MPNN)

! "

# = % &, (, )

Gilmer et al. 2017 (MPNN); Wang et B 2018 (EdgeConv)

General aggregation function *+ = ,

  • .∈0 +

1 2+, 2., 3+., )

slide-77
SLIDE 77

Images Graphs § Convolution § Local operations (window) § Constant number of neighbours § Fixed ordering of neighbours § Shift equivariance § O(n) complexity § Message passing § Local operations (1-hop) § Different number of neighbours § No ordering of neighbours § Permutation invariance § O(n) complexity

slide-78
SLIDE 78

Pooling

slide-79
SLIDE 79

Graph coarsening

slide-80
SLIDE 80

Graph coarsening

!(#) = ! !(&)

  • Sequence of coarser graphs with adjacency matrices !(#), !(&), …
slide-81
SLIDE 81

Graph coarsening

  • Sequence of coarser graphs with adjacency matrices !(#), !(&), …
  • Pooling of features on collapsed vertices ((#), ((&), …
  • Interleave convolutional / pooling layers
  • Learnable pooling

Ying et al. 2018 (DiffPool)

!(#) = ! !(&) !(*)

slide-82
SLIDE 82

“…we might be witnessing a new field being born.”

slide-83
SLIDE 83

Plot: Pau Rodríguez López

ICLR 2020 submissions keyword statistics

∆ between 2020 and 2019 in %

slide-84
SLIDE 84

“We expect the following years to bring exciting new methods and applications”

slide-85
SLIDE 85

Vosoghi et el. 2018

slide-86
SLIDE 86

Did 'Muslim Migrants' Attack a Catholic Church?

A video of pro-migrant protesters being removed from the Basilica of Saint-Denis in France was shared with the inflammatory and incorrect claim that it shows Muslim immigrants attacking a church.

Monti et B 2019

slide-87
SLIDE 87

Acquired by Twitter in 2019

slide-88
SLIDE 88

Recommender systems and link prediction

Input graph Reconstructed graph Node embedding Graph encoder Graph decoder

slide-89
SLIDE 89

High-energy physics

Jet image: LCH

slide-90
SLIDE 90

LHC: stop pair production

Abdughani et al. 2018

GNN architecture for event graph classification

slide-91
SLIDE 91

LHC: Particle reconstruction

Ju et al. 2019

slide-92
SLIDE 92

IceCube: neutrino detection

slide-93
SLIDE 93

IceCube: neutrino detection

Choma et B 2018

ROC curve comparing different methods for neutrino detection Light deposition for a high-energy single muon in IceCube detector

slide-94
SLIDE 94

LHAASO CR experiment

Jin, Chen, He 2019

ROC of classifying light component from the background Graph-structured LHAASO-KM2A detectors activated by a 500-TeV EAS event (red=EM & blue=muon detectors)

slide-95
SLIDE 95

Astrophysics: Redshift regression

Beck et al 2019

Predicted galaxy redshift from photometric observations using MoNet-style GNN vs groundtruth spectroscopic measurement

slide-96
SLIDE 96

Neutrino detection

Choma et B 2018

ROC curve comparing different methods for neutrino detection Light deposition for a high-energy single muon in IceCube detector

slide-97
SLIDE 97

Computational chemistry and drug design

Experiment

Graph NN

1060 1012 109 106 103

#Candidates

10-2 103 105

Computational cost

Synthesizable molecules

“Computational funnel”

Duvenaud et al. 2015; Gilmer et al. 2017; Jin et al. 2020 DFT Schrödinger Stokes et al. 2020

slide-98
SLIDE 98

Hyperfoods

Veselkov et B 2019

slide-99
SLIDE 99

Hyperfoods

Veselkov et B 2019

slide-100
SLIDE 100

Hyperfoods

Veselkov et B 2019

slide-101
SLIDE 101

Hyperfoods

Veselkov et B 2019

slide-102
SLIDE 102
slide-103
SLIDE 103

Combinatorial drug therapy

Zitnik et al 2018

Drug-Drug Interaction Protein-Protein Interaction Drug-Protein Interaction

slide-104
SLIDE 104

PD-1 PD-L1

Protein science and cancer immunotherapy

slide-105
SLIDE 105
slide-106
SLIDE 106

De novo protein design

Gainza et B 2020

Predicted interface Cancer target

interface score

Protein database Top match based

  • n descriptor

similarity Predicted complex

slide-107
SLIDE 107

Video: FaceShift 2015

3D vision and graphics

slide-108
SLIDE 108

Analysis Synthesis

Images: Faceshift

Shape analysis and synthesis

slide-109
SLIDE 109

Masci et B 2015

Classical (extrinsic) convolution Geometric (intrinsic) convolution

slide-110
SLIDE 110

Deformable shape correspondence

Masci et B 2015; Monti et B 2017; Litany et B 2017; Verma et al. 2017

Correspondence between 3D shapes

slide-111
SLIDE 111

3D generative models

Litany et B 2018; Ranjan et al. 2018; Bouritsas et B 2019; Gong et B 2019; Gong, Basri et B 2020

3D shape calculus in latent space

slide-112
SLIDE 112

3D hand reconstruction from 2D images

Kulon et B 2019; Kulon et B 2020

Mixed 2D/3D encoder/decoder model

slide-113
SLIDE 113

3D hand reconstruction from 2D images

Kulon et B 2019; Kulon et B 2020

Examples of reconstructed 3D hands in the wild

slide-114
SLIDE 114

Video: Ariel AI 2020

slide-115
SLIDE 115

Conclusions

  • Graphs are very general abstractions, useful everywhere
  • Deep learning models capable of accounting for graph structures

(“relational inductive bias”)

  • Cross-disciplinary research
  • Very hot field
  • Several success stories
  • Some first industrial applications
slide-116
SLIDE 116

Open problems

  • Theoretical guarantees
  • Standardized benchmarks
  • Efficient computation and scaling to web-size graphs
  • Higher order structures (e.g. motifs)
  • Dynamic graphs
  • Graph generating models
  • Incorporating problem-specific knowledge
slide-117
SLIDE 117