MixHop: Higher-Order Graph Convolutional Architectures via - - PowerPoint PPT Presentation

mixhop higher order graph convolutional architectures via
SMART_READER_LITE
LIVE PREVIEW

MixHop: Higher-Order Graph Convolutional Architectures via - - PowerPoint PPT Presentation

1 2 MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing Sami Abu-El-Haija 1 , Bryan Perozzi 2 , Amol Kapoor 2 , Nazanin Alipourfard 1 , Kristina Lerman 1 , Hrayr Harutyunyan 1 , Greg Ver Steeg 1 , Aram


slide-1
SLIDE 1

Abu-El-Haija et al, MixHop, ICML’19

Poster #88 Poster #88

Code: http://github.com/samihaija/mixhop Slides: http://sami.haija.org/icml19

MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing

Sami Abu-El-Haija1, Bryan Perozzi2, Amol Kapoor2, Nazanin Alipourfard1, Kristina Lerman1, Hrayr Harutyunyan1, Greg Ver Steeg1, Aram Galstyan1 2 1

slide-2
SLIDE 2

Abu-El-Haija et al, MixHop, ICML’19

Poster #88

Agenda

  • Review Graph Convolutional Networks (GCN)

○ Application Semi-Supervised Node Classification (SSNC) ○ Shortcoming of GCN

  • MixHop: Higher-Order GCN

○ Sparsification

  • MixHop Results on SSNC
slide-3
SLIDE 3

Abu-El-Haija et al, MixHop, ICML’19

Poster #88

Agenda

  • Review Graph Convolutional Networks (GCN)

○ Application Semi-Supervised Node Classification (SSNC) ○ Shortcoming of GCN

  • MixHop: Higher-Order GCN

○ Sparsification

  • MixHop Results on SSNC
slide-4
SLIDE 4

Abu-El-Haija et al, MixHop, ICML’19

Poster #88

Graph Convolutional Network (GCN) [1]

[1] Kipf & Welling, ICLR 2017

slide-5
SLIDE 5

Abu-El-Haija et al, MixHop, ICML’19

Poster #88

Graph Convolutional Network (GCN) [1]

x1 x3 x5 x6 x4 x2

[1] Kipf & Welling, ICLR 2017

slide-6
SLIDE 6

Abu-El-Haija et al, MixHop, ICML’19

Poster #88

Graph Convolutional Network (GCN) [1]

x1 x3 x5 x6 Input Features x4 x2

[1] Kipf & Welling, ICLR 2017

slide-7
SLIDE 7

Abu-El-Haija et al, MixHop, ICML’19

Poster #88

Graph Convolutional Network (GCN) [1]

[1] Kipf & Welling, ICLR 2017

GC Layer 1 x1 x3 x5 x6 Input Features x4 x2

slide-8
SLIDE 8

Abu-El-Haija et al, MixHop, ICML’19

Poster #88

Graph Convolutional Network (GCN) [1]

h1

(1)

h6

(1)

Latent Features h2

(1)

h3

(1)

h4

(1)

h5

(1)

[1] Kipf & Welling, ICLR 2017

GC Layer 1 x1 x3 x5 x6 Input Features x4 x2

slide-9
SLIDE 9

Abu-El-Haija et al, MixHop, ICML’19

Poster #88

[1] Kipf & Welling, ICLR 2017

x1 x3 x5 x6 Input Features x4 x2 h1

(1)

h6

(1)

Latent Features h2

(1)

h3

(1)

h4

(1)

h5

(1)

GC Layer 1

Graph Convolutional Network (GCN) [1]

slide-10
SLIDE 10

Abu-El-Haija et al, MixHop, ICML’19

Poster #88

Graph Convolutional Network (GCN) [1]

GC Layer L h1

(L)

h6

(L)

Output Features h2

(L)

h3

(L)

h4

(L)

h5

(L)

[1] Kipf & Welling, ICLR 2017

x1 x3 x5 x6 Input Features x4 x2

h1

(1)

h6

(1)

Latent Features h2

(1)

h3

(1)

h4

(1)

h5

(1)

GC Layer 1

slide-11
SLIDE 11

Abu-El-Haija et al, MixHop, ICML’19

Poster #88

Graph Convolutional Network (GCN) [1]

h1

(L)

h6

(L)

Output Features h2

(L)

h3

(L)

h4

(L)

h5

(L)

[1] Kipf & Welling, ICLR 2017

y2 y4

Train on semi-supervised node classification:

  • measure Loss on labeled nodes (y4, y2)
slide-12
SLIDE 12

Abu-El-Haija et al, MixHop, ICML’19

Poster #88

Graph Convolutional Network (GCN) [1]

h1

(L)

h6

(L)

Output Features h2

(L)

h3

(L)

h4

(L)

h5

(L)

[1] Kipf & Welling, ICLR 2017

y2 y4

Train on semi-supervised node classification:

  • measure Loss on labeled nodes (y4, y2)
  • Backprop to learn GC layers.

Loss Loss

slide-13
SLIDE 13

Abu-El-Haija et al, MixHop, ICML’19

Poster #88

Graph Convolutional Network (GCN) [1]

GC Layer L h1

(L)

h6

(L)

Output Features h2

(L)

h3

(L)

h4

(L)

h5

(L)

[1] Kipf & Welling, ICLR 2017

y2 Loss y4 Loss SGD

h1

(1)

h6

(1)

Latent Features h2

(1)

h3

(1)

h4

(1)

h5

(1)

x1 x3 x5 x6 Input Features x4 x2 GC Layer 1 update update

slide-14
SLIDE 14

Abu-El-Haija et al, MixHop, ICML’19

Poster #88

Graph Convolutional Network (GCN) [1]

GC Layer L h1

(L)

h6

(L)

h2

(L)

h3

(L)

h4

(L)

h5

(L)

[1] Kipf & Welling, ICLR 2017

y2 y4

h1

(1)

h6

(1)

h2

(1)

h3

(1)

h4

(1)

h5

(1)

x1 x3 x5 x6 x4 x2 GC Layer 1

?

slide-15
SLIDE 15

Abu-El-Haija et al, MixHop, ICML’19

Poster #88

h2

(1)

Graph Convolutional Network (GCN) [1]

[1] Kipf & Welling, ICLR 2017

x1 x3 x5 x6 x4 x2 h1

(1)

h6

(1)

h4

(1)

h5

(1)

h3

(1)

GC Layer 1

slide-16
SLIDE 16

Abu-El-Haija et al, MixHop, ICML’19

Poster #88

h2

(1)

Graph Convolutional Network (GCN) [1]

[1] Kipf & Welling, ICLR 2017

x1 x5 x6 x4 x2 h1

(1)

h6

(1)

h4

(1)

h5

(1)

h3

(1)

GC Layer 1 Avg fc x3

slide-17
SLIDE 17

Abu-El-Haija et al, MixHop, ICML’19

Poster #88

Graph Convolutional Network (GCN) [1]

[1] Kipf & Welling, ICLR 2017

x1 x5 x6 x4 x2 h1

(1)

h6

(1)

h2

(1)

h3

(1)

h4

(1)

h5

(1)

GC Layer 1 Avg fc x3

Tensor Graph

slide-18
SLIDE 18

Abu-El-Haija et al, MixHop, ICML’19

Poster #88

Graph Convolutional Network (GCN) [1]

[1] Kipf & Welling, ICLR 2017

x1 x5 x6 x4 x2 h1

(1)

h6

(1)

h2

(1)

h3

(1)

h4

(1)

h5

(1)

GC Layer 1 Avg fc x3

Tensor Graph

slide-19
SLIDE 19

Abu-El-Haija et al, MixHop, ICML’19

Poster #88

Graph Convolutional Network (GCN) [1]

[1] Kipf & Welling, ICLR 2017

x1 x5 x6 x4 x2 h1

(1)

h6

(1)

h2

(1)

h3

(1)

h4

(1)

h5

(1)

GC Layer 1 Avg fc x3

Tensor Graph

slide-20
SLIDE 20

Abu-El-Haija et al, MixHop, ICML’19

Poster #88

Graph Convolutional Network (GCN) [1]

[1] Kipf & Welling, ICLR 2017

x1 x5 x6 x4 x2 h1

(1)

h6

(1)

h2

(1)

h3

(1)

h4

(1)

h5

(1)

GC Layer 1 Avg fc x3

Tensor Graph

slide-21
SLIDE 21

Abu-El-Haija et al, MixHop, ICML’19

Poster #88

Graph Convolutional Network (GCN) [1]

[1] Kipf & Welling, ICLR 2017

x1 x5 x6 x4 x2 h1

(1)

h6

(1)

h2

(1)

h3

(1)

h4

(1)

h5

(1)

GC Layer 1 Avg fc x3

Tensor Graph

slide-22
SLIDE 22

Abu-El-Haija et al, MixHop, ICML’19

Poster #88

Shortcoming of Vanilla GCN

[1] Kipf & Welling, ICLR 2017

Vanilla GC Layer

slide-23
SLIDE 23

Abu-El-Haija et al, MixHop, ICML’19

Poster #88

Shortcoming of Vanilla GCN

[1] Kipf & Welling, ICLR 2017

😁 fc is shared ⇒ inductive Vanilla GC Layer

slide-24
SLIDE 24

Abu-El-Haija et al, MixHop, ICML’19

Poster #88

Shortcoming of Vanilla GCN

[1] Kipf & Welling, ICLR 2017

😁 fc is shared ⇒ inductive 😣 Appendix Experiments of [1] shows no gains beyond 2 layers Vanilla GC Layer

slide-25
SLIDE 25

Abu-El-Haija et al, MixHop, ICML’19

Poster #88

Shortcoming of Vanilla GCN

[1] Kipf & Welling, ICLR 2017

😁 fc is shared ⇒ inductive 😣 Appendix Experiments of [1] shows no gains beyond 2 layers 😣 cannot mix neighbors from various distances in arbitrary linear combinations Vanilla GC Layer

slide-26
SLIDE 26

Abu-El-Haija et al, MixHop, ICML’19

Poster #88

Shortcoming of Vanilla GCN

[1] Kipf & Welling, ICLR 2017

😁 fc is shared ⇒ inductive 😣 Appendix Experiments of [1] shows no gains beyond 2 layers 😣 cannot mix neighbors from various distances in arbitrary linear combinations e.g. cannot learn Gabor Filters! Vanilla GC Layer

slide-27
SLIDE 27

Abu-El-Haija et al, MixHop, ICML’19

Poster #88

Shortcoming of Vanilla GCN

[1] Kipf & Welling, ICLR 2017

😁 fc is shared ⇒ inductive 😣 Appendix Experiments of [1] shows no gains beyond 2 layers 😣 cannot mix neighbors from various distances in arbitrary linear combinations e.g. cannot learn Gabor Filters! Vanilla GC Layer ?

slide-28
SLIDE 28

Abu-El-Haija et al, MixHop, ICML’19

Poster #88

Detour: Review Gabor Filters

[2] Daugman, Vision Research,1980 [3] Daugman, Journal of the Optical Society of America, 1985 [4] Honglak Lee et al, ICML, 2009 [5] Alex Krizhevsky et al, NeurIPS 2012

Neuroscientists discover their importance in the primate visual cortex [2, 3]:

slide-29
SLIDE 29

Abu-El-Haija et al, MixHop, ICML’19

Poster #88

Detour: Review Gabor Filters

[2] Daugman, Vision Research,1980 [3] Daugman, Journal of the Optical Society of America, 1985 [4] Honglak Lee et al, ICML, 2009 [5] Alex Krizhevsky et al, NeurIPS 2012

Further, they are automatically recovered by training CNNs on images [4, 5] Neuroscientists discover their importance in the primate visual cortex [2, 3]:

slide-30
SLIDE 30

Abu-El-Haija et al, MixHop, ICML’19

Poster #88

Main Motivation

slide-31
SLIDE 31

Abu-El-Haija et al, MixHop, ICML’19

Poster #88

Extend the class of representations realizable by GCNs e.g. to learn Gabor Filters Main Motivation

slide-32
SLIDE 32

Abu-El-Haija et al, MixHop, ICML’19

Poster #88

Agenda

  • Review Graph Convolutional Networks (GCN)

○ Application Semi-Supervised Node Classification (SSNC) ○ Shortcoming of GCN

  • MixHop: Higher-Order GCN

○ Sparsification

  • MixHop Results on SSNC
slide-33
SLIDE 33

Abu-El-Haija et al, MixHop, ICML’19

Poster #88

Our Model: MixHop

Vanilla GC Layer MixHop GC Layer

slide-34
SLIDE 34

Abu-El-Haija et al, MixHop, ICML’19

Poster #88

Our Model: MixHop

Vanilla GC Layer MixHop GC Layer

Couple of code lines implements concatenation

slide-35
SLIDE 35

Abu-El-Haija et al, MixHop, ICML’19

Poster #88

Our Model: MixHop

Vanilla GC Layer MixHop GC Layer

Couple of code lines implements concatenation

slide-36
SLIDE 36

Abu-El-Haija et al, MixHop, ICML’19

Poster #88

Our Model: MixHop

MixHop GC Layer

slide-37
SLIDE 37

Abu-El-Haija et al, MixHop, ICML’19

Poster #88

Our Model: MixHop

MixHop GC Layer 😁 Inductive

slide-38
SLIDE 38

Abu-El-Haija et al, MixHop, ICML’19

Poster #88

Our Model: MixHop

MixHop GC Layer 😁 Inductive 😁 Can incorporate distant nodes

slide-39
SLIDE 39

Abu-El-Haija et al, MixHop, ICML’19

Poster #88

Our Model: MixHop

MixHop GC Layer 😁 Inductive 😁 Can incorporate distant nodes 😁 Can mix neighbors across distances in arbitrary linear combinations

slide-40
SLIDE 40

Abu-El-Haija et al, MixHop, ICML’19

Poster #88

Our Model: MixHop

MixHop GC Layer 😁 Inductive 😁 Can incorporate distant nodes 😁 Can mix neighbors across distances in arbitrary linear combinations i.e. can learn Gabor Filters!

slide-41
SLIDE 41

Abu-El-Haija et al, MixHop, ICML’19

Poster #88

Sparsification

We add group L2-Lasso Regularization to drop-out columns feature matrices, similar to [6]

[6] Gordon et al, CVPR, 2018

[images are rotated space]

slide-42
SLIDE 42

Abu-El-Haija et al, MixHop, ICML’19

Poster #88

Sparsification

We add group L2-Lasso Regularization to drop-out columns feature matrices, similar to [6] 2nd layer of Cora drops-out zeroth-power completely.

[6] Gordon et al, CVPR, 2018

[images are rotated space]

slide-43
SLIDE 43

Abu-El-Haija et al, MixHop, ICML’19

Poster #88

Agenda

  • Review Graph Convolutional Networks (GCN)

○ Application Semi-Supervised Node Classification (SSNC) ○ Shortcoming of GCN

  • MixHop: Higher-Order GCN

○ Sparsification

  • MixHop Results on SSNC
slide-44
SLIDE 44

Abu-El-Haija et al, MixHop, ICML’19

Poster #88

Results on Citation Datasets

slide-45
SLIDE 45

Abu-El-Haija et al, MixHop, ICML’19

Poster #88

Results on (Synthetic) Homophily Datasets

With less homophily, our performance gap increases

slide-46
SLIDE 46

Abu-El-Haija et al, MixHop, ICML’19

Poster #88

Results on (Synthetic) Homophily Datasets

With less homophily, our performance gap increases With less homophily, our method learns more feature differences (i.e. Gabor-like Filters)

slide-47
SLIDE 47

Abu-El-Haija et al, MixHop, ICML’19

Poster #88

References

[1] Kipf & Welling, “Semi-Supervised Classification with Graph Convolutional Networks, ICLR”, 2017 [2] Daugman, “Two-dimensional spectral analysis of cortical receptive field profiles”, Vision Research,1980 [3] Daugman, “Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters”, Journal of the Optical Society of America, 1985 [4] Honglak Lee et al, “Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations”, ICML, 2009 [5] Alex Krizhevsky et al, “ImageNet Classification with Deep Convolutional Neural Networks”, NeurIPS 2012 [6] Gordon et al, “Morphnet: Fast & simple resource-constrained structure learning of deep networks” CVPR 2018

slide-48
SLIDE 48

Abu-El-Haija et al, MixHop, ICML’19

Poster #88

Conclusion

  • With just a couple of lines, Kipf’s model can be extended to incorporate

powers of (normalized) adjacency matrix

  • Allowing it to learn general neighborhood mixing, and its special cases:

Gabor-like Filter and Delta Ops

  • Inspection shows Delta Ops are indeed learned with lower levels of

homophily.

Slides at: http://sami.haija.org/icml19

Thank you for listening! Poster #88