Simplifying Graph Convolutional Networks Amauri Holanda Felix Wu* - - PowerPoint PPT Presentation

simplifying graph convolutional networks
SMART_READER_LITE
LIVE PREVIEW

Simplifying Graph Convolutional Networks Amauri Holanda Felix Wu* - - PowerPoint PPT Presentation

Cornell University Simplifying Graph Convolutional Networks Amauri Holanda Felix Wu* Tianyi Zhang* Christopher Fifty Tao Yu Kilian Q. Weinberger de Souza Jnior* *: Equal contribution Cornell University Graph Convolutional Networks


slide-1
SLIDE 1

Cornell University

Simplifying Graph Convolutional Networks

Christopher Fifty Amauri Holanda de Souza Júnior* Kilian Q. Weinberger Tianyi Zhang* Felix Wu* *: Equal contribution Tao Yu

slide-2
SLIDE 2

Cornell University

Graph Convolutional Networks Applications

  • Social Networks & Citation Networks
  • Applied Chemistry
  • Natural Language Processing
  • Computer Vision

Image source: https://tkipf.github.io/graph-convolutional-networks/images/karate.png

(Kipf and Welling, ICLR 2017)

slide-3
SLIDE 3

Cornell University

Node Classification

  • 1

+1 Feature Value: Class +1: Class -1: Feature Vector:

Input Graph

slide-4
SLIDE 4

Cornell University

Node Classification

  • 1

+1 Feature Value: Class +1: Class -1: Feature Vector:

Input Graph Features: X = [x1, …, xn]⊤ x1 x2 x3 x4 x5 x6 x7

slide-5
SLIDE 5

Cornell University

Node Classification

  • 1

+1 Feature Value: Class +1: Class -1: Feature Vector:

Input Graph Features: X = [x1, …, xn]⊤ x1 x2 x3 x4 x5 x6 x7 (Partially labelled)

slide-6
SLIDE 6

Cornell University

Node Classification

  • 1

+1 Feature Value: Class +1: Class -1: Feature Vector:

Input Graph Features: X = [x1, …, xn]⊤ x1 x2 x3 x4 x5 x6 x7 (Partially labelled)

slide-7
SLIDE 7

Cornell University Input Graph

x1

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x2

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x3

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x4

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x5

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x6

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x7

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

H(0) = X = [x1, . . . , xn]>

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

Graph Convolutional Network (GCN)

  • 1

+1 Feature Value: Class +1: Class -1: Feature Vector:

slide-8
SLIDE 8

Cornell University Input Graph

x1

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x2

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x3

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x4

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x5

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x6

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x7

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

H(0) = X = [x1, . . . , xn]>

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

Graph Convolutional Network (GCN)

  • 1

+1 Feature Value: Class +1: Class -1: Feature Vector:

¯ H(k) ← SH(k−1)

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

Feature Propagation

slide-9
SLIDE 9

Cornell University Input Graph

x1

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x2

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x3

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x4

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x5

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x6

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x7

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

H(0) = X = [x1, . . . , xn]>

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

Graph Convolutional Network (GCN)

  • 1

+1 Feature Value: Class +1: Class -1: Feature Vector:

¯ H(k) ← SH(k−1)

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

Feature Propagation

Linear Transformation

¯ H(k) ← ¯ H(k)Θ(k)

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>
slide-10
SLIDE 10

Cornell University Input Graph

x1

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x2

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x3

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x4

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x5

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x6

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x7

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

H(0) = X = [x1, . . . , xn]>

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

Graph Convolutional Network (GCN)

  • 1

+1 Feature Value: Class +1: Class -1: Feature Vector:

¯ H(k) ← SH(k−1)

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

Feature Propagation

Linear Transformation ¯ H(k) ← ¯ H(k)Θ(k)

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

Nonlinearity

H(k) ← ReLU( ¯ H(k))

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>
slide-11
SLIDE 11

Cornell University Input Graph

x1

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x2

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x3

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x4

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x5

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x6

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x7

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

H(0) = X = [x1, . . . , xn]>

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

Graph Convolutional Network (GCN)

  • 1

+1 Feature Value: Class +1: Class -1: Feature Vector:

¯ H(k) ← SH(k−1)

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

Feature Propagation

Linear Transformation ¯ H(k) ← ¯ H(k)Θ(k)

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

Nonlinearity

H(k) ← ReLU( ¯ H(k))

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

× (K − 1)

slide-12
SLIDE 12

Cornell University

Graph Convolutional Network (GCN)

  • 1

+1 Feature Value: Class +1: Class -1: Feature Vector:

Nonlinearity Linear Transformation

GCN

¯ H(k) ← SH(k−1)

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

¯ H(k) ← ¯ H(k)Θ(k)

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

Predictions

ˆ YGCN = softmax(SH(K−1)Θ(K))

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

×(K − 1)

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

Feature Propagation

Input Graph

x1

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x2

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x3

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x4

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x5

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x6

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x7

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

H(0) = X = [x1, . . . , xn]>

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

H(k) ← ReLU( ¯ H(k))

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>
slide-13
SLIDE 13

Cornell University

Graph Convolutional Network (GCN)

  • 1

+1 Feature Value: Class +1: Class -1: Feature Vector:

Nonlinearity Linear Transformation

GCN

¯ H(k) ← SH(k−1)

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

¯ H(k) ← ¯ H(k)Θ(k)

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

Predictions

ˆ YGCN = softmax(SH(K−1)Θ(K))

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

×(K − 1)

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

Feature Propagation

Input Graph

x1

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x2

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x3

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x4

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x5

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x6

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x7

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

H(0) = X = [x1, . . . , xn]>

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

H(k) ← ReLU( ¯ H(k))

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

Is the nonlinearity necessary?

slide-14
SLIDE 14

Cornell University

Graph Convolutional Network (GCN)

  • 1

+1 Feature Value: Class +1: Class -1: Feature Vector:

Nonlinearity Linear Transformation

GCN

¯ H(k) ← SH(k−1)

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

¯ H(k) ← ¯ H(k)Θ(k)

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

Predictions

ˆ YGCN = softmax(SH(K−1)Θ(K))

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

×(K − 1)

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

Feature Propagation

Input Graph

x1

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x2

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x3

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x4

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x5

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x6

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x7

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

H(0) = X = [x1, . . . , xn]>

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

H(k) ← ReLU( ¯ H(k))

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

Is the nonlinearity necessary?

̂ Y = softmax (S…SSXΘ(1)Θ(2)…Θ(K))

slide-15
SLIDE 15

Cornell University

Graph Convolutional Network (GCN)

  • 1

+1 Feature Value: Class +1: Class -1: Feature Vector:

Nonlinearity Linear Transformation

GCN

¯ H(k) ← SH(k−1)

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

¯ H(k) ← ¯ H(k)Θ(k)

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

Predictions

ˆ YGCN = softmax(SH(K−1)Θ(K))

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

×(K − 1)

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

Feature Propagation

Input Graph

x1

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x2

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x3

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x4

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x5

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x6

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x7

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

H(0) = X = [x1, . . . , xn]>

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

H(k) ← ReLU( ¯ H(k))

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

Is the nonlinearity necessary?

̂ Y = softmax (S…SSXΘ(1)Θ(2)…Θ(K))

slide-16
SLIDE 16

Cornell University

Simple Graph Convolution (SGC)

Logistic Regression

ˆ YSGC = softmax ¯ XΘ

  • <latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

Predictions

K-step Feature Propagation ¯ X ← SKX

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

SGC

Input Graph

x1

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x2

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x3

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x4

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x5

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x6

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x7

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

X = [x1, . . . , xn]>

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>
  • 1

+1 Feature Value: Class +1: Class -1: Feature Vector:

slide-17
SLIDE 17

Cornell University

Simple Graph Convolution (SGC)

Logistic Regression

ˆ YSGC = softmax ¯ XΘ

  • <latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

Predictions

K-step Feature Propagation ¯ X ← SKX

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

SGC

Input Graph

x1

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x2

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x3

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x4

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x5

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x6

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

x7

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>

X = [x1, . . . , xn]>

<latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit><latexit sha1_base64="(nul)">(nul)</latexit>
  • 1

+1 Feature Value: Class +1: Class -1: Feature Vector:

1.Interpretability 2.Mini-batch training on large graphs 3.Second-order opimization methods

slide-18
SLIDE 18

Cornell University

Performance vs. Training Time on Reddit Dataset

100 101 102 Relative Training Time 92 93 94 95 96 97 Test F1 (%) SGC 1x SAGE-mean 29x SAGE-GCN 32x FastGCN 100x SAGE-LSTM 180x DGI GaAN

Reddit

(Hamilton et al., NeurIPS 2017) Reddit dataset contains 233K nodes and 11.6M edges Within 2.7 seconds.

slide-19
SLIDE 19

Cornell University

Comparable or Better Results

  • SGC performs on par with or better than GCN across 5 tasks including 14 datasets.
  • Graph classification on citation and social networks: Cora, Citeseer, Pubmed, and Reddit datasets
  • Text classification: 20NG, R8, R52, Ohsumed, MR
  • Semi-supervised user geolocation: GEOTEXT, TWITTER-US, TWITTER-WORLD
  • Relation extraction: TACRED
  • Zero-shot image classification: ImageNet
slide-20
SLIDE 20

Cornell University

Comparable or Better Results

  • SGC performs on par with or better than GCN across 5 tasks including 14 datasets.
  • Graph classification on citation and social networks: Cora, Citeseer, Pubmed, and Reddit datasets
  • Text classification: 20NG, R8, R52, Ohsumed, MR
  • Semi-supervised user geolocation: GEOTEXT, TWITTER-US, TWITTER-WORLD
  • Relation extraction: TACRED
  • Zero-shot image classification: ImageNet

But not on graph classification task! (Nonlinearity is still beneficial on this task)

slide-21
SLIDE 21

Cornell University

Thank You!

  • A linear model (SGC) is sufficient on many graph tasks.
  • Our official code is available at: https://github.com/Tiiiger/SGC
  • We thank Deep Graph Library, PyTorch Geometric, Spektral, and

StellarGraph for including SGC in their library.

  • Please feel free to come by our poster section if you have questions:

Tonight 06:30~09:00 PM @ Pacific Ballroom #267