Graph Neural Networks Prof. Srijan Kumar - - PowerPoint PPT Presentation

graph neural networks
SMART_READER_LITE
LIVE PREVIEW

Graph Neural Networks Prof. Srijan Kumar - - PowerPoint PPT Presentation

CSE 6240: Web Search and Text Mining. Spring 2020 Graph Neural Networks Prof. Srijan Kumar http://cc.gatech.edu/~srijan 1 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining Todays Lecture Introduction to deep


slide-1
SLIDE 1

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

1

CSE 6240: Web Search and Text Mining. Spring 2020

Graph Neural Networks

  • Prof. Srijan Kumar

http://cc.gatech.edu/~srijan

slide-2
SLIDE 2

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

2

Today’s Lecture

  • Introduction to deep graph embeddings
  • Graph convolution networks
  • GraphSAGE
slide-3
SLIDE 3

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

3

Goal: Node Embeddings

similarity(u, v) ≈ z>

v zu

Goal: Need to define!

Input network d-dimensional embedding space

slide-4
SLIDE 4

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

4

Deep Graph Encoders

  • Encoder: Map a node to a low-dimensional

vector:

  • Deep encoder methods based on graph

neural networks:

  • Graph encoders idea is

inspired by CNN on images

enc(v) = zv

enc(v) =

multiple layers of non-linear transformations

  • f graph structure

(Animation Vincent Dumoul

Image Graph

slide-5
SLIDE 5

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

5

Idea from Convolutional Networks

  • In CNN, pixel representation is created by

transforming neighboring pixel representation

– In GNN, node representations are created by transforming neighboring node representation

  • But graphs are irregular, unlike images

– So, generalize convolutions beyond simple lattices, and leverage node features/attributes

  • Solution: deep graph encoders
slide-6
SLIDE 6

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

6

Deep Graph Encoders

Output: Node embeddings, embed larger network structures, subgraphs, graphs

  • Once an encoder is defined, multiple

layers of encoders can be stacked

slide-7
SLIDE 7

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

7

Graph Encoder: A Naïve Approach

  • Join adjacency matrix and features
  • Feed them into a deep neural network:
  • Issues with this idea:

– 𝑃(|𝑊|) parameters – Not applicable to graphs of different sizes – Not invariant to node ordering

A B C D E A B C D E 0 1 1 1 0 1 0 1 0 0 1 1 0 0 1 0 0 1 0 0 1 1 1 1 0 1 1 1 0 1 0 1 0 1 0 Feat

  • Done?

?

A C B D E

slide-8
SLIDE 8

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

8

Graph Encoders: Two Instantiations

  • 1. Graph convolution networks (GCN): one
  • f the first frameworks to learn node

embeddings in an end-to-end manner

– Different from random walk methods, which are not end-to-end

  • 2. GraphSAGE: generalized GCNs to various

neighborhood aggregations

slide-9
SLIDE 9

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

9

Today’s Lecture

  • Introduction to deep graph embeddings
  • Graph convolution networks (GCN)
  • GraphSAGE

Main paper: “Semi-Supervised Classification with Graph Convolutional Networks”, Kipf and Welling, ICLR 2017

slide-10
SLIDE 10

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

10

Content

  • Local network neighborhoods:

– Describe aggregation strategies – Define computation graphs

  • Stacking multiple layers:

– Describe the model, parameters, training – How to fit the model? – Simple example for unsupervised and supervised training

slide-11
SLIDE 11

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

11

Setup

  • Assume we have a graph 𝐻:

– 𝑊 is the vertex set – 𝑩 is the adjacency matrix (assume binary) – 𝒀 ∈ ℝ+×|-| is a matrix of node features

– Social networks: User profile, User image – Biological networks: Gene expression profiles – If there are no features, use: » Indicator vectors (one-hot encoding of a node) » Vector of constant 1: [1, 1, …, 1]

slide-12
SLIDE 12

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

12

Graph Convolutional Networks

  • Idea: Generate node embeddings based on

local network neighborhoods

– A node’s neighborhood defines its computation graph

  • Learn how to aggregate information from

the neighborhood to learn node embeddings

– Transform information from the neighbors and combine it:

  • Transform “messages” ℎ/ from neighbors: 𝑋

/ ℎ/

slide-13
SLIDE 13

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

13

Idea: Aggregate Neighbors

  • Intuition: Generate node embeddings based
  • n local network neighborhoods
  • Nodes aggregate information from their

neighbors using neural networks

INPUT GRAPH TARGET NODE

B D E F C A B C D A A A C F B E A

Neural networks

slide-14
SLIDE 14

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

14

Idea: Aggregate Neighbors

  • Intuition: Network neighborhood defines a

computation graph

Every node defines a computation graph based on its neighborhood

slide-15
SLIDE 15

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

15

Deep Model: Many Layers

  • Model can be of arbitrary depth:

– Nodes have embeddings at each layer – Layer-0 embedding of node 𝒗 is its input feature, 𝒚𝒗 – Layer-K embedding gets information from nodes

that are atmost K hops away

INPUT GRAPH TARGET NODE

B D E F C A B C D A A A C F B E A

xA xB xC xE xF xA xA

Layer-2 Layer-1 Layer-0

slide-16
SLIDE 16

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

16

Neighborhood Aggregation

  • Neighborhood aggregation: Key

distinctions are in how different approaches aggregate information across the layers

INPUT GRAPH TARGET NODE

B D E F C A B C D A A A C F B E A

? ? ? ? What is in the box?

slide-17
SLIDE 17

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

17

Neighborhood Aggregation

  • Basic approach: Average information from

neighbors and apply a neural network

INPUT GRAPH TARGET NODE

B D E F C A B C D A A A C F B E A

(1) average messages from neighbors (2) apply neural network

slide-18
SLIDE 18

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

18

The Math: Deep Encoder

  • Basic approach: Average neighbor

messages and apply a neural network

– Note: Apply L2 normalization for each node embedding at

every layer Average of neighbor’s previous layer embeddings Initial 0-th layer embeddings are equal to node features Embedding after K layers of neighborhood aggregation Non-linearity (e.g., ReLU) Previous layer embedding of v

h0

v = xv

hk

v = σ

@Wk X

u∈N(v)

hk−1

u

|N(v)| + Bkhk−1

v

1 A , ∀k ∈ {1, ..., K} zv = hK

v

slide-19
SLIDE 19

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

19

GCN: Matrix Form

  • H(l) is the representation in lth layer
  • W0

(l) and W1 (l) are matrices to be learned for

each layer

  • A = adjacency matrix, D = diagonal degree

matrix

  • GCN rewritten in vector form:
slide-20
SLIDE 20

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

20

𝒜5

Training the Model

  • How do we train the model?

– Need to define a loss function on the embeddings

slide-21
SLIDE 21

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

21

Model Parameters

  • We can feed these embeddings into any

loss function and run stochastic gradient descent to train the weight parameters

– Once we have the weight matrices, we can calculate the node embeddings

Trainable weight matrices (i.e., what we learn)

h0

v = xv

hk

v = σ

@Wk X

u∈N(v)

hk−1

u

|N(v)| + Bkhk−1

v

1 A , ∀k ∈ {1, ..., K} zv = hK

v

slide-22
SLIDE 22

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

22

Unsupervised Training

  • Training can be unsupervised or supervised
  • Unsupervised training:

– Use only the graph structure: “Similar” nodes have similar embeddings – Common unsupervised loss function = edge existence

  • Unsupervised loss function can be anything

from the last section, e.g., a loss based on

– Node proximity in the graph – Random walks

slide-23
SLIDE 23

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

23

Supervised Training

  • Train the model for a

supervised task (e.g., node classification)

  • Two ways:

– Total loss = supervised loss – Total loss = supervised loss + unsupervised loss

E.g., Normal or anomalous node?

slide-24
SLIDE 24

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

24

Model Design: Overview

(1) Define a neighborhood aggregation function (2) Define a loss function on the embeddings

𝒜5

slide-25
SLIDE 25

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

25

Model Design: Overview

(3) Train on a set of nodes, i.e., a batch of compute graphs

slide-26
SLIDE 26

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

26

Model Design: Overview

(4) Generate embeddings for nodes as needed Even for nodes we never trained on!

slide-27
SLIDE 27

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

27

GCN: Inductive Capability

  • The same aggregation parameters are

shared for all nodes:

– The number of model parameters is sublinear in |𝑊| and we can generalize to unseen nodes

INPUT GRAPH

B D E F C A

Compute graph for node A Compute graph for node B shared parameters shared parameters

Wk Bk

slide-28
SLIDE 28

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

28

Inductive Capability: New Nodes

Train with snapshot New node arrives Generate embedding for new node

zu

  • Many application settings constantly encounter

previously unseen nodes:

  • E.g., Reddit, YouTube, Google Scholar
  • Need to generate new embeddings “on the fly”
slide-29
SLIDE 29

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

29

Inductive Capability: New Graphs

Inductive node embedding Generalize to entirely unseen graphs E.g., train on protein interaction graph from model organism A and generate embeddings on newly collected data about organism B

Train on one graph Generalize to new graph

zu

slide-30
SLIDE 30

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

30

Summary So Far

  • Recap: Generate node embeddings by

aggregating neighborhood information

– We saw a basic variant of this idea – Key distinctions are in how different approaches aggregate information across the layers

  • Next: GraphSAGE graph neural network

architecture

slide-31
SLIDE 31

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

31

Today’s Lecture

  • Introduction to deep graph embeddings
  • Graph convolution networks
  • GraphSAGE
  • Main paper: Inductive Representation

Learning on Large Graphs. William L. Hamilton, Rex Ying, Jure Leskovec. NeurIPS 2017.

slide-32
SLIDE 32

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

32

GraphSAGE Idea

  • In GCN, we aggregated the neighbors’

messages as the (weighted) average of all

  • neighbors. How can we generalize this?

INPUT GRAPH TARGET NODE

B D E F C A B C D A A A C F B E A

? ? ? ?

[Hamilton et al., NIPS 2017]

slide-33
SLIDE 33

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

33

GraphSAGE Idea

INPUT GRAPH TARGET NODE

B D E F C A B C D A A A C F B E A

hk

v = σ

⇥ Ak · agg({hk−1

u

, ∀u ∈ N(v)}), Bkhk−1

v

Any differentiable function that maps set of vectors in 𝑂(𝑣) to a single vector

slide-34
SLIDE 34

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

34

Neighborhood Aggregation

  • Simple neighborhood aggregation:
  • GraphSAGE:

Concatenate neighbor embedding and self embedding

hk

v = σ

⇥ Wk · agg

  • {hk−1

u

, ∀u ∈ N(v)}

  • , Bkhk−1

v

hk

v = σ

@Wk X

u∈N(v)

hk−1

u

|N(v)| + Bkhk−1

v

1 A

Generalized aggregation

slide-35
SLIDE 35

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

35

Neighbor Aggregation: Variants

  • Mean: Take a weighted average of

neighbors

  • Pool: Transform neighbor vectors and apply

symmetric vector function

  • LSTM: Apply LSTM to reshuffled of

neighbors

agg = X

u∈N(v)

hk−1

u

|N(v)|

Element-wise mean/max

agg = γ

  • {Qhk−1

u

, ∀u ∈ N(v)}

  • agg = LSTM
  • [hk−1

u

, ∀u ∈ π(N(v))]

slide-36
SLIDE 36

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

36

Experiments: Dataset

  • Dynamic datasets:

– Citation Network: Predict paper category

  • Data from 2000-2005
  • 302,424 nodes
  • Train: data till 2004, test: 2005 data

– Reddit Post Network: Predict subreddit of post

  • Nodes = posts
  • Edges between posts if common users comment on

the post

  • 232,965 posts
  • Train: 20 days of data, test: next 10 days of data
slide-37
SLIDE 37

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

37

Experiments: Results

slide-38
SLIDE 38

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

38

Summary: GCN and GraphSAGE

  • Key idea: Generate node embeddings based
  • n local neighborhoods

– Nodes aggregate “messages” from their neighbors

using neural networks

  • Graph convolutional networks:

– Basic variant: Average neighborhood information

and stack neural networks

  • GraphSAGE:

– Generalized neighborhood aggregation

slide-39
SLIDE 39

Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

39

Today’s Lecture

  • Introduction to deep graph embeddings
  • Graph convolution networks
  • GraphSAGE