http://cs224w.stanford.edu Output: Node embeddings. We can also - - PowerPoint PPT Presentation

http cs224w stanford edu
SMART_READER_LITE
LIVE PREVIEW

http://cs224w.stanford.edu Output: Node embeddings. We can also - - PowerPoint PPT Presentation

HW2 deadline postponed to next Thu, Oct 31! We are releasing an improved version of the starter code for HW2.Q4 -- keep an eye on Piazza! CS224W: Machine Learning with Graphs Jure Leskovec, JiaxuanYou, Stanford University


slide-1
SLIDE 1

CS224W: Machine Learning with Graphs Jure Leskovec, JiaxuanYou, Stanford University

http://cs224w.stanford.edu

HW2 deadline postponed to next Thu, Oct 31! We are releasing an improved version of the starter code for HW2.Q4 -- keep an eye on Piazza!

slide-2
SLIDE 2

10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 2

Output: Node embeddings.

We can also embed larger network structures, subgraphs, graphs

slide-3
SLIDE 3

¡ Key idea: Generate node embeddings based

  • n local network neighborhoods

10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 3

INPUT GRAPH TARGET NODE

B D E F C A B C D A A A C F B E A

slide-4
SLIDE 4

¡ Intuition: Nodes aggregate information from

their neighbors using neural networks

10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 4

INPUT GRAPH TARGET NODE

B D E F C A B C D A A A C F B E A

Neural networks

slide-5
SLIDE 5

¡ Intuition: Network neighborhood defines a

computation graph

10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 5

Every node defines a computation graph based on its neighborhood!

slide-6
SLIDE 6

Key idea: Generate node embeddings based on local network neighborhoods

§ Nodes aggregate “messages” from their neighbors using neural networks

¡ Graph Convolutional Neural Networks:

§ Basic variant: Average neighborhood information and stack neural networks

¡ GraphSAGE:

§ Generalized neighborhood aggregation

10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 6

𝑤 hk−1

u khk−1 v

hk

v

slide-7
SLIDE 7

10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 7

Output: Vector embeddings

slide-8
SLIDE 8

10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 8

Output: Graph Structure!

slide-9
SLIDE 9
  • 1. Problem of Graph Generation
  • 2. ML Basics for Graph Generation
  • 3. GraphRNN
  • 4. Applications and Open Questions

10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 9

slide-10
SLIDE 10
slide-11
SLIDE 11

¡ We want to generate realistic graphs ¡ What is a good model? ¡ How can we fit the model and

generate the graph using it?

10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 11

Given a large real graph Generate a synthetic graph

slide-12
SLIDE 12

¡ Generation – Gives insight into the graph

formation process

¡ Anomaly detection – abnormal behavior,

evolution

¡ Predictions – predicting future from the past ¡ Simulations of novel graph structures ¡ Graph completion – many graphs are partially

  • bserved

¡ "What if” scenarios

10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 12

slide-13
SLIDE 13

Task 1: Realistic graph generation

¡ Generate graphs that are similar to a given

set of graphs [Focus of this lecture]

Task 2: Goal-directed graph generation

¡ Generate graphs that optimize given

  • bjectives/constraints

§ Drug molecule generation/optimization

10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 13

slide-14
SLIDE 14

Drug discovery

¡ Discover highly drug-like molecules

10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 14

drug_likeness=0.94

Graph generative model

slide-15
SLIDE 15

Drug discovery

¡ Complete an existing molecule to optimize a

desired property

10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 15

Solubility=-5.55 Solubility=-1.78 Complete Improve

slide-16
SLIDE 16

Discovering novel structures

10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 16

Train GraphRNN

Grid Community Ego

slide-17
SLIDE 17

Network Science

¡ Null models for realistic networks

10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 17

Barabasi_Albert(n=50, m=2) ~ NeuralNet_X(n=50, p=3, q=5) ~

slide-18
SLIDE 18

¡ Large and variable output space

§ For 𝑜 nodes we need to generate 𝑜$ values § Graph size (nodes, edges) varies

10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 18

1 1 1 1 1 1 1 1 1 1 1 1

1 2 4 3 5

5 nodes: 25 values 1K nodes: 1M values

slide-19
SLIDE 19

¡ Non-unique representations:

§ 𝑜-node graph can be represented in 𝑜! ways § Hard to compute/optimize objective functions (e.g., reconstruction error)

10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 19

1 2 4 3 5

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

1 4 3 5 2 Same graph Very different representations!

slide-20
SLIDE 20

¡ Complex dependencies:

§ Edge formation has long-range dependencies

10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 20

1

Example: Generate a ring graph on 6 nodes:

1 1 1 1 1 Should have edge! 1 1 1 1 1 Shouldn’t have edge!

Existence of an edge may depend on the entire graph!

slide-21
SLIDE 21
  • 1. Problem of Graph Generation
  • 2. ML Basics for Graph Generation
  • 3. GraphRNN
  • 4. Applications and Open Questions

10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 21

slide-22
SLIDE 22
slide-23
SLIDE 23

¡ Given: Graphs sampled from 𝑞'()((𝐻) ¡ Goal:

§Learn the distribution 𝑞-.'/0(𝐻) §Sample from 𝑞-.'/0(𝐻)

10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 23

𝑞'()((𝐻) 𝑞-.'/0(𝐻) Learn & Sample

slide-24
SLIDE 24

Setup:

¡ Assume we want to learn a generative model

from a set of data points (i.e., graphs) {𝒚3}

§ 𝑞'()((𝒚) is the data distribution, which is never known to us, but we have sampled 𝒚3 ~ 𝑞'()((𝒚) § 𝑞-.'/0(𝒚; 𝜄) is the model, parametrized by 𝜄, that we use to approximate 𝑞'()((𝒚)

¡ Goal:

§ (1) Make 𝑞-.'/0 𝒚; 𝜄 close to 𝑞'()( 𝒚 § (2) Make sure we can sample from 𝑞-.'/0 𝒚; 𝜄

§ We need to generate examples (graphs) from 𝑞-.'/0 𝒚; 𝜄

10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 24

slide-25
SLIDE 25

(1) Make 𝒒𝒏𝒑𝒆𝒇𝒎 𝒚; 𝜾 close to 𝒒𝒆𝒃𝒖𝒃 𝒚

¡ Key Principle: Maximum Likelihood

¡ Fundamental approach to modeling distributions

§ Find parameters 𝜄∗, such that for observed data points 𝒚3~𝑞'()(, ∑3 log 𝑞-.'/0 𝒚3; 𝜄∗ has the highest value, among all possible choices of 𝜄

§ That is, find the model that is most likely to have generated the observed data 𝑦

10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 25

slide-26
SLIDE 26

(2) Sample from 𝒒𝒏𝒑𝒆𝒇𝒎 𝒚; 𝜾

¡ Goal: Sample from a complex distribution ¡ The most common approach:

§ (1) Sample from a simple noise distribution 𝒜3~𝑂(0,1) § (2) Transform the noise 𝑨3 via 𝑔(⋅) 𝒚3 = 𝑔(𝒜3; 𝜄) Then 𝒚3 follows a complex distribution

¡ Q: How to design 𝑔(⋅)? ¡ A: Use Deep Neural Networks, and train it

using the data we have!

10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 26

slide-27
SLIDE 27

10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 27

[Goodfellow, NeurIPS 2016]

Taxonomy of Deep Generative Models This lecture: Auto-regressive models:

An autoregressive (AR) model predicts future behavior based on past behavior.

slide-28
SLIDE 28

Auto-regressive models

¡ 𝒒𝒏𝒑𝒆𝒇𝒎 𝒚; 𝜾 is used for both density estimation

and sampling (from the probability density)

§ (other models like Variational Auto Encoders (VAEs), Generative Adversarial Nets (GANs) have 2 or more models, each playing one of the roles)

§ Apply chain rule: Joint distribution is a product of conditional distributions:

𝑞-.'/0 𝒚; 𝜄 = O

)PQ R

𝑞-.'/0(𝑦)|𝑦Q, … , 𝑦)UQ; 𝜄) § E.g., 𝒚 is a vector, 𝑦) is the 𝑢-th dimension; 𝒚 is a sentence, 𝑦) is the 𝑢-th word. § In our case: 𝑦) will be the 𝑢-th action (add node, add edge)

10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 28

slide-29
SLIDE 29
  • 1. Problem of Graph Generation
  • 2. ML Basics for Graph Generation
  • 3. GraphRNN
  • 4. Applications and Open Questions

10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 29

slide-30
SLIDE 30

GraphRNN: Generating Realistic Graphs with Deep Auto-regressive

  • Models. J. You, R. Ying, X. Ren, W. L. Hamilton, J. Leskovec.

International Conference on Machine Learning (ICML), 2018.

slide-31
SLIDE 31

Generating graphs via sequentially adding nodes and edges

10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 31

[You et al., ICML 2018] 1 1 2 1 2 3 1 2 4 3 1 2 4 3 5 1 2 4 3 5

Graph 𝐻 Generation process 𝑇X

slide-32
SLIDE 32

10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 32

1 2 4 3 5

Graph G with node ordering π can be uniquely mapped into a sequence of node and edge additions S\

1 1 2 1 2 3 1 2 4 3 1 2 4 3 5

Graph 𝐻 with node ordering 𝜌: Sequence 𝑇X:

𝑇Q

X

𝑇$

X

𝑇^

X

𝑇_

X

𝑇`

X

( ) 𝑇X = , , , ,

slide-33
SLIDE 33

The sequence 𝑇X has two levels (𝑇 is a sequence of sequences):

§ Node-level: add nodes, one at a time § Edge-level: add edges between existing nodes

¡ Node-level: At each step, a new node is added

10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 33

1 1 2 1 2 3 1 2 4 3 1 2 4 3 5 𝑇Q

X

𝑇$

X

𝑇^

X

𝑇_

X

𝑇`

X

( ) 𝑇X = , , , , “Add node 1” “Add node 5”

slide-34
SLIDE 34

The sequence 𝑇X has two levels:

¡ Each Node-level step is an edge-level sequence ¡ Edge-level: At each step, add a new edge

10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 34

1 2 4 3 𝑇_,Q

X

𝑇_,$

X

𝑇_,^

X

𝑇_

X

( ) 𝑇_

X =

, , “Not connect 4, 1” “Connect 4, 2” “Connect 4, 3”

1 1

slide-35
SLIDE 35

¡ Summary: A graph + a node ordering = A

sequence of sequences!

¡ Node ordering is randomly selected (we will

come back to this)

10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 35

1 1 1 1 1 1 1 1 1 1 1 1

1 2 4 3 5 Graph 𝐻 Node-level sequence Edge-level sequence Adjacency matrix

slide-36
SLIDE 36

¡ We have transformed graph generation

problem into a sequence generation problem

¡ Need to model two processes:

§ Generate a state for a new node (Node-level sequence) § Generate edges for the new node based on its state (Edge-level sequence)

¡ Approach: Use RNN to model these processes!

10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 36

slide-37
SLIDE 37

¡ GraphRNN has a node-level RNN and an

edge-level RNN

¡ Relationship between the two RNNs:

§ Node-level RNN generates the initial state for edge-level RNN § Edge-level RNN generates edges for the new node, then updates node-level RNN state using generated results

10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 37

slide-38
SLIDE 38

10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 38

No Node de-le level l RNN genera rates the in init itia ial l st state f for e edge-le level l RNN Ed Edge-le level l RNN genera rates edges for r the new node, th then update te nod

  • de-le

level l RNN state usin ing genera rated result lts

slide-39
SLIDE 39

10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 39

No Node de-le level l RNN genera rates the in init itia ial l st state f for e edge-le level l RNN Ed Edge-le level l RNN genera rates edges for r the new node, th then update te Nod

  • de-le

level l RNN state usin ing genera rated result lts

Next: How to generate a sequence with RNN?

slide-40
SLIDE 40

¡ 𝑡): State of RNN after time 𝑢 ¡ 𝑦): Input to RNN at time 𝑢 ¡ 𝑧): Output of RNN at time 𝑢 ¡ 𝑋, 𝑉, 𝑊: parameter matrices, 𝜏 ⋅ : non-linearity ¡ More expressive cells: GRU, LSTM, etc.

10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 40

RNN cell 𝑦) 𝑡)UQ (1) 𝑡) = 𝜏(𝑋 ⋅ 𝑦) + 𝑉 ⋅ 𝑡)UQ) 𝑧) 𝑡) (2) 𝑧) = 𝑊 ⋅ 𝑡) (1) (2)

slide-41
SLIDE 41

¡ Q: How to use RNN to generate sequences? ¡ A: Let 𝑦)iQ = 𝑧)! ¡ Q: How to initialize 𝑡j, 𝑦Q? When to stop generation? ¡ A: Use start/end of sequence token (SOS, EOS)- e.g.,

zero vector

¡ This is good, but this model is deterministic

10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 41

RNN cell 𝑦Q = 𝑇𝑃𝑇 𝑡j = 𝑇𝑃𝑇 𝑧Q 𝑡Q RNN cell 𝑦$= 𝑧Q 𝑧$ 𝑡lUQ RNN cell 𝑦^ = 𝑧$ 𝑧l = 𝐹𝑃𝑇 … 𝑡$

slide-42
SLIDE 42

¡ Remember our goal: Use RNN to model

∏oPQ

R

𝑞-.'/0(𝑦)|𝑦Q, … , 𝑦)UQ; 𝜄)

¡ Let 𝑧) = 𝑞-.'/0(𝑦)|𝑦Q, … , 𝑦)UQ; 𝜄) ¡ Then𝑦)iQis a sample from 𝑧): 𝑦)iQ~𝑧)

§ Each step of RNN outputs a probability vector § We then sample from the vector, and feed sample to next step:

10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 42

RNN cell 𝑦Q = 𝑇𝑃𝑇 𝑧Q 𝒕Q RNN cell 𝑧$ 𝑡$ RNN cell 𝑦^~𝑧$ 𝑧^ 𝑡^ … 𝑡j = 𝑇𝑃𝑇 𝑦$~𝑧Q

slide-43
SLIDE 43

Suppose we already have trained the model

§ 𝑧) follows Bernoulli distribution (choice of 𝑞-.'/0) § means value 1 has prob. 𝑞, value 0 has prob. 1 − 𝑞

¡ Right now everything is generated by the model ¡ How do we use training data 𝒚𝟐, 𝒚𝟑, … , 𝒚𝒐?

10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 43

RNN cell 𝑦Q = 𝑇𝑃𝑇 𝑡Q RNN cell 𝑦$ ~ 𝑡$ RNN cell 𝑡^ …

0.9

𝑧Q = 𝑡j = 𝑇𝑃𝑇

0.9 0.4

𝑧$ =

0.7

𝑧^ = 𝑦$ = 𝑦^ ~

0.4

𝑦^ =

𝑞

1 1

slide-44
SLIDE 44

Training the model

¡ We observe a sequence 𝑧∗ of edges [1,0,…] ¡ Principle: Teacher Forcing -- Replace input

and output by the real sequence

10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 44

RNN cell 𝑦Q = 𝑇𝑃𝑇 𝒕Q RNN cell 𝑡$ RNN cell 𝑡^ 𝑧Q

∗ =

𝑦$ = 𝑦^ =

Compute loss

𝑧$

∗ =

𝑧^

∗ =

𝑡j = 𝑇𝑃𝑇

0.9

𝑧Q =

0.4

𝑧$ =

0.7

𝑧^ =

1 1 1

slide-45
SLIDE 45

10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 45

¡ Loss 𝑀 : Binary cross entropy ¡ Minimize:

𝑀 = −[𝑧Q

∗log(𝑧Q) + (1 − 𝑧Q ∗)log(1 − 𝑧Q)]

¡ If 𝑧Q

∗ = 1, we minimize −log(𝑧Q), making 𝑧Q higher

¡ If 𝑧Q

∗ = 0, we minimize −log(1 − 𝑧Q), making 𝑧Q lower

¡ This way, 𝑧Q is fitting the data samples 𝑧Q

¡ Reminder: 𝑧Q is computed by RNN, this loss will adjust

RNN parameters accordingly, using back propagation! 𝑧Q

∗ =

Compute loss

0.9

𝑧Q =

1

slide-46
SLIDE 46

12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 46

Node RNN 𝑇𝑃𝑇

1 1 1 1

1 2 3 Observed graph Assuming Node 1 is in the graph Now adding Node 2

slide-47
SLIDE 47

12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 47

Node RNN 𝑇𝑃𝑇 Edge RNN 𝑇𝑃𝑇 0.5 Edge RNN

1 1 1 1

1 2 3

1 0.5

Observed graph Edge RNN predicts how Node 2 connects to Node 1

slide-48
SLIDE 48

12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 48

Node RNN 𝑇𝑃𝑇 Edge RNN 𝑇𝑃𝑇 0.5 Edge RNN 𝐹𝑃𝑇

1 1 1 1

1 2 3

1 1 0.5

Observed graph Edge RNN gets supervisions from ground truth

slide-49
SLIDE 49

12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 49

Node RNN 𝑇𝑃𝑇 Edge RNN 𝑇𝑃𝑇 0.5 Edge RNN 𝐹𝑃𝑇

1 1 1 1

1 2 3

1 Node RNN 1 1 0.5

Observed graph New edges are used to update Node RNN

slide-50
SLIDE 50

12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 50

Node RNN 𝑇𝑃𝑇 Edge RNN 𝑇𝑃𝑇 0.5 Edge RNN 𝐹𝑃𝑇

1 1 1 1

1 2 3

1 Node RNN Edge RNN 𝑇𝑃𝑇 0.6 Edge RNN Edge RNN 1 1 0.5 0.4 0.7

Observed graph

1

Edge RNN predicts how Node 3 connects to Node 2

slide-51
SLIDE 51

12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 51

Node RNN 𝑇𝑃𝑇 Edge RNN 𝑇𝑃𝑇 0.5 Edge RNN 𝐹𝑃𝑇

1 1 1 1

1 2 3

1 Node RNN Edge RNN 𝑇𝑃𝑇 0.6 Edge RNN Edge RNN 1 1 0.5 0.4 0.7 𝐹𝑃𝑇 1

Observed graph

1

Edge RNN gets supervisions from ground truth

slide-52
SLIDE 52

12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 52

Node RNN 𝑇𝑃𝑇 Edge RNN 𝑇𝑃𝑇 0.5 Edge RNN 𝐹𝑃𝑇

1 1 1 1

1 2 3

1 Node RNN Edge RNN Node RNN 1 𝑇𝑃𝑇 0.6 Edge RNN Edge RNN 1 1 0.5 0.4 0.7 𝐹𝑃𝑇 1

Observed graph

1

New edges are used to update Node RNN

slide-53
SLIDE 53

12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 53

Node RNN 𝑇𝑃𝑇 Edge RNN 𝑇𝑃𝑇 0.5 Edge RNN 𝐹𝑃𝑇

1 1 1 1

1 2 3

1 Node RNN Edge RNN Node RNN Edge RNN 1 𝑇𝑃𝑇 𝑇𝑃𝑇 0.6 Edge RNN Edge RNN 1 1 0.5 0.4 0.7 𝐹𝑃𝑇 1 0.4 𝐹𝑃𝑇

Observed graph

1

Node 4 doesn’t connect to any nodes, stop generation

slide-54
SLIDE 54

12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 54

Node RNN 𝑇𝑃𝑇 Edge RNN 𝑇𝑃𝑇 0.5 Edge RNN 𝐹𝑃𝑇

1 1 1 1

1 2 3

1 Node RNN Edge RNN Node RNN Edge RNN 1 𝑇𝑃𝑇 𝑇𝑃𝑇 0.6 Edge RNN Edge RNN 1 1 0.5 0.4 0.7 𝐹𝑃𝑇 1 0.4 𝐹𝑃𝑇

Observed graph

1

Backprop through time: All gradients are accumulated across time steps

slide-55
SLIDE 55

12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 55

Node RNN 𝑇𝑃𝑇 Edge RNN 𝑇𝑃𝑇 0.9 Edge RNN

1 1 1 1

1 2 3

1 Node RNN Edge RNN Node RNN Edge RNN 1 𝑇𝑃𝑇 𝑇𝑃𝑇 0.2 Edge RNN Edge RNN 1 0.5 0.8 𝐹𝑃𝑇

Observed graph

1

Replace ground truth by GraphRNN’s own predictions!

~

1

~ ~

0.5 𝐹𝑃𝑇

~

1

~

0.5 𝐹𝑃𝑇

~

slide-56
SLIDE 56

Quick Summary of GraphRNN: § Generate a graph by generating a two level sequence § Use RNN to generate the sequences

¡ Next: Making GraphRNN tractable, proper evaluation

10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 56

1 1 1 1 1 1 1 1 1 1 1 1

1 2 4 3 5 Graph 𝐻 No Node de-le level l RNN Ed Edge-le level l RNN

Adjacency matrix

slide-57
SLIDE 57

¡ Any node can connect to any prior node

¡ Too many steps for edge generation

§ Need to generate full adjacency matrix § Complex too-long edge dependencies

10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 57

1 5 4 3 2

Random node ordering: Node 5 may connect to any/all previous nodes

How do we limit this complexity?

“Recipe” to generate the left graph:

  • Add node 1
  • Add node 2
  • Add node 3
  • Connect 3 with 1 and 2
  • Add node 4
slide-58
SLIDE 58

¡ Breadth-First Search node ordering ¡ BFS node ordering:

§ Since Node 4 doesn’t connect to Node 1 § We know all Node 1’s neighbors have already been traversed § Therefore, Node 5 and the following nodes will never connect to node 1 § We only need memory of 2 “steps” rather than n − 1 steps

10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 58

1 2 4 3 5

BFS ordering

“Recipe” to generate the left graph:

  • Add node 1
  • Add node 2
  • Connect 2 with 1
  • Add node 3
  • Connect 3 with 1
  • Add node 4
  • Connect 4 with 2 and 3
slide-59
SLIDE 59

¡ Breadth-First Search node ordering ¡ Benefits:

§ Reduce possible node orderings

§ From 𝑃(𝑜!) to number of distinct BFS orderings

§ Reduce steps for edge generation

§ Reducing number of previous nodes to look at

10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 59

1 2 4 3 5

BFS ordering

BFS node ordering: Node 5 will never connect to node 1 (only need memory of 2 “steps” rather than 𝑜 − 1 steps)

slide-60
SLIDE 60

¡ BFS reduces the number of steps for edge

generation

10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 60

Adjacency matrices

slide-61
SLIDE 61

¡ Task: Compare two sets of graphs ¡ Goal: Define similarity metrics for graphs ¡ Challenge: There is no efficient Graph

Isomorphism test that can be applied to any class of graphs!

¡ Solution

§ Visual similarity § Graph statistics similarity

10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 61

How similar?

slide-62
SLIDE 62

10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 62

slide-63
SLIDE 63

10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 63

slide-64
SLIDE 64

10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 64

slide-65
SLIDE 65
  • 1. Problem of Graph Generation
  • 2. ML Basics for Graph Generation
  • 3. GraphRNN
  • 4. Applications and Open Questions

10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 65

slide-66
SLIDE 66

Question: Can we learn a model that can generate valid and realistic molecules with high value of a given chemical property?

10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 66

Model Property

  • utput

that optimizes e.g., drug_likeness=0.95 [You et al., NeurIPS 2018]

Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation. J. You, B. Liu, R. Ying, V. Pande, J. Leskovec. Neural Information Processing Systems (NeurIPS), 2018.

slide-67
SLIDE 67

Generating graphs that:

¡ Optimize a given objective (High scores)

§ e.g., drug-likeness (black box)

¡ Obey underlying rules (Valid)

§ e.g., chemical valency rules

¡ Are learned from examples (Realistic)

§ e.g., Imitating a molecule graph dataset

10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 67

Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation. J. You, B. Liu, R. Ying, V. Pande, J. Leskovec. Neural Information Processing Systems (NeurIPS), 2018.

slide-68
SLIDE 68

Graph Convolutional Policy Network combines graph representation + RL:

¡ Graph Neural Network captures complex

structural information, and enables validity check in each state transition (Valid)

¡ Reinforcement learning optimizes

intermediate/final rewards (High scores)

¡ Adversarial training imitates examples in

given datasets (Realistic)

10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 68

slide-69
SLIDE 69

Visualization of GCPN graphs: Generate graphs with high property scores

69 10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu

slide-70
SLIDE 70

Visualization of GCPN graphs: Edit given graph for higher property scores

10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 70

St Starting structure Fi Fini nishe hed struc uctur ure

slide-71
SLIDE 71

¡ Generating graphs in other domains

§ 3D shapes, point clouds, scene graphs, etc.

¡ Scale up to large graphs

§ Hierarchical action space, allowing high- level action like adding a structure at a time

¡ Other applications: Anomaly detection

§ Use generative models to estimated prob. of real graphs vs. fake graphs

10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 71

slide-72
SLIDE 72
  • 1. Problem of Graph Generation
  • 2. ML Basics for Graph Generation
  • 3. GraphRNN
  • 4. Applications and Open Questions

10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 72