http://cs224w.stanford.edu Output: Node embeddings. We can also - - PowerPoint PPT Presentation
http://cs224w.stanford.edu Output: Node embeddings. We can also - - PowerPoint PPT Presentation
HW2 deadline postponed to next Thu, Oct 31! We are releasing an improved version of the starter code for HW2.Q4 -- keep an eye on Piazza! CS224W: Machine Learning with Graphs Jure Leskovec, JiaxuanYou, Stanford University
10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 2
…
Output: Node embeddings.
We can also embed larger network structures, subgraphs, graphs
¡ Key idea: Generate node embeddings based
- n local network neighborhoods
10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 3
INPUT GRAPH TARGET NODE
B D E F C A B C D A A A C F B E A
¡ Intuition: Nodes aggregate information from
their neighbors using neural networks
10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 4
INPUT GRAPH TARGET NODE
B D E F C A B C D A A A C F B E A
Neural networks
¡ Intuition: Network neighborhood defines a
computation graph
10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 5
Every node defines a computation graph based on its neighborhood!
Key idea: Generate node embeddings based on local network neighborhoods
§ Nodes aggregate “messages” from their neighbors using neural networks
¡ Graph Convolutional Neural Networks:
§ Basic variant: Average neighborhood information and stack neural networks
¡ GraphSAGE:
§ Generalized neighborhood aggregation
10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 6
𝑤 hk−1
u khk−1 v
hk
v
10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 7
…
Output: Vector embeddings
10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 8
…
Output: Graph Structure!
- 1. Problem of Graph Generation
- 2. ML Basics for Graph Generation
- 3. GraphRNN
- 4. Applications and Open Questions
10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 9
¡ We want to generate realistic graphs ¡ What is a good model? ¡ How can we fit the model and
generate the graph using it?
10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 11
Given a large real graph Generate a synthetic graph
¡ Generation – Gives insight into the graph
formation process
¡ Anomaly detection – abnormal behavior,
evolution
¡ Predictions – predicting future from the past ¡ Simulations of novel graph structures ¡ Graph completion – many graphs are partially
- bserved
¡ "What if” scenarios
10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 12
Task 1: Realistic graph generation
¡ Generate graphs that are similar to a given
set of graphs [Focus of this lecture]
Task 2: Goal-directed graph generation
¡ Generate graphs that optimize given
- bjectives/constraints
§ Drug molecule generation/optimization
10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 13
Drug discovery
¡ Discover highly drug-like molecules
10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 14
drug_likeness=0.94
Graph generative model
Drug discovery
¡ Complete an existing molecule to optimize a
desired property
10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 15
Solubility=-5.55 Solubility=-1.78 Complete Improve
Discovering novel structures
10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 16
Train GraphRNN
Grid Community Ego
Network Science
¡ Null models for realistic networks
10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 17
Barabasi_Albert(n=50, m=2) ~ NeuralNet_X(n=50, p=3, q=5) ~
¡ Large and variable output space
§ For 𝑜 nodes we need to generate 𝑜$ values § Graph size (nodes, edges) varies
10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 18
1 1 1 1 1 1 1 1 1 1 1 1
1 2 4 3 5
5 nodes: 25 values 1K nodes: 1M values
¡ Non-unique representations:
§ 𝑜-node graph can be represented in 𝑜! ways § Hard to compute/optimize objective functions (e.g., reconstruction error)
10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 19
1 2 4 3 5
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 4 3 5 2 Same graph Very different representations!
¡ Complex dependencies:
§ Edge formation has long-range dependencies
10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 20
1
Example: Generate a ring graph on 6 nodes:
1 1 1 1 1 Should have edge! 1 1 1 1 1 Shouldn’t have edge!
Existence of an edge may depend on the entire graph!
- 1. Problem of Graph Generation
- 2. ML Basics for Graph Generation
- 3. GraphRNN
- 4. Applications and Open Questions
10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 21
¡ Given: Graphs sampled from 𝑞'()((𝐻) ¡ Goal:
§Learn the distribution 𝑞-.'/0(𝐻) §Sample from 𝑞-.'/0(𝐻)
10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 23
𝑞'()((𝐻) 𝑞-.'/0(𝐻) Learn & Sample
Setup:
¡ Assume we want to learn a generative model
from a set of data points (i.e., graphs) {𝒚3}
§ 𝑞'()((𝒚) is the data distribution, which is never known to us, but we have sampled 𝒚3 ~ 𝑞'()((𝒚) § 𝑞-.'/0(𝒚; 𝜄) is the model, parametrized by 𝜄, that we use to approximate 𝑞'()((𝒚)
¡ Goal:
§ (1) Make 𝑞-.'/0 𝒚; 𝜄 close to 𝑞'()( 𝒚 § (2) Make sure we can sample from 𝑞-.'/0 𝒚; 𝜄
§ We need to generate examples (graphs) from 𝑞-.'/0 𝒚; 𝜄
10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 24
(1) Make 𝒒𝒏𝒑𝒆𝒇𝒎 𝒚; 𝜾 close to 𝒒𝒆𝒃𝒖𝒃 𝒚
¡ Key Principle: Maximum Likelihood
¡ Fundamental approach to modeling distributions
§ Find parameters 𝜄∗, such that for observed data points 𝒚3~𝑞'()(, ∑3 log 𝑞-.'/0 𝒚3; 𝜄∗ has the highest value, among all possible choices of 𝜄
§ That is, find the model that is most likely to have generated the observed data 𝑦
10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 25
(2) Sample from 𝒒𝒏𝒑𝒆𝒇𝒎 𝒚; 𝜾
¡ Goal: Sample from a complex distribution ¡ The most common approach:
§ (1) Sample from a simple noise distribution 𝒜3~𝑂(0,1) § (2) Transform the noise 𝑨3 via 𝑔(⋅) 𝒚3 = 𝑔(𝒜3; 𝜄) Then 𝒚3 follows a complex distribution
¡ Q: How to design 𝑔(⋅)? ¡ A: Use Deep Neural Networks, and train it
using the data we have!
10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 26
10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 27
[Goodfellow, NeurIPS 2016]
Taxonomy of Deep Generative Models This lecture: Auto-regressive models:
An autoregressive (AR) model predicts future behavior based on past behavior.
Auto-regressive models
¡ 𝒒𝒏𝒑𝒆𝒇𝒎 𝒚; 𝜾 is used for both density estimation
and sampling (from the probability density)
§ (other models like Variational Auto Encoders (VAEs), Generative Adversarial Nets (GANs) have 2 or more models, each playing one of the roles)
§ Apply chain rule: Joint distribution is a product of conditional distributions:
𝑞-.'/0 𝒚; 𝜄 = O
)PQ R
𝑞-.'/0(𝑦)|𝑦Q, … , 𝑦)UQ; 𝜄) § E.g., 𝒚 is a vector, 𝑦) is the 𝑢-th dimension; 𝒚 is a sentence, 𝑦) is the 𝑢-th word. § In our case: 𝑦) will be the 𝑢-th action (add node, add edge)
10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 28
- 1. Problem of Graph Generation
- 2. ML Basics for Graph Generation
- 3. GraphRNN
- 4. Applications and Open Questions
10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 29
GraphRNN: Generating Realistic Graphs with Deep Auto-regressive
- Models. J. You, R. Ying, X. Ren, W. L. Hamilton, J. Leskovec.
International Conference on Machine Learning (ICML), 2018.
Generating graphs via sequentially adding nodes and edges
10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 31
[You et al., ICML 2018] 1 1 2 1 2 3 1 2 4 3 1 2 4 3 5 1 2 4 3 5
Graph 𝐻 Generation process 𝑇X
10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 32
1 2 4 3 5
Graph G with node ordering π can be uniquely mapped into a sequence of node and edge additions S\
1 1 2 1 2 3 1 2 4 3 1 2 4 3 5
Graph 𝐻 with node ordering 𝜌: Sequence 𝑇X:
𝑇Q
X
𝑇$
X
𝑇^
X
𝑇_
X
𝑇`
X
( ) 𝑇X = , , , ,
The sequence 𝑇X has two levels (𝑇 is a sequence of sequences):
§ Node-level: add nodes, one at a time § Edge-level: add edges between existing nodes
¡ Node-level: At each step, a new node is added
10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 33
1 1 2 1 2 3 1 2 4 3 1 2 4 3 5 𝑇Q
X
𝑇$
X
𝑇^
X
𝑇_
X
𝑇`
X
( ) 𝑇X = , , , , “Add node 1” “Add node 5”
…
The sequence 𝑇X has two levels:
¡ Each Node-level step is an edge-level sequence ¡ Edge-level: At each step, add a new edge
10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 34
1 2 4 3 𝑇_,Q
X
𝑇_,$
X
𝑇_,^
X
𝑇_
X
( ) 𝑇_
X =
, , “Not connect 4, 1” “Connect 4, 2” “Connect 4, 3”
1 1
¡ Summary: A graph + a node ordering = A
sequence of sequences!
¡ Node ordering is randomly selected (we will
come back to this)
10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 35
1 1 1 1 1 1 1 1 1 1 1 1
1 2 4 3 5 Graph 𝐻 Node-level sequence Edge-level sequence Adjacency matrix
⇔
¡ We have transformed graph generation
problem into a sequence generation problem
¡ Need to model two processes:
§ Generate a state for a new node (Node-level sequence) § Generate edges for the new node based on its state (Edge-level sequence)
¡ Approach: Use RNN to model these processes!
10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 36
¡ GraphRNN has a node-level RNN and an
edge-level RNN
¡ Relationship between the two RNNs:
§ Node-level RNN generates the initial state for edge-level RNN § Edge-level RNN generates edges for the new node, then updates node-level RNN state using generated results
10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 37
10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 38
No Node de-le level l RNN genera rates the in init itia ial l st state f for e edge-le level l RNN Ed Edge-le level l RNN genera rates edges for r the new node, th then update te nod
- de-le
level l RNN state usin ing genera rated result lts
10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 39
No Node de-le level l RNN genera rates the in init itia ial l st state f for e edge-le level l RNN Ed Edge-le level l RNN genera rates edges for r the new node, th then update te Nod
- de-le
level l RNN state usin ing genera rated result lts
Next: How to generate a sequence with RNN?
¡ 𝑡): State of RNN after time 𝑢 ¡ 𝑦): Input to RNN at time 𝑢 ¡ 𝑧): Output of RNN at time 𝑢 ¡ 𝑋, 𝑉, 𝑊: parameter matrices, 𝜏 ⋅ : non-linearity ¡ More expressive cells: GRU, LSTM, etc.
10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 40
RNN cell 𝑦) 𝑡)UQ (1) 𝑡) = 𝜏(𝑋 ⋅ 𝑦) + 𝑉 ⋅ 𝑡)UQ) 𝑧) 𝑡) (2) 𝑧) = 𝑊 ⋅ 𝑡) (1) (2)
¡ Q: How to use RNN to generate sequences? ¡ A: Let 𝑦)iQ = 𝑧)! ¡ Q: How to initialize 𝑡j, 𝑦Q? When to stop generation? ¡ A: Use start/end of sequence token (SOS, EOS)- e.g.,
zero vector
¡ This is good, but this model is deterministic
10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 41
RNN cell 𝑦Q = 𝑇𝑃𝑇 𝑡j = 𝑇𝑃𝑇 𝑧Q 𝑡Q RNN cell 𝑦$= 𝑧Q 𝑧$ 𝑡lUQ RNN cell 𝑦^ = 𝑧$ 𝑧l = 𝐹𝑃𝑇 … 𝑡$
¡ Remember our goal: Use RNN to model
∏oPQ
R
𝑞-.'/0(𝑦)|𝑦Q, … , 𝑦)UQ; 𝜄)
¡ Let 𝑧) = 𝑞-.'/0(𝑦)|𝑦Q, … , 𝑦)UQ; 𝜄) ¡ Then𝑦)iQis a sample from 𝑧): 𝑦)iQ~𝑧)
§ Each step of RNN outputs a probability vector § We then sample from the vector, and feed sample to next step:
10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 42
RNN cell 𝑦Q = 𝑇𝑃𝑇 𝑧Q 𝒕Q RNN cell 𝑧$ 𝑡$ RNN cell 𝑦^~𝑧$ 𝑧^ 𝑡^ … 𝑡j = 𝑇𝑃𝑇 𝑦$~𝑧Q
Suppose we already have trained the model
§ 𝑧) follows Bernoulli distribution (choice of 𝑞-.'/0) § means value 1 has prob. 𝑞, value 0 has prob. 1 − 𝑞
¡ Right now everything is generated by the model ¡ How do we use training data 𝒚𝟐, 𝒚𝟑, … , 𝒚𝒐?
10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 43
RNN cell 𝑦Q = 𝑇𝑃𝑇 𝑡Q RNN cell 𝑦$ ~ 𝑡$ RNN cell 𝑡^ …
0.9
𝑧Q = 𝑡j = 𝑇𝑃𝑇
0.9 0.4
𝑧$ =
0.7
𝑧^ = 𝑦$ = 𝑦^ ~
0.4
𝑦^ =
𝑞
1 1
Training the model
¡ We observe a sequence 𝑧∗ of edges [1,0,…] ¡ Principle: Teacher Forcing -- Replace input
and output by the real sequence
10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 44
RNN cell 𝑦Q = 𝑇𝑃𝑇 𝒕Q RNN cell 𝑡$ RNN cell 𝑡^ 𝑧Q
∗ =
𝑦$ = 𝑦^ =
Compute loss
𝑧$
∗ =
𝑧^
∗ =
𝑡j = 𝑇𝑃𝑇
0.9
𝑧Q =
0.4
𝑧$ =
0.7
𝑧^ =
1 1 1
10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 45
¡ Loss 𝑀 : Binary cross entropy ¡ Minimize:
𝑀 = −[𝑧Q
∗log(𝑧Q) + (1 − 𝑧Q ∗)log(1 − 𝑧Q)]
¡ If 𝑧Q
∗ = 1, we minimize −log(𝑧Q), making 𝑧Q higher
¡ If 𝑧Q
∗ = 0, we minimize −log(1 − 𝑧Q), making 𝑧Q lower
¡ This way, 𝑧Q is fitting the data samples 𝑧Q
∗
¡ Reminder: 𝑧Q is computed by RNN, this loss will adjust
RNN parameters accordingly, using back propagation! 𝑧Q
∗ =
Compute loss
0.9
𝑧Q =
1
12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 46
Node RNN 𝑇𝑃𝑇
1 1 1 1
1 2 3 Observed graph Assuming Node 1 is in the graph Now adding Node 2
12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 47
Node RNN 𝑇𝑃𝑇 Edge RNN 𝑇𝑃𝑇 0.5 Edge RNN
1 1 1 1
1 2 3
1 0.5
Observed graph Edge RNN predicts how Node 2 connects to Node 1
12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 48
Node RNN 𝑇𝑃𝑇 Edge RNN 𝑇𝑃𝑇 0.5 Edge RNN 𝐹𝑃𝑇
1 1 1 1
1 2 3
1 1 0.5
Observed graph Edge RNN gets supervisions from ground truth
12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 49
Node RNN 𝑇𝑃𝑇 Edge RNN 𝑇𝑃𝑇 0.5 Edge RNN 𝐹𝑃𝑇
1 1 1 1
1 2 3
1 Node RNN 1 1 0.5
Observed graph New edges are used to update Node RNN
12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 50
Node RNN 𝑇𝑃𝑇 Edge RNN 𝑇𝑃𝑇 0.5 Edge RNN 𝐹𝑃𝑇
1 1 1 1
1 2 3
1 Node RNN Edge RNN 𝑇𝑃𝑇 0.6 Edge RNN Edge RNN 1 1 0.5 0.4 0.7
Observed graph
1
Edge RNN predicts how Node 3 connects to Node 2
12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 51
Node RNN 𝑇𝑃𝑇 Edge RNN 𝑇𝑃𝑇 0.5 Edge RNN 𝐹𝑃𝑇
1 1 1 1
1 2 3
1 Node RNN Edge RNN 𝑇𝑃𝑇 0.6 Edge RNN Edge RNN 1 1 0.5 0.4 0.7 𝐹𝑃𝑇 1
Observed graph
1
Edge RNN gets supervisions from ground truth
12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 52
Node RNN 𝑇𝑃𝑇 Edge RNN 𝑇𝑃𝑇 0.5 Edge RNN 𝐹𝑃𝑇
1 1 1 1
1 2 3
1 Node RNN Edge RNN Node RNN 1 𝑇𝑃𝑇 0.6 Edge RNN Edge RNN 1 1 0.5 0.4 0.7 𝐹𝑃𝑇 1
Observed graph
1
New edges are used to update Node RNN
12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 53
Node RNN 𝑇𝑃𝑇 Edge RNN 𝑇𝑃𝑇 0.5 Edge RNN 𝐹𝑃𝑇
1 1 1 1
1 2 3
1 Node RNN Edge RNN Node RNN Edge RNN 1 𝑇𝑃𝑇 𝑇𝑃𝑇 0.6 Edge RNN Edge RNN 1 1 0.5 0.4 0.7 𝐹𝑃𝑇 1 0.4 𝐹𝑃𝑇
Observed graph
1
Node 4 doesn’t connect to any nodes, stop generation
12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 54
Node RNN 𝑇𝑃𝑇 Edge RNN 𝑇𝑃𝑇 0.5 Edge RNN 𝐹𝑃𝑇
1 1 1 1
1 2 3
1 Node RNN Edge RNN Node RNN Edge RNN 1 𝑇𝑃𝑇 𝑇𝑃𝑇 0.6 Edge RNN Edge RNN 1 1 0.5 0.4 0.7 𝐹𝑃𝑇 1 0.4 𝐹𝑃𝑇
Observed graph
1
Backprop through time: All gradients are accumulated across time steps
12/6/18 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 55
Node RNN 𝑇𝑃𝑇 Edge RNN 𝑇𝑃𝑇 0.9 Edge RNN
1 1 1 1
1 2 3
1 Node RNN Edge RNN Node RNN Edge RNN 1 𝑇𝑃𝑇 𝑇𝑃𝑇 0.2 Edge RNN Edge RNN 1 0.5 0.8 𝐹𝑃𝑇
Observed graph
1
Replace ground truth by GraphRNN’s own predictions!
~
1
~ ~
0.5 𝐹𝑃𝑇
~
1
~
0.5 𝐹𝑃𝑇
~
Quick Summary of GraphRNN: § Generate a graph by generating a two level sequence § Use RNN to generate the sequences
¡ Next: Making GraphRNN tractable, proper evaluation
10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 56
1 1 1 1 1 1 1 1 1 1 1 1
1 2 4 3 5 Graph 𝐻 No Node de-le level l RNN Ed Edge-le level l RNN
Adjacency matrix
⇔
¡ Any node can connect to any prior node
¡ Too many steps for edge generation
§ Need to generate full adjacency matrix § Complex too-long edge dependencies
10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 57
1 5 4 3 2
Random node ordering: Node 5 may connect to any/all previous nodes
How do we limit this complexity?
“Recipe” to generate the left graph:
- Add node 1
- Add node 2
- Add node 3
- Connect 3 with 1 and 2
- Add node 4
- …
¡ Breadth-First Search node ordering ¡ BFS node ordering:
§ Since Node 4 doesn’t connect to Node 1 § We know all Node 1’s neighbors have already been traversed § Therefore, Node 5 and the following nodes will never connect to node 1 § We only need memory of 2 “steps” rather than n − 1 steps
10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 58
1 2 4 3 5
BFS ordering
“Recipe” to generate the left graph:
- Add node 1
- Add node 2
- Connect 2 with 1
- Add node 3
- Connect 3 with 1
- Add node 4
- Connect 4 with 2 and 3
¡ Breadth-First Search node ordering ¡ Benefits:
§ Reduce possible node orderings
§ From 𝑃(𝑜!) to number of distinct BFS orderings
§ Reduce steps for edge generation
§ Reducing number of previous nodes to look at
10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 59
1 2 4 3 5
BFS ordering
BFS node ordering: Node 5 will never connect to node 1 (only need memory of 2 “steps” rather than 𝑜 − 1 steps)
¡ BFS reduces the number of steps for edge
generation
10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 60
Adjacency matrices
¡ Task: Compare two sets of graphs ¡ Goal: Define similarity metrics for graphs ¡ Challenge: There is no efficient Graph
Isomorphism test that can be applied to any class of graphs!
¡ Solution
§ Visual similarity § Graph statistics similarity
10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 61
How similar?
10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 62
10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 63
10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 64
- 1. Problem of Graph Generation
- 2. ML Basics for Graph Generation
- 3. GraphRNN
- 4. Applications and Open Questions
10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 65
Question: Can we learn a model that can generate valid and realistic molecules with high value of a given chemical property?
10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 66
Model Property
- utput
that optimizes e.g., drug_likeness=0.95 [You et al., NeurIPS 2018]
Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation. J. You, B. Liu, R. Ying, V. Pande, J. Leskovec. Neural Information Processing Systems (NeurIPS), 2018.
Generating graphs that:
¡ Optimize a given objective (High scores)
§ e.g., drug-likeness (black box)
¡ Obey underlying rules (Valid)
§ e.g., chemical valency rules
¡ Are learned from examples (Realistic)
§ e.g., Imitating a molecule graph dataset
10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 67
Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation. J. You, B. Liu, R. Ying, V. Pande, J. Leskovec. Neural Information Processing Systems (NeurIPS), 2018.
Graph Convolutional Policy Network combines graph representation + RL:
¡ Graph Neural Network captures complex
structural information, and enables validity check in each state transition (Valid)
¡ Reinforcement learning optimizes
intermediate/final rewards (High scores)
¡ Adversarial training imitates examples in
given datasets (Realistic)
10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 68
Visualization of GCPN graphs: Generate graphs with high property scores
69 10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu
Visualization of GCPN graphs: Edit given graph for higher property scores
10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 70
St Starting structure Fi Fini nishe hed struc uctur ure
¡ Generating graphs in other domains
§ 3D shapes, point clouds, scene graphs, etc.
¡ Scale up to large graphs
§ Hierarchical action space, allowing high- level action like adding a structure at a time
¡ Other applications: Anomaly detection
§ Use generative models to estimated prob. of real graphs vs. fake graphs
10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 71
- 1. Problem of Graph Generation
- 2. ML Basics for Graph Generation
- 3. GraphRNN
- 4. Applications and Open Questions
10/24/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 72