CS224W: Machine Learning with Graphs Jure Leskovec, Weihua Hu, Stanford University
http://cs224w.stanford.edu Output: Node embeddings. We can also - - PowerPoint PPT Presentation
http://cs224w.stanford.edu Output: Node embeddings. We can also - - PowerPoint PPT Presentation
CS224W: Machine Learning with Graphs Jure Leskovec, Weihua Hu, Stanford University http://cs224w.stanford.edu Output: Node embeddings. We can also embed larger network structures, subgraphs, graphs. 12/3/19 Jure Leskovec, Stanford CS224W:
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 2
…
Output: Node embeddings.
We can also embed larger network structures, subgraphs, graphs.
¡ Key idea: Generate node embeddings based
- n local network neighborhoods
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 3
INPUT GRAPH TARGET NODE
B D E F C A B C D A A A C F B E A
¡ Intuition: Nodes aggregate information from
their neighbors using neural networks
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 4
INPUT GRAPH TARGET NODE
B D E F C A B C D A A A C F B E A
Neural networks
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 5
Scarselli et al., 2009b; Battaglia et al., 2016; Defferrard et al., 2016; Duvenaud et al., 2015; Hamilton et al., 2017a; Kearnes et al., 2016; Kipf & Welling, 2017; Lei et al., 2017; Li et al., 2016; Velickovic et al., 2018; Verma & Zhang, 2018; Ying et al., 2018; Zhang et al., 2018
5
INPUT GRAPH TARGET NODE
B D E F C A B C D A A A C F B E A
What’s inside the box?
¡ Many model variants have been proposed
with difference choice of neural networks.
? ? ? ?
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 6
Scarselli et al., 2009b; Battaglia et al., 2016; Defferrard et al., 2016; Duvenaud et al., 2015; Hamilton et al., 2017a; Kearnes et al., 2016; Kipf & Welling, 2017; Lei et al., 2017; Li et al., 2016; Velickovic et al., 2018; Verma & Zhang, 2018; Ying et al., 2018; Zhang et al., 2018
6
INPUT GRAPH TARGET NODE
B D E F C A B C D A A A C F B E A
¡ Many model variants have been proposed
with difference choice of neural networks.
Graph Convolutional Networks ks [Kipf & Welling ICLR'2017]
Li Line near + Re ReLU
Linear + + Re ReLU Me Mean Me Mean
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 7
Scarselli et al., 2009b; Battaglia et al., 2016; Defferrard et al., 2016; Duvenaud et al., 2015; Hamilton et al., 2017a; Kearnes et al., 2016; Kipf & Welling, 2017; Lei et al., 2017; Li et al., 2016; Velickovic et al., 2018; Verma & Zhang, 2018; Ying et al., 2018; Zhang et al., 2018
7
INPUT GRAPH TARGET NODE
B D E F C A B C D A A A C F B E A
¡ Many model variants have been proposed
with difference choice of neural networks.
Gr Graph phSAGE GE [Hamilton+ NeurIPS’2017]
Ma Max ML MLP Ma Max ML MLP
¡ Intuition: Network neighborhood defines a
computation graph
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 8
Every node defines a computation graph based on its neighborhood!
¡ Obtain node representation by neighbor
aggregation
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 9 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 9
¡ Obtain graph representation by pooling node
representation
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 10 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 10
Pool
- ol (e.g., Sum, Average)
Graph Neural Networks have achieved state-of- the-art performance on:
¡ Node classification [Kipf+ ICLR’2017] ¡ Graph Classification [Ying+ NeurIPS’2018] ¡ Link Prediction [Zhang+ NeurIPS’2018]
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 11
Graph Neural Networks have achieved state-of- the-art performance on
¡ Node classification [Kipf+ ICLR’2017] ¡ Graph Classification [Ying+ NeurIPS’2018] ¡ Link Prediction [Zhang+ NeurIPS’2018]
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 12
Are GNNs perfect? What are the limitations of GNNs?
¡ Some simple graph structures cannot be
distinguished by conventional GNNs.
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 13
GCN and GraphSAGE fail to distinguish the two graphs. Assume: Input node features are uniform (denoted by the same node color)
¡ Some simple graph structures cannot be
distinguished by conventional GNNs.
¡ GNNs are not robust to noise in graph data.
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 14
GCN and GraphSAGE fail to distinguish the two graphs. Assume: Input node features are uniform (denoted by the same node color)
1 1 1 1 1 1 1 1 1
?
Graph GNNs
1
Noise in graph
- 1. Node feature
perturbation
- 2. Edge
addition/deletion
Class 1 Class 2 Class 3
Class prediction
- 1. Limitations of conventional GNNs in
capturing graph structure
- 2. Vulnerability of GNNs to noise in
graph data
- 3. Open questions & Future direction
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 15
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 17
¡ Given two different graphs, can GNNs map
them into different graph representations?
¡ Important condition for classification scenario.
GN GNN GN GNN ? ?
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 18
¡ Essentially, graph isomorphism test problem. ¡ No polynomial algorithms exist for general case. ¡ GNNs may not perfectly distinguish any graphs!
GN GNN ? GN GNN ?
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 19
¡ Essentially graph isomorphism test problem. ¡ No polynomial algorithms exist for general case. ¡ GNNs may not perfectly distinguish any graphs.
GN GNN ? GN GNN ?
How well can GNNs perform the graph isomorphism test?
Requires rethinking the mechanism of how GNNs capture graph structure.
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 20
¡ GNNs use different computational graphs to
distinguish different graphs.
1 2 3 4 1 2 3 4 1’ 2’ 3’ 4’ 1’ 2’ 3’ 4’
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 21
¡ Node representation captures rooted subtree
structure.
1 2 3 4 1 2 3 4 1’ 2’ 3’ 4’ 1’ 2’ 3’ 4’
Rooted subtree structure Rooted subtree structure
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 22
¡ Most discriminative GNNs map different subtrees
into different node representations (denoted by different colors).
1 2 3 4 1’ 2’ 3’ 4’
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 23
¡ Most discriminative GNNs map different subtrees
into different node representation (denoted by different colors).
1 2 3 4 1’ 2’ 3’ 4’
Injec ectivity
¡ Function is injective if it maps different
elements into different outputs.
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 24
¡ Entire neighbor aggregation is injective if
every step of neighbor aggregation is injective.
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 25
In Inject ective e à In Inject ective e à En Enti tire functi tion is Injecti tive ve!
¡ Neighbor aggregation is essentially a function
- ver multi-set (set with repeating elements).
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 26
Ne Neighbo hbor ag aggreg egat ation Mu Multi-se set f function Eq Equiva valent Ex Examples of mu multi-se set
Sam Same e co color indicat cates es the e sa same no node fe featur ures
¡ Neighbor aggregation is essentially function
- ver multi-set (set with repeating elements).
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 27
Ne Neighbo hbor ag aggreg egat ation Mu Multi-se set f function Eq Equiva valent Ex Examples of mu multi-se set
Discriminative Power of GNNs can be characterized by that of multi-set functions
Next: Analyzing GCN, GraphSAGE
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 28
Re Recall: GCN CN uses me mean po pooling.
Mean Mean Po Pooling + Li Line near Re ReLU LU
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 29
Re Recall: GCN CN uses me mean po pooling.
Mean Mean po pooling + + Li Line near Re ReLU LU GCN GCN wi will fail to di distinguish pr propo portionally equ quivalent multi-se sets. s.
No Not injective ve!
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 30
Re Recall: Gr Grap aphSAGE GE use uses m s max p x pool
- oling.
ng.
ML MLP + Max ax pooling
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 31
Re Recall: Gr Grap aphSAGE GE use uses m s max p x pool
- oling.
ng.
ML MLP + Max Max po pooling. Gr Graph phSAGE GE wi will even fail to di distinguish multi-se set w with t the sa same d dist stinct e elements. s.
No Not injective ve!
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 32
Re Recall: Gr Grap aphSAGE GE use uses m s max p x pool
- oling.
ng.
ML MLP + max ax pooling. GCN GCN wi will even fail to di distinguish multi-se set w with t the sa same di distinct elements.
No Not injective ve!
How can we design injective multi-set function using neural networks?
Theorem Any injective multi-set function can be expressed by
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 33
𝜚 "
#∈%
𝑔(𝑦)
𝑇 : m
: multi-se set
So Some non- lin linear functio ion So Some non- lin linear functio ion
𝜚
𝑔 𝑔 𝑔 + +
Su Sum ov
- ver
r mu multi-se set
Theorem Any injective multi-set function can be expressed by We can model 𝜚 and 𝑔 using Multi-Layer- Perceptron (MLP). Note: MLP is a universal approximator.
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 34
𝜚 "
#∈%
𝑔(𝑦)
So Some non- lin linear functio ion So Some non- lin linear functio ion
Su Sum ov
- ver
r mu multi-se set
+ +
𝑁𝑀𝑄/ 𝑁𝑀𝑄 𝑁𝑀𝑄 𝑁𝑀𝑄
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 35
Gr Grap aph Isomorphism Network k (GIN)
[Xu [Xu+ + ICL CLR’2019] ] ML MLP + su sum p pooling
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 36
ML MLP + su sum p pooling
Th The GIN’s neigh ighbor
- r aggr
ggrega gation ion is is in inje jectiv ive!
Graph Isomorphism Network k (GIN)
[Xu [Xu+ + ICL CLR’2019] ]
¡ Graph pooling is also a function over multiset.
Sum pooling can give injective graph pooling!
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 37
1 2 3 4 2 3 4 1’ 2’ 3’ 4’ 1’ 2’
Su Sum m pooling Su Sum m pooling
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 38
ML MLP + sum pooling
Th The GIN’s neigh ighbor
- r aggr
ggrega gation ion is is in inje jectiv ive!
Graph Isomorphism Network k (GIN)
[Xu [Xu+ + ICL CLR’2019] ]
So far: GIN achieves maximal discriminative power by using injective neighbor aggregation. How powerful is this?
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 39
¡ GIN is closely related to Weisfeiler-Lehman
(WL) Graph Isomorphism Test (1968).
¡ WL test is known to be capable of
distinguishing most of real-world graphs.
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 40
¡ GIN is closely related to Weisfeiler-Lehman
(WL) Graph Isomorphism Test (1968).
¡ WL test is known to be capable of
distinguishing most of real-world graphs.
¡ Next: We will show GIN is as discriminative
as the WL test.
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 41
1 2 3 4 1’ 2’ 3’ 4’
¡ WL
WL f first m map aps d differ eren ent r rooted ed s subtrees ees t to di different col
- lors
- rs
1 2 3 4 1’ 2’ 3’ 4’
Similar to Injective neighbor aggregation in GIN
¡ WL
WL t then c counts d s different c colors. s.
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 42
1 2 3 4 1’ 2’ 3’ 4’ 1 2 3 4 1’ 2’ 3’ 4’
Similar to graph pooling in GNN
4 x 1 x 1 x 2 x
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 43
1 2 3 4 1’ 2’ 3’ 4’
¡ Fina
Finally lly, WL L compares the he count unt
1 2 3 4 1’ 2’ 3’ 4’ 4 x 1 x 1 x 2 x
Compare!
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 44
1 2 3 4 1’ 2’ 3’ 4’
¡ Fina
Finally lly, WL L compares the he count unt
1 2 3 4 1’ 2’ 3’ 4’ 4 x 1 x 1 x 2 x
Compare!
WL test and GIN are operationally equivalent. Graphs that WL test can distinguish Graphs that GIN can distinguish
Observation
¡ GINs have the same discriminative power as
the WL graph isomorphism test.
¡ WL test has been known to distinguish most of
the graphs, except for some corner cases.
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 45
Gskip(11, 2)
2) Gskip(11, 3)
Follow-up work to resolve corner cases but with ex exponen ential al time e co complex exity: [Murphy+ ICML 2019]
The two graphs look the same for WL test because all the nodes have the same local subtree structure!
¡ Graph classification: social and bio/chem graphs
Training accuracy of different GNN architectures. GIN fits training data much better than GCN, GraphSAGE.
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 46
WL kernel GIN GCN GraphSAGE
¡ Graph classification: social and bio/chem graphs
Training accuracy of different GNN architectures. Same trend across datasets!
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 47
GIN GraphSAGE GCN
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 48
¡ Graph classification: social and bio/chem graphs
GIN outperforms existing GNNs also in terms of test accuracy because it can better capture graph structure.
Data / Model
GIN (Powerful)
Sum-1
GCN (Mean) G RDT-B 92.4 90.0 50.0 IMDB-B 75.1 74.1 74.0 NCI1 82.7 82.0 80.2 MUTAG 89.4 90.0 85.6 GCN (Mean) GraphSAGE (Max) 50.0 50.0 74.0 72.3 80.2 77.7 85.6 85.1
Reddit dataset does not have node features!
¡ Existing GNNs use non-injective neighbor
aggregation, thus have low discriminative power
¡ GIN uses injective neighbor aggregation, and
is as discriminative as the WL graph isomorphism test
¡ GIN achieves state-of-the-art test
performance in graph classification
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 49
¡ Deep Neural Networks are vulnerable to
adversarial attacks!
¡ Attacks are often implemented as imperceptible
noise that changes the prediction
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 51
Gradient direction that changes prediction
52
¡ Adversaries are very common in applications
- f graph neural networks, e.g., search engines,
recommender systems, social networks, etc.
¡ These adversaries will exploit any exposed
vulnerabilities!
Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu
53
¡ Adversaries are very common in applications
- f graph neural networks. e.g. search engines,
recommender systems, social networks.
¡ These adversaries will exploit any
vulnerabilities exposed.
Are GNNs robust to adversarial attacks?
Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu
54
Labeled
? ? ? ? ? ? ?
Message passing
¡ Here we focus on semi-supervised node
classification using Graph Convolutional Neural Networks (GCN) [Kipf+ ICLR’2017].
GCN
Labeled
?: Unlabeled
Input: Partially labeled attributed graph Goal: Predict labels of unlabeled nodes
Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 55
𝐵 ∈ 0,1 5×5: Adjacency matrix 𝑌 ∈ 0,1 5×8: (binary) node attributes
𝑡𝑝𝑔𝑢𝑛𝑏𝑦 > 𝐵 𝑆𝑓𝑀𝑉 > 𝐵𝑌𝑋 C 𝑋 D
> 𝐵 ≡ 𝐸GH
I 𝐵 + 𝐽 𝐸GH I : Renormalized adjacency matrix
Classification Model:
Two-step GCN message passing
Training:
Minimize cross entropy loss on labeled data
Testing:
Apply the model to predict unlabeled data
Labeled
? ? ? ? ? ? ?
Labeled
Target node 𝑢 ∈ 𝑊: node whose classification label we want to change Attacker nodes 𝑇 ⊂ 𝑊: nodes the attacker can modify
56
Target node Attacker node Attacker node
¡ What are attack possibilities
in real world?
Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu
Target node 𝑢 ∈ 𝑊: node whose classification label we want to change Attacker nodes 𝑇 ⊂ 𝑊: nodes the attacker can modify
Direct attack (𝑇 = {𝑢})
¡
Modify the target‘s features
¡
Add connections to the target
¡
Remove connections from the target
57
Target node Attacker node Attacker node
Change website content Buy likes/ followers Example Unfollow untrusted users
¡ What are attack possibilities
in real world?
Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu
Target node 𝑢 ∈ 𝑊: node whose classification label we want to change Attacker nodes 𝑇 ⊂ 𝑊: nodes the attacker can modify
Direct attack (𝑇 = {𝑢})
¡
Modify the target‘s features
¡
Add connections to the target
¡
Remove connections from the target
58
Target node
Indirect attack (𝑢 ∉ 𝑇)
- Modify the
attackers‘ features
- Add connections
to the attackers
- Remove connections
from the attackers
Attacker node Attacker node
Change website content Buy likes/ followers Example Unfollow untrusted users Hijack friends
- f target
Create a link/ spam farm Example
¡ What are attack possibilities
in real world?
Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu
Target node 𝑢 ∈ 𝑊: node whose classification label we want to change Attacker nodes 𝑇 ⊂ 𝑊: nodes the attacker can modify
Direct attack (𝑇 = {𝑢})
¡
Modify the target‘s features
¡
Add connections to the target
¡
Remove connections from the target
59
Target node
Indirect attack (𝑢 ∉ 𝑇)
- Modify the
attackers‘ features
- Add connections
to the attackers
- Remove connections
from the attackers
Attacker node Attacker node
Change website content Buy likes/ followers Example Unfollow untrusted users Hijack friends
- f target
Create a link/ spam farm Example
¡ What are attack possibilities
in real world?
Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu
How to mathematically formalize these attack possibilities?
¡ Zügner+, Adversarial Attacks on Neural Networks for
Graph Data, KDD’18
Maximize (Change of predicted labels of target node) Subject to (Limited noise in the graph)
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 60
1 1 1 1 1 1 1 1 1
?
Graph GCN
1 Class 1 Class 2 Class 3
Add small graph noise Target node Change the predicted label
arg max
VW,XW max YZY[\] log 𝑎a,Y ∗
− log 𝑎a,Y[\]
∗
𝑥ℎ𝑓𝑠𝑓 𝑎∗ = 𝑔
g∗ 𝐵h, 𝑌h =
𝑡𝑝𝑔𝑢𝑛𝑏𝑦 > 𝐵′ 𝑆𝑓𝑀𝑉 > 𝐵′𝑌′𝑋 C 𝑋 D , 𝑥𝑗𝑢ℎ 𝜄∗ = arg min
g ℒ (𝜄; 𝐵h, 𝑌h)
𝑡. 𝑢. 𝐵h, 𝑌h ≈ (𝐵, 𝑌)
¡ Find a modified graph that maximizes the change of
predicted labels of target node
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 61
Let’s parse the objective function!
arg max
VW,XW max YZY[\] log 𝑎a,Y ∗
− log 𝑎a,Y[\]
∗
𝑥ℎ𝑓𝑠𝑓 𝑎∗ = 𝑔
g∗ 𝐵h, 𝑌h =
𝑡𝑝𝑔𝑢𝑛𝑏𝑦 > 𝐵′ 𝑆𝑓𝑀𝑉 > 𝐵′𝑌′𝑋 C 𝑋 D , 𝑥𝑗𝑢ℎ 𝜄∗ = arg min
g ℒ (𝜄; 𝐵h, 𝑌h)
𝑡. 𝑢. 𝐵h, 𝑌h ≈ (𝐵, 𝑌)
¡ Find a modified graph that maximizes the change of
predicted labels of target node
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 62
Modified adjacency matrix Modified node feature Original prediction New prediction (specified by attacker) Target node
¡ Find a modified graph that maximizes the change of
predicted labels of target node
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 63
arg max
VW,XW max YZY[\] log 𝑎a,Y ∗
− log 𝑎a,Y[\]
∗
𝑥ℎ𝑓𝑠𝑓 𝑎∗ = 𝑔
g∗ 𝐵h, 𝑌h =
𝑡𝑝𝑔𝑢𝑛𝑏𝑦 > 𝐵′ 𝑆𝑓𝑀𝑉 > 𝐵′𝑌′𝑋 C 𝑋 D , 𝑥𝑗𝑢ℎ 𝜄∗ = arg min
g ℒ (𝜄; 𝐵h, 𝑌h)
𝑡. 𝑢. 𝐵h, 𝑌h ≈ (𝐵, 𝑌)
Increase the loglikelihood
- f target node 𝑤 being
predicted as 𝑑 Decrease the loglikelihood of target node 𝑤 being predicted as 𝑑tuv
¡ Find a modified graph that maximizes the change of
predicted labels of target node
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 64
GCN is trained on modified graph, which is then used to predict labels of the target node.
arg max
VW,XW max YZY[\] log 𝑎a,Y ∗
− log 𝑎a,Y[\]
∗
𝑥ℎ𝑓𝑠𝑓 𝑎∗ = 𝑔
g∗ 𝐵h, 𝑌h =
𝑡𝑝𝑔𝑢𝑛𝑏𝑦 > 𝐵′ 𝑆𝑓𝑀𝑉 > 𝐵′𝑌′𝑋 C 𝑋 D , 𝑥𝑗𝑢ℎ 𝜄∗ = arg min
g ℒ (𝜄; 𝐵h, 𝑌h)
𝑡. 𝑢. 𝐵h, 𝑌h ≈ (𝐵, 𝑌)
arg max
VW,XW max YZY[\] log 𝑎a,Y ∗
− log 𝑎a,Y[\]
∗
𝑥ℎ𝑓𝑠𝑓 𝑎∗ = 𝑔
g∗ 𝐵h, 𝑌h =
𝑡𝑝𝑔𝑢𝑛𝑏𝑦 > 𝐵′ 𝑆𝑓𝑀𝑉 > 𝐵′𝑌′𝑋 C 𝑋 D , 𝑥𝑗𝑢ℎ 𝜄∗ = arg min
g ℒ (𝜄; 𝐵h, 𝑌h)
𝑡. 𝑢. 𝐵h, 𝑌h ≈ (𝐵, 𝑌)
¡ Find a modified graph that maximizes the change of
predicted labels of target node
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 65
The modified graph should be close to the original graph. E.g., Fixed budget of edge deletion/addition, node attribute perturbation
¡ In practice, we cannot exactly solve the
- ptimization problem because…
§ Graph modification is discrete (cannot use simple gradient descent to optimize) § Inner loop involves expensive re-training of GCN
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 66
arg max
VW,XW max YZY[\] log 𝑎a,Y ∗
− log 𝑎a,Y[\]
∗
𝑥ℎ𝑓𝑠𝑓 𝑎∗ = 𝑔
g∗ 𝐵h, 𝑌h =
𝑡𝑝𝑔𝑢𝑛𝑏𝑦 > 𝐵′ 𝑆𝑓𝑀𝑉 > 𝐵′𝑌′𝑋 C 𝑋 D , 𝑥𝑗𝑢ℎ 𝜄∗ = arg min
g ℒ (𝜄; 𝐵h, 𝑌h)
𝑡. 𝑢. 𝐵h, 𝑌h ≈ (𝐵, 𝑌)
¡ Some heuristics have been proposed to
efficiently obtain an approximate solution.
¡ For example:
§ Greedily choosing the step-by-step graph modification § Simplifying GCN by removing ReLU activation (to work in closed form) § Etc.
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 67
More details in Zügner+ KDD’2018!
¡ Semi-supervised node classification with GCN
§ Class predictions for a single node, produced by 5 GCNs with different random initializations
68
Class 1 Class 2 Class 3 Class 4 Class 5 Class 6 (correct) Class 7
Predicted probabilities (clean) Classification margin > 0: Correct classification < 0: Incorrect classification
Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu
¡ The GCN prediction is easily manipulated by only 5
modifications of graph structure (|V|=~2k, |E|=~5k)
69
Class 1 Class 2 Class 3 Class 4 Class 5 Class 6 (correct) Class 7
Predicted probabilities after attack (5 modifications)
Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu
70
Class 1 Class 2 Class 3 Class 4 Class 5 Class 6 (correct) Class 7
Predicted probabilities after attack (5 modifications)
¡ The prediction of GCN is easily manipulated by only 5
modifications of graph structure.
GNNs are not robust to adversarial attacks!
Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 72
¡ Chemistry: Molecular graphs
§ Molecular property prediction
¡ Biology: Protein-Protein Interaction Networks
§ Protein function prediction
= toxic?
GNN ( )
= adenylate cyclase activity?
GNN( )
¡ Scarcity of labeled data
§ Labels require expensive experiments à Models overfit to small training datasets
¡ Out-of-distribution prediction
§ Test examples are very different from training in scientific discovery à Models typically perform poorly
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 73
¡ Pre-training GNNs [Hu+ 2019]
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 74
GNN
Pre-train GNNs on relevant, easy to obtain graph data
Unlabeled molecules
Chemistry database
Pre-train
Downstream task 1 Downstream task 2 Downstream task N
Fine-tune
with chemical knowledge
¡ We have seen how to attack GNNs ¡ Open question:
How to defend against the attacks?
12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, hpp://cs224w.stanford.edu 75
Challenges:
¡ Tractable optimization on discrete graph data ¡ Achieving good trade-off between Accuracy