http://cs224w.stanford.edu Output: Node embeddings. We can also - - PowerPoint PPT Presentation

http cs224w stanford edu
SMART_READER_LITE
LIVE PREVIEW

http://cs224w.stanford.edu Output: Node embeddings. We can also - - PowerPoint PPT Presentation

CS224W: Machine Learning with Graphs Jure Leskovec, Weihua Hu, Stanford University http://cs224w.stanford.edu Output: Node embeddings. We can also embed larger network structures, subgraphs, graphs. 12/3/19 Jure Leskovec, Stanford CS224W:


slide-1
SLIDE 1

CS224W: Machine Learning with Graphs Jure Leskovec, Weihua Hu, Stanford University

http://cs224w.stanford.edu

slide-2
SLIDE 2

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 2

Output: Node embeddings.

We can also embed larger network structures, subgraphs, graphs.

slide-3
SLIDE 3

¡ Key idea: Generate node embeddings based

  • n local network neighborhoods

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 3

INPUT GRAPH TARGET NODE

B D E F C A B C D A A A C F B E A

slide-4
SLIDE 4

¡ Intuition: Nodes aggregate information from

their neighbors using neural networks

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 4

INPUT GRAPH TARGET NODE

B D E F C A B C D A A A C F B E A

Neural networks

slide-5
SLIDE 5

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 5

Scarselli et al., 2009b; Battaglia et al., 2016; Defferrard et al., 2016; Duvenaud et al., 2015; Hamilton et al., 2017a; Kearnes et al., 2016; Kipf & Welling, 2017; Lei et al., 2017; Li et al., 2016; Velickovic et al., 2018; Verma & Zhang, 2018; Ying et al., 2018; Zhang et al., 2018

5

INPUT GRAPH TARGET NODE

B D E F C A B C D A A A C F B E A

What’s inside the box?

¡ Many model variants have been proposed

with difference choice of neural networks.

? ? ? ?

slide-6
SLIDE 6

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 6

Scarselli et al., 2009b; Battaglia et al., 2016; Defferrard et al., 2016; Duvenaud et al., 2015; Hamilton et al., 2017a; Kearnes et al., 2016; Kipf & Welling, 2017; Lei et al., 2017; Li et al., 2016; Velickovic et al., 2018; Verma & Zhang, 2018; Ying et al., 2018; Zhang et al., 2018

6

INPUT GRAPH TARGET NODE

B D E F C A B C D A A A C F B E A

¡ Many model variants have been proposed

with difference choice of neural networks.

Graph Convolutional Networks ks [Kipf & Welling ICLR'2017]

Li Line near + Re ReLU

Linear + + Re ReLU Me Mean Me Mean

slide-7
SLIDE 7

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 7

Scarselli et al., 2009b; Battaglia et al., 2016; Defferrard et al., 2016; Duvenaud et al., 2015; Hamilton et al., 2017a; Kearnes et al., 2016; Kipf & Welling, 2017; Lei et al., 2017; Li et al., 2016; Velickovic et al., 2018; Verma & Zhang, 2018; Ying et al., 2018; Zhang et al., 2018

7

INPUT GRAPH TARGET NODE

B D E F C A B C D A A A C F B E A

¡ Many model variants have been proposed

with difference choice of neural networks.

Gr Graph phSAGE GE [Hamilton+ NeurIPS’2017]

Ma Max ML MLP Ma Max ML MLP

slide-8
SLIDE 8

¡ Intuition: Network neighborhood defines a

computation graph

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 8

Every node defines a computation graph based on its neighborhood!

slide-9
SLIDE 9

¡ Obtain node representation by neighbor

aggregation

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 9 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 9

slide-10
SLIDE 10

¡ Obtain graph representation by pooling node

representation

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 10 12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 10

Pool

  • ol (e.g., Sum, Average)
slide-11
SLIDE 11

Graph Neural Networks have achieved state-of- the-art performance on:

¡ Node classification [Kipf+ ICLR’2017] ¡ Graph Classification [Ying+ NeurIPS’2018] ¡ Link Prediction [Zhang+ NeurIPS’2018]

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 11

slide-12
SLIDE 12

Graph Neural Networks have achieved state-of- the-art performance on

¡ Node classification [Kipf+ ICLR’2017] ¡ Graph Classification [Ying+ NeurIPS’2018] ¡ Link Prediction [Zhang+ NeurIPS’2018]

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 12

Are GNNs perfect? What are the limitations of GNNs?

slide-13
SLIDE 13

¡ Some simple graph structures cannot be

distinguished by conventional GNNs.

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 13

GCN and GraphSAGE fail to distinguish the two graphs. Assume: Input node features are uniform (denoted by the same node color)

slide-14
SLIDE 14

¡ Some simple graph structures cannot be

distinguished by conventional GNNs.

¡ GNNs are not robust to noise in graph data.

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 14

GCN and GraphSAGE fail to distinguish the two graphs. Assume: Input node features are uniform (denoted by the same node color)

1 1 1 1 1 1 1 1 1

?

Graph GNNs

1

Noise in graph

  • 1. Node feature

perturbation

  • 2. Edge

addition/deletion

Class 1 Class 2 Class 3

Class prediction

slide-15
SLIDE 15
  • 1. Limitations of conventional GNNs in

capturing graph structure

  • 2. Vulnerability of GNNs to noise in

graph data

  • 3. Open questions & Future direction

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 15

slide-16
SLIDE 16
slide-17
SLIDE 17

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 17

¡ Given two different graphs, can GNNs map

them into different graph representations?

¡ Important condition for classification scenario.

GN GNN GN GNN ? ?

slide-18
SLIDE 18

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 18

¡ Essentially, graph isomorphism test problem. ¡ No polynomial algorithms exist for general case. ¡ GNNs may not perfectly distinguish any graphs!

GN GNN ? GN GNN ?

slide-19
SLIDE 19

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 19

¡ Essentially graph isomorphism test problem. ¡ No polynomial algorithms exist for general case. ¡ GNNs may not perfectly distinguish any graphs.

GN GNN ? GN GNN ?

How well can GNNs perform the graph isomorphism test?

Requires rethinking the mechanism of how GNNs capture graph structure.

slide-20
SLIDE 20

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 20

¡ GNNs use different computational graphs to

distinguish different graphs.

1 2 3 4 1 2 3 4 1’ 2’ 3’ 4’ 1’ 2’ 3’ 4’

slide-21
SLIDE 21

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 21

¡ Node representation captures rooted subtree

structure.

1 2 3 4 1 2 3 4 1’ 2’ 3’ 4’ 1’ 2’ 3’ 4’

Rooted subtree structure Rooted subtree structure

slide-22
SLIDE 22

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 22

¡ Most discriminative GNNs map different subtrees

into different node representations (denoted by different colors).

1 2 3 4 1’ 2’ 3’ 4’

slide-23
SLIDE 23

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 23

¡ Most discriminative GNNs map different subtrees

into different node representation (denoted by different colors).

1 2 3 4 1’ 2’ 3’ 4’

Injec ectivity

slide-24
SLIDE 24

¡ Function is injective if it maps different

elements into different outputs.

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 24

slide-25
SLIDE 25

¡ Entire neighbor aggregation is injective if

every step of neighbor aggregation is injective.

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 25

In Inject ective e à In Inject ective e à En Enti tire functi tion is Injecti tive ve!

slide-26
SLIDE 26

¡ Neighbor aggregation is essentially a function

  • ver multi-set (set with repeating elements).

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 26

Ne Neighbo hbor ag aggreg egat ation Mu Multi-se set f function Eq Equiva valent Ex Examples of mu multi-se set

Sam Same e co color indicat cates es the e sa same no node fe featur ures

slide-27
SLIDE 27

¡ Neighbor aggregation is essentially function

  • ver multi-set (set with repeating elements).

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 27

Ne Neighbo hbor ag aggreg egat ation Mu Multi-se set f function Eq Equiva valent Ex Examples of mu multi-se set

Discriminative Power of GNNs can be characterized by that of multi-set functions

Next: Analyzing GCN, GraphSAGE

slide-28
SLIDE 28

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 28

Re Recall: GCN CN uses me mean po pooling.

Mean Mean Po Pooling + Li Line near Re ReLU LU

slide-29
SLIDE 29

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 29

Re Recall: GCN CN uses me mean po pooling.

Mean Mean po pooling + + Li Line near Re ReLU LU GCN GCN wi will fail to di distinguish pr propo portionally equ quivalent multi-se sets. s.

No Not injective ve!

slide-30
SLIDE 30

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 30

Re Recall: Gr Grap aphSAGE GE use uses m s max p x pool

  • oling.

ng.

ML MLP + Max ax pooling

slide-31
SLIDE 31

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 31

Re Recall: Gr Grap aphSAGE GE use uses m s max p x pool

  • oling.

ng.

ML MLP + Max Max po pooling. Gr Graph phSAGE GE wi will even fail to di distinguish multi-se set w with t the sa same d dist stinct e elements. s.

No Not injective ve!

slide-32
SLIDE 32

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 32

Re Recall: Gr Grap aphSAGE GE use uses m s max p x pool

  • oling.

ng.

ML MLP + max ax pooling. GCN GCN wi will even fail to di distinguish multi-se set w with t the sa same di distinct elements.

No Not injective ve!

How can we design injective multi-set function using neural networks?

slide-33
SLIDE 33

Theorem Any injective multi-set function can be expressed by

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 33

𝜚 "

#∈%

𝑔(𝑦)

𝑇 : m

: multi-se set

So Some non- lin linear functio ion So Some non- lin linear functio ion

𝜚

𝑔 𝑔 𝑔 + +

Su Sum ov

  • ver

r mu multi-se set

slide-34
SLIDE 34

Theorem Any injective multi-set function can be expressed by We can model 𝜚 and 𝑔 using Multi-Layer- Perceptron (MLP). Note: MLP is a universal approximator.

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 34

𝜚 "

#∈%

𝑔(𝑦)

So Some non- lin linear functio ion So Some non- lin linear functio ion

Su Sum ov

  • ver

r mu multi-se set

+ +

𝑁𝑀𝑄/ 𝑁𝑀𝑄 𝑁𝑀𝑄 𝑁𝑀𝑄

slide-35
SLIDE 35

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 35

Gr Grap aph Isomorphism Network k (GIN)

[Xu [Xu+ + ICL CLR’2019] ] ML MLP + su sum p pooling

slide-36
SLIDE 36

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 36

ML MLP + su sum p pooling

Th The GIN’s neigh ighbor

  • r aggr

ggrega gation ion is is in inje jectiv ive!

Graph Isomorphism Network k (GIN)

[Xu [Xu+ + ICL CLR’2019] ]

slide-37
SLIDE 37

¡ Graph pooling is also a function over multiset.

Sum pooling can give injective graph pooling!

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 37

1 2 3 4 2 3 4 1’ 2’ 3’ 4’ 1’ 2’

Su Sum m pooling Su Sum m pooling

slide-38
SLIDE 38

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 38

ML MLP + sum pooling

Th The GIN’s neigh ighbor

  • r aggr

ggrega gation ion is is in inje jectiv ive!

Graph Isomorphism Network k (GIN)

[Xu [Xu+ + ICL CLR’2019] ]

So far: GIN achieves maximal discriminative power by using injective neighbor aggregation. How powerful is this?

slide-39
SLIDE 39

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 39

¡ GIN is closely related to Weisfeiler-Lehman

(WL) Graph Isomorphism Test (1968).

¡ WL test is known to be capable of

distinguishing most of real-world graphs.

slide-40
SLIDE 40

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 40

¡ GIN is closely related to Weisfeiler-Lehman

(WL) Graph Isomorphism Test (1968).

¡ WL test is known to be capable of

distinguishing most of real-world graphs.

¡ Next: We will show GIN is as discriminative

as the WL test.

slide-41
SLIDE 41

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 41

1 2 3 4 1’ 2’ 3’ 4’

¡ WL

WL f first m map aps d differ eren ent r rooted ed s subtrees ees t to di different col

  • lors
  • rs

1 2 3 4 1’ 2’ 3’ 4’

Similar to Injective neighbor aggregation in GIN

slide-42
SLIDE 42

¡ WL

WL t then c counts d s different c colors. s.

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 42

1 2 3 4 1’ 2’ 3’ 4’ 1 2 3 4 1’ 2’ 3’ 4’

Similar to graph pooling in GNN

4 x 1 x 1 x 2 x

slide-43
SLIDE 43

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 43

1 2 3 4 1’ 2’ 3’ 4’

¡ Fina

Finally lly, WL L compares the he count unt

1 2 3 4 1’ 2’ 3’ 4’ 4 x 1 x 1 x 2 x

Compare!

slide-44
SLIDE 44

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 44

1 2 3 4 1’ 2’ 3’ 4’

¡ Fina

Finally lly, WL L compares the he count unt

1 2 3 4 1’ 2’ 3’ 4’ 4 x 1 x 1 x 2 x

Compare!

WL test and GIN are operationally equivalent. Graphs that WL test can distinguish Graphs that GIN can distinguish

slide-45
SLIDE 45

Observation

¡ GINs have the same discriminative power as

the WL graph isomorphism test.

¡ WL test has been known to distinguish most of

the graphs, except for some corner cases.

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 45

Gskip(11, 2)

2) Gskip(11, 3)

Follow-up work to resolve corner cases but with ex exponen ential al time e co complex exity: [Murphy+ ICML 2019]

The two graphs look the same for WL test because all the nodes have the same local subtree structure!

slide-46
SLIDE 46

¡ Graph classification: social and bio/chem graphs

Training accuracy of different GNN architectures. GIN fits training data much better than GCN, GraphSAGE.

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 46

WL kernel GIN GCN GraphSAGE

slide-47
SLIDE 47

¡ Graph classification: social and bio/chem graphs

Training accuracy of different GNN architectures. Same trend across datasets!

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 47

GIN GraphSAGE GCN

slide-48
SLIDE 48

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 48

¡ Graph classification: social and bio/chem graphs

GIN outperforms existing GNNs also in terms of test accuracy because it can better capture graph structure.

Data / Model

GIN (Powerful)

Sum-1

GCN (Mean) G RDT-B 92.4 90.0 50.0 IMDB-B 75.1 74.1 74.0 NCI1 82.7 82.0 80.2 MUTAG 89.4 90.0 85.6 GCN (Mean) GraphSAGE (Max) 50.0 50.0 74.0 72.3 80.2 77.7 85.6 85.1

Reddit dataset does not have node features!

slide-49
SLIDE 49

¡ Existing GNNs use non-injective neighbor

aggregation, thus have low discriminative power

¡ GIN uses injective neighbor aggregation, and

is as discriminative as the WL graph isomorphism test

¡ GIN achieves state-of-the-art test

performance in graph classification

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 49

slide-50
SLIDE 50
slide-51
SLIDE 51

¡ Deep Neural Networks are vulnerable to

adversarial attacks!

¡ Attacks are often implemented as imperceptible

noise that changes the prediction

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 51

Gradient direction that changes prediction

slide-52
SLIDE 52

52

¡ Adversaries are very common in applications

  • f graph neural networks, e.g., search engines,

recommender systems, social networks, etc.

¡ These adversaries will exploit any exposed

vulnerabilities!

Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu

slide-53
SLIDE 53

53

¡ Adversaries are very common in applications

  • f graph neural networks. e.g. search engines,

recommender systems, social networks.

¡ These adversaries will exploit any

vulnerabilities exposed.

Are GNNs robust to adversarial attacks?

Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu

slide-54
SLIDE 54

54

Labeled

? ? ? ? ? ? ?

Message passing

¡ Here we focus on semi-supervised node

classification using Graph Convolutional Neural Networks (GCN) [Kipf+ ICLR’2017].

GCN

Labeled

?: Unlabeled

Input: Partially labeled attributed graph Goal: Predict labels of unlabeled nodes

Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu

slide-55
SLIDE 55

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 55

𝐵 ∈ 0,1 5×5: Adjacency matrix 𝑌 ∈ 0,1 5×8: (binary) node attributes

𝑡𝑝𝑔𝑢𝑛𝑏𝑦 > 𝐵 𝑆𝑓𝑀𝑉 > 𝐵𝑌𝑋 C 𝑋 D

> 𝐵 ≡ 𝐸GH

I 𝐵 + 𝐽 𝐸GH I : Renormalized adjacency matrix

Classification Model:

Two-step GCN message passing

Training:

Minimize cross entropy loss on labeled data

Testing:

Apply the model to predict unlabeled data

Labeled

? ? ? ? ? ? ?

Labeled

slide-56
SLIDE 56

Target node 𝑢 ∈ 𝑊: node whose classification label we want to change Attacker nodes 𝑇 ⊂ 𝑊: nodes the attacker can modify

56

Target node Attacker node Attacker node

¡ What are attack possibilities

in real world?

Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu

slide-57
SLIDE 57

Target node 𝑢 ∈ 𝑊: node whose classification label we want to change Attacker nodes 𝑇 ⊂ 𝑊: nodes the attacker can modify

Direct attack (𝑇 = {𝑢})

¡

Modify the target‘s features

¡

Add connections to the target

¡

Remove connections from the target

57

Target node Attacker node Attacker node

Change website content Buy likes/ followers Example Unfollow untrusted users

¡ What are attack possibilities

in real world?

Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu

slide-58
SLIDE 58

Target node 𝑢 ∈ 𝑊: node whose classification label we want to change Attacker nodes 𝑇 ⊂ 𝑊: nodes the attacker can modify

Direct attack (𝑇 = {𝑢})

¡

Modify the target‘s features

¡

Add connections to the target

¡

Remove connections from the target

58

Target node

Indirect attack (𝑢 ∉ 𝑇)

  • Modify the

attackers‘ features

  • Add connections

to the attackers

  • Remove connections

from the attackers

Attacker node Attacker node

Change website content Buy likes/ followers Example Unfollow untrusted users Hijack friends

  • f target

Create a link/ spam farm Example

¡ What are attack possibilities

in real world?

Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu

slide-59
SLIDE 59

Target node 𝑢 ∈ 𝑊: node whose classification label we want to change Attacker nodes 𝑇 ⊂ 𝑊: nodes the attacker can modify

Direct attack (𝑇 = {𝑢})

¡

Modify the target‘s features

¡

Add connections to the target

¡

Remove connections from the target

59

Target node

Indirect attack (𝑢 ∉ 𝑇)

  • Modify the

attackers‘ features

  • Add connections

to the attackers

  • Remove connections

from the attackers

Attacker node Attacker node

Change website content Buy likes/ followers Example Unfollow untrusted users Hijack friends

  • f target

Create a link/ spam farm Example

¡ What are attack possibilities

in real world?

Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu

How to mathematically formalize these attack possibilities?

slide-60
SLIDE 60

¡ Zügner+, Adversarial Attacks on Neural Networks for

Graph Data, KDD’18

Maximize (Change of predicted labels of target node) Subject to (Limited noise in the graph)

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 60

1 1 1 1 1 1 1 1 1

?

Graph GCN

1 Class 1 Class 2 Class 3

Add small graph noise Target node Change the predicted label

slide-61
SLIDE 61

arg max

VW,XW max YZY[\] log 𝑎a,Y ∗

− log 𝑎a,Y[\]

𝑥ℎ𝑓𝑠𝑓 𝑎∗ = 𝑔

g∗ 𝐵h, 𝑌h =

𝑡𝑝𝑔𝑢𝑛𝑏𝑦 > 𝐵′ 𝑆𝑓𝑀𝑉 > 𝐵′𝑌′𝑋 C 𝑋 D , 𝑥𝑗𝑢ℎ 𝜄∗ = arg min

g ℒ (𝜄; 𝐵h, 𝑌h)

𝑡. 𝑢. 𝐵h, 𝑌h ≈ (𝐵, 𝑌)

¡ Find a modified graph that maximizes the change of

predicted labels of target node

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 61

Let’s parse the objective function!

slide-62
SLIDE 62

arg max

VW,XW max YZY[\] log 𝑎a,Y ∗

− log 𝑎a,Y[\]

𝑥ℎ𝑓𝑠𝑓 𝑎∗ = 𝑔

g∗ 𝐵h, 𝑌h =

𝑡𝑝𝑔𝑢𝑛𝑏𝑦 > 𝐵′ 𝑆𝑓𝑀𝑉 > 𝐵′𝑌′𝑋 C 𝑋 D , 𝑥𝑗𝑢ℎ 𝜄∗ = arg min

g ℒ (𝜄; 𝐵h, 𝑌h)

𝑡. 𝑢. 𝐵h, 𝑌h ≈ (𝐵, 𝑌)

¡ Find a modified graph that maximizes the change of

predicted labels of target node

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 62

Modified adjacency matrix Modified node feature Original prediction New prediction (specified by attacker) Target node

slide-63
SLIDE 63

¡ Find a modified graph that maximizes the change of

predicted labels of target node

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 63

arg max

VW,XW max YZY[\] log 𝑎a,Y ∗

− log 𝑎a,Y[\]

𝑥ℎ𝑓𝑠𝑓 𝑎∗ = 𝑔

g∗ 𝐵h, 𝑌h =

𝑡𝑝𝑔𝑢𝑛𝑏𝑦 > 𝐵′ 𝑆𝑓𝑀𝑉 > 𝐵′𝑌′𝑋 C 𝑋 D , 𝑥𝑗𝑢ℎ 𝜄∗ = arg min

g ℒ (𝜄; 𝐵h, 𝑌h)

𝑡. 𝑢. 𝐵h, 𝑌h ≈ (𝐵, 𝑌)

Increase the loglikelihood

  • f target node 𝑤 being

predicted as 𝑑 Decrease the loglikelihood of target node 𝑤 being predicted as 𝑑tuv

slide-64
SLIDE 64

¡ Find a modified graph that maximizes the change of

predicted labels of target node

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 64

GCN is trained on modified graph, which is then used to predict labels of the target node.

arg max

VW,XW max YZY[\] log 𝑎a,Y ∗

− log 𝑎a,Y[\]

𝑥ℎ𝑓𝑠𝑓 𝑎∗ = 𝑔

g∗ 𝐵h, 𝑌h =

𝑡𝑝𝑔𝑢𝑛𝑏𝑦 > 𝐵′ 𝑆𝑓𝑀𝑉 > 𝐵′𝑌′𝑋 C 𝑋 D , 𝑥𝑗𝑢ℎ 𝜄∗ = arg min

g ℒ (𝜄; 𝐵h, 𝑌h)

𝑡. 𝑢. 𝐵h, 𝑌h ≈ (𝐵, 𝑌)

slide-65
SLIDE 65

arg max

VW,XW max YZY[\] log 𝑎a,Y ∗

− log 𝑎a,Y[\]

𝑥ℎ𝑓𝑠𝑓 𝑎∗ = 𝑔

g∗ 𝐵h, 𝑌h =

𝑡𝑝𝑔𝑢𝑛𝑏𝑦 > 𝐵′ 𝑆𝑓𝑀𝑉 > 𝐵′𝑌′𝑋 C 𝑋 D , 𝑥𝑗𝑢ℎ 𝜄∗ = arg min

g ℒ (𝜄; 𝐵h, 𝑌h)

𝑡. 𝑢. 𝐵h, 𝑌h ≈ (𝐵, 𝑌)

¡ Find a modified graph that maximizes the change of

predicted labels of target node

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 65

The modified graph should be close to the original graph. E.g., Fixed budget of edge deletion/addition, node attribute perturbation

slide-66
SLIDE 66

¡ In practice, we cannot exactly solve the

  • ptimization problem because…

§ Graph modification is discrete (cannot use simple gradient descent to optimize) § Inner loop involves expensive re-training of GCN

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 66

arg max

VW,XW max YZY[\] log 𝑎a,Y ∗

− log 𝑎a,Y[\]

𝑥ℎ𝑓𝑠𝑓 𝑎∗ = 𝑔

g∗ 𝐵h, 𝑌h =

𝑡𝑝𝑔𝑢𝑛𝑏𝑦 > 𝐵′ 𝑆𝑓𝑀𝑉 > 𝐵′𝑌′𝑋 C 𝑋 D , 𝑥𝑗𝑢ℎ 𝜄∗ = arg min

g ℒ (𝜄; 𝐵h, 𝑌h)

𝑡. 𝑢. 𝐵h, 𝑌h ≈ (𝐵, 𝑌)

slide-67
SLIDE 67

¡ Some heuristics have been proposed to

efficiently obtain an approximate solution.

¡ For example:

§ Greedily choosing the step-by-step graph modification § Simplifying GCN by removing ReLU activation (to work in closed form) § Etc.

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 67

More details in Zügner+ KDD’2018!

slide-68
SLIDE 68

¡ Semi-supervised node classification with GCN

§ Class predictions for a single node, produced by 5 GCNs with different random initializations

68

Class 1 Class 2 Class 3 Class 4 Class 5 Class 6 (correct) Class 7

Predicted probabilities (clean) Classification margin > 0: Correct classification < 0: Incorrect classification

Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu

slide-69
SLIDE 69

¡ The GCN prediction is easily manipulated by only 5

modifications of graph structure (|V|=~2k, |E|=~5k)

69

Class 1 Class 2 Class 3 Class 4 Class 5 Class 6 (correct) Class 7

Predicted probabilities after attack (5 modifications)

Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu

slide-70
SLIDE 70

70

Class 1 Class 2 Class 3 Class 4 Class 5 Class 6 (correct) Class 7

Predicted probabilities after attack (5 modifications)

¡ The prediction of GCN is easily manipulated by only 5

modifications of graph structure.

GNNs are not robust to adversarial attacks!

Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu

slide-71
SLIDE 71
slide-72
SLIDE 72

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 72

¡ Chemistry: Molecular graphs

§ Molecular property prediction

¡ Biology: Protein-Protein Interaction Networks

§ Protein function prediction

= toxic?

GNN ( )

= adenylate cyclase activity?

GNN( )

slide-73
SLIDE 73

¡ Scarcity of labeled data

§ Labels require expensive experiments à Models overfit to small training datasets

¡ Out-of-distribution prediction

§ Test examples are very different from training in scientific discovery à Models typically perform poorly

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 73

slide-74
SLIDE 74

¡ Pre-training GNNs [Hu+ 2019]

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu 74

GNN

Pre-train GNNs on relevant, easy to obtain graph data

Unlabeled molecules

Chemistry database

Pre-train

Downstream task 1 Downstream task 2 Downstream task N

Fine-tune

with chemical knowledge

slide-75
SLIDE 75

¡ We have seen how to attack GNNs ¡ Open question:

How to defend against the attacks?

12/3/19 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, hpp://cs224w.stanford.edu 75

Challenges:

¡ Tractable optimization on discrete graph data ¡ Achieving good trade-off between Accuracy

and Robustness