Graph Representation Learning with Graph Convolutional Networks - - PowerPoint PPT Presentation

graph representation learning with graph convolutional
SMART_READER_LITE
LIVE PREVIEW

Graph Representation Learning with Graph Convolutional Networks - - PowerPoint PPT Presentation

Graph Representation Learning with Graph Convolutional Networks Jure Leskovec Networks: Common Language Movie 1 friend Actor 2 co-worker Actor 1 Mary Peter Actor 4 Movie 3 Tom Movie 2 friend brothers Actor 3 Albert Protein 2


slide-1
SLIDE 1

Graph Representation Learning with Graph Convolutional Networks

Jure Leskovec

slide-2
SLIDE 2

Networks: Common Language

Ju Jure Leskovec, Stanford University 2

Peter Mary Albert Tom

co-worker friend brothers friend

Protein 1 Protein 2 Protein 5 Protein 9

Movie 1 Movie 3 Movie 2

Actor 3 Actor 1 Actor 2 Actor 4

|N|=4 |E|=4

slide-3
SLIDE 3

Example: Node Classification

Many possible ways to create node features: § Node degree, PageRank score, motifs, … § Degree of neighbors, PageRank of neighbors, …

Ju Jure Leskovec, Stanford University 3

? ? ? ? ?

Machine Learning

slide-4
SLIDE 4

Machine Learning Lifecycle

4

Network Data Node Features Learning Algorithm Model Downstream prediction task Feature Engineering

Automatically learn the features

(Supervised) Machine Learning Lifecycle: This feature, that feature. Every single time!

Ju Jure Leskovec, Stanford University

slide-5
SLIDE 5

Feature Learning in Graphs

This talk: Feature learning for networks!

Ju Jure Leskovec, Stanford University 5

vector node !: # → ℝ& ℝ&

Feature representation, embedding

u

slide-6
SLIDE 6

6

Gr Graph phSAGE GE: : Graph Convolutional Networks

Ju Jure Leskovec, Stanford University

Inductive Representation Learning on Large Graphs.

  • W. Hamilton, R. Ying, J. Leskovec. Neural Information Processing Systems (NIPS), 2017.

Representation Learning on Graphs: Methods and Applications.

  • W. Hamilton, R. Ying, J. Leskovec. IEEE Data Engineering Bulletin, 2017.
slide-7
SLIDE 7

From Images to Networks

Single CNN layer with 3x3 filter:

Ju Jure Leskovec, Stanford University 7

(Animation Vincent Dumoul

Image Graph

Transform information at the neighbors and combine it

§ Transform “messages” ℎ" from neighbors: #

" ℎ"

§ Add them up: ∑ #

" ℎ" "

slide-8
SLIDE 8

Real-World Graphs

But what if your graphs look like this?

Ju Jure Leskovec, Stanford University 8

  • r this:

s like this?

  • r this:

§ Examples:

Social networks, Information networks, Knowledge graphs, Communication networks, Web graph, …

slide-9
SLIDE 9

A Naïve Approach

§ Join adjacency matrix and features § Feed them into a deep neural net: § Issues with this idea:

§ !(#) parameters § Not applicable to graphs of different sizes § Not invariant to node ordering

Ju Jure Leskovec, Stanford University 9

A B C D E A B C D E 0 1 1 1 0 1 0 1 0 0 1 1 0 0 1 0 0 1 0 0 1 1 1 1 0 1 1 1 0 1 0 1 0 1 0 Feat

  • Done?

?

A C B D E

slide-10
SLIDE 10

Graph Convolutional Networks

§ Graph Convolutional Networks:

Ju Jure Leskovec, Stanford 10 10

Niepert, Mathias, Mohamed Ahmed, and Konstantin Kutzkov. "Learning convolutional neural networks for graphs." ICML. 2016. (image source)

§ Problem: For a given subgraph how to come with canonical node ordering?

slide-11
SLIDE 11

Desiderata

§ Invariant to node ordering

§ No graph isomorphism problem

§ Locality – operations depend

  • n the neighbors of a given node

§ Number of model parameters should be independent of graph size § Model should be independent of graph structure and we should be able to transfer the model across graphs

Ju Jure Leskovec, Stanford University 11 11

slide-12
SLIDE 12

GraphSAGE

§ Adapt the GCN idea to inductive node embedding § Generalize beyond simple convolutions § Demonstrate that this generalization § Leads to significant performance gains § Allows the model to learn about local structures

Ju Jure Leskovec, Stanford 12 12

slide-13
SLIDE 13

Idea: Graph defines computation

Lear Learn n ho how to pr propag pagat ate e inf nformat ation n acr across th the g graph to to c comp mpute te n node fe featu tures

13 13 Ju Jure Leskovec, Stanford University

Determine node computation graph

!

Propagate and transform information

!

Idea: Node’s neighborhood defines a computation graph

Semi-Supervised Classification with Graph Convolutional Networks. T. N. Kipf, M. Welling, ICLR 2017

slide-14
SLIDE 14

Our Approach: GraphSAGE

14 14 Ju Jure Leskovec, Stanford

Q(1)

W(1)

Q(1)

W(1)

W(2) Q(2)

§ Each node defines its own computational graph

§ Each edge in this graph is a transformation/aggregation function

slide-15
SLIDE 15

Our Approach: GraphSAGE

15 15 Ju Jure Leskovec, Stanford

Q(1)

W(1)

Q(1)

W(1)

W(2) Q(2)

Upda Update te for r node de !: ℎ#

(%&') = *+,- . % ℎ# % ,

*+,-(1 % ℎ2

% ) 2∈4 #

§ ℎ#

5 = attributes of node 6

§ Σ ⋅ : Aggregator function (e.g., avg., LSTM, max-pooling)

Transform 6’s own features from level 9 Transform and aggregate features of neighbors : 9 + 1=> level features of node 6

Semi-Supervised Classification with Graph Convolutional Networks. T. N. Kipf, M. Welling, ICLR 2017

slide-16
SLIDE 16

GraphSAGE Algorithm

K = “search depth” aggregate information from neighbors initialize representations as features concatenate neighborhood info with current representation and propagate classification (cross-entropy) loss

slide-17
SLIDE 17

WL isomorphism test

§ The classic Weisfeiler-Lehman graph isomorphism test is a special case of GraphSAGE § We replace the hash function with trainable neural nets:

17 17

X X HASH

Shervashidze, Nino, et al. "Weisfeiler-Lehman graph kernels." Journal of Machine Learning Research (2011).

Ju Jure Leskovec, Stanford

slide-18
SLIDE 18

GraphSAGE: Training

§ Assume parameter sharing:

Ju Jure Leskovec, Stanford University 18 18

W(2) W(2) W(2) Q(2) Q(2) Q(2)

W(1) Q(1)

§ Two types of parameters:

§ Aggregate function can have params. § Matrix W(k)

§ Adapt to inductive setting (e.g., unsupervised loss, neighborhood sampling, minibatch optimization) § Generalized notion of “aggregating neighborhood”

slide-19
SLIDE 19

GraphSAGE: Benefits

§ Can use different aggregators !

§ Mean (simple element-wise mean), LSTM (to a random

  • rder of nodes), Max-pooling (element-wise max)

§ Can use different loss functions:

§ Cross entropy, Hinge loss, ranking loss

§ Model has a constant number of parameters § Fast scalable inference § Can be applied to any node in any network

Ju Jure Leskovec, Stanford University 19 19

slide-20
SLIDE 20

GraphSAGE Performance: Experiments

§ Co Comp mpare Gr GraphSAGE GE to to alte terna nati tive metho thods

§ Logistic regression on features (no network information) § Node2vec, extended node2vec with features

§ Task: k: Node classification, transfer learning

§ Citation graph: 302,424 papers from 2000-05

§ Pr Predi dict 6 subj bject code des; Train on 2000-04, test on ‘05

§ Reddit posts: 232,965 posts, 50 communities, Sep ‘14

§ Wh What community y does a post belong to? Train on first 20 days, test on remaining 10 days

§ Protein-protein interaction networks: 24 PPI networks from different tissues

§ Tr Transfer learning of protein function: Train on 20 networks, test on 2

DA DARPA SIMPLEX PI Meeting, February 6, 2018 MINER Project 20 20

slide-21
SLIDE 21

GraphSAGE Performance: Results

GraphSAGE performs best in all experiments. Achieves ~40% average improvement over raw features.

DA DARPA SIMPLEX PI Meeting, February 6, 2018 MINER Project 21 21

slide-22
SLIDE 22

Application: Pinterest

Hum Human n cur urated collection n of pins ns

Ju Jure Leskovec, Stanford University 22 22

Pi Pin: A visual bookmark someone has saved from the internet to a board they’ve created. Pi Pin: Image, text, link Bo Board: A greater collection of ideas (pins having sth. in common).

slide-23
SLIDE 23

Large-Scale Application

§ Semi-Supervised node embedding for graph-based recommendations § Gr Graph: ph: 2B pins, 1B boards, 20B edges

Ju Jure Leskovec, Stanford University 23 23

Q

Pins Boars

slide-24
SLIDE 24

Pinterest Graph

§ Graph is dynamic: need to apply to new nodes without model retraining § Rich node features: content, image

Ju Jure Leskovec, Stanford University 24 24

Q

slide-25
SLIDE 25

Task: Item-Item Recs

Related Pin recommendations § Given user is looking at pin Q, what pin X are they going to save next:

Ju Jure Leskovec, Stanford University 25 25

Qu Query Po Positive ve Ha Hard d ne negative Rnd

  • Rnd. ne

negative

slide-26
SLIDE 26

GraphSAGE Training

§ Leverage inductive capability, and train on individual subgraphs

§ 300 million nodes, 1 billion edges, 1.2 billion pin pairs (Q, (Q, X)

§ Large batch size: 2048 per minibatch

Ju Jure Leskovec, Stanford University 26 26

slide-27
SLIDE 27

GraphSAGE: Inference

§ Use MapReduce for model inference § Avoids repeated computation

Ju Jure Leskovec, Stanford University 27 27

slide-28
SLIDE 28

Experiments

Related Pin recommendations § Given user is looking at pin Q, predict what pin X are they going to save next § Ba Baselin lines fo for comparis ison

§ Vi Visual: VGG-16 visual features § An Annotation: Word2Vec model § Co Comb mbine ned: combine visual and annotation § RW RW: Random-walk based algorithm § Gr GraphS phSAGE GE

§ Se Setup: Embed 2B pins, perform nearest neighbor to generate recommendations

Ju Jure Leskovec, Stanford University 28 28

slide-29
SLIDE 29

Results: Ranking

Ta Task: Given Q, rank X as high as possible among 2B 2B pins

§ Hit-rate: Pct. P was among top-k § MRR: Mean reciprocal rank

Ju Jure Leskovec, Stanford University 29 29

Method Hit-rate MRR Visual 17% 0.23 Annotation 14% 0.19 Combined 27% 0.37 GraphSAGE 46% 0.56

slide-30
SLIDE 30

Example Recommendations

Ju Jure Leskovec, Stanford University 30 30

GS

slide-31
SLIDE 31

GraphSAGE: Summary

§ Graph Convolution Networks

§ Generalize beyond simple convolutions

§ Fuses node features & graph info

§ State-of-the-art accuracy for node classification and link prediction.

§ Model size independent of graph size; can scale to billions of nodes

§ Largest embedding to date (3B nodes, 20B edges)

§ Leads to significant performance gains

Ju Jure Leskovec, Stanford University 31 31

slide-32
SLIDE 32

Ju Jure Leskovec, Stanford University 32 32

How can this technology be used for biomedical problems?

§ Tw Two examples:

§ Pa Pair irs of

  • f nod
  • des: Predicting side-effects
  • f drug combinations

§ Su Subgr graph predic iction ion: Predicting which drug treats what disease

Modeling polypharmacy side effects with graph convolutional networks. M. Zitnik, M. Agrawal, J. Leskovec. BioArxiv, 2017.

slide-33
SLIDE 33

Polypharmacy Side Effects

,

Patient’s side effects Patient’s medications

Polypharmacy side effect Drug combination

slide-34
SLIDE 34

Polypharmacy Side Effects

,

Patient’s side effects Patient’s medications

Polypharmacy side effect

s

§ Polypharmacy is common to treat complex diseases and co-existing conditions § High risk of side effects due to interactions § 15% 15% of the U.S. population affected § Annual costs exceed $177 $177 billion

  • n

§ Difficult to identify manually:

§ Rare, occur only in a subset of patients § Not observed in clinical testing

slide-35
SLIDE 35

Network & Indications Data

§ Id Idea: Construct a heterogeneous graph of drugs and proteins § Train: Fit a model to predict known associations of drug pairs and side effects § Test: Given a query drug pair, predict candidate polypharmacy side effects

Da Data ta:

§ Pr Prot

  • tein-protein interaction network

k [Menche et al. Science 15]

§ 19K nodes, 350K edges

§ Dr Drug ug-pr prote tein an and di disease-pr prote tein links ks:

§ 9k proteins, 800k drug-protein links

§ Dr Drug ug sid ide effe fects: SIDER, OFFSIDES, TWOSIDES

35 35 Ju Jure Leskovec, Stanford University

slide-36
SLIDE 36

Heterogeneous Graph

slide-37
SLIDE 37

Link Prediction Task

§ Predict labeled edges between drugs § Given a drug pair (", $), predict how likely an edge (", '

(, $) exists

§ Meaning: Drug combination (", $) leads to polypharmacy side effect '

(

slide-38
SLIDE 38

Neural Architecture: Encoder

Graph encoder:

§ Input: graph, additional node features § Output: node embeddings

slide-39
SLIDE 39

Neural Architecture: Decoder

Graph decoder:

§ Input: Query drug pairs and their embeddings § Output: predicted links

slide-40
SLIDE 40

Prediction Performance

§ Up to 54% improvement over baselines § First time to computationally identify side effects of drugs

40 40

slide-41
SLIDE 41

Ju Jure Leskovec, Stanford University 41 41

How can this technology be used for biomedical problems?

§ Tw Two examples:

§ Pa Pair irs of

  • f nod
  • des: Predicting side-effects
  • f drug combinations

§ Su Subgr graph predic iction ion: Predicting which drug treats what disease

slide-42
SLIDE 42

Prediction Problem

Go Goal: Pr Predict which diseases a new drug g (m (molecule) c ) could t treat

Graph convolutional drug repurposing

42 42 Ju Jure Leskovec, Stanford University

slide-43
SLIDE 43

Insight: Networks

§ Subgraphs of disease-associated proteins § Subgraphs of drug target proteins

43 43 Ju Jure Leskovec, Stanford University

slide-44
SLIDE 44

A Rationale for Graphs

A drug is likely to treat a disease if they are nearby in “pharmacological space”

44 44

[Menche et al. Science 2015; Guney et al. Nat Commun 2016; Hodos et al. Systems Biology and Medicine 2016]

Ju Jure Leskovec, Stanford University

slide-45
SLIDE 45

Link Prediction on Subgraphs

§ Drug repurposing: Link prediction problem on subgraphs § Predict new indications:

§ Obtain subgraphs by projecting drug and disease on the graph § Predict links between subgraphs

45 45 Ju Jure Leskovec, Stanford University

slide-46
SLIDE 46

SUGAR: Message Passing

Embedding for subgraph !:

46 46 Ju Jure Leskovec, Stanford University

slide-47
SLIDE 47

Neural Network Model

47 47 Ju Jure Leskovec, Stanford University

slide-48
SLIDE 48

Network & Indications Data

§ Pr Protein-protein interaction network k culled from 15 knowledge databases [Menche et al. Science 15]

§ 19K nodes, 350K edges

§ Dr Drug-pr prot

  • tein an

and di disease-pr prot

  • tein links

ks:

§ DrugBank, OMIM, DisGeNET, STITCH DB and others § 5K drugs, 20K diseases § 20K drug-protein links, 560K disease-protein links

§ Dr Drug medic ical l in indic icatio ions:

§ DrugBank, MEDI-HPS, DailyMed, RepoDB and others § 6K 6K drug-disease indications

§ Side information: Molecular pathways, disease symptoms, side effects

48 48 Ju Jure Leskovec, Stanford University

slide-49
SLIDE 49

Experimental Setup

§ Disease-centric cross-validation § For each cross-validation fold:

§ Exclude all indications of test diseases § Use the remaining data to train a model

§ Query: Given a disease, rank all drugs based on scores returned by the model

49 49 Ju Jure Leskovec, Stanford University

slide-50
SLIDE 50

Experimental Results

50 50

Co Comp mpariso son to current st state of the art: § Up to 49% improvement over methods for drug repurposing § Up to 172% improvement over methods for scoring drug-disease pairs

Ju Jure Leskovec, Stanford University

slide-51
SLIDE 51

Integrating Side Information

Including additional biomedical knowledge:

51 51

ge genetics mo molecular pa pathways me metabolic pa pathways

Ju Jure Leskovec, Stanford University

slide-52
SLIDE 52

Drug Repurposing @ SPARK

52 52

Dr Drug Dis Disease

N-acetyl-cysteine cystic fibrosis Rank: 14/5000 Xamoterol neurodegeneration Rank: 26/5000 Plerixafor cancer Rank: 54/5000 Sodium selenite cancer Rank: 36/5000 Ebs Ebselen C C difficile Ra Rank: 10/5000 Itraconazole cancer Rank: 26/5000 Bestatin lymphedema Rank: 11/5000 Bestatin pulmonary arterial hypertension Rank: 16/5000 Ketaprofen lymphedema Rank: 28/5000 Sildenafil lymphatic malformation Rank: 26/5000 Tacrolimus pulmonary arterial hypertension Rank: 46/5000 Benzamil psoriasis Rank: 114/5000 Carvedilol Chagas’ disease Rank: 9/5000 Benserazide BRCA1 cancer Rank: 41/5000 Pioglitazone interstitial cystitis Rank: 13/5000 Sirolimus dystrophic epidermolysis bullosa Rank: 46/5000

Given C difficile, where does Ebselen rank among all approved drugs?

Ju Jure Leskovec, Stanford University

slide-53
SLIDE 53

SUGAR’s Predictions

Dr Drug Dis Disease

N-acetyl-cysteine cystic fibrosis Rank: 14/5000 Xamoterol neurodegeneration Rank: 26/5000 Plerixafor cancer Rank: 54/5000 Sodium selenite cancer Rank: 36/5000 Ebs Ebselen C C difficile Ra Rank: 10/5000 Itraconazole cancer Rank: 26/5000 Bestatin lymphedema Rank: 11/5000 Bestatin pulmonary arterial hypertension Rank: 16/5000 Ketaprofen lymphedema Rank: 28/5000 Sildenafil lymphatic malformation Rank: 26/5000 Tacrolimus pulmonary arterial hypertension Rank: 46/5000 Benzamil psoriasis Rank: 114/5000 Carvedilol Chagas’ disease Rank: 9/5000 Benserazide BRCA1 cancer Rank: 41/5000 Pioglitazone interstitial cystitis Rank: 13/5000 Sirolimus dystrophic epidermolysis bullosa Rank: 46/5000

53 53

Higher rank is better Example: SUGAR predicted Ebselen as 10th most likely candidate drug for C difficile

Ju Jure Leskovec, Stanford University

slide-54
SLIDE 54

Conclusion

Res Results from the e pas ast 1-2 2 years have show

  • wn:

§ Representation learning paradigm can be extended to graphs § No feature engineering necessary § Can effectively combine node attribute data with the network information § State-of-the-art results in a number of domains/tasks § Use end-to-end training instead of multi-stage approaches for better performance

Ju Jure Leskovec, Stanford University 54 54

slide-55
SLIDE 55

Conclusion

Ne Next t ste teps: § Multimodal & dynamic/evolving settings § Domain-specific adaptations (e.g. for recommender systems) § Graph generation § Prediction beyond simple parwise edges

§ Multi-hop edge prediction

§ Theory

Ju Jure Leskovec, Stanford University 55 55

slide-56
SLIDE 56

56 56

PhD Students Post-Doctoral Fellows Funding Collaborators Industry Partnerships

Claire Donnat Mitchell Gordon David Hallac Emma Pierson Himabindu Lakkaraju Rex Ying Tim Althoff Will Hamilton David Jurgens Marinka Zitnik Michele Catasta Srijan Kumar Stephen Bach Rok Sosic

Research Staff

Peter Kacin Dan Jurafsky, Linguistics, Stanford University Christian Danescu-Miculescu-Mizil, Information Science, Cornell University Stephen Boyd, Electrical Engineering, Stanford University David Gleich, Computer Science, Purdue University VS Subrahmanian, Computer Science, University of Maryland Sarah Kunz, Medicine, Harvard University Russ Altman, Medicine, Stanford University Jochen Profit, Medicine, Stanford University Eric Horvitz, Microsoft Research Jon Kleinberg, Computer Science, Cornell University Sendhill Mullainathan, Economics, Harvard University Scott Delp, Bioengineering, Stanford University Jens Ludwig, Harris Public Policy, University of Chicago Geet Sethi

Ju Jure Leskovec, Stanford University

slide-57
SLIDE 57

References

§ node2vec: Scalable Feature Learning for Networks

  • A. Grover, J. Leskovec. KDD 2016.

§ Predicting multicellular function through multi-layer tissue

  • networks. M. Zitnik, J. Leskovec. Bioinformatics, 2017.

§ Inductive Representation Learning on Large Graphs.

  • W. Hamilton, R. Ying, J. Leskovec. NIPS 2017

§ Representation Learning on Graphs: Methods and Applications.

  • W. Hamilton, R. Ying, J. Leskovec.

IEEE Data Engineering Bulletin, 2017. § Modeling polypharmacy side effects with graph convolutional

  • networks. M. Zitnik, M. Agrawal, J. Leskovec. BioArxiv, 2017.

§ Code:

§ http://snap.stanford.edu/node2vec § http://snap.stanford.edu/graphsage

Ju Jure Leskovec, Stanford University 57 57