Graph Representation Learning: Embedding, GNNs, and Pre-Training - PowerPoint PPT Presentation

Graph Representation Learning: Embedding, GNNs, and Pre-Training Yuxiao Dong https://ericdongyx.github.io/ Microsoft Research, Redmond

Joint Work with Jiezhong Qiu Ziniu Hu Hongxia Yang Jing Zhang Tsinghua UCLA Alibaba Renmin U. of China (Jie Tang) (Yizhou Sun) Jie Tang Yizhou Sun Hao Ma Kuansan Wang Tsinghua UCLA Facebook AI Microsoft Research

Why Graphs?

Graphs Office/ Of ice/So Social cial Gr Graph aph Bi Biol ologi ogical Ne Neura ral Networks ks Acad Academ emic Gr Graph aph Knowledge Graph Kno In Internet Transp Tr sportation figure credit: Web

The Graph Mining Paradigm 𝑦 !" : node 𝑤 ! ’s 𝑘 #$ feature, e.g., 𝑤 ! ’s pagerank value Graph & Network applications • Node classification • Link prediction X • Community detection • Anomaly detection • Social influence • Graph evolution hand-crafted feature matrix • … … feature engineering machine learning models Structural Diversity and Homophily: A Study Across More Than One Hundred Big Networks. KDD 2017.

Graph Representation Learning Graph & Network applications • Node classification • Link prediction Z • Community detection • Anomaly detection • Social influence • Graph evolution hand-crafted latent feature matrix • … … machine learning models Feature engineering learning Input: a network 𝐻 = (𝑊, 𝐹) • Output: 𝒂 ∈ 𝑆 ! ×# , 𝑙 ≪ |𝑊| , 𝑙 -dim vector 𝒂 $ for each node v . •

Application: Embedding Heterogeneous Academic Graph Graph Representation Learning Academic Graph 1. https://academic.microsoft.com/ 2. Kuansan Wang et al. Microsoft Academic Graph: When experts are not enough. Quantitative Science Studies 1 (1), 396-413, 2020 3. Dong et al. metapath2vec: scalable representation learning for heterogeneous networks. In KDD 2017. 4. Code & data for metapath2vec: https://ericdongyx.github.io/metapath2vec/m2v.html

Application: Similarity Search & Recommendation Johns Hopkins Harvard Stanford UChicago Yale Columbia 1. https://academic.microsoft.com/ 2. Kuansan Wang et al. Microsoft Academic Graph: When experts are not enough. Quantitative Science Studies 1 (1), 396-413, 2020 3. Dong et al. metapath2vec: scalable representation learning for heterogeneous networks. In KDD 2017. 4. Code & data for metapath2vec: https://ericdongyx.github.io/metapath2vec/m2v.html

Application: Reasoning about Diabetes from MAG Symptom Cause Treatment

Application: Reasoning about COVID-19 from MAG SARS-CoV-2 Oseltamivir Wasting Asymptomatic Diarrhea Lamivudine Coronavirus Azithromycin COVID-19 Antiviral drug MERS Zika Virus Rash Post-exposure prophylaxis Abdominal pain Ebola Virus Symptom Cause Treatment

Graph Representation Learning Network Embedding Matrix Factorization GNNs Pre-Training

Network Embedding Feature learning 𝑤 !"# 𝑤 !"$ 𝑤 ! 𝑤 !%$ 𝑤 !%# Sequences of objects Skip-Gram Words in Text • Nodes in graphs • 1. Mikolov, et al. Efficient estimation of word representations in vector space. In ICLR 2013 . 2. Perozzi et al. DeepWalk: Online learning of social representations. In KDD’ 14 , pp. 701–710.

Distributional Hypothesis of Harris • Word embedding : words in similar contexts have similar meanings (e.g., skip-gram in word embedding) • Node embedding : nodes in similar structural contexts are similar • DeepWalk: structural contexts are defined by co-occurrence over random walk paths Harris, Z. (1954). Distributional structure. Word , 10(23): 146-162.

hide The Objective 3 𝒜 4 ) exp(𝒜 2 ℒ = # # −log(𝑄(𝑑|𝑤)) 𝑞 𝑑 𝑤 = 3 𝒜 5 ) ∑ 5∈6 exp(𝒜 2 !∈# $∈% !" (!) ℒ à to maximize the likelihood of node co-occurrence on a random walk path % 𝒜 & à the possibility that node 𝑤 and context 𝑑 appear on a random walk path 𝒜 $

Network Embedding: Random Walk + Skip-Gram 𝑤 !"# 𝑤 !"$ 𝑤 ! 𝑤 !%$ 𝑤 !%# Radom Walk Strategies: o DeepWalk (walk length > 1) o LINE (walk length = 1) o PTE (walk length = 1) o node2vec (biased random walk) o metapath2vec (heterogeneous rw) 1. Perozzi et al. DeepWalk : Online learning of social representations. In KDD’ 14 . Most Cited Paper in KDD’14. 2. Tang et al. LINE : Large scale information network embedding. In WWW’15 . Most Cited Paper in WWW’15. Grover and Leskovec. node2vec : Scalable feature learning for networks. In KDD’16. 2 nd Most Cited Paper in KDD’16. 3. 4. Dong et al. metapath2vec : scalable representation learning for heterogeneous networks. In KDD 2017. Most Cited Paper in KDD’17.

Graph Representation Learning Network Embedding Matrix Factorization GNNs Pre-Training DeepWalk • LINE • Node2vec • PTE • … • metapath2vec •

hide NetMF: Network Embedding as Matrix Factorization • DeepWalk • LINE • PTE • node2vec 𝑩 Adjacency matrix b : #negative samples T : context window size 𝑬 Degree matrix 𝑤𝑝𝑚 𝐻 = ( ( 𝐵 !" ! " 1. Qiu et al. Network embedding as matrix factorization: unifying deepwalk, line, pte, and node2vec. In WSDM’18.

Understanding Random Walk + Skip Gram 𝑥 !"# 𝑥 !"$ 𝑥 ! 𝑥 !%$ 𝑥 !%# Skip-Gram Graph Language NLP Language • 𝐻: graph • #(w,c): co-occurrence of w & c log(#(𝒙, 𝒅)|𝒠| 𝑐#(𝑥)#(𝑑)) • 𝑩 : adjacency matrix • #(w): occurrence of word w • 𝑬: degree matrix • #(c): occurrence of context c • 𝑤𝑝𝑚 𝐻 : volume of 𝐻 • 𝒠: word−context pair (w, c) multi−set • |𝒠| : number of word-context pairs Levy and Goldberg. Neural word embeddings as implicit matrix factorization. In NIPS 2014

Understanding Random Walk + Skip Gram NLP Language Distinguish direction and distance • #(w,c): co-occurrence of w & c • #(w): occurrence of word w • #(c): occurrence of context c Partition the multiset 𝒠 into several sub-multisets • • 𝒠: word−context pair (w, c) multi−set according to the way in which each node and its • |𝒠| : number of word-context pairs context appear in a random walk node sequence. More formally, for 𝑠 = 1, 2, ⋯ , 𝑈 , we define •

Understanding Random Walk + Skip Gram the length of random walk 𝑀 → ∞

Understanding Random Walk + Skip Gram Graph Language 𝑩 Adjacency matrix 𝑬 Degree matrix 𝑤𝑝𝑚 𝐻 = ( ( 𝐵 !" ! " b : #negative samples T : context window size

Understanding Random Walk + Skip Gram 𝑥 !"# 𝑥 !"$ 𝑥 ! 𝑥 !%$ 𝑥 !%# DeepWalk is asymptotically and implicitly factorizing 𝑩 Adjacency matrix 𝑬 Degree matrix 𝑤𝑝𝑚 𝐻 = ( ( 𝐵 !" ! " b : #negative samples T : context window size 1. Qiu et al. Network embedding as matrix factorization: unifying deepwalk, line, pte, and node2vec. In WSDM’18.

Unifying DeepWalk, LINE, PTE, & node2vec as Matrix Factorization • DeepWalk • LINE • PTE • node2vec Qiu et al. Network embedding as matrix factorization: unifying deepwalk, line, pte, and node2vec. In WSDM’18. The most cited paper in WSDM’18 as of May 2019

NetMF: Explicitly Factorizing the DeepWalk Matrix Matrix 𝑥 !"# 𝑥 !"$ 𝑥 ! Factorization 𝑥 !%$ 𝑥 !%# DeepWalk is asymptotically and implicitly factorizing 1. Qiu et al. Network embedding as matrix factorization: unifying deepwalk, line, pte, and node2vec. In WSDM’18. 2. Code &data for NetMF: https://github.com/xptree/NetMF

NetMF 1. Construction 2. Factorization 𝑻 = 1. Qiu et al. Network embedding as matrix factorization: unifying deepwalk, line, pte, and node2vec. In WSDM’18. 2. Code &data for NetMF: https://github.com/xptree/NetMF

Results Explicit matrix factorization (NetMF) offers performance gains over implicit matrix factorization (DeepWalk & LINE) 1. Qiu et al. Network embedding as matrix factorization: unifying deepwalk, line, pte, and node2vec. In WSDM’18. 2. Code &data for NetMF: https://github.com/xptree/NetMF

Network Embedding Random Walk Skip Gram DeepWalk, LINE, node2vec, metapath2vec (dense) Matrix Output: 𝑻 = 𝑔(𝑩) Input: Factorization Vectors Adjacency Matrix NetMF 𝒂 𝑩 𝑔 𝑩 = 1. Qiu et al. Network embedding as matrix factorization: unifying deepwalk, line, pte, and node2vec. In WSDM’18. 2. Code &data for NetMF: https://github.com/xptree/NetMF

Challenge? 𝑻 = 𝑜 " non-zeros Dense!! Time complexity 𝑃(𝑜 # )

NetMF How can we solve this issue? 1. Construction 2. Factorization 𝑻 = 1. Qiu et al. NetSMF: Network embedding as sparse matrix factorization. In WWW 2019. 2. Code & data for NetSMF: https://github.com/xptree/NetSMF

NetSMF--Sparse How can we solve this issue? 1. Sparse Construction 2. Sparse Factorization 𝑻 = 1. Qiu et al. NetSMF: Network embedding as sparse matrix factorization. In WWW 2019. 2. Code & data for NetSMF: https://github.com/xptree/NetSMF

Graph Representation Learning: Embedding, GNNs, and Pre-Training - PowerPoint PPT Presentation

Graph Representation Learning: Embedding, GNNs, and Pre-Training Yuxiao Dong https://ericdongyx.github.io/ Microsoft Research, Redmond Joint Work with Jiezhong Qiu Ziniu Hu Hongxia Yang Jing Zhang Tsinghua UCLA Alibaba Renmin U. of

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

8.3 GRAPH REPRESENTATIONS AND GRAPH ISOMORPHISM INCIDENCE TABLE REPRESENTATION def: An incidence

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Graph: representation and traversal CISC4080, Computer Algorithms CIS, Fordham Univ.

Graph Representation and Traversals Mark Redekopp David Kempe Sandra Batista 2 GRAPH

Graph Indexing: Tree + Delta Delta >= Graph >= Graph Graph Indexing: Tree + Peixian Zhao,

Graph Mining Marco Serafini COMPSCI 532 Lecture 11 Classes of Graph Systems Graph

Stable and Efficient Representation Learning with Nonnegativity Constraints Tsung-Han Lin and

Unsupervised Language Learning: Representation Learning for NLP Katia Shutova ILLC University

K K Knowledge Knowledge l d l d Representation Representation Representation

Graph-based Clustering Transform the data into a graph representation Vertices are the

Mria Markoov Graph definition Degree, in, out degree, oriented graph.

Graph Algorithms What is a graph? V - vertices E V x V - edges directed / undirected

Graph- -based Learning based Learning Graph Larry Holder Larry Holder Computer Science and

Graph Representation Learning with Graph Convolutional Networks Jure Leskovec Networks: Common

Graph and Knowledge Graph Representation Learning Prof. Srijan Kumar

KDIGO Clinical Practice Guidelines R EASONS FOR G LOBAL G UIDELINES Keeping up with new

Elevated LDL-C Cardiovascular Pathobiology MEDX Dallas/Fort Worth November 17, 2018 THE

Keynote Address Chronic Kidney Disease; Global and Sri Lankan Perspectives Dr Shanthi Mendis

2018 2018 AH AHA/ A/ACC/AA AACVPR/AAP AAPA/ A/AB ABC/ACPM/AD ADA/ A/AGS/AP APhA/A /ASPC

Children and Young Adults in Poverty: A look by Race and Geography November 17, 2015

1-855-337-6227 www.marylandMACS.org Substance Use Disorders in Older Adults Mi Mich chae ael

Addressing Substance Use Disorder During & After Pregnancy Donna Gorman, LCMFT, LCAC May

Relative Scope of the Problem: Opioids versus Alcohol Courtesy of Dr. Aaron White, NIAAA Alcohol

Graph Representation Learning: Embedding, GNNs, and Pre-Training - PowerPoint PPT Presentation

Graph Representation Learning: Embedding, GNNs, and Pre-Training Yuxiao Dong https://ericdongyx.github.io/ Microsoft Research, Redmond Joint Work with Jiezhong Qiu Ziniu Hu Hongxia Yang Jing Zhang Tsinghua UCLA Alibaba Renmin U. of

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

8.3 GRAPH REPRESENTATIONS AND GRAPH ISOMORPHISM INCIDENCE TABLE REPRESENTATION def: An incidence

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Graph: representation and traversal CISC4080, Computer Algorithms CIS, Fordham Univ.

Graph Representation and Traversals Mark Redekopp David Kempe Sandra Batista 2 GRAPH

Graph Indexing: Tree + Delta Delta &gt;= Graph &gt;= Graph Graph Indexing: Tree + Peixian Zhao,

Graph Mining Marco Serafini COMPSCI 532 Lecture 11 Classes of Graph Systems Graph

Stable and Efficient Representation Learning with Nonnegativity Constraints Tsung-Han Lin and

Unsupervised Language Learning: Representation Learning for NLP Katia Shutova ILLC University

K K Knowledge Knowledge l d l d Representation Representation Representation

Graph-based Clustering Transform the data into a graph representation Vertices are the

Mria Markoov Graph definition Degree, in, out degree, oriented graph.

Graph Algorithms What is a graph? V - vertices E V x V - edges directed / undirected

Graph- -based Learning based Learning Graph Larry Holder Larry Holder Computer Science and

Graph Representation Learning with Graph Convolutional Networks Jure Leskovec Networks: Common

Graph and Knowledge Graph Representation Learning Prof. Srijan Kumar

KDIGO Clinical Practice Guidelines R EASONS FOR G LOBAL G UIDELINES Keeping up with new

Elevated LDL-C Cardiovascular Pathobiology MEDX Dallas/Fort Worth November 17, 2018 THE

Keynote Address Chronic Kidney Disease; Global and Sri Lankan Perspectives Dr Shanthi Mendis

2018 2018 AH AHA/ A/ACC/AA AACVPR/AAP AAPA/ A/AB ABC/ACPM/AD ADA/ A/AGS/AP APhA/A /ASPC

Children and Young Adults in Poverty: A look by Race and Geography November 17, 2015

1-855-337-6227 www.marylandMACS.org Substance Use Disorders in Older Adults Mi Mich chae ael

Addressing Substance Use Disorder During &amp; After Pregnancy Donna Gorman, LCMFT, LCAC May

Relative Scope of the Problem: Opioids versus Alcohol Courtesy of Dr. Aaron White, NIAAA Alcohol

Graph Indexing: Tree + Delta Delta >= Graph >= Graph Graph Indexing: Tree + Peixian Zhao,

Addressing Substance Use Disorder During & After Pregnancy Donna Gorman, LCMFT, LCAC May