Graph Representation Learning:
Embedding, GNNs, and Pre-Training
Yuxiao Dong
https://ericdongyx.github.io/
Microsoft Research, Redmond
Graph Representation Learning: Embedding, GNNs, and Pre-Training - - PowerPoint PPT Presentation
Graph Representation Learning: Embedding, GNNs, and Pre-Training Yuxiao Dong https://ericdongyx.github.io/ Microsoft Research, Redmond Joint Work with Jiezhong Qiu Ziniu Hu Hongxia Yang Jing Zhang Tsinghua UCLA Alibaba Renmin U. of
Yuxiao Dong
https://ericdongyx.github.io/
Microsoft Research, Redmond
Jiezhong Qiu Tsinghua (Jie Tang) Jie Tang Tsinghua Yizhou Sun UCLA Ziniu Hu UCLA (Yizhou Sun) Hongxia Yang Alibaba Hao Ma Facebook AI Kuansan Wang Microsoft Research Jing Zhang Renmin U. of China
Of Office/ ice/So Social cial Gr Graph aph In Internet Kno Knowledge Graph Bi Biol
Neura ral Networks ks Tr Transp sportation
figure credit: Web
Acad Academ emic Gr Graph aph
hand-crafted feature matrix
feature engineering
X
π¦!": node π€!βs π#$ feature, e.g., π€!βs pagerank value
machine learning models
Graph & Network applications
Structural Diversity and Homophily: A Study Across More Than One Hundred Big Networks. KDD 2017.
hand-crafted latent feature matrix
Feature engineering learning
Z machine learning models
Graph & Network applications
Academic Graph
Graph Representation Learning
1. https://academic.microsoft.com/ 2. Kuansan Wang et al. Microsoft Academic Graph: When experts are not enough. Quantitative Science Studies 1 (1), 396-413, 2020 3. Dong et al. metapath2vec: scalable representation learning for heterogeneous networks. In KDD 2017. 4. Code & data for metapath2vec: https://ericdongyx.github.io/metapath2vec/m2v.html
Harvard Stanford Columbia Yale UChicago Johns Hopkins
1. https://academic.microsoft.com/ 2. Kuansan Wang et al. Microsoft Academic Graph: When experts are not enough. Quantitative Science Studies 1 (1), 396-413, 2020 3. Dong et al. metapath2vec: scalable representation learning for heterogeneous networks. In KDD 2017. 4. Code & data for metapath2vec: https://ericdongyx.github.io/metapath2vec/m2v.html
Cause Symptom Treatment
SARS-CoV-2 COVID-19 Antiviral drug Azithromycin Rash Zika Virus MERS Lamivudine Ebola Virus Wasting Oseltamivir Post-exposure prophylaxis Asymptomatic Abdominal pain Diarrhea Coronavirus
Cause Symptom Treatment
Network Embedding Matrix Factorization Pre-Training GNNs
Feature learning
1. Mikolov, et al. Efficient estimation of word representations in vector space. In ICLR 2013. 2. Perozzi et al. DeepWalk: Online learning of social representations. In KDDβ 14, pp. 701β710.
π€! π€!"# π€!"$ π€!%$ π€!%#
Sequences of objects
Skip-Gram
meanings (e.g., skip-gram in word embedding)
walk paths
Harris, Z. (1954). Distributional structure. Word, 10(23): 146-162.
β Γ to maximize the likelihood of node co-occurrence on a random walk path π$
%π& Γ the possibility that node π€ and context π appear on a random walk path
!β#
$β%!"(!)
π π π€ = exp(π2
3π4)
β5β6 exp(π2
3π5)
π€! π€!"# π€!"$ π€!%$ π€!%#
Radom Walk Strategies:
(walk length = 1)
(biased random walk)
1. Perozzi et al. DeepWalk: Online learning of social representations. In KDDβ 14. Most Cited Paper in KDDβ14. 2. Tang et al. LINE: Large scale information network embedding. In WWWβ15. Most Cited Paper in WWWβ15. 3. Grover and Leskovec. node2vec: Scalable feature learning for networks. In KDDβ16. 2nd Most Cited Paper in KDDβ16. 4. Dong et al. metapath2vec: scalable representation learning for heterogeneous networks. In KDD 2017. Most Cited Paper in KDDβ17.
Network Embedding Matrix Factorization Pre-Training GNNs
π€ππ π» = (
!
(
"
π΅!" π© Adjacency matrix π¬ Degree matrix b: #negative samples T: context window size
π₯! π₯!"# π₯!"$ π₯!%$ π₯!%#
log(#(π, π )|π | π#(π₯)#(π))
Levy and Goldberg. Neural word embeddings as implicit matrix factorization. In NIPS 2014
Graph Language NLP Language
Skip-Gram
according to the way in which each node and its context appear in a random walk node sequence.
Distinguish direction and distance
NLP Language
the length of random walk π β β
π€ππ π» = (
!
(
"
π΅!" π© Adjacency matrix π¬ Degree matrix b: #negative samples T: context window size Graph Language
π₯! π₯!"# π₯!"$ π₯!%$ π₯!%#
DeepWalk is asymptotically and implicitly factorizing
1. Qiu et al. Network embedding as matrix factorization: unifying deepwalk, line, pte, and node2vec. In WSDMβ18.
π€ππ π» = (
!
(
"
π΅!" π© Adjacency matrix π¬ Degree matrix b: #negative samples T: context window size
Unifying DeepWalk, LINE, PTE, & node2vec as Matrix Factorization
Qiu et al. Network embedding as matrix factorization: unifying deepwalk, line, pte, and node2vec. In WSDMβ18. The most cited paper in WSDMβ18 as of May 2019
π₯! π₯!"# π₯!"$ π₯!%$ π₯!%#
DeepWalk is asymptotically and implicitly factorizing
1. Qiu et al. Network embedding as matrix factorization: unifying deepwalk, line, pte, and node2vec. In WSDMβ18. 2. Code &data for NetMF: https://github.com/xptree/NetMF
π» =
1. Qiu et al. Network embedding as matrix factorization: unifying deepwalk, line, pte, and node2vec. In WSDMβ18. 2. Code &data for NetMF: https://github.com/xptree/NetMF
1. Qiu et al. Network embedding as matrix factorization: unifying deepwalk, line, pte, and node2vec. In WSDMβ18. 2. Code &data for NetMF: https://github.com/xptree/NetMF
Explicit matrix factorization (NetMF) offers performance gains over implicit matrix factorization (DeepWalk & LINE)
Input:
Adjacency Matrix
π©
Random Walk Skip Gram
π» = π(π©)
(dense) Matrix Factorization
Output:
Vectors
π
DeepWalk, LINE, node2vec, metapath2vec NetMF π π© =
1. Qiu et al. Network embedding as matrix factorization: unifying deepwalk, line, pte, and node2vec. In WSDMβ18. 2. Code &data for NetMF: https://github.com/xptree/NetMF
π" non-zeros Dense!! π» = Time complexity π(π#)
π» =
π» =
For random-walk matrix polynomial where and non-negative One can construct a 1 + π -spectral sparsifier 3 π΄ with non-zeros in time for undirected graphs
1. Dehua Cheng, Yu Cheng, Yan Liu, Richard Peng, and Shang-Hua Teng, Efficient Sampling for Gaussian Graphical Models via Spectral Sparsification, COLT 2015. 2. Dehua Cheng, Yu Cheng, Yan Liu, Richard Peng, and Shang-Hua Teng. Spectral sparsification of random-walk matrix polynomials. arXiv:1502.03496.
For random-walk matrix polynomial where and non-negative One can construct a 1 + π -spectral sparsifier 3 π΄ with non-zeros in time
π» =
for undirected graphs
Factorize the constructed matrix
π΅ 6 π΅
Γ 45 Billion
Effectiveness: NetSMF (sparse MF) β NetMF (explicit MF) > DeepWalk/LINE (implicit MF) Efficiency: NetSMF (sparse MF) can handle billion-scale network embedding
.
30% improvements
billion-scale graphs 100% improvements
billion-scale graphs
Input:
Adjacency Matrix
π©
Random Walk Skip Gram
π» = π(π©)
(dense) Matrix Factorization
Output:
Vectors
π
DeepWalk, LINE, node2vec, metapath2vec (sparse) Matrix Factorization
Sparsify π»
NetSMF NetMF π π© =
Incorporate network structures π© into the similarity matrix π», and then factorize π»
π> β πΈ?@π΅(π½A β 3 π) π> is the spectral filter of π = π½A β πΈ?@π΅ πΈ?@π΅(π½A β 3 π) is πΈ?@π΅ modulated by the filter in the spectrum The idea of Graph Neural Networks
20 Threads 1 Thread 19hours 98mins 10mins 1.1M nodes
ProNE offers 10-400X speedups (1 thread vs 20 threads) Embed 100,000,000 nodes by 1 thread: 29 hours with performance superiority
Embed 100,000,000 nodes by 1 thread: 29 hours with performance superiority
Input:
Adjacency Matrix
π©
Random Walk Skip Gram
π» = π(π©)
(dense) Matrix Factorization
Output:
Vectors
π
DeepWalk, LINE, node2vec, metapath2vec (sparse) Matrix Factorization
Sparsify π»
NetSMF (sparse) Matrix Factorization
π = π(πβ²)
ProNE NetMF
Factorize π©, and then incorporate network structures via spectral propagation
Network Embedding Matrix Factorization Pre-Training GNNs
a e v b d c
π2 = π(π2, πB, πC, π4, π>, πD)
β Propagation based network embedding
β Neighborhood aggregation: aggregate neighbor information and pass into a neural network
πΊ> β π¬?@π©(π±A β 3 π΄) πΊ>
a e v b d c
Neighborhood Aggregation:
selected neighbors
CNN Graph Convolution
( = π(πΏ(
)β* ! βͺ!
(,-
the neighbors of node π€ node π€βs embedding at layer π Non-linear activation function (e.g., ReLU) parameters in layer π a e v b d c
normalized Laplacian matrix
Aggregate info from neighborhood via the normalized Laplacian matrix
( = π(πΏ(
)β* !
(,-
!
(,-
a e v b d c Aggregate from π€βs neighbors Aggregate from itself
( = π(πΏ(
)β* !
(,-
!
(,-
Kipf et al. Semisupervised Classification with Graph Convolutional Networks. ICLR 2017
a e v b d c The same parameters for both its neighbors & itself
( = π(πΏ(
)β* !
(,-
!
(,-
Kipf et al. Semisupervised Classification with Graph Convolutional Networks. ICLR 2017
a e v b d c
.π©π¬,- .π° (,- πΏ (
.π±π¬,- .π° (,- πΏ (
. π© + π± π¬,- .π° (,- πΏ (
Input
Output
π°' = π
π» = (π, πΉ, π©)
Kipf et al. Semisupervised Classification with Graph Convolutional Networks. ICLR 2017
. π© + π± π¬,- .π° (,- πΏ (
Input
Output
framework with a supervised task
π°' = π
π» = (π, πΉ, π©)
Kipf et al. Semisupervised Classification with Graph Convolutional Networks. ICLR 2017
. π© + π± π¬,- .π° (,- πΏ (
π°' = π
π» = (π, πΉ, π©)
Input
Output
( = π(πΏ(
)β* ! βͺ!
(,-
Hamilton et al. Inductive Representation Learning on Large Graphs. NIPS 2017
a e v b d c
GCN GraphSage
( = π([π©( β AGG
(,-, βπ£ β π π€
(,-])
Instead of summation, it concatenates neighbor & self embeddings Generalized aggregation: any differentiable function that maps set of vectors to a single vector
Hamilton et al. Inductive Representation Learning on Large Graphs. NIPS 2017 Slide snipping from βHamiltion & Tang, AAAI 2019 Tutorial on Graph Representation Learningβ
( = π([π©( β AGG
(,-, βπ£ β π π€
(,-])
Velickovic et al. Graph Attention Networks. ICLR 2018
( = π(πΏ(
)β* ! βͺ!
(,-
GCN Graph Attention
( = π(
)β* ! βͺ!
(,-)
a e v b d c
Aggregate info from neighborhood via the learned attention Aggregate info from neighborhood via the normalized Laplacian matrix
Velickovic et al. Graph Attention Networks. ICLR 2018
a e v b d c
many ways to define attention!
heterogeneous academic graph heterogeneous office graph
1.Ziniu Hu, et al. Heterogeneous Graph Transformer. WWW 2020. 2.Code & Data for HGT: https://github.com/acbull/pyHGT
Paper Author
Write
Paper Author
Write
1.Ziniu Hu, et al. Heterogeneous Graph Transformer. WWW 2020. 2.Code & Data for HGT: https://github.com/acbull/pyHGT
Paper Author
Write
1.Ziniu Hu, et al. Heterogeneous Graph Transformer. WWW 2020. 2.Code & Data for HGT: https://github.com/acbull/pyHGT
Paper Author
Write
1.Ziniu Hu, et al. Heterogeneous Graph Transformer. WWW 2020. 2.Code & Data for HGT: https://github.com/acbull/pyHGT
1.Ziniu Hu, et al. Heterogeneous Graph Transformer. WWW 2020. 2.Code & Data for HGT: https://github.com/acbull/pyHGT
1.Ziniu Hu, et al. Heterogeneous Graph Transformer. WWW 2020. 2.Code & Data for HGT: https://github.com/acbull/pyHGT
1. Ziniu Hu, et al. Heterogeneous Graph Transformer. WWW 2020. 2. Difan Zou, et al. Layer-Dependent Importance Sampling for Training Deep and Large Graph Convolutional Networks. In NeurIPSβ19.
HGT offers ~9β21% improvements over existing (heterogeneous) GNNs
1.Ziniu Hu, et al. Heterogeneous Graph Transformer. WWW 2020. 2.Code & Data for HGT: https://github.com/acbull/pyHGT
DB + Networking + IR DM + Networking + IR + DB DB + DM ML + DB + Web + AI + NLP!!! CV + ML + AI ML + CV + DL + NLP Experiments done in 2019!
1.Ziniu Hu, et al. Heterogeneous Graph Transformer. WWW 2020. 2.Code & Data for HGT: https://github.com/acbull/pyHGT
Learn meta-paths & their weights implicitly!
1.Ziniu Hu, et al. Heterogeneous Graph Transformer. WWW 2020. 2.Code & Data for HGT: https://github.com/acbull/pyHGT
Network Embedding Matrix Factorization Pre-Training GNNs
β Recent progress of pre-training models in NLP & CV
β To pre-train from one graph β To fine-tune for unseen tasks on the same graph or graphs of the same domain.
β Model level: GNNs? β Pre-training task: self-supervised tasks on graphs?
1.Ziniu Hu et al. GPT-GNN: Generative Pre-Training of Graph Neural Networks. KDD 2020. 2.Code & data for GPT-GNN: https://github.com/acbull/GPT-GNN
Pre-Trained Model
graph pre-training task
input graph
P-Ted Model
node classification
the same input graph or graphs of the same domain
link prediction recommendation
Pre-Training Fine-Tuning
P-Ted Model P-Ted Model
? ? ? ?
1.Ziniu Hu et al. GPT-GNN: Generative Pre-Training of Graph Neural Networks. KDD 2020. 2.Code & data for GPT-GNN: https://github.com/acbull/GPT-GNN
by learning to reconstruct the input graph.
β Factorize the graph likelihood into two terms:
attribute and edge masked input graph
π
1.Ziniu Hu et al. GPT-GNN: Generative Pre-Training of Graph Neural Networks. KDD 2020. 2.Code & data for GPT-GNN: https://github.com/acbull/GPT-GNN
by learning to reconstruct the input graph.
β Factorize the graph likelihood into two terms:
attribute and edge masked input graph
π
1.Ziniu Hu et al. GPT-GNN: Generative Pre-Training of Graph Neural Networks. KDD 2020. 2.Code & data for GPT-GNN: https://github.com/acbull/GPT-GNN
GPT-GNN
attribute generation
attribute and edge masked input graph
GPT-GNN
node classification
the same input graph or graphs of the same domain
link prediction recommendation
Pre-Training Fine-Tuning
GPT-GNN GPT-GNN
? ? ? ?
edge generation
1.Ziniu Hu et al. GPT-GNN: Generative Pre-Training of Graph Neural Networks. KDD 2020. 2.Code & data for GPT-GNN: https://github.com/acbull/GPT-GNN
Pre-Train Fine-Tune
Tasks:
Base GNN model:
Heterogeneous Graph Transformer (HGT)
1.Ziniu Hu et al. GPT-GNN: Generative Pre-Training of Graph Neural Networks. KDD 2020. 2.Code & data for GPT-GNN: https://github.com/acbull/GPT-GNN
Pre-Train Fine-Tune
CS Academic Graph CS Academic Graph Med, Bio, Physics⦠CS before 2014 CS Academic Graph CS after 2014 Med, Bio, Physics⦠before 2014 CS after 2014 No Transfer: Field Transfer: Time Transfer: Time + Field Transfer:
1.Ziniu Hu et al. GPT-GNN: Generative Pre-Training of Graph Neural Networks. KDD 2020. 2.Code & data for GPT-GNN: https://github.com/acbull/GPT-GNN
1.Ziniu Hu et al. GPT-GNN: Generative Pre-Training of Graph Neural Networks. KDD 2020. 2.Code & data for GPT-GNN: https://github.com/acbull/GPT-GNN
performance of GNNs
relative performance gain of 9.1% over the base model without pre-training
help the pre-training framework
Pre-Train Fine-Tune
Tasks:
Base GNN model:
Heterogeneous Graph Transformer (HGT)
1.Ziniu Hu et al. GPT-GNN: Generative Pre-Training of Graph Neural Networks. KDD 2020. 2.Code & data for GPT-GNN: https://github.com/acbull/GPT-GNN
1.Ziniu Hu et al. GPT-GNN: Generative Pre-Training of Graph Neural Networks. KDD 2020. 2.Code & data for GPT-GNN: https://github.com/acbull/GPT-GNN
The GNN model w/o pre-training by using 100% training data VS The GNN model with pre-training by using 10-20% training data
GPT-GNN
attribute generation
attribute and edge masked input graph
GPT-GNN
node classification
the same input graph or graphs of the same domain
link prediction recommendation
Pre-Training Fine-Tuning
GPT-GNN GPT-GNN
? ? ? ?
edge generation
1.Ziniu Hu et al. GPT-GNN: Generative Pre-Training of Graph Neural Networks. KDD 2020. 2.Code & data for GPT-GNN: https://github.com/acbull/GPT-GNN
Of Office/ ice/So Social cial Gr Graph aph In Internet Kno Knowledge Graph Bi Biol
Neura ral Networks ks Tr Transp sportation
figure credit: Web
β To pre-train from some graphs β To fine-tune for unseen tasks on unseen graphs
β Model level: GNNs? β Pre-training task: self-supervised tasks across graphs?
1.Jiezhong Qiu et al. GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training. KDD 2020. 2.Code & Data for GCC: https://github.com/THUDM/GCC
Pre-Trained GNN
graph pre-training task
P-Ted GNN
node classification
Pre-Training Fine-Tuning
Facebook IMDB DBLP US-Airport
P-Ted GNN
graph classification
β¦
1.Jiezhong Qiu et al. GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training. KDD 2020. 2.Code & Data for GCC: https://github.com/THUDM/GCC
β structural similarity, it maps vertices with similar local network topologies close to each other in the vector space β transferability, it is compatible with vertices and graphs unseen by the pre-training algorithm
1.Jiezhong Qiu et al. GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training. KDD 2020. 2.Code & Data for GCC: https://github.com/THUDM/GCC
capturing the similarities between instances
1. Zhirong Wu, Yuanjun Xiong, Stella X Yu, and Dahua Lin. Unsupervised feature learning via non-parametric instance discrimination. In CVPR β18. 2. Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Momentum contrast for unsupervised visual representation learning. In CVPR β20.
Subgraph instance discrimination
1.Jiezhong Qiu et al. GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training. KDD 2020. 2.Code & Data for GCC: https://github.com/THUDM/GCC
GCC
subgraph instance discrimination
GCC
node classification
Pre-Training Fine-Tuning
Facebook IMDB DBLP US-Airport
GCC
graph classification
β¦
1.Jiezhong Qiu et al. GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training. KDD 2020. 2.Code & Data for GCC: https://github.com/THUDM/GCC
β US-Airport & AMiner academic graph
β COLLAB, RDT-B, RDT-M, & IMDB-B, IMDB-M
β AMiner academic graph
β Graph Isomorphism Network (GIN)
1.Jiezhong Qiu et al. GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training. KDD 2020. 2.Code & Data for GCC: https://github.com/THUDM/GCC
Node Classification Graph Classification Similarity Search
1.Jiezhong Qiu et al. GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training. KDD 2020. 2.Code & Data for GCC: https://github.com/THUDM/GCC
1.Jiezhong Qiu et al. GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training. KDD 2020. 2.Code & Data for GCC: https://github.com/THUDM/GCC
GCC: universal patterns?
subgraph instance discrimination
GCC
node classification
Pre-Training Fine-Tuning
Facebook IMDB DBLP US-Airport
GCC
graph classification
β¦
1.Jiezhong Qiu et al. GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training. KDD 2020. 2.Code & Data for GCC: https://github.com/THUDM/GCC
Does the pre-training of GNNs learn the universal structural patterns across networks?
Network Embedding Matrix Factorization Pre-Training GNNs
https://ogb.stanford.edu/
1. Kuansan Wang et al. Microsoft Academic Graph: When experts are not enough. Quantitative Science Studies 1 (1), 2020 2. Fanjin Zhang et al. OAG: Toward Linking Large-scale Heterogeneous Entity Graphs. KDD 2019. 3. Jie Tang et al. Arnetminer: extraction and mining of academic social networks. In KDD 2008.
1800 --- 2019 #pubs doubles every 13 years
1. Ziniu Hu et al. GPT-GNN: Generative Pre-Training of Graph Neural Networks. KDD 2020. 2. Jiezhong Qiu et al. GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training. KDD 2020. 3. Kuansan Wang et al. Microsoft Academic Graph: When experts are not enough. Quantitative Science Studies 1 (1), 396-413, 2020. 4. Weihua Hu et al. Open Graph Benchmark: Datasets for Machine Learning on Graphs. arXiv 2020. 5. Feng et al. Graph Random Neural Networks. arXiv 2020. 6. Ziniu Hu et al. Heterogeneous Graph Transformer. WWW 2020. 7. Yuxiao Dong et al. Heterogeneous Network Representation Learning. IJCAI 2020. 8. Jiezhong Qiu et al. NetSMF: Large-Scale Network Embedding as Sparse Matrix Factorization. WWW 2019. 9. Jie Zhang et al. ProNE: Fast and Scalable Network Representation Learning. IJCAI 2019. 10. Fanjing Zhang et al. OAG: Toward Linking Large-scale Heterogeneous Entity Graphs. ACM KDD 2019. 11. Xian Wu et al. Neural Tensor Decomposition. WSDM 2019. 12. Jiezhong Qiu et al. Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec. WSDM 2018. 13. Yuxiao Dong et al. metapath2vec: Scalable Representation Learning for Heterogeneous Networks. KDD 2017. 14. Perozzi et al. DeepWalk: Online learning of social representations. In KDDβ 14. 15. Tang et al. LINE: Large scale information network embedding. In WWWβ15. 16. Grover and Leskovec. node2vec: Scalable feature learning for networks. In KDDβ16. 17. Harris, Z. (1954). Distributional structure. Word, 10(23): 146-162. 18. Kipf et al. Semisupervised Classification with Graph Convolutional Networks. ICLR 2017 19. Velickovic et al. Graph Attention Networks. ICLR 2018 20. Hamilton et al. Inductive Representation Learning on Large Graphs. NeurIPS 2017 21. Defferrard et al. Convolutional Neural Networks on Graphs with Fast Locailzied Spectral Filtering. In NeurIPS 2016 22. Jacob Devlin et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL-HLT 2019. 23. Justin Gilmer, et al. Neural message passing for quantum chemistry. arXiv: 2017. 24. Kaiming He, et al. Momentum contrast for unsupervised visual representation learning. arXiv: 2019 25. Tomas Mikolov et al. Distributed representations of words and phrases and their compositionality. NeurIPS 2013. 26. Petar Velickovic et al. Deep Graph Infomax. In ICLR 19. 27. Zhen Yang et al. Understanding Negative Sampling in Graph Representation Learning. KDD 2020.
Jiezhong Qiu Tsinghua (Jie Tang) Jie Tang Tsinghua Yizhou Sun UCLA Ziniu Hu UCLA (Yizhou Sun) Hongxia Yang Alibaba Hao Ma Facebook AI Kuansan Wang Microsoft Research Jing Zhang Renmin U. of China
Papers & data & code available at https://ericdongyx.github.io/ ericdongyx@gmail.com