Representation Learning on Networks Yuxiao Dong Microsoft Research, - - PowerPoint PPT Presentation

β–Ά
representation learning on networks
SMART_READER_LITE
LIVE PREVIEW

Representation Learning on Networks Yuxiao Dong Microsoft Research, - - PowerPoint PPT Presentation

Representation Learning on Networks Yuxiao Dong Microsoft Research, Redmond Joint work with Jiezhong Qiu, Jie Zhang, Jie Tang (Tsinghua University) Hao Ma (MSR & Facebook AI) and Kuansan Wang (MSR) Networks Social networks Economic


slide-1
SLIDE 1

Representation Learning on Networks

Yuxiao Dong

Microsoft Research, Redmond Joint work with Jiezhong Qiu, Jie Zhang, Jie Tang (Tsinghua University) Hao Ma (MSR & Facebook AI) and Kuansan Wang (MSR)

slide-2
SLIDE 2

Networks

Economic networks Social networks Networks of neurons Biomedical networks Internet Information networks

Slides credit: Jure Leskovec

slide-3
SLIDE 3

hand-crafted feature matrix

feature engineering

The Network & Graph Mining Paradigm

X

Graph & network applications

  • Node label inference;
  • Link prediction;
  • User behavior… …

π‘¦π‘—π‘˜: node 𝑀𝑗’s π‘˜π‘’β„Ž feature, e.g., 𝑀𝑗’s pagerank value

machine learning models

slide-4
SLIDE 4

hand-crafted latent feature matrix

Feature engineering learning

Representation Learning for Networks

Graph & network applications

  • Node label inference;
  • Node clustering;
  • Link prediction;
  • … …

Z machine learning models

  • Input: a network 𝐻 = (π‘Š, 𝐹)
  • Output: 𝒂 ∈ 𝑆 π‘Š ×𝑙, 𝑙 β‰ͺ |π‘Š|, 𝑙-dim vector 𝒂𝑀 for each node v.
slide-5
SLIDE 5

π‘₯𝑗 π‘₯π‘—βˆ’2 π‘₯π‘—βˆ’1 π‘₯𝑗+1 π‘₯𝑗+2

Network Embedding: Random Walk + Skip-Gram

Perozzi et al. DeepWalk: Online learning of social representations. In KDD’ 14, pp. 701–710.

  • sentences in NLP
  • vertex-paths in Networks

skip-gram (word2vec)

slide-6
SLIDE 6

Random Walk Strategies

  • Random Walk

– DeepWalk (walk length > 1) – LINE (walk length = 1)

  • Biased Random Walk
  • 2nd order Random Walk

– node2vec

  • Metapath guided Random Walk

– metapath2vec

slide-7
SLIDE 7

Application: Embedding Heterogeneous Academic Graph

Microsoft Academic Graph

metapath2vec

  • https://academic.microsoft.com/
  • https://www.openacademic.ai/oag/
  • metapath2vec: scalable representation learning for heterogeneous networks. In KDD 2017.
slide-8
SLIDE 8

Application 1: Related Venues

  • https://academic.microsoft.com/
  • https://www.openacademic.ai/oag/
  • metapath2vec: scalable representation learning for heterogeneous networks. In KDD 2017.
slide-9
SLIDE 9

Harvard Stanford Columbia Yale UChicago Johns Hopkins Microsoft Google AT&T Labs MIT Facebook CMU

Application 2: Similarity Search (Institution)

  • https://academic.microsoft.com/
  • https://www.openacademic.ai/oag/
  • metapath2vec: scalable representation learning for heterogeneous networks. In KDD 2017.
slide-10
SLIDE 10

Input:

Adjacency Matrix

𝑩

Random Walk Skip Gram Output:

Vectors

𝒂

Network Embedding

  • Random Walk

– DeepWalk (walk length > 1) – LINE (walk length = 1)

  • Biased Random Walk
  • 2nd order Random Walk

– node2vec

  • Metapath guided Random Walk

– metapath2vec

slide-11
SLIDE 11

Unifying DeepWalk, LINE, PTE, & node2vec as Matrix Factorization

  • 1. Qiu et al. Network embedding as matrix factorization: unifying deepwalk, line, pte, and node2vec. In WSDM’18.
  • DeepWalk
  • LINE
  • PTE
  • node2vec

π‘€π‘π‘š 𝐻 = ෍

𝑗

෍

π‘˜

π΅π‘—π‘˜ 𝑩 Adjacency matrix 𝑬 Degree matrix b: #negative samples T: context window size

slide-12
SLIDE 12

π‘₯𝑗 π‘₯π‘—βˆ’2 π‘₯π‘—βˆ’1 π‘₯𝑗+1 π‘₯𝑗+2

log(#(𝒙, 𝒅)|𝒠| 𝑐#(π‘₯)#(𝑑))

?

𝐻 = (π‘Š, 𝐹)

  • Adjacency matrix 𝑩
  • Degree matrix 𝑬
  • Volume of 𝐻: π‘€π‘π‘š 𝐻

Levy and Goldberg. Neural word embeddings as implicit matrix factorization. In NIPS 2014

  • (π‘₯, 𝑑): co-occurrence of w & c
  • (π‘₯): occurrence of node w
  • (𝑑): occurrence of context c
  • 𝒠: nodeβˆ’context pair (w, c) multiβˆ’set
  • |𝒠|: number of node-context pairs

Understanding Random Walk + Skip Gram

slide-13
SLIDE 13

Understanding Random Walk + Skip Gram

log(#(𝒙, 𝒅)|𝒠| 𝑐#(π‘₯)#(𝑑))

  • (π‘₯, 𝑑): co-occurrence of w & c
  • (π‘₯): occurrence of node w
  • (𝑑): occurrence of context c
  • 𝒠: nodeβˆ’context pair (w, c) multiβˆ’set
  • |𝒠|: number of node-context pairs
slide-14
SLIDE 14

Understanding Random Walk + Skip Gram

  • Partition the multiset 𝒠 into several sub-multisets according to the

way in which each node and its context appear in a random walk node sequence.

  • More formally, for 𝑠 = 1, 2, β‹― , π‘ˆ, we define

Distinguish direction and distance log(#(𝒙, 𝒅)|𝒠| 𝑐#(π‘₯)#(𝑑))

  • (π‘₯, 𝑑): co-occurrence of w & c
  • (π‘₯): occurrence of node w
  • (𝑑): occurrence of context c
  • 𝒠: nodeβˆ’context pair (w, c) multiβˆ’set
  • |𝒠|: number of node-context pairs
slide-15
SLIDE 15

Understanding Random Walk + Skip Gram

the length of random walk 𝑀 β†’ ∞

  • (π‘₯, 𝑑): co-occurrence of w & c
  • 𝒠: (w, c) multiβˆ’set
slide-16
SLIDE 16

Understanding Random Walk + Skip Gram

the length of random walk 𝑀 β†’ ∞

slide-17
SLIDE 17

π‘₯𝑗 π‘₯π‘—βˆ’2 π‘₯π‘—βˆ’1 π‘₯𝑗+1 π‘₯𝑗+2

DeepWalk is asymptotically and implicitly factorizing

1. Qiu et al. Network embedding as matrix factorization: unifying deepwalk, line, pte, and node2vec. In WSDM’18.

Understanding Random Walk + Skip Gram

π‘€π‘π‘š 𝐻 = ෍

𝑗

෍

π‘˜

π΅π‘—π‘˜ 𝑩 Adjacency matrix 𝑬 Degree matrix b: #negative samples T: context window size

slide-18
SLIDE 18

Unifying DeepWalk, LINE, PTE, & node2vec as Matrix Factorization

Qiu et al. Network embedding as matrix factorization: unifying deepwalk, line, pte, and node2vec. In WSDM’18. The most cited paper in WSDM’18 as of May 2019

  • DeepWalk
  • LINE
  • PTE
  • node2vec
slide-19
SLIDE 19

NetMF: explicitly factorizing the DeepWalk matrix

π‘₯𝑗 π‘₯π‘—βˆ’2 π‘₯π‘—βˆ’1 π‘₯𝑗+1 π‘₯𝑗+2

DeepWalk is asymptotically and implicitly factorizing

1. Qiu et al. Network embedding as matrix factorization: unifying deepwalk, line, pte, and node2vec. In WSDM’18.

Matrix Factorization

slide-20
SLIDE 20

1. Construction 2. Factorization

𝑻 =

the NetMF algorithm

1. Qiu et al. Network embedding as matrix factorization: unifying deepwalk, line, pte, and node2vec. In WSDM’18.

slide-21
SLIDE 21

Results

  • Predictive performance on varying the ratio of training data;
  • The x-axis represents the ratio of labeled data (%)

1. Qiu et al. Network embedding as matrix factorization: unifying deepwalk, line, pte, and node2vec. In WSDM’18.

slide-22
SLIDE 22

Results

1. Qiu et al. Network embedding as matrix factorization: unifying deepwalk, line, pte, and node2vec. In WSDM’18.

Explicit matrix factorization (NetMF) offers performance gains over implicit matrix factorization (DeepWalk & LINE)

slide-23
SLIDE 23

Input:

Adjacency Matrix

𝑩

Random Walk Skip Gram

𝑻 = 𝑔(𝑩)

(dense) Matrix Factorization

Output:

Vectors

𝒂

Network Embedding

DeepWalk, LINE, node2vec, metapath2vec NetMF

Incorporate network structures 𝑩 into the similarity matrix 𝑻, and then factorize 𝑻

𝑔 𝑩 =

slide-24
SLIDE 24

Challenges

NetMF is not practical for very large networks 𝑻 =

slide-25
SLIDE 25

NetMF

How can we solve this issue?

1. Construction 2. Factorization

  • 1. Qiu et al. NetSMF: Network embedding as sparse matrix factorization. In WWW 2019

𝑻 =

slide-26
SLIDE 26

NetSMF--Sparse

How can we solve this issue?

1. Sparse Construction 2. Sparse Factorization

  • 1. Qiu et al. NetSMF: Network embedding as sparse matrix factorization. In WWW 2019

𝑻 =

slide-27
SLIDE 27

Sparsify 𝑻

For random-walk matrix polynomial where and non-negative One can construct a 1 + πœ— -spectral sparsifier ΰ·¨ 𝑴 with non-zeros in time for undirected graphs

  • Dehua Cheng, Yu Cheng, Yan Liu, Richard Peng, and Shang-Hua Teng, Efficient Sampling for Gaussian Graphical Models via Spectral Sparsification, COLT 2015.
  • Dehua Cheng, Yu Cheng, Yan Liu, Richard Peng, and Shang-Hua Teng. Spectral sparsification of random-walk matrix polynomials. arXiv:1502.03496.
slide-28
SLIDE 28

Sparsify 𝑻

For random-walk matrix polynomial where and non-negative One can construct a 1 + πœ— -spectral sparsifier ΰ·¨ 𝑴 with non-zeros in time

  • 1. Qiu et al. NetSMF: Network embedding as sparse matrix factorization. In WWW 2019

𝑻 =

slide-29
SLIDE 29

NetSMF --- Sparse

Factorize the constructed sparse matrix

  • 1. Qiu et al. NetSMF: Network embedding as sparse matrix factorization. In WWW 2019
slide-30
SLIDE 30

NetSMF---bounded approximation error

𝑡 ΰ·© 𝑡

  • 1. Qiu et al. NetSMF: Network embedding as sparse matrix factorization. In WWW 2019
slide-31
SLIDE 31

#non-zeros ~4.5 Quadrillion β†’ 45 Billion

  • 1. Qiu et al. NetSMF: Network embedding as sparse matrix factorization. In WWW 2019
slide-32
SLIDE 32
  • 1. Qiu et al. NetSMF: Network embedding as sparse matrix factorization. In WWW 2019
slide-33
SLIDE 33
  • 1. Qiu et al. NetSMF: Network embedding as sparse matrix factorization. In WWW 2019

Effectiveness:

  • (sparse MF)NetSMF β‰ˆ (explicit MF)NetMF > (implicit MF) DeepWalk/LINE

Efficiency:

  • Sparse MF can handle billion-scale network embedding
slide-34
SLIDE 34

Embedding Dimension?

  • 1. Qiu et al. NetSMF: Network embedding as sparse matrix factorization. In WWW 2019
slide-35
SLIDE 35

Input:

Adjacency Matrix

𝑩

Random Walk Skip Gram

𝑻 = 𝑔(𝑩)

(dense) Matrix Factorization

Output:

Vectors

𝒂

Network Embedding

DeepWalk, LINE, node2vec, metapath2vec (sparse) Matrix Factorization

Sparsify 𝑻

NetSMF NetMF 𝑔 𝑩 =

Incorporate network structures 𝑩 into the similarity matrix 𝑻, and then factorize 𝑻

slide-36
SLIDE 36

ProNE: More fast & scalable network embedding

  • 1. Zhang et al. ProNE: Fast and Scalable Network Representation Learning. In IJCAI 2019
slide-37
SLIDE 37

Embedding enhancement via spectral propagation

𝑆𝑒 ← πΈβˆ’1𝐡(π½π‘œ βˆ’ ΰ·¨ 𝑀) 𝑆𝑒 is the spectral filter of 𝑀 = π½π‘œ βˆ’ πΈβˆ’1𝐡 πΈβˆ’1𝐡(π½π‘œ βˆ’ ΰ·¨ 𝑀) is πΈβˆ’1𝐡 modulated by the filter in the spectrum

  • 1. Zhang et al. ProNE: Fast and Scalable Network Representation Learning. In IJCAI 2019

The idea of Graph Neural Networks

slide-38
SLIDE 38

Performance

20 Threads 1 Thread

ProNE offers 10-400X speedups (1 thread vs 20 threads)

19hours 98mins 10mins

  • 1. Zhang et al. ProNE: Fast and Scalable Network Representation Learning. In IJCAI 2019

ProNE embeds 100,000,000 nodes by 1 thread: 29 hours with performance superiority

slide-39
SLIDE 39

A general embedding enhancement framework

  • 1. Zhang et al. ProNE: Fast and Scalable Network Representation Learning. In IJCAI 2019
slide-40
SLIDE 40

Input:

Adjacency Matrix

𝑩

Random Walk Skip Gram

𝑻 = 𝑔(𝑩)

(dense) Matrix Factorization

Output:

Vectors

𝒂

Network Embedding

DeepWalk, LINE, node2vec, metapath2vec (sparse) Matrix Factorization

Sparsify 𝑻

NetSMF (sparse) Matrix Factorization

𝒂 = 𝑔(𝒂′)

ProNE NetMF

Factorize 𝑩, and then incorporate network structures via spectral propagation

slide-41
SLIDE 41

Input:

Adjacency Matrix

𝑩

Random Walk Skip Gram

𝑻 = 𝑔(𝑩)

(dense) Matrix Factorization

Output:

Vectors

𝒂

Network Embedding

DeepWalk, LINE, node2vec, metapath2vec (sparse) Matrix Factorization

Sparsify 𝑻

NetSMF: handle billion-scale graphs (sparse) Matrix Factorization

𝒂 = 𝑔(𝒂′)

ProNE: 10--400X speedups NetMF: theory & better accuracy

slide-42
SLIDE 42

References

  • 1. Jiezhong Qiu, Yuxiao Dong, Hao Ma, Jian Li, Chi Wang, Kuansan Wang, and Jie Tang. NetSMF: Large-

Scale Network Embedding as Sparse Matrix Factorization. WWW 2019.

  • 2. Jie Zhang, Yuxiao Dong, Yan Wang, Jie Tang, and Ming Ding. ProNE: Fast and Scalable Network

Representation Learning. IJCAI 2019.

  • 3. Jiezhong Qiu, Yuxiao Dong, Hao Ma, Jian Li, Kuansan Wang, and Jie Tang. Network Embedding as

Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec. WSDM 2018.

  • 4. Yuxiao Dong, Nitesh V. Chawla, Ananthram Swami. metapath2vec: Scalable Representation Learning for

Heterogeneous Networks. KDD 2017.

  • 5. Fanjing Zhang, et al. OAG: Toward Linking Large-scale Heterogeneous Entity Graphs. ACM KDD 2019.
  • 6. Wu, Shi, Dong, Huang, Chawla. Neural Tensor Decomposition. WSDM 2019.
slide-43
SLIDE 43

Microsoft Academic Graph

https://academic.microsoft.com as of Sep. 2019 The graph data is open!

230 million authors 25,570 Institutions 48,757 journals 4,307 conferences 664,862 fields of study 228 million papers/patents/books/preprints

1800 --- 2019

slide-44
SLIDE 44

Thank you!

Papers & data & code available at https://ericdongyx.github.io/ ericdongyx@gmail.com Joint work with Jiezhong Qiu, Jie Zhang, Jie Tang (Tsinghua University) Hao Ma (MSR & Facebook AI) and Kuansan Wang (MSR)