NetSMF: Large-Scale Network Embedding as Sparse Matrix Factorization - PowerPoint PPT Presentation

NetSMF: Large-Scale Network Embedding as Sparse Matrix Factorization Jiezhong Qiu Tsinghua University June 17, 2019 Joint work with Yuxiao Dong (MSR), Hao Ma (Facebook AI), Jian Li (IIIS, Tsinghua), Chi Wang (MSR), Kuansan Wang (MSR), and Jie Tang (DCST, Tsinghua) 1 / 32

Motivation and Problem Formulation Problem Formulation Give a network G = ( V, E ) , aim to learn a function f : V → R p to capture neighborhood similarity and community membership. Applications: ◮ link prediction ◮ community detection ◮ label classification Figure 1: A toy example (Figure from DeepWalk). 2 / 32

Two Genres of Network Embedding Algorithm ◮ Local Context Methods: ◮ LINE, DeepWalk, node2vec, metapath2vec. ◮ Usually be formulated as a skip-gram-like problem, and optimized by SGD. ◮ Global Matrix Factorization Methods. ◮ NetMF, GraRep, HOPE. ◮ Leverage global statistics of the input networks. ◮ Not necessarily a gradient-based optimization problem. ◮ Usually requires explicit construction of the matrix to be factorized. 3 / 32

Notations Consider an undirected weighted graph G = ( V, E ) , where | V | = n and | E | = m . ◮ Adjacency matrix A ∈ R n × n : + � a i,j > 0 ( i, j ) ∈ E A i,j = ( i, j ) �∈ E . 0 ◮ Degree matrix D = diag( d 1 , · · · , d n ) , where d i is the generalized degree of vertex i . ◮ Volume of the graph G : vol( G ) = � � j A i,j . i 4 / 32

Contents Revisit DeepWalk and NetMF NetSMF: Network Embedding as Sparse Matrix Factorization Experimental Results 5 / 32

DeepWalk and NetMF �� Output: Input Node Random Skip-gram G=(V,E) Embedding Walk 6 / 32

DeepWalk and NetMF �� Output: Input Node Random Skip-gram G=(V,E) Embedding Walk Levy & Goldberg (NIPS 14) b Number of negative samples #( w, c ) #( w ) Co-occurrence of w and c Occurrence of word w |D| Total number of word-context pairs #( c ) Occurrence of context c 7 / 32

DeepWalk and NetMF �� Output: Input Node Random Skip-gram G=(V,E) Embedding Walk Levy & Goldberg (NIPS 14) b Number of negative samples #( w, c ) #( w ) Co-occurrence of w and c Occurrence of word w |D| Total number of word-context pairs #( c ) Occurrence of context c 8 / 32

DeepWalk and NetMF �� Output: Input Node Random Skip-gram G=(V,E) Embedding Walk Levy & Goldberg (NIPS 14) Adjacency matrix Degree matrix b Number of negative samples 9 / 32

DeepWalk and NetMF �� Output: Input Node Random Skip-gram G=(V,E) Embedding Walk Levy & Goldberg (NIPS 14) Matrix Factorization 10 / 32

Computation Challanges of NetMF For small world networks,   � T � � r vol( G )  1    D − 1 is always a dense matrix . D − 1 A b T � �� r =1 matrix power 13 / 32

Computation Challanges of NetMF For small world networks,   � T � � r vol( G )  1    D − 1 is always a dense matrix . D − 1 A b T � �� r =1 matrix power Why? ◮ In small world networks, each pair of vertices ( i, j ) can reach each other in a small number of hops. ◮ Make the corresponding matrix entry a positive value. 13 / 32

Computation Challanges of NetMF For small world networks,   � T � � r vol( G )  1    D − 1 is always a dense matrix . D − 1 A b T � �� r =1 matrix power Why? ◮ In small world networks, each pair of vertices ( i, j ) can reach each other in a small number of hops. ◮ Make the corresponding matrix entry a positive value. Idea ◮ Sparse matrix is easier to handle. ◮ Can we achieve a matrix sparse but ‘good enough’ matrix. 13 / 32

Observation Definition For � T r =1 α r = 1 and α r non-negative, � T � � r D − 1 A L = D − α r D (1) r =1 is a T -degree random-walk matrix polynomial. Observation For α 1 = · · · = α T = 1 T : � � � � T � � � r vol( G ) 1 log ◦ D − 1 A D − 1 b T r =1 � vol( G ) � = log ◦ D − 1 ( D − L ) D − 1 b � vol( G ) � ≈ log ◦ D − 1 ( D − � L ) D − 1 b 14 / 32

Random-walk Matrix Polynomial Sparsification Theorem [CCL + 15] For random-walk matrix polynomial � � r , one can construct, in time L = D − � T D − 1 A r =1 α r D O ( T 2 mǫ − 2 log 2 n ) , a (1 + ǫ ) -spectral sparsifier, � L , with O ( n log nǫ − 2 ) non-zeros. For unweighted graphs, the time complexity can be reduced to O ( T 2 mǫ − 2 log n ) . 15 / 32

NetSMF—Algorithm The proposed NetSMF algorithm consists of three steps: ◮ Construct a random walk matrix polynomial sparsifier, � L , by calling PathSampling algorithm proposed in [CCL + 15]. ◮ Construct a NetMF matrix sparsifier. � vol( G ) � D − 1 ( D − � trunc log ◦ L ) D − 1 b ◮ Truncated randomized singular value decomposition. Detailed Algorithm 16 / 32

Algorithm Details PathSampling: ◮ Sample an edge ( u, v ) from edge set. ◮ Start very short random walk from u and arrive u ′ . ◮ Start very short random walk from v and arrive v ′ . ◮ Record vertex pair ( u ′ , v ′ ) . Randomized SVD: ◮ Project origin matrix to low dimensional space by Gaussian random matrix. ◮ Deal with the projected small matrix. 17 / 32

NetSMF — System Design Figure 2: The System Design of NetSMF. 18 / 32

Setup Label Classification: ◮ BlogCatelog, PPI, Flickr, YouTube, OAG. ◮ Logistic Regression ◮ NetSMF ( T = 10 ), NetMF ( T = 10) , DeepWalk, LINE. Table 1: Statistics of Datasets. Dataset BlogCatalog PPI Flickr YouTube OAG | V | 10,312 3,890 80,513 1,138,499 67,768,244 | E | 333,983 76,584 5,899,882 2,990,443 895,368,962 #Labels 39 50 195 47 19 20 / 32

Experimental Results DeepWalk LINE node2vec NetMF NetSMF BlogCatalog PPI Flickr YouTube OAG 50 30 45 50 50 45 25 40 45 45 Micro-F1 (%) 40 20 35 40 40 35 15 30 35 35 30 10 25 30 30 25 5 20 25 25 20 0 15 20 20 40 30 30 45 30 35 25 25 40 25 Macro-F1 (%) 30 20 20 35 20 25 15 15 30 15 20 10 10 25 10 15 5 5 20 5 10 0 0 15 0 25 50 75 25 50 75 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 Training Ratio (%) Figure 3: Predictive performance on varying the ratio of training data. The x-axis represents the ratio of labeled data (%), and the y-axis in the top and bottom rows denote the Micro-F1 and Macro-F1 scores respectively. 21 / 32

Running Time Table 2: Running Time LINE DeepWalk node2vec NetMF NetSMF BlogCatalog 40 mins 12 mins 56 mins 2 mins 13 mins PPI 41 mins 4 mins 4 mins 16 secs 10 secs Flickr 42 mins 2.2 hours 21 hours 2 hours 48 mins YouTube 46 mins 1 day 4 days × 4.1 hours OAG 2.6 hours – – × 24 hours 22 / 32

Conclusion and Future Work We propose NetSMF, a scalable, efficient, and effective network embedding algorithm. Future Work ◮ A distributed-memory implementation. ◮ Extension to directed, dynamic, heterogeneous graphs. 23 / 32

Thanks. ◮ Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec (WSDM ’18) ◮ NetSMF: Network Embedding as Sparse Matrix Factorization (WebConf ’19) Code for NetMF available at github.com/xptree/NetMF Code for NetSMF available at github.com/xptree/NetSMF Q&A 24 / 32

On the Large-dimensionality Assumption of [LG14] Recall the objective of skip-gram model: X , Y L ( X , Y ) min where � #( w, c ) � � � w y c ) + b #( w ) #( c ) log g ( x ⊤ |D| log g ( − x ⊤ L ( X , Y ) = |D| w y c ) |D| |D| w c Theorem For DeepWalk, when the length of random walk L → ∞ , � � T � #( w, c ) → 1 d w d c p vol( G ) ( P r ) w,c + vol( G ) ( P r ) c,w . |D| 2 T r =1 #( w ) vol( G ) and #( c ) d w d c p p → → vol( G ) . |D| |D| Back 25 / 32

NetSMF — Approximation Error Denote M = D − 1 ( D − L ) D − 1 in � vol( G ) � D − 1 ( D − � trunc log ◦ L ) D − 1 , b and � M to be its sparsifier the we constructed. Theorem The singular value of � M − M satisfies 4 ǫ σ i ( � √ d i d min M − M ) ≤ , ∀ i ∈ [ n ] . Theorem Let �·� F be the matrix Frobenius norm. Then � � � vol( G ) � � vol( G ) �� n � � � ≤ 4 ǫ vol( G ) 1 � � � � � trunc log ◦ − trunc log ◦ b √ d min M M . � b b d i F i =1 26 / 32

Spectrally Similar Definition Suppose G = ( V, E, A ) and � G = ( V, � E, � A ) are two weighted undirected networks. Let L = D G − A and � G − � L = D � A be their Laplacian matrices, respectively. We define G and � G are (1 + ǫ ) -spectrally similar if ∀ x ∈ R n , (1 − ǫ ) · x ⊤ � Lx ≤ x ⊤ Lx ≤ (1 + ǫ ) · x ⊤ � Lx . 27 / 32

NetSMF: Large-Scale Network Embedding as Sparse Matrix Factorization - PowerPoint PPT Presentation

NetSMF: Large-Scale Network Embedding as Sparse Matrix Factorization Jiezhong Qiu Tsinghua University June 17, 2019 Joint work with Yuxiao Dong (MSR), Hao Ma (Facebook AI), Jian Li (IIIS, Tsinghua), Chi Wang (MSR), Kuansan Wang (MSR), and Jie

Greedy embedding of a graph Greedy embedding of a graph 99 Greedy embedding Greedy embedding

A large-scale International IPv6 Network A large-scale International IPv6 Network www.6net.org

Large-Scale Clustering through Functional NCut Embedding Embedding Experiments Summary

High-performance and Memory-saving Sparse General Matrix-Matrix Multiplication for Pascal GPU

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

Exploiting Matrix Reuse and Data Locality in Sparse Matrix-Vector and Matrix-Transpose-Vector

Graph Drawing Embedding Embedding For a given graph G = ( V , E ) , an embedding (into R 2 )

Planarity Embedding Embedding For a given graph G = ( V , E ) , an embedding (into R 2 ) assigns

Parallel Sparse Matrix-Vector and Matrix- Transpose-Vector Multiplication using Compressed Sparse

Sparse Matrix Partitioning, Reordering and Vector Multiplication Albert-Jan Yzelman, Utrecht

[3] The Matrix What is a matrix? Traditional answer Neo: What is the Matrix? Trinity: The answer

Matrix Multiplication Matrix Multiplication via Matrix-Vector Mult Defn. If matrix A is m n

FINANCING LARGE SCALE SOLAR Large Scale Solar Conference - Sydney Gloria Chan Director, Large

A User-Friendly Hybrid Sparse Matrix Class in C++ Conrad Sanderson, Ryan R. Curtin July 19, 2018

Exploiting GPU Caches in Sparse Matrix Vector Multiplication Yusuke Nagasaka Tokyo Institute of

The Input/Output Complexity of Sparse Matrix Multiplication Rasmus Pagh, Morten St ockel IT

Eyes-free Computing Past, Present And Future T. V. Raman Google http://emacspeak.sf.net/raman

UC UC SF SF Disclosures EVAR for Rupture: Royalties and research grant Trial Data and

Using Newtons method to solve linear systems or: How I learned to stop worrying and love

Matt Herdon Product Manager / Marketing Nautel Your questions please? (if you dont see the

Focal Plane Arrays, Data Transmission, Pulsars, and Spectral Line Backends John Ford Bob Garwood

Simulations of the Structure of Magnetic Fields in Galaxy Clusters Forrest W. Glines 1,2 , Brian

Ionospheric Disturbances Observed with the VLA Low-band Ionospheric and Transient Experiment

Multinorms and Banach lattices Based on results of G.Dales, M.Polyakov, N.Laustsen, G.Pisier,