netsmf large scale network embedding as sparse matrix
play

NetSMF: Large-Scale Network Embedding as Sparse Matrix Factorization - PowerPoint PPT Presentation

NetSMF: Large-Scale Network Embedding as Sparse Matrix Factorization Jiezhong Qiu Tsinghua University June 17, 2019 Joint work with Yuxiao Dong (MSR), Hao Ma (Facebook AI), Jian Li (IIIS, Tsinghua), Chi Wang (MSR), Kuansan Wang (MSR), and Jie


  1. NetSMF: Large-Scale Network Embedding as Sparse Matrix Factorization Jiezhong Qiu Tsinghua University June 17, 2019 Joint work with Yuxiao Dong (MSR), Hao Ma (Facebook AI), Jian Li (IIIS, Tsinghua), Chi Wang (MSR), Kuansan Wang (MSR), and Jie Tang (DCST, Tsinghua) 1 / 32

  2. Motivation and Problem Formulation Problem Formulation Give a network G = ( V, E ) , aim to learn a function f : V → R p to capture neighborhood similarity and community membership. Applications: ◮ link prediction ◮ community detection ◮ label classification Figure 1: A toy example (Figure from DeepWalk). 2 / 32

  3. Two Genres of Network Embedding Algorithm ◮ Local Context Methods: ◮ LINE, DeepWalk, node2vec, metapath2vec. ◮ Usually be formulated as a skip-gram-like problem, and optimized by SGD. ◮ Global Matrix Factorization Methods. ◮ NetMF, GraRep, HOPE. ◮ Leverage global statistics of the input networks. ◮ Not necessarily a gradient-based optimization problem. ◮ Usually requires explicit construction of the matrix to be factorized. 3 / 32

  4. Notations Consider an undirected weighted graph G = ( V, E ) , where | V | = n and | E | = m . ◮ Adjacency matrix A ∈ R n × n : + � a i,j > 0 ( i, j ) ∈ E A i,j = ( i, j ) �∈ E . 0 ◮ Degree matrix D = diag( d 1 , · · · , d n ) , where d i is the generalized degree of vertex i . ◮ Volume of the graph G : vol( G ) = � � j A i,j . i 4 / 32

  5. Contents Revisit DeepWalk and NetMF NetSMF: Network Embedding as Sparse Matrix Factorization Experimental Results 5 / 32

  6. DeepWalk and NetMF �� Output: Input Node Random Skip-gram G=(V,E) Embedding Walk 6 / 32

  7. DeepWalk and NetMF �� Output: Input Node Random Skip-gram G=(V,E) Embedding Walk Levy & Goldberg (NIPS 14) b Number of negative samples #( w, c ) #( w ) Co-occurrence of w and c Occurrence of word w |D| Total number of word-context pairs #( c ) Occurrence of context c 7 / 32

  8. DeepWalk and NetMF �� Output: Input Node Random Skip-gram G=(V,E) Embedding Walk Levy & Goldberg (NIPS 14) b Number of negative samples #( w, c ) #( w ) Co-occurrence of w and c Occurrence of word w |D| Total number of word-context pairs #( c ) Occurrence of context c 8 / 32

  9. DeepWalk and NetMF �� Output: Input Node Random Skip-gram G=(V,E) Embedding Walk Levy & Goldberg (NIPS 14) Adjacency matrix Degree matrix b Number of negative samples 9 / 32

  10. DeepWalk and NetMF �� Output: Input Node Random Skip-gram G=(V,E) Embedding Walk Levy & Goldberg (NIPS 14) Matrix Factorization 10 / 32

  11. Contents Revisit DeepWalk and NetMF NetSMF: Network Embedding as Sparse Matrix Factorization Experimental Results 11 / 32

  12. Contents Revisit DeepWalk and NetMF NetSMF: Network Embedding as Sparse Matrix Factorization Experimental Results 12 / 32

  13. Computation Challanges of NetMF For small world networks,   � T � � r vol( G )  1    D − 1 is always a dense matrix . D − 1 A b T � �� � r =1 matrix power 13 / 32

  14. Computation Challanges of NetMF For small world networks,   � T � � r vol( G )  1    D − 1 is always a dense matrix . D − 1 A b T � �� � r =1 matrix power Why? ◮ In small world networks, each pair of vertices ( i, j ) can reach each other in a small number of hops. ◮ Make the corresponding matrix entry a positive value. 13 / 32

  15. Computation Challanges of NetMF For small world networks,   � T � � r vol( G )  1    D − 1 is always a dense matrix . D − 1 A b T � �� � r =1 matrix power Why? ◮ In small world networks, each pair of vertices ( i, j ) can reach each other in a small number of hops. ◮ Make the corresponding matrix entry a positive value. Idea ◮ Sparse matrix is easier to handle. ◮ Can we achieve a matrix sparse but ‘good enough’ matrix. 13 / 32

  16. Observation Definition For � T r =1 α r = 1 and α r non-negative, � T � � r D − 1 A L = D − α r D (1) r =1 is a T -degree random-walk matrix polynomial. Observation For α 1 = · · · = α T = 1 T : � � � � T � � � r vol( G ) 1 log ◦ D − 1 A D − 1 b T r =1 � vol( G ) � = log ◦ D − 1 ( D − L ) D − 1 b � vol( G ) � ≈ log ◦ D − 1 ( D − � L ) D − 1 b 14 / 32

  17. Random-walk Matrix Polynomial Sparsification Theorem [CCL + 15] For random-walk matrix polynomial � � r , one can construct, in time L = D − � T D − 1 A r =1 α r D O ( T 2 mǫ − 2 log 2 n ) , a (1 + ǫ ) -spectral sparsifier, � L , with O ( n log nǫ − 2 ) non-zeros. For unweighted graphs, the time complexity can be reduced to O ( T 2 mǫ − 2 log n ) . 15 / 32

  18. NetSMF—Algorithm The proposed NetSMF algorithm consists of three steps: ◮ Construct a random walk matrix polynomial sparsifier, � L , by calling PathSampling algorithm proposed in [CCL + 15]. ◮ Construct a NetMF matrix sparsifier. � vol( G ) � D − 1 ( D − � trunc log ◦ L ) D − 1 b ◮ Truncated randomized singular value decomposition. Detailed Algorithm 16 / 32

  19. Algorithm Details PathSampling: ◮ Sample an edge ( u, v ) from edge set. ◮ Start very short random walk from u and arrive u ′ . ◮ Start very short random walk from v and arrive v ′ . ◮ Record vertex pair ( u ′ , v ′ ) . Randomized SVD: ◮ Project origin matrix to low dimensional space by Gaussian random matrix. ◮ Deal with the projected small matrix. 17 / 32

  20. NetSMF — System Design Figure 2: The System Design of NetSMF. 18 / 32

  21. Contents Revisit DeepWalk and NetMF NetSMF: Network Embedding as Sparse Matrix Factorization Experimental Results 19 / 32

  22. Setup Label Classification: ◮ BlogCatelog, PPI, Flickr, YouTube, OAG. ◮ Logistic Regression ◮ NetSMF ( T = 10 ), NetMF ( T = 10) , DeepWalk, LINE. Table 1: Statistics of Datasets. Dataset BlogCatalog PPI Flickr YouTube OAG | V | 10,312 3,890 80,513 1,138,499 67,768,244 | E | 333,983 76,584 5,899,882 2,990,443 895,368,962 #Labels 39 50 195 47 19 20 / 32

  23. Experimental Results DeepWalk LINE node2vec NetMF NetSMF BlogCatalog PPI Flickr YouTube OAG 50 30 45 50 50 45 25 40 45 45 Micro-F1 (%) 40 20 35 40 40 35 15 30 35 35 30 10 25 30 30 25 5 20 25 25 20 0 15 20 20 40 30 30 45 30 35 25 25 40 25 Macro-F1 (%) 30 20 20 35 20 25 15 15 30 15 20 10 10 25 10 15 5 5 20 5 10 0 0 15 0 25 50 75 25 50 75 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 Training Ratio (%) Figure 3: Predictive performance on varying the ratio of training data. The x-axis represents the ratio of labeled data (%), and the y-axis in the top and bottom rows denote the Micro-F1 and Macro-F1 scores respectively. 21 / 32

  24. Running Time Table 2: Running Time LINE DeepWalk node2vec NetMF NetSMF BlogCatalog 40 mins 12 mins 56 mins 2 mins 13 mins PPI 41 mins 4 mins 4 mins 16 secs 10 secs Flickr 42 mins 2.2 hours 21 hours 2 hours 48 mins YouTube 46 mins 1 day 4 days × 4.1 hours OAG 2.6 hours – – × 24 hours 22 / 32

  25. Conclusion and Future Work We propose NetSMF, a scalable, efficient, and effective network embedding algorithm. Future Work ◮ A distributed-memory implementation. ◮ Extension to directed, dynamic, heterogeneous graphs. 23 / 32

  26. Thanks. ◮ Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec (WSDM ’18) ◮ NetSMF: Network Embedding as Sparse Matrix Factorization (WebConf ’19) Code for NetMF available at github.com/xptree/NetMF Code for NetSMF available at github.com/xptree/NetSMF Q&A 24 / 32

  27. On the Large-dimensionality Assumption of [LG14] Recall the objective of skip-gram model: X , Y L ( X , Y ) min where � #( w, c ) � � � w y c ) + b #( w ) #( c ) log g ( x ⊤ |D| log g ( − x ⊤ L ( X , Y ) = |D| w y c ) |D| |D| w c Theorem For DeepWalk, when the length of random walk L → ∞ , � � T � #( w, c ) → 1 d w d c p vol( G ) ( P r ) w,c + vol( G ) ( P r ) c,w . |D| 2 T r =1 #( w ) vol( G ) and #( c ) d w d c p p → → vol( G ) . |D| |D| Back 25 / 32

  28. NetSMF — Approximation Error Denote M = D − 1 ( D − L ) D − 1 in � vol( G ) � D − 1 ( D − � trunc log ◦ L ) D − 1 , b and � M to be its sparsifier the we constructed. Theorem The singular value of � M − M satisfies 4 ǫ σ i ( � √ d i d min M − M ) ≤ , ∀ i ∈ [ n ] . Theorem Let �·� F be the matrix Frobenius norm. Then � � � vol( G ) � � vol( G ) �� � � n � � � ≤ 4 ǫ vol( G ) 1 � � � � � trunc log ◦ − trunc log ◦ b √ d min M M . � b b d i F i =1 26 / 32

  29. Spectrally Similar Definition Suppose G = ( V, E, A ) and � G = ( V, � E, � A ) are two weighted undirected networks. Let L = D G − A and � G − � L = D � A be their Laplacian matrices, respectively. We define G and � G are (1 + ǫ ) -spectrally similar if ∀ x ∈ R n , (1 − ǫ ) · x ⊤ � Lx ≤ x ⊤ Lx ≤ (1 + ǫ ) · x ⊤ � Lx . 27 / 32

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend