spaceland embedding of sparse stochastic graphs
play

Spaceland Embedding of Sparse Stochastic Graphs IEEE High - PowerPoint PPT Presentation

Spaceland Embedding of Sparse Stochastic Graphs IEEE High Performance Extreme Computing September 25, 2019 Nikos Pitsianis 12 Alexandros-S. Iliopoulos 2 Dimitris Floros 1 Xiaobai Sun 2 1 Department of Electrical and Computer Engineering, Aristotle


  1. Spaceland Embedding of Sparse Stochastic Graphs IEEE High Performance Extreme Computing September 25, 2019 Nikos Pitsianis 12 Alexandros-S. Iliopoulos 2 Dimitris Floros 1 Xiaobai Sun 2 1 Department of Electrical and Computer Engineering, Aristotle University of Thessaloniki 2 Department of Computer Science, Duke University Pitsianis Iliopoulos Floros Sun (AUTh|Duke) Embedding of Sparse Stochastic Graphs IEEE HPEC19 | Sep 25, 2019 1 / 21

  2. Outline 1. Introduction 2. Contribution A: SG-t-SNE 3. Contribution B: SG-t-SNE- Π 4. Key references Pitsianis Iliopoulos Floros Sun (AUTh|Duke) Embedding of Sparse Stochastic Graphs IEEE HPEC19 | Sep 25, 2019 2 / 21

  3. 1. Introduction Graph embedding Precursor work Significant impact Main limitations 2. Contribution A: SG-t-SNE 3. Contribution B: SG-t-SNE-Π 4. Key references

  4. Introduction: graphs & graph embedding Graph/network G ( V , E ): relational data increasingly arise in various applications: biological, social, friend networks, food webs, co-author networks, word co-occurrence networks, product co-purchase networks, . . . Graph (vertex) embedding : ⇒ 𝒵 ⊆ R d Mapping/encoding: V = 𝒴 = - word embedding (of a co-occurrence graph) - image embedding (of a nearest-similarity graph) - product embedding (of a co-purchase graph) - user embedding (of a friend network) Social network orkut with n = 3 , 072 , 441 user nodes and m = 237 , 442 , 607 friendship links: to facilitate many tasks of graph data analysis Degree distribution (top) and 2D embedding (bottom) Pitsianis Iliopoulos Floros Sun (AUTh|Duke) Embedding of Sparse Stochastic Graphs IEEE HPEC19 | Sep 25, 2019 3 / 21

  5. SNE: stochastic neighbor embedding algorithm X = { x i } n Y = { y i } n i =1 ∈ R d i =1 G ( V , E k ) G ( V , E k , W k ) k NN cast stochastic distribution V graph weights on E k matching sequence embedding in R 2 x i : RNA sequence SNE 1 pipeline illustrated with spatial embedding of n = 1 , 306 , 127 RNA sequences of E18 mouse brain cells 1 Hinton and Roweis, NIPS, 2003 10x Genomics, App Note, 2017 Pitsianis Iliopoulos Floros Sun (AUTh|Duke) Embedding of Sparse Stochastic Graphs IEEE HPEC19 | Sep 25, 2019 4 / 21

  6. t-SNE: t-distributed SNE From input vertex data 𝒴 = ¶ x i ♢ n Vertex embedding coordinates i =1 Find k NNs among D = [ d 2 ( x i , x j )] n × n 𝒵 = ¶ y i ♢ n i =1 ∈ R d , d = 1 , 2 , 3 , . . . Cast D kNN to stochastic P = [ p j ♣ i + p i ♣ j ] / 2 Follow t-distribution (Cauchy kernel) p j ♣ i ( σ i ) = 1 )︄ [︄ ⊗ d 2 ij / 2 σ 2 exp (Gaussians) q ij = 1 i Z i Z (1 + ‖ y i ⊗ y j ‖ 2 ) ⊗ 1 Q : with σ i determined by the perplexity equations Determined by the best distribution matching ∑︂ measured by KL divergence 1 ⊗ a ij p j ♣ i ( σ i ) log( p j ♣ i ( σ i )) = log( u ) , ∀ i (1) j 𝒵 * = arg min 𝒵 KL( P ‖ Q ( 𝒵 )) u : perplexity parameter chosen by the user 1 van der Maaten and Hinton, JMLR, 2008 Pitsianis Iliopoulos Floros Sun (AUTh|Duke) Embedding of Sparse Stochastic Graphs IEEE HPEC19 | Sep 25, 2019 5 / 21

  7. t-SNE: iterative embedding process X = { x i } n Y = { y i } n i =1 ∈ R d i =1 G ( V , E k ) G ( V , E k , W k ) k NN cast stochastic distribution V weights on E k graph matching digit embedding in R 2 x i : pixels in digit image SNE 1 pipeline illustrated with spatial embedding of n = 60 , 000 handwritten digits (MNIST dataset) 1 Hinton and Roweis, NIPS, 2003 LeCun et al., Proc IEEE, 1998 Pitsianis Iliopoulos Floros Sun (AUTh|Duke) Embedding of Sparse Stochastic Graphs IEEE HPEC19 | Sep 25, 2019 6 / 21

  8. Significant impacts With low-dim. spatial embedding in particular, the SNE/t-SNE algorithm family has enabled – visual inspection, identification of connections/separations – network-based analysis for hidden connections – hypothesis generating and scientific discoveries Amir et al., Nat Biotechnol, 2013 Abdelmoula et al., PNAS, 2016 van Unen et al., Nat Commun, 2017 Pitsianis Iliopoulos Floros Sun (AUTh|Duke) Embedding of Sparse Stochastic Graphs IEEE HPEC19 | Sep 25, 2019 7 / 21

  9. Main limitations Vertices of a network do not necessarily ⊲ Restricted to data in a metric space readily reside in a metric space A typical economic phenomenon: ⊲ Restricted to k NN-based stochastic graphs low-degree nodes in majority hub nodes in minority Degree k and perplexity u are coupled by Irregular in degree distribution condition 0 < u < k implied in (1) Defying the parameter condition u < deg ( i ) Amazon DBLP orkut Irregular degree distribution for each of three real-world networks: Low-degree nodes (including leaf nodes) in majority; high-degree nodes in minority. Pitsianis Iliopoulos Floros Sun (AUTh|Duke) Embedding of Sparse Stochastic Graphs IEEE HPEC19 | Sep 25, 2019 8 / 21

  10. Main limitations ⊲ Existing software programs ⋆ are limited, due Many networks are large; to slow computation speed, to Spaceland (3D) embedding has much - small graphs, or greater potential in preserving/encoding - 1D/2D embedding more structural information (Left) kNN graph (k = 150) for a Möbius strip on a 256 × 32 lattice, with n = 8 , 192 nodes, (Middle) 2D embedding with missed/unresolved connections, (Right) 3D embedding with correct connections, also offering multiple or steerable views. ⋆ van der Maaten, JMLR, 2014 Linderman et al., Nat Methods, 2019 https://lvdmaaten.github.io/tsne https://github.com/KlugerLab/FIt-SNE Pitsianis Iliopoulos Floros Sun (AUTh|Duke) Embedding of Sparse Stochastic Graphs IEEE HPEC19 | Sep 25, 2019 9 / 21

  11. 1. Introduction 2. Contribution A: SG-t-SNE Admitting arbitrary stochastic graph (SG) Enabled embeddings of real-world graphs 3. Contribution B: SG-t-SNE-Π 4. Key references

  12. SG-t-SNE: stochastic graph t-SNE X = { x i } n Y = { y i } n i =1 ∈ R d i =1 G ( V , E k ) G ( V , E k , W k ) k NN V graph cast/scale stochastic distribution or weights on E matching G G ( V , E , P ( λ )) embedding in R 2 admit arbitrary stochastic graph SG-SNE pipeline admitting two types of input (top) embedding of n = 1 , 306 , 127 RNA sequences of E18 mouse brain cells (bottom) embedding of n = 8 , 381 peripheral blood mononuclear cells 10x Genomics, App Note, 2017 Zheng et al., Nat Commun, 2017 Pitsianis Iliopoulos Floros Sun (AUTh|Duke) Embedding of Sparse Stochastic Graphs IEEE HPEC19 | Sep 25, 2019 10 / 21

  13. SG-t-SNE: distinctive extension & the keystone Distinctions: ◇ Admitting arbitrary stochastic graph P = [ p j ♣ i ] i.e., extend the embedding to the entire family of stochastic graphs ◇ Making it feasible to exploit sparse connection pattern for - investigative/explorative data analysis - higher computation efficiency Key: the stochastic reshaping/rescaling equations: ∀ i ⎞ ⎡ p γ i a ij φ ⎞ ⎡ ∑︂ j ♣ i a ij φ p γ i = λ p j ♣ i ( λ ) = , ⇒ = j ♣ i λ j φ ≥ 0: reshaping function, monotonically increasing 1 λ > 0: re-scaling parameter; A = [ a ij ]: the binary-valued adjacency matrix; Solutions γ i exist unconditionally 1 We used φ ( x ) = x for the presented embeddings Pitsianis Iliopoulos Floros Sun (AUTh|Duke) Embedding of Sparse Stochastic Graphs IEEE HPEC19 | Sep 25, 2019 11 / 21

  14. Enabled embedding of Amazon product co-purchase network (534) (678) ID n sub e in e out w in w out (534) 44 374 20 71.7 2.4 (678) 70 506 19 114.6 3.3 Amazon product sale network: n = 334 , 863 products, m = 1 , 851 , 744 edges for co-purchase connectivity, irregular degree distribution. (Left) 2D product embedding enabled by SG-t-SNE; (Right) two product clusters/subgraphs, the vertices for each are embedded closer together, with denser intra-connections. Yang and Leskovec, K&IS, 2015 Pitsianis Iliopoulos Floros Sun (AUTh|Duke) Embedding of Sparse Stochastic Graphs IEEE HPEC19 | Sep 25, 2019 12 / 21

  15. Enabled embedding of social network orkut Social network orkut : n = 3 , 072 , 441 user nodes, m = 237 , 442 , 607 friendship links. (Left & Middle) 3D and 2D embeddings enabled by SG-t-SNE; (Right) Findings : There is a weak-link zone (easier to observe in 3D embedding), calibrated communities reside on one or the other side; the rich structure reflects/decodes information of geophysical regions and cultural diversities. Yang and Leskovec, K&IS, 2015 Pitsianis Iliopoulos Floros Sun (AUTh|Duke) Embedding of Sparse Stochastic Graphs IEEE HPEC19 | Sep 25, 2019 13 / 21

  16. SG-t-SNE: exploiting sparse patterns ⊲ Vertex data: 8 k peripheral blood mononuclear cells (PBMCs) ⊲ PBMC embedding via kNN graphs by a cell similarity measure ⊲ SG-t-SNE can use a much sparser neighbor graph kNN graph P k , k = 30 t-SNE: k = 150 , u =50 SG-t-SNE: k = 30 , λ =80 PBM cells are color coded by provided labels with the data. Zheng et al., Nat Commun, 2017 Pitsianis Iliopoulos Floros Sun (AUTh|Duke) Embedding of Sparse Stochastic Graphs IEEE HPEC19 | Sep 25, 2019 14 / 21

  17. 1. Introduction 2. Contribution A: SG-t-SNE 3. Contribution B: SG-t-SNE- Π Challenges in gradient updates Fast calculation of sparse interactions Fast calculation of dense interactions Fast data translocation Comparisons in performance 4. Key references

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend