orion shortest path estimation for large social graphs
play

Orion: Shortest Path Estimation for Large Social Graphs Xiaohan - PowerPoint PPT Presentation

Orion: Shortest Path Estimation for Large Social Graphs Xiaohan Zhao , Alessandra Sala, Christo Wilson, Haitao Zheng and Ben Y. Zhao Department of Computer Science, UC Santa Barbara, USA Super Large Social Graphs 45 450 Million 70 70


  1. Orion: Shortest Path Estimation for Large Social Graphs Xiaohan Zhao , Alessandra Sala, Christo Wilson, Haitao Zheng and Ben Y. Zhao Department of Computer Science, UC Santa Barbara, USA

  2. Super Large Social Graphs 45 450 Million � 70 70 Million � 150 M 150 Million � 2

  3. Maximizing Social Influence  Product advertisement in OSN  Bill Gates “likes” Windows Mobile 7  Propagate information starting at specific nodes  Goal: find the most influential nodes in graph  Nodes with shorter average distances to rest of graph 3

  4. Ranked Social Search  Search for specific friends in social network  Rank search results based on the social distances 4

  5. 5

  6. Node Distance Algorithms For a graph with n nodes and m edges Algorithm Time complexity for all nodes pairs Breadth-First Search (BFS) O(mn) Dijkstra O(n 2 log(n)+mn) Floyd-Warshall Θ (n 3 ) 6

  7. Problem of Node Distance Algorithms 7

  8. A More Scalable Solution?  Design a scalable system for large graphs  Real-time queries are important  Desired query time: O(1)  Do preprocessing  How to achieve O(1) query time?  Represent node distance in a graph as distance between two nodes in Euclidean Space  Map all graph nodes into Euclidean Space  A Graph Coordinate System 8

  9. Orion  A Graph Coordinate System  Embedding: “Capture” node distances using Euclidean positions  Estimate node distances using coordinates in constant time 9

  10. Outline  Motivation  Designing Orion  Experimental Results  Using Orion in Graph Applications  Conclusion 10

  11. Design Goals of Orion  Scalability (preprocessing time)  Preprocessing time scales linearly w/ graph size  Minimize number of BFS operations  Accuracy  Distance estimates approximate ground truth  Fast convergence  Individual node calibration should not oscillate 11

  12. Approaches for Embedding Our Choice  Physical spring system  Landmark-based approach  Each node needs to do BFS  Distances to fixed number computation of nodes  Compute once each node  Multiple iteration 12

  13. How to Select Landmarks?  Intuition: highest degree nodes as landmarks  “Backbone” of social graph  Landmark separation  Highest degree nodes often connected to each other  Need to avoid clusters of landmarks 13

  14. How to Position Landmarks?  Naïve solution: Global Simplex Downhill  O(k 2 D) for k landmarks in D-dimension space  However, k can be large for large graphs  Incremental approach  Divide k landmarks into two groups  Small initial group L k (16)  Two step computation  Initial group: global simplex downhill  Remaining landmarks added one by one  Use initial landmarks to calibrate distance 14

  15. Experimental Setup  Datasets  Four datasets from Facebook regional networks  Evaluation Metrics E = | d m − d p |  Relative Error: d m  d m : actual distance d p : estimated distance computed by Orion  Computational Time Network Nodes Edges Avg. Path Len. Norway 293K 5,589K 4.2 Egypt 246K 1,618K 5.0 Los Angeles 275K 2,115K 5.1 India 363K 1,556K 6.1 15

  16. Dimensionality of Coordinates  Error < 0.2 when dimension > 6  Higher dimensions  improved accuracy 0.4  But also increases computational time 0.35 Average Relative Error India 0.3 Egypt LA 0.25 Norway 0.2 0.15 0.1 0.05 0 2 4 6 8 10 12 14 # of Dimensions 16

  17. Computational Time Time India Egypt L.A. Norway Orion Preprocessing 9493s 6156s 6967s 7506s Orion Response 0.0000002s 0.00000002s 0.00000018s 0.00000019s BFS Response 1.028s 0.75s 1.027s 1.44s  Orion Preprocessing: to compute coordinates for all nodes  One-time cost  2 hours for 300K node graph on 1 cheap commodity server  Time scales linearly with graph size  Easily parallelized across clusters  Average time per node-distance query  Orion is 7 orders of magnitude faster than BFS 17

  18. Application: Node Separation Metrics  Node separation metrics  Common tool to analyze graphs  Include radius, diameter and average path length 7 Average path length (hop) Actual 6 Orion 5 4 3 2 1 0 India Egypt L.A. Norway 18

  19. Conclusion  We propose Orion , a scalable graph coordinate system for node distance computation  Time complexity is low  Preprocessing: 2 hours for a 300K node graph  Can be parallelized across machine clusters  Query Response: 0.2µs to estimate node distances for per query  Orion can accurately support node-distance based applications 19

  20. Future / Ongoing Work  Dynamics in social graphs  Investigate the impact of graph dynamics on node distances  Use heuristics to incrementally update graph embeddings at run time  Weighted graphs  Examine the use of graph coordinate systems on applications on weighted graphs 20

  21. Thank You. Questions? 21

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend