Orion: Shortest Path Estimation for Large Social Graphs Xiaohan - - PowerPoint PPT Presentation

orion shortest path estimation for large social graphs
SMART_READER_LITE
LIVE PREVIEW

Orion: Shortest Path Estimation for Large Social Graphs Xiaohan - - PowerPoint PPT Presentation

Orion: Shortest Path Estimation for Large Social Graphs Xiaohan Zhao , Alessandra Sala, Christo Wilson, Haitao Zheng and Ben Y. Zhao Department of Computer Science, UC Santa Barbara, USA Super Large Social Graphs 45 450 Million 70 70


slide-1
SLIDE 1

Orion: Shortest Path Estimation for Large Social Graphs

Xiaohan Zhao, Alessandra Sala, Christo Wilson, Haitao Zheng and Ben Y. Zhao Department of Computer Science, UC Santa Barbara, USA

slide-2
SLIDE 2

Super Large Social Graphs

2

45 450 Million 70 70 Million 150 M 150 Million

slide-3
SLIDE 3

Maximizing Social Influence

 Product advertisement in OSN

 Bill Gates “likes” Windows Mobile 7

 Propagate information starting at specific nodes  Goal: find the most influential nodes in graph

 Nodes with shorter average distances to rest of graph

3

slide-4
SLIDE 4

Ranked Social Search

 Search for specific friends in social network

 Rank search results based on the social distances

4

slide-5
SLIDE 5

5

slide-6
SLIDE 6

Algorithm Time complexity for all nodes pairs Breadth-First Search (BFS) O(mn) Dijkstra O(n2log(n)+mn) Floyd-Warshall Θ(n3)

Node Distance Algorithms

6

For a graph with n nodes and m edges

slide-7
SLIDE 7

Problem of Node Distance Algorithms

7

slide-8
SLIDE 8

A More Scalable Solution?

 Design a scalable system for large graphs

 Real-time queries are important  Desired query time: O(1)  Do preprocessing

 How to achieve O(1) query time?

 Represent node distance in a graph as distance between two nodes in Euclidean Space

 Map all graph nodes into Euclidean Space

 A Graph Coordinate System

8

slide-9
SLIDE 9

Orion

 A Graph Coordinate System

 Embedding: “Capture” node distances using Euclidean positions  Estimate node distances using coordinates in constant time

9

slide-10
SLIDE 10

Outline

 Motivation  Designing Orion  Experimental Results  Using Orion in Graph Applications  Conclusion

10

slide-11
SLIDE 11

Design Goals of Orion

 Scalability (preprocessing time)

 Preprocessing time scales linearly w/ graph size  Minimize number of BFS operations

 Accuracy

 Distance estimates approximate ground truth

 Fast convergence

 Individual node calibration should not oscillate

11

slide-12
SLIDE 12

 Physical spring system

 Each node needs to do BFS computation  Multiple iteration

Approaches for Embedding

 Landmark-based approach

 Distances to fixed number

  • f nodes

 Compute once each node

12

Our Choice

slide-13
SLIDE 13

How to Select Landmarks?

 Intuition: highest degree nodes as landmarks

 “Backbone” of social graph

 Landmark separation

 Highest degree nodes often connected to each other  Need to avoid clusters of landmarks

13

slide-14
SLIDE 14

How to Position Landmarks?

 Naïve solution: Global Simplex Downhill

 O(k2D) for k landmarks in D-dimension space  However, k can be large for large graphs

 Incremental approach

 Divide k landmarks into two groups

 Small initial group Lk (16)

 Two step computation

 Initial group: global simplex downhill  Remaining landmarks added one by one

 Use initial landmarks to calibrate distance

14

slide-15
SLIDE 15

Experimental Setup

 Datasets

 Four datasets from Facebook regional networks

 Evaluation Metrics

 Relative Error:

 dm: actual distance dp: estimated distance computed by Orion

 Computational Time

15

Network Nodes Edges

  • Avg. Path Len.

Norway 293K 5,589K 4.2 Egypt 246K 1,618K 5.0 Los Angeles 275K 2,115K 5.1 India 363K 1,556K 6.1

E = |dm−dp|

dm

slide-16
SLIDE 16

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 2 4 6 8 10 12 14 Average Relative Error # of Dimensions India Egypt LA Norway

Dimensionality of Coordinates

16

  • Error < 0.2 when dimension > 6
  • Higher dimensions  improved accuracy
  • But also increases computational time
slide-17
SLIDE 17

Computational Time

Time India Egypt L.A. Norway Orion Preprocessing 9493s 6156s 6967s 7506s Orion Response 0.0000002s 0.00000002s 0.00000018s 0.00000019s BFS Response 1.028s 0.75s 1.027s 1.44s

17

 Orion Preprocessing: to compute coordinates for all nodes

 One-time cost  2 hours for 300K node graph on 1 cheap commodity server  Time scales linearly with graph size

 Easily parallelized across clusters

 Average time per node-distance query

 Orion is 7 orders of magnitude faster than BFS

slide-18
SLIDE 18

Application: Node Separation Metrics

 Node separation metrics

 Common tool to analyze graphs  Include radius, diameter and average path length

18

1 2 3 4 5 6 7 India Egypt L.A. Norway Average path length (hop) Actual Orion

slide-19
SLIDE 19

Conclusion

 We propose Orion, a scalable graph coordinate system for node distance computation  Time complexity is low

 Preprocessing: 2 hours for a 300K node graph

 Can be parallelized across machine clusters

 Query Response: 0.2µs to estimate node distances for per query

 Orion can accurately support node-distance based applications

19

slide-20
SLIDE 20

Future / Ongoing Work

 Dynamics in social graphs

 Investigate the impact of graph dynamics on node distances  Use heuristics to incrementally update graph embeddings at run time

 Weighted graphs  Examine the use of graph coordinate systems on applications on weighted graphs

20

slide-21
SLIDE 21

Thank You. Questions?

21