graphwalker an i o efficient and resource friendly graph
play

GraphWalker: An I/O-Efficient and Resource-Friendly Graph Analytic - PowerPoint PPT Presentation

GraphWalker: An I/O-Efficient and Resource-Friendly Graph Analytic System for Fast and Scalable Random Walks Rui Wang , Yongkun Li , Hong Xie , Yinlong Xu , John C.S. Lui* University of Science and Technology of China


  1. GraphWalker: An I/O-Efficient and Resource-Friendly Graph Analytic System for Fast and Scalable Random Walks Rui Wang † , Yongkun Li † , Hong Xie ‡ , Yinlong Xu † , John C.S. Lui* † University of Science and Technology of China ‡ Chongqing University * The Chinese University of Hong Kong USENIX ATC 2020

  2. Ø Social networks Webpage links Recommendation systems Graph analytics is one of the top 10 data and analytics technology trends [1] [1] Gartner's top 10 data and analytics technology trends for 2019 2

  3. Ø Ø • • • Random walks can realize an approximate calculation on large graphs. [2] The 2012 common crawl graph. http://webdatacommons.org. 3

  4. Ø a) b) c) Ø • Massive walks situations! • • 4

  5. Ø ② update walks in the sub-graph Memory ① load a sub-graph to memory Disk Loop until all walks finished 5

  6. Ø How many walkers in How many steps each loaded sub-graph? each walker can update? 6

  7. Ø • Start 10 6 walks from a source in Friendster (68.3M vertices) on DurnkardMob More than 200000 walks 0.03% Only 12 walks walks distribution #,-./ ./0.- I/O 𝑣𝑢𝑗𝑚𝑗𝑨𝑏𝑢𝑗𝑝𝑜 = #12134 ./0.- 56 3 -,70839: among subgraphs after 4 steps Some loaded blocks contain only few walkers, results in low I/O utilization 7

  8. Ø • 𝑥 > 𝑥 ? block b 0 block b 0 𝑥 < 𝑥 ? 2 2 1 1 𝑥 < 𝑥 = 7 𝑥 > 7 0 0 3 6 9 𝑥 = 3 6 9 block b 1 8 4 8 4 block b 1 block b 2 block b 2 5 5 Let the most walkers get moved by an I/O 8

  9. Ø • Start 10 6 walks of 10 steps from a source in Friendster (68.3M vertices) on DurnkardMob Avg: 0.2% walks in 1 st subgraph walk steps update rate in each I/O after each iteration Many walks remain in the current subgraph after walking one step Synchronized walk updating leads to low walk updating rate 9

  10. Ø Successively update each walk until it moved out of current subgraph Maximize the I/O utilization of a loaded subgraph Straggler problem: probabilistic approach with a probability p, we choose to load the subgraph with the shortest walker 10

  11. Ø • walk arrays bucket array Too many walk arrays 0 0 128 Ø 1 256 • … … |v|-128 • P−1 |v| #walk limited as its hard to flush all walks to disk for too many files 11

  12. Ø • p Dynamic walk arrays bucket array • 0 0 • 128 1 256 Frequent memory re-allocation … … |v|-128 P−1 memory waste |v| Dynamic arrays bring high cost for storing walk data 12

  13. Ø • Block Walk pool Fixed-length array in disk file block array 0 block 0 1 block 1 … … source current step 40 39 14 13 0 63 P block P low s 𝐮𝐩𝐬𝐛𝐡𝐟 𝐝𝐩𝐭𝐮 low I/O cost less memory re-allocation 13

  14. Ø Other optimizations Data conflict Graph block size Light-weighted Walk-conscious optimization in configuration blocking cache strategy multi-threads More details are in the paper Ø Prototype system——GraphWalker 14

  15. Ø • • Ø Largest dataset 15

  16. Ø Optimize the walk management DrunkardMob Ø Fine-grained I/O management Graphene GraFSoft Ø KnightKing Optimize the walk forwarding process 16

  17. Ø Performance of random walks with different number of walks Fix walk length as 10 • GraphWalker achieves 16x-70x speedup. GraphWalker is also capable to support huge graphs and massive walks. GraphWalker finishes running 10 10 walks on the largest dataset CrawlWeb within around one hour. 17

  18. Ø Performance of random walks with different walk lengths Fix the number of walks as 10 5 • GraphWalker achieves even more than three orders of magnitude in the best case. GraphWalker also achieves 7 – 10x speedup for Kron30. 18

  19. Ø I/O utilization and walk updating rate RWD (|V| * 6 walks) in YahooWeb (1.6B vertices) • DrunkardMob needs 150 I/Os and GraphWalker only needs 46 I/Os. GraphWalker achieves 2 – 4x I/O utilization. 19

  20. Ø Compare with single machine systems Graphene and GraFSoft R * 10 • GraphWalker achieves 2 - 40x speedup compared to Graphene. GraphWalker achieves 1 - 37x speedup compared to GraFSoft. 20

  21. Ø Compare with distributed system KnightKing Run |v| walks, each vertex start one walk • Terminate with probability 0.15 in each walk step • GraphWalker (1 node) achieves comparable with KnightKing (8 nodes). 21

  22. Ø • • • Ø https://github.com/ustcadsl/graphwalker Ø 22

  23. 23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend