crashsim an efficient algorithm for computing simrank
play

CrashSim: An Efficient Algorithm for Computing SimRank over Static - PowerPoint PPT Presentation

CrashSim: An Efficient Algorithm for Computing SimRank over Static and Temporal Graphs Mo Li 1,2 , Farhana M. Choudhury 2 , Renata Borovica-Gajic 2 , Zhiqiong Wang 1 , Junchang Xin 1,* , Jianxin Li 3 1 Northeastern University, CN 2 University of


  1. CrashSim: An Efficient Algorithm for Computing SimRank over Static and Temporal Graphs Mo Li 1,2 , Farhana M. Choudhury 2 , Renata Borovica-Gajic 2 , Zhiqiong Wang 1 , Junchang Xin 1,* , Jianxin Li 3 1 Northeastern University, CN 2 University of Melbourne, AU 3 Deakin University, AU April 22, 2020

  2. Outline Background Problem Definition • SimRank Overview • Preliminaries • Motivation • Problem Definition Our Approach Experiments and Conclusion • CrashSim Algorithm --- static graphs • Experimental Evaluation • CrashSim-T Algorithm --- temporal graphs • Conclusion 2

  3. Background • SimRank Overview • Motivation

  4. Background • Similarity assessment plays a vital role in our lives. Recommender System Citation Collaboration Graph Network 4

  5. Background • SimRank • Node-to-node measurement based on the topology of graphs (KDD’02) • Basic assumption • Two nodes will be similar if they are both highly relevant to similar nodes • Two Forms • Original definition (KDD’02) √c - walk (SIGMOD’16) • 5

  6. Background • Temporal Graph A A A B B B C D E C D E C D E Recommender System F G H F G H F G H • Temporal SimRank queries: threshold and trend 6

  7. Problem Definition • Preliminaries • Problem Definition

  8. Preliminaries SLING algorithm (SIGMOD’16) ProbeSim algorithm (VLDB’17) 8

  9. Problem Definition Problem (Temporal SimRank Queries) Given: !, #, [% & , % ' ] Return: node set Ω , such that the SimRank of # and each node * ∈ Ω continuously meet a certain query requirement during the entire query interval [% & , % ' ] Problem (Temporal SimRank Trend Query) Given: !, #, [% & , % ' ] Return: node set Ω , such that the SimRank of # and each node * ∈ Ω is continuously increasing (or decreasing) during the entire query interval [% & , % ' ] Problem (Temporal SimRank Threshold Query) Given: !, #, [% & , % ' ] , , Return: node set Ω , such that the SimRank of # and each node * ∈ Ω is greater than - during the entire query interval [% & , % ' ] 9

  10. Our Approach • CrashSim Algorithm --- static graphs • CrashSim-T Algorithm --- temporal graphs

  11. CrashSim Algorithm • Motivation • ProbeSim (VLDB’17) is the state-of-the-art algorithm for SimRak computation over static graph A B B A C D E C D E B C A D F G H C B F G H A C D E A D F G B C F G H E A D • Drawbacks • redundant computations the length of √" -walk determine the computation costs • 11

  12. CrashSim Algorithm • Key idea • Constrain the length of √" -walk to l max • A reverse reachable tree of source u with the limited length of √" -walk, l max • Still obtain SimRank estimators with the same guaranteed error bound of the ProbeSim Problem (Approximation Guarantee) Given: #, %, &, ' Return: ( %, ) such that |( %, ) − (,-(%, ))| ≤ & with at least 1 − ' probability 12

  13. CrashSim Algorithm A Level 0 A B B C Level 1 C D E E B D Level 2 F G H H A E B Level 3 ! 0,$ = 1 ) = 1×0.5 ! 1,' = ! 0,$ × 2 = 0.25 In the k -th trial, 6 . = ., 4, ', $ * ' ) = 1×0.5 7 8 A, C = ! 0, C + ! 1, D + ! 2, B + ! 3, A ! 1,. = ! 0,$ × 3 = 0.167 * . = 0 + 0 + 0.0417 + 0.0104 = 0.0521 ! 2,2 = 0.0625 , ! 2,' = 0.0417 , ! 2,4 = 0.0417 ! 3,5 = 0.0156 , ! 3,$ = 0.0104 ! 3,2 = 0.0104 , ! 3,' = 0.0104 13

  14. CrashSim Algorithm Time Complexity: 14

  15. CrashSim-T Algorithm • Two opportunities • Unnecessary to compute the SimRank between u and the candidate node set ! at each time instant • The size of node set ! can only gradually reduce over time • CrashSim naturally supports the computation of SimRank of the source u and a partial set of nodes. 15

  16. CrashSim-T Algorithm --- Delta Pruning • Affected area of a changed edge ! → # • The altered nodes in the reverse reachable tree of u $ %&' − 1 length reachable nodes of y • • Delta pruning: ignore the nodes of an unaffected area A A A B B Delete B C C D E C D E * → + E B D F G H F G H The reverse reachable tree of A remains unchanged. 16

  17. CrashSim-T Algorithm --- Difference Pruning • Related area: the ! "#$ length reverse reachable tree of u and v • Difference pruning: filter out those nodes whose related area is unchanged A E A A B B B H Add B C C D E C D E % → ' A D E B D F G H F G H The reverse reachable tree of A and E remains unchanged. 17

  18. CrashSim-T Algorithm • Main idea • Check whether the conditions of delta and difference pruning are satisfied • If so, disregard those nodes as part of the candidate node set • Invoke CrashSim algorithm to compute u and residual nodes • According to different query requirements to filter out unsatisfied nodes 18

  19. Experiments and Conclusion • Experimental Evaluation • Conclusion

  20. Experimental Evaluation • Datasets • Comparison baselines • SLING (SIGMOD’16), ProbeSim (VLDB’17), READS (VLDB’17) • Setting and metrics • ! varies from 0.0125, 0.025, 0.05 to 0.1 • "# = max ( ),+ − (-. ),+ + ∈ 0 7(9:)∩7(9=) • 1234-(-56 = >?@(9:,9=) 20

  21. Experimental Evaluation (a) AS-733 (c) Wiki-Vote (b) AS-Caidi (e) HepPh (d) HepTh 21

  22. Experimental Evaluation The impact of the query interval on the response time of the algorithms 22

  23. Conclusion • Propose CrashSim algorithm, an index-free algorithm for single-source and partial SimRank computation in static graphs • Introduce CrashSim-T --- an extension to CrashSim to solve SimRank queries over temporal graphs • Experiments show that both CrashSim and CrashSim-T outperform the state-of-the- art algorithms. 23

  24. Thanks. Mo Li limo_neucse@hotmail.com

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend