mining heavy subgraphs in time evolving networks
play

Mining Heavy Subgraphs in Time-Evolving Networks Petko Bogdanov - PowerPoint PPT Presentation

Mining Heavy Subgraphs in Time-Evolving Networks Petko Bogdanov (petko@cs.ucsb.edu) Misael Mongiov Ambuj Singh Department of Computer Science University of California Santa Barbara Traffic networks ICDM 2011 2 Images from


  1. Mining Heavy Subgraphs in Time-Evolving Networks Petko Bogdanov (petko@cs.ucsb.edu) Misael Mongiovì Ambuj Singh Department of Computer Science University of California Santa Barbara

  2. Traffic networks ICDM 2011 2 Images from http://www.dot.ca.gov

  3. Transformation to a Dynamic Network ICDM 2011 3 Images from http://www.dot.ca.gov

  4. Temporal subgraph scoring ICDM 2011 4 Images from http://www.dot.ca.gov

  5. Various Application Domains ICDM 2011 5

  6. Problem definition • Time-Evolving Graph • Time-Evolving Graph • Time-Evolving Graph • Time-Evolving Graph • Temporal subgraph o Connected o Contiguous in time o Score is the sum of scores of involved edges Values can also be in nodes instead of edges ICDM 2011 6

  7. Problem definition • Time-Evolving Graph • Time-Evolving Graph • Time-Evolving Graph • Time-Evolving Graph • Temporal subgraph o Connected o Contiguous in time o Score is the sum of scores of involved edges Values can also be in nodes instead of edges ICDM 2011 7

  8. Outline • Motivation and problem definition • Complexity and previous work • Mining Edge Dynamic Networks (MEDEN) • Datasets and Results ICDM 2011 8

  9. Previous Work • Dynamic graph mining o Evolutionary clustering [Lin 08, Kim 09, Yang 11, Sun 07] o Pattern mining [Lin 09, Oshino 10, McGlohon 07] o Anomaly detection [Abello 10, Akoglu 10] • Static graphs: Prize collecting steiner tree (PCST) [Lee 96, Johnson 00, Ljubic 05, Dittrich 08] ICDM 2011 9

  10. Complexity • HDS is NP-hard o Reduction from Thumbnail Rectilinear Steiner Tree [Ganley 95] • The problem remains NP-hard o For one time slice o For a simple {-1,1} scoring ICDM 2011 10

  11. Naive solution • Consider all time intervals o Transform HDS to PCST o Solve PCST • Return best solution • Complexity: O(t 2 |V| 2 log|V|) o We have to enumerate all sub-intervals o We have to apply a super-quadratic heuristic for PCST (such as [Johnson 2000] ) • Can we filter unfeasible solutions fast? o O(t log 2 (t) |E|) filtering ICDM 2011 11

  12. Outline • Motivation and problem definition • Complexity and previous work • Mining Edge Dynamic Networks (MEDEN) • Datasets and Results ICDM 2011 12

  13. A better solution: Basic • For a fixed time slice: o Prune by structure upper bounds (UB) o Use a fast and accurate heuristic TopDown • Filtering solution • Obtain a lower bound • Filter every interval based on UB • Verify unfiltered intervals • Filtering: O( t 2 |E|) • Can we filter multiple intervals at a time? ICDM 2011 13

  14. Grouping similar intervals Time • Combine overlapping intervals into groups. • High overlap: similar solutions • Filter interval groups as a whole, without considering individual members ICDM 2011 14

  15. Groups of minimum overlap • Group intervals with common starting point • Ensure minimum overlap alpha (0.5 in the example) ICDM 2011 15

  16. Groups of minimum overlap • Group intervals with common starting point • Ensure minimum overlap alpha (0.5 in the example) • A total of O(t log(t)) groups, when alpha < 1 ICDM 2011 16

  17. MEDEN: filter and verify ICDM 2011 17

  18. MEDEN: filter and verify ICDM 2011 18

  19. MEDEN: filter and verify ICDM 2011 19

  20. MEDEN: filter and verify ICDM 2011 20

  21. MEDEN: filter and verify ICDM 2011 21

  22. Filter whole groups • We define a Dominating Graph (DG) for each group o Edge weights are maximum over all group members o Solution in DG dominates solution in any member • Compose a DG: O(log(t)) • Time to build index: O(t |E|) • All dominant graphs composition time: O(t log 2 (t) |E|) ICDM 2011 22

  23. Putting things together: MEDEN • Steps of MEDEN: • Obtain a LB to the solution • Filter groups • Filter members • Verify using TopDown • Running time • Filtering takes O(t log 2 (t) |E|) • Verification takes O(|C||E|log|V|) , where |C| is the number of not pruned intervals • |C| is small (linear in t in our experiments) ICDM 2011 23

  24. Outline • Motivation and problem definition • Complexity and previous work • Mining Edge Dynamic Networks (MEDEN) • Datasets and Results ICDM 2011 24

  25. Scalability with time ICDM 2011 25

  26. Graph size and overlap for grouping ICDM 2011 26

  27. Conclusion • We are the first to introduce the Heaviest Dynamic Subgraph ( HDS) • Our approach MEDEN scales to real world graphs and outperforms a basic approach by an order of magnitude due to interval grouping • Future directions • Extend to scalable top-k • Allow smoothly changing patterns ICDM 2011 27

  28. THANK YOU Questions? ICDM 2011 28

  29. NP-hardness Reduction from Thumbnail Rectilinear Steiner Tree -1 -1 -1 -1 -1 -1 1 1 4n -1 4n -1 -1 1 1 -1 1 4n 1 1 4n 1 ICDM 2011 29

  30. Upper Bounds UB sop UB str ICDM 2011 30

  31. TopDown heuristic ICDM 2011 31

  32. Results - Twitter Twitter sub-network • Nodes: 2605 • Edges: 14871 • Slices: 204 • Resolution: 1 day • Cutoff: cosine similarity 0.004 ICDM 2011 32

  33. Can we improve the evaluation in time? • Still O(t 2 ) sub-intervals need to be considered • Infeasible on long time-spans • Combine overlapping intervals into groups. • Ensure: 1. High overlap of intervals in a group • Sub-quadratic number of groups • Sub-quadratic time to compute bounds for groups • Prune groups as a whole, without considering individual members ICDM 2011 33

  34. Some References • Hwang, Ju-Won; Lee, Young-Seol; Cho, Sung-Bae; , "Structure evolution of dynamic Bayesian network for traffic accident detection," Evolutionary Computation (CEC), 2011 IEEE Congress on , vol., no., pp.1655-1671, 5-8 June 2011 • Borgwardt, K.M.; Kriegel, H.-P.; Wackersreuther, P.; , "Pattern Mining in Frequent Dynamic Subgraphs," Data Mining, 2006. ICDM '06. Sixth International Conference on , vol., no., pp.818-822, 18-22 Dec. 2006 • Johnson DS, Minkoff M, Phillips S. The Prize Collecting Steiner Tree Problem : Theory and Practice. SODA . 2000. • Kwon J, Murphy K. Modeling Freeway Traffic with Coupled HMMs. 2004 • Berlingerio M, Bonchi F. Mining graph evolution rules. Machine Learning and Knowledge Discovery in Databases . 2009:115-130. • Stoev SA, Michailidis G, Vaughan J. Global Modeling and Prediction of Computer Network Traffic. 2009:1-32. • Wackersreuther B, Wackersreuther P, Oswald A, Böhm C, Borgwardt KM. Frequent Subgraph Discovery in Dynamic Networks. Developmental Biology . 2010:155-162. • Macropol K, Singh AK: Content-based Modeling and Prediction of Information Dissemination ASONAM 2011 ICDM 2011 34

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend