Mining Heavy Subgraphs in Time-Evolving Networks Petko Bogdanov - - PowerPoint PPT Presentation
Mining Heavy Subgraphs in Time-Evolving Networks Petko Bogdanov - - PowerPoint PPT Presentation
Mining Heavy Subgraphs in Time-Evolving Networks Petko Bogdanov (petko@cs.ucsb.edu) Misael Mongiov Ambuj Singh Department of Computer Science University of California Santa Barbara Traffic networks ICDM 2011 2 Images from
ICDM 2011 2
Traffic networks
Images from http://www.dot.ca.gov
ICDM 2011 3
Transformation to a Dynamic Network
Images from http://www.dot.ca.gov
ICDM 2011 4
Temporal subgraph scoring
Images from http://www.dot.ca.gov
ICDM 2011 5
Various Application Domains
ICDM 2011 6
Problem definition
- Time-Evolving Graph
- Time-Evolving Graph
- Time-Evolving Graph
- Time-Evolving Graph
- Temporal subgraph
- Connected
- Contiguous in time
- Score is the sum of scores of involved edges
Values can also be in nodes instead of edges
ICDM 2011 7
Problem definition
- Time-Evolving Graph
- Time-Evolving Graph
- Time-Evolving Graph
- Time-Evolving Graph
- Temporal subgraph
- Connected
- Contiguous in time
- Score is the sum of scores of involved edges
Values can also be in nodes instead of edges
ICDM 2011 8
Outline
- Motivation and problem definition
- Complexity and previous work
- Mining Edge Dynamic Networks (MEDEN)
- Datasets and Results
ICDM 2011 9
Previous Work
- Dynamic graph mining
- Evolutionary clustering [Lin 08, Kim 09, Yang 11, Sun 07]
- Pattern mining [Lin 09, Oshino 10, McGlohon 07]
- Anomaly detection [Abello 10, Akoglu 10]
- Static graphs: Prize collecting steiner tree (PCST)
[Lee 96, Johnson 00, Ljubic 05, Dittrich 08]
ICDM 2011 10
Complexity
- HDS is NP-hard
- Reduction from Thumbnail Rectilinear Steiner Tree
[Ganley 95]
- The problem remains NP-hard
- For one time slice
- For a simple {-1,1} scoring
ICDM 2011 11
Naive solution
- Consider all time intervals
- Transform HDS to PCST
- Solve PCST
- Return best solution
- Complexity: O(t2 |V|2log|V|)
- We have to enumerate all sub-intervals
- We have to apply a super-quadratic heuristic for PCST
(such as [Johnson 2000])
- Can we filter unfeasible solutions fast?
- O(t log2(t) |E|) filtering
ICDM 2011 12
Outline
- Motivation and problem definition
- Complexity and previous work
- Mining Edge Dynamic Networks (MEDEN)
- Datasets and Results
ICDM 2011 13
A better solution: Basic
- For a fixed time slice:
- Prune by structure upper bounds (UB)
- Use a fast and accurate heuristic TopDown
- Filtering solution
- Obtain a lower bound
- Filter every interval based on UB
- Verify unfiltered intervals
- Filtering: O(t2 |E|)
- Can we filter multiple intervals at a time?
ICDM 2011 14
Grouping similar intervals
- Combine overlapping intervals into groups.
- High overlap: similar solutions
- Filter interval groups as a whole, without
considering individual members
Time
ICDM 2011 15
Groups of minimum overlap
- Group intervals with common starting point
- Ensure minimum overlap alpha (0.5 in the example)
ICDM 2011 16
Groups of minimum overlap
- Group intervals with common starting point
- Ensure minimum overlap alpha (0.5 in the example)
- A total of O(t log(t)) groups, when alpha < 1
ICDM 2011 17
MEDEN: filter and verify
ICDM 2011 18
MEDEN: filter and verify
ICDM 2011 19
MEDEN: filter and verify
ICDM 2011 20
MEDEN: filter and verify
ICDM 2011 21
MEDEN: filter and verify
ICDM 2011 22
Filter whole groups
- We define a Dominating Graph (DG) for each group
- Edge weights are maximum over all group members
- Solution in DG dominates solution in any member
- Compose a DG: O(log(t))
- Time to build index: O(t |E|)
- All dominant graphs composition
time: O(t log2(t) |E|)
ICDM 2011 23
Putting things together: MEDEN
- Steps of MEDEN:
- Obtain a LB to the solution
- Filter groups
- Filter members
- Verify using TopDown
- Running time
- Filtering takes O(t log2(t) |E|)
- Verification takes O(|C||E|log|V|), where |C| is
the number of not pruned intervals
- |C| is small (linear in t in our experiments)
ICDM 2011 24
Outline
- Motivation and problem definition
- Complexity and previous work
- Mining Edge Dynamic Networks (MEDEN)
- Datasets and Results
ICDM 2011 25
Scalability with time
ICDM 2011 26
Graph size and overlap for grouping
ICDM 2011 27
Conclusion
- We are the first to introduce the Heaviest Dynamic
Subgraph (HDS)
- Our approach MEDEN scales to real world
graphs and outperforms a basic approach by an
- rder of magnitude due to interval grouping
- Future directions
- Extend to scalable top-k
- Allow smoothly changing patterns
ICDM 2011 28
THANK YOU
Questions?
ICDM 2011 29
NP-hardness
Reduction from Thumbnail Rectilinear Steiner Tree
1
- 1
- 1
- 1
- 1
- 1
- 1
- 1
1 1
- 1
- 1
- 1
4n
1
4n
1
4n
1
4n
1 1
ICDM 2011 30
Upper Bounds
UBsop UBstr
ICDM 2011 31
TopDown heuristic
ICDM 2011 32
Results - Twitter
Twitter sub-network
- Nodes: 2605
- Edges: 14871
- Slices: 204
- Resolution: 1 day
- Cutoff: cosine similarity 0.004
ICDM 2011 33
Can we improve the evaluation in time?
- Still O(t2) sub-intervals need to be considered
- Infeasible on long time-spans
- Combine overlapping intervals into groups.
- Ensure:
- 1. High overlap of intervals in a group
- Sub-quadratic number of groups
- Sub-quadratic time to compute bounds for
groups
- Prune groups as a whole, without considering
individual members
ICDM 2011 34
Some References
- Hwang, Ju-Won; Lee, Young-Seol; Cho, Sung-Bae; , "Structure evolution of
dynamic Bayesian network for traffic accident detection," Evolutionary Computation (CEC), 2011 IEEE Congress on , vol., no., pp.1655-1671, 5-8 June 2011
- Borgwardt, K.M.; Kriegel, H.-P.; Wackersreuther, P.; , "Pattern Mining in
Frequent Dynamic Subgraphs," Data Mining, 2006. ICDM '06. Sixth International Conference on , vol., no., pp.818-822, 18-22 Dec. 2006
- Johnson DS, Minkoff M, Phillips S. The Prize Collecting Steiner Tree Problem :
Theory and Practice. SODA. 2000.
- Kwon J, Murphy K. Modeling Freeway Traffic with Coupled HMMs. 2004
- Berlingerio M, Bonchi F. Mining graph evolution rules. Machine Learning and
Knowledge Discovery in Databases. 2009:115-130.
- Stoev SA, Michailidis G, Vaughan J. Global Modeling and Prediction of Computer
Network Traffic. 2009:1-32.
- Wackersreuther B, Wackersreuther P, Oswald A, Böhm C, Borgwardt KM.
Frequent Subgraph Discovery in Dynamic Networks. Developmental Biology. 2010:155-162.
- Macropol K, Singh AK: Content-based Modeling and Prediction of Information
Dissemination ASONAM 2011