Mining Heavy Subgraphs in Time-Evolving Networks Petko Bogdanov - - PowerPoint PPT Presentation

mining heavy subgraphs in time evolving networks
SMART_READER_LITE
LIVE PREVIEW

Mining Heavy Subgraphs in Time-Evolving Networks Petko Bogdanov - - PowerPoint PPT Presentation

Mining Heavy Subgraphs in Time-Evolving Networks Petko Bogdanov (petko@cs.ucsb.edu) Misael Mongiov Ambuj Singh Department of Computer Science University of California Santa Barbara Traffic networks ICDM 2011 2 Images from


slide-1
SLIDE 1

Mining Heavy Subgraphs in Time-Evolving Networks

Petko Bogdanov (petko@cs.ucsb.edu) Misael Mongiovì Ambuj Singh

Department of Computer Science University of California Santa Barbara

slide-2
SLIDE 2

ICDM 2011 2

Traffic networks

Images from http://www.dot.ca.gov

slide-3
SLIDE 3

ICDM 2011 3

Transformation to a Dynamic Network

Images from http://www.dot.ca.gov

slide-4
SLIDE 4

ICDM 2011 4

Temporal subgraph scoring

Images from http://www.dot.ca.gov

slide-5
SLIDE 5

ICDM 2011 5

Various Application Domains

slide-6
SLIDE 6

ICDM 2011 6

Problem definition

  • Time-Evolving Graph
  • Time-Evolving Graph
  • Time-Evolving Graph
  • Time-Evolving Graph
  • Temporal subgraph
  • Connected
  • Contiguous in time
  • Score is the sum of scores of involved edges

Values can also be in nodes instead of edges

slide-7
SLIDE 7

ICDM 2011 7

Problem definition

  • Time-Evolving Graph
  • Time-Evolving Graph
  • Time-Evolving Graph
  • Time-Evolving Graph
  • Temporal subgraph
  • Connected
  • Contiguous in time
  • Score is the sum of scores of involved edges

Values can also be in nodes instead of edges

slide-8
SLIDE 8

ICDM 2011 8

Outline

  • Motivation and problem definition
  • Complexity and previous work
  • Mining Edge Dynamic Networks (MEDEN)
  • Datasets and Results
slide-9
SLIDE 9

ICDM 2011 9

Previous Work

  • Dynamic graph mining
  • Evolutionary clustering [Lin 08, Kim 09, Yang 11, Sun 07]
  • Pattern mining [Lin 09, Oshino 10, McGlohon 07]
  • Anomaly detection [Abello 10, Akoglu 10]
  • Static graphs: Prize collecting steiner tree (PCST)

[Lee 96, Johnson 00, Ljubic 05, Dittrich 08]

slide-10
SLIDE 10

ICDM 2011 10

Complexity

  • HDS is NP-hard
  • Reduction from Thumbnail Rectilinear Steiner Tree

[Ganley 95]

  • The problem remains NP-hard
  • For one time slice
  • For a simple {-1,1} scoring
slide-11
SLIDE 11

ICDM 2011 11

Naive solution

  • Consider all time intervals
  • Transform HDS to PCST
  • Solve PCST
  • Return best solution
  • Complexity: O(t2 |V|2log|V|)
  • We have to enumerate all sub-intervals
  • We have to apply a super-quadratic heuristic for PCST

(such as [Johnson 2000])

  • Can we filter unfeasible solutions fast?
  • O(t log2(t) |E|) filtering
slide-12
SLIDE 12

ICDM 2011 12

Outline

  • Motivation and problem definition
  • Complexity and previous work
  • Mining Edge Dynamic Networks (MEDEN)
  • Datasets and Results
slide-13
SLIDE 13

ICDM 2011 13

A better solution: Basic

  • For a fixed time slice:
  • Prune by structure upper bounds (UB)
  • Use a fast and accurate heuristic TopDown
  • Filtering solution
  • Obtain a lower bound
  • Filter every interval based on UB
  • Verify unfiltered intervals
  • Filtering: O(t2 |E|)
  • Can we filter multiple intervals at a time?
slide-14
SLIDE 14

ICDM 2011 14

Grouping similar intervals

  • Combine overlapping intervals into groups.
  • High overlap: similar solutions
  • Filter interval groups as a whole, without

considering individual members

Time

slide-15
SLIDE 15

ICDM 2011 15

Groups of minimum overlap

  • Group intervals with common starting point
  • Ensure minimum overlap alpha (0.5 in the example)
slide-16
SLIDE 16

ICDM 2011 16

Groups of minimum overlap

  • Group intervals with common starting point
  • Ensure minimum overlap alpha (0.5 in the example)
  • A total of O(t log(t)) groups, when alpha < 1
slide-17
SLIDE 17

ICDM 2011 17

MEDEN: filter and verify

slide-18
SLIDE 18

ICDM 2011 18

MEDEN: filter and verify

slide-19
SLIDE 19

ICDM 2011 19

MEDEN: filter and verify

slide-20
SLIDE 20

ICDM 2011 20

MEDEN: filter and verify

slide-21
SLIDE 21

ICDM 2011 21

MEDEN: filter and verify

slide-22
SLIDE 22

ICDM 2011 22

Filter whole groups

  • We define a Dominating Graph (DG) for each group
  • Edge weights are maximum over all group members
  • Solution in DG dominates solution in any member
  • Compose a DG: O(log(t))
  • Time to build index: O(t |E|)
  • All dominant graphs composition

time: O(t log2(t) |E|)

slide-23
SLIDE 23

ICDM 2011 23

Putting things together: MEDEN

  • Steps of MEDEN:
  • Obtain a LB to the solution
  • Filter groups
  • Filter members
  • Verify using TopDown
  • Running time
  • Filtering takes O(t log2(t) |E|)
  • Verification takes O(|C||E|log|V|), where |C| is

the number of not pruned intervals

  • |C| is small (linear in t in our experiments)
slide-24
SLIDE 24

ICDM 2011 24

Outline

  • Motivation and problem definition
  • Complexity and previous work
  • Mining Edge Dynamic Networks (MEDEN)
  • Datasets and Results
slide-25
SLIDE 25

ICDM 2011 25

Scalability with time

slide-26
SLIDE 26

ICDM 2011 26

Graph size and overlap for grouping

slide-27
SLIDE 27

ICDM 2011 27

Conclusion

  • We are the first to introduce the Heaviest Dynamic

Subgraph (HDS)

  • Our approach MEDEN scales to real world

graphs and outperforms a basic approach by an

  • rder of magnitude due to interval grouping
  • Future directions
  • Extend to scalable top-k
  • Allow smoothly changing patterns
slide-28
SLIDE 28

ICDM 2011 28

THANK YOU

Questions?

slide-29
SLIDE 29

ICDM 2011 29

NP-hardness

Reduction from Thumbnail Rectilinear Steiner Tree

1

  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1

1 1

  • 1
  • 1
  • 1

4n

1

4n

1

4n

1

4n

1 1

slide-30
SLIDE 30

ICDM 2011 30

Upper Bounds

UBsop UBstr

slide-31
SLIDE 31

ICDM 2011 31

TopDown heuristic

slide-32
SLIDE 32

ICDM 2011 32

Results - Twitter

Twitter sub-network

  • Nodes: 2605
  • Edges: 14871
  • Slices: 204
  • Resolution: 1 day
  • Cutoff: cosine similarity 0.004
slide-33
SLIDE 33

ICDM 2011 33

Can we improve the evaluation in time?

  • Still O(t2) sub-intervals need to be considered
  • Infeasible on long time-spans
  • Combine overlapping intervals into groups.
  • Ensure:
  • 1. High overlap of intervals in a group
  • Sub-quadratic number of groups
  • Sub-quadratic time to compute bounds for

groups

  • Prune groups as a whole, without considering

individual members

slide-34
SLIDE 34

ICDM 2011 34

Some References

  • Hwang, Ju-Won; Lee, Young-Seol; Cho, Sung-Bae; , "Structure evolution of

dynamic Bayesian network for traffic accident detection," Evolutionary Computation (CEC), 2011 IEEE Congress on , vol., no., pp.1655-1671, 5-8 June 2011

  • Borgwardt, K.M.; Kriegel, H.-P.; Wackersreuther, P.; , "Pattern Mining in

Frequent Dynamic Subgraphs," Data Mining, 2006. ICDM '06. Sixth International Conference on , vol., no., pp.818-822, 18-22 Dec. 2006

  • Johnson DS, Minkoff M, Phillips S. The Prize Collecting Steiner Tree Problem :

Theory and Practice. SODA. 2000.

  • Kwon J, Murphy K. Modeling Freeway Traffic with Coupled HMMs. 2004
  • Berlingerio M, Bonchi F. Mining graph evolution rules. Machine Learning and

Knowledge Discovery in Databases. 2009:115-130.

  • Stoev SA, Michailidis G, Vaughan J. Global Modeling and Prediction of Computer

Network Traffic. 2009:1-32.

  • Wackersreuther B, Wackersreuther P, Oswald A, Böhm C, Borgwardt KM.

Frequent Subgraph Discovery in Dynamic Networks. Developmental Biology. 2010:155-162.

  • Macropol K, Singh AK: Content-based Modeling and Prediction of Information

Dissemination ASONAM 2011