reductions for frequency based data mining problems
play

Reductions for Frequency- Based Data Mining Problems Stefan Neumann - PowerPoint PPT Presentation

Reductions for Frequency- Based Data Mining Problems Stefan Neumann & Pauli Miettinen Maximal Frequent Patterns A pattern is a subset of the data entities itemset, subgraph, subsequence, A pattern is frequent if it appears su


  1. Reductions for Frequency- Based Data Mining Problems Stefan Neumann & Pauli Miettinen

  2. Maximal Frequent Patterns • A pattern is a subset of the data entities • itemset, subgraph, subsequence, … • A pattern is frequent if it appears su ffi ciently often in the data • A frequent pattern is maximal if it is not contained in any other frequent pattern • Studied since 1990s

  3. Computational Complexity • Comp. complexity of maximal pattern mining surprisingly unknown • Potentially exponentially many max. patterns 
 ⇒ takes exponential time • More fine-grained answers: • Time w.r.t. input and output 
 (enumeration complexity, Johnson et al. 1988) • Time spent to count the number of maximal patterns 
 (counting complexity, Valiant 1979)

  4. Reductions • A can be reduced to B if we can solve A e ff ectively with an algorithm to solve B • ” B is at least as hard as A” • In this talk : maximality-preserving reductions between frequent pattern mining problems • ”Maximum X mining is at least as hard as maximum Y mining”

  5. State of the Art Sequences with 
 Undir. graphs 
 no repetition Directed cyclic graphs with treewidth ≤ 3 MaxSQS MaxFS( DAG ) MaxFS( BTW 3 ) Undir. graphs 
 with degree ≤ 3 MaxFS( BDG 3 ) MaxFS( T ) Undir. trees MaxFS( PLN ) MaxFS( DirG ) MaxFIS Planar undir. graphs Directed graphs MaxFS( G ) Itemsets Uniquely labelled 
 undirected graphs A → B = A can be reduced to B

  6. Maximality-Preserving Reductions MaxSQS MaxFS( DAG ) MaxFS( BTW 3 ) MaxFS( BDG 3 ) MaxFS( T ) MaxFS( PLN ) MaxFS( DirG ) MaxFIS MaxFS( G ) These reductions preserve enumeration and counting complexity A → B = A can be reduced to B

  7. Impressed? • Why no more reductions? • Example: From MaxFS( G ) to MaxFIS • Each edge { u , v } has a unique label ( l ( u ), l ( v )) • Make the edges as items and graphs as transactions • Mine maximal frequent itemsets • This doesn’t (quite) work!

  8. What’s Wrong? tid A–B A–D B–C B–D C–D A B C D 1 1 0 1 0 1 A D C B 2 0 1 1 0 1 3 1 0 0 1 1 A B D C Frequent itemsets (minfreq 2/3): Not connected! (3) (2) (2) C D A B A B C D (2) (2) B C B C D

  9. Feasible Patterns • T o be able to encode the connectedness, we need to constrain the feasible patterns • We can adjust our reductions to work with these constraints. E.g.: • maximal graph patterns must map to maximal feasible itemsets, and • it must be easy to compute the graph patterns from the feasible maximum itemsets • These constraints are transitive

  10. Maximality-Preserving Reductions for Feasible Patterns MaxSQS MaxFS( DAG ) MaxFS( BTW 3 ) The complexity collapses under these reductions! MaxFS( BDG 3 ) MaxFS( T ) MaxFS( PLN ) MaxFS( DirG ) MaxFIS MaxFS( G ) A → B = A can be reduced to B

  11. Maximality-Preserving Reductions for Feasible Patterns The complexity collapses under these reductions! MaxFS( BTW 3 ) MaxSQS MaxFS( T ) MaxFS( DAG ) MaxFS( BDG 3 ) MaxFS( DirG ) MaxFS( PLN ) MaxFIS MaxFS( G ) A → B = A can be reduced to B

  12. Summary • For all feasible pattern versions of the problems: • Enumerating all feasible patterns is #P-hard • Given a set of feasible patterns, deciding whether there is any more feasible patterns is NP-hard • Even if only two patterns are given • For any fixed minfreq threshold τ , the enumeration can be done in polynomial time

  13. Conclusions • Most maximal pattern mining problems are essentially equally hard • Methods for one type of problem can be used to solve other types, as well • Feasible patterns admit usually constraints that are amenable to standard level-wise algorithms • Notable exceptions: MaxFS on general graphs and sequences with repetitions • Subgraph isomorphism is NP-hard Ti an k Yov !

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend