h mine hyper structure paper s goals mining of frequent
play

H-Mine: Hyper-Structure Papers goals Mining of Frequent Patterns in - PowerPoint PPT Presentation

H-Mine: Hyper-Structure Papers goals Mining of Frequent Patterns in Large Databases Introduce a new data structure: H-struct Introduce a new mining algorithm: H-mine J. Pei, J. Han, H. Lu, S. Nishio, S. Tang, and D. Yang


  1. H-Mine: Hyper-Structure Paper’s goals Mining of Frequent Patterns in Large Databases ■ Introduce a new data structure: H-struct ■ Introduce a new mining algorithm: H-mine J. Pei, J. Han, H. Lu, S. Nishio, S. Tang, and D. Yang ■ Introduce a new data mining methodology: Int. Conf. on Data Mining (ICDM'01), San Jose, CA space-preserving mining Presented by Leonid Mocofan 1 2 Why a new algorithm ? H-mine characteristics Two current algorithm categories: ■ ■ It has limited and precisely predictable – Candidate generation-and-test approach: space overhead. • E.g., Apriori algorithm – Pattern growth methods: ■ It can scale up to very large databases • E.g., FP-growth, TreeProjection by using database partitioning They have performance bottlenecks: ■ – Huge space required for mining ■ When the data sets are dense, it can – Real databases contain all the cases switch to use FP-trees to continue the – Large applications need more scalability mining process 3 4

  2. Frequent pattern mining Frequent pattern mining introduction definitions ■ set of items: I = {x 1 ,…,x n } Frequent pattern: For a transaction database TDB and a support threshold min_sup , X is a ■ itemset X: subset of items (X ⊆ I) frequent pattern if and only if sup(X) ≥ min_sup ■ transaction: T=(tid, X) ■ transaction database: TBD Frequent pattern mining: Finding the complete set of frequent patterns in a given ■ support(X): number of transactions in transaction database with respect to a given TDB containing X support threshold. 5 6 H-mine algorithm H-mine(Mem) – Example minimum support threshold is 2 Trans Items Frequent-item H-mine(Mem) – memory based, ID projection 1. 100 c,d,e,f,g,i c,d,e,g efficient pattern-growth algorithm 200 a,c,d,e,m a,c,d,e Header a c d e g 300 a,b,d,e,g,k a,d,e,g H-mine based on H-mine(Mem) for Table H 3 3 4 3 2 2. 400 a,c,d,h a,c,d large databases by first partitioning the 100 c d e g database F-list : a-c-d-e-g frequent 200 a c d E projections For dense data sets, H-mine is 3. 300 a d e g integrated with FP-growth dynamically a c d 400 H-struct 7 8

  3. H-mine(Mem) – Example H-mine(Mem) – Example Header Header Header H eader a c d e g Table H Table H a Table H ac H eader c d e g Table H 3 3 4 3 2 Table H a 2 3 2 1 a c d e g c d e g d e 3 3 4 3 2 2 3 2 1 2 1 100 c d e g frequent 100 c d e g 200 a c d g projections frequent 200 a c d g a d e g 300 projections 300 a d e g a c d 400 a c d 400 H eader table H a and ac -queue Header table H ac 9 10 H-mine(Mem) – Example H-mine(Mem) – Example Header a c d e g H eader a c d e g c d e g Table H 3 3 4 3 2 Table H 3 3 4 3 2 H eader 2 3 2 1 Table H 100 c d e g 100 c d e g frequent frequent 200 a c d e 200 a c d g projections projections 300 a d e g a d e g 300 a c d 400 a c d 400 Adjusted hyper-links after mining H eader table H a and ad -queue a- projected database 11 12

  4. H-mine: Mining large databases H-mine: Mining large databases ■ Apply H-mine(Mem) to TDB i with minimum ■ TDB transaction database (size n ) support threshold  min_sup ∗ n i /n  ■ Minimum support threshold min_sup ■ Find L, the set of frequent items ■ Combine F i , set of locally frequent pattern in TDB i , to get the globally frequent patterns. ■ TDB partitioned in k parts (TDB i , 1 ≤ i ≤ k ) 13 14 H-mine – Example Performance ■ H-mine has better runtime performance ■ TDB split in P 1 ,P 2 ,P 3 ,P 4 on both sparse and dense data than ■ Minimum support threshold 100 FP-growth and Apriori Local freq. pat. Partitions Accumulated sup.cnt ■ H-mine has better space usage on both ab P 1 ,P 2 ,P 3 ,P 4 280 sparse and dense data than FP-growth ac P 1 ,P 2 ,P 3 ,P 4 320 ad P 1 ,P 2 ,P 3 ,P 4 260 and Apriori abc P 1 ,P 3 ,P 4 120 ■ H-mine performs well with very large abcd P 1 ,P 4 40 … … … databases too ■ Frequent patterns: ab, ac, ad, abc 15 16

  5. Conclusions Bibliography H-mine: ■ “H-Mine: Hyper-Structure Mining of Frequent ■ has high performance Patterns in Large Databases”, J. Pei, J. Han, H. Lu, S. Nishio, S. Tang, and D. Yang, Int. Conf. on Data ■ is scalable in all kinds of data Mining (ICDM'01), San Jose, CA, Nov. 2001. ■ has very small space overhead ■ “Mining Frequent Patterns without Candidate Generation”, J. Han, J. Pei, and Y. Yin, ACM- ■ can dynamically adapt to input data SIGMOD 2000, Dallas, TX, May 2000. ■ introduces structure- and space- ■ “Data Mining: Concepts and Techniques”, Jiawei Han and Micheline Kamber, The Morgan Kaufmann Pub., preserving mining methodology 2001. 17 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend