outline
play

Outline MAFIA: A Maximal Introduction Frequent Itemset Related - PowerPoint PPT Presentation

Outline MAFIA: A Maximal Introduction Frequent Itemset Related Work Algorithm for Algorithmic Components Transactional Databases Database Representation Experimental Results Authors: Doug Burdick, Manuel Calimlim, Johannes


  1. Outline MAFIA: A Maximal � Introduction Frequent Itemset � Related Work Algorithm for � Algorithmic Components Transactional Databases � Database Representation � Experimental Results Authors: Doug Burdick, Manuel Calimlim, Johannes Gehrke � Comparison - DepthProject Presented by: Benjamin Chu � Conclusion CMPUT 695 Fall 2004 2 MAFIA Presentation Introduction - Problem Introduction - Solutions � Alternatives to “FI” Association Rule Mining, Two Phases � � Frequent Closed Itemsets ( FCI ) Find Frequent Itemsets ( FI ) 1) � FCI: Itemset X is closed if there are no supersets Generate “interesting” patterns 2) with the same support. Most time in association rule mining is � � Maximal Frequent Itemsets ( MFI ) spent in finding the frequent itemsets � MFI: Itemset X is maximally frequent if no superset of X is frequent. When itemsets are long (g.t. 15-20 � items), finding entire FI can be infeasible. 3 4 MAFIA Presentation MAFIA Presentation

  2. Introduction - MAFIA Introduction - Item Subset Lattice / Tree � Integrates new and old ideas into practical algorithm for solving MFI problem � Problem of mining frequent itemsets viewed as finding a cut through itemset lattice � All items above cut are frequent itemsets � Head(N) – itemset identifying node N � All items below cut are infrequent itemsets � Tail(N) – set of all possible extensions of the node N � HUT – Head U(nion) Tail 5 6 MAFIA Presentation MAFIA Presentation Related/Prior Work on MFI Outline � Introduction � Apriori � Related Work � MaxMiner � Algorithmic Components � DepthProject � Database Representation � MaxClique � Experimental Results � MaxEclat � Comparison – DepthProject � Pincer-Search � Conclusion � VIPER 7 8 MAFIA Presentation MAFIA Presentation

  3. Outline Algorithmic Components � Introduction � MAFIA: depth-first traversal of item subset � Related Work lattice with search space pruning: � Algorithmic Components � PEP � Database Representation � FHUT � HUTMFI � Experimental Results � Dynamic reordering � Comparison – DepthProject � Conclusion 9 10 MAFIA Presentation MAFIA Presentation Search Space Pruning - PEP Search Space Pruning - FHUT � Parent Equivalence Pruning � Frequent Head Union Tail � Given current node in itemset tree with � For a node n , the largest possible frequent head x and tail element y , t(x) ⊆ t(y) itemset contained in subtree rooted at n is means any transaction containing x also n’s HUT (Head Union Tail). contains y � If n ’s HUT is found to be frequent, do not � Since we only want maximal frequent explore any subsets of the HUT. itemsets, we can move y to the head if � The subtree rooted at n can be pruned t(x) ⊆ t(y) holds away. 11 12 MAFIA Presentation MAFIA Presentation

  4. Search Space Pruning – Dynamic Search Space Pruning - HUTMFI Reordering � Head Union Tail Maximal Frequent � Tail of a node such that it only contains Itemset frequent extensions of the current node � If a superset of HUT for the current node is � Tail elements are ordered by increasing already in the MFI, then the HUT is support (keeps search space as small as frequent. possible). � The subtree rooted at this node can be pruned away. 13 14 MAFIA Presentation MAFIA Presentation MAFIA Algorithm Outline � Introduction � Related Work � Algorithmic Components � Database Representation � Experimental Results � Comparison – DepthProject � Conclusion (from: http://himalaya-tools.sourceforge.net/mafiappt_files/800x600/Slide26.html) 15 16 MAFIA Presentation MAFIA Presentation

  5. Database Representation Database Representation � Vertical bitmap representation allows for � Vertical Bitmap optimized support counting and efficient itemset � Each item is allocated a set of bits, one bit for each transaction in the database generation � If item X appears in transaction j, then the j th bit of item X is set to one X X Y XY 1 1 0 0 1 1 2 3 4 5 0 0 0 Lookup 0 TID Items 1 1 0 1 0 pregenerated 0 1 0 1 1 1, 2, 4 & ‘onecount’ for 1 1 1 1 1 1 0 0 1 2 1, 2, 5 219 0 1 1 1 3 1, 2, 3, 4 1 1 1 1 0 1 0 0 0 4 1, 4 1 1 0 0 1 0 Itemset 1 0 0 Support Counting … Generation 17 18 MAFIA Presentation MAFIA Presentation Database Compression Outline � Introduction � Problem: Sparse bitmaps at low support levels � Related Work � Algorithmic Components � Solution: Remove bits that don’t matter � Database Representation � To count support of subtree rooted at a � Experimental Results node N, only need transactions containing itemset X at node N � Comparison - DepthProject � Conclusion � Product: projected bit vector 19 20 MAFIA Presentation MAFIA Presentation

  6. Experimental Results - Experimental Results Compression 21 22 MAFIA Presentation MAFIA Presentation Outline Comparison - DepthProject � Introduction � DepthProject: “state-of-the-art” maximal pattern algorithm � Related Work � Algorithmic Components � Differences: � Database Representation � Uses horizontal database layout � Experimental Results � Alternate pruning: bucketing � Comparison - DepthProject � Conclusion 23 24 MAFIA Presentation MAFIA Presentation

  7. Comparison - DepthProject Comparison - DepthProject � Influence of PEP Reduction Factor of Nodes Considered Due to PEP Pruning 25 26 MAFIA Presentation MAFIA Presentation Comparison - DepthProject Comparison – DepthProject Scaleup of Chess.data Time Comparison on Chess.data � MAFIA: Only a � MAFIA: Scales factor of two very well with better than the number of DepthProject transactions on this dataset 27 28 MAFIA Presentation MAFIA Presentation

  8. Outline Conclusions � Introduction � Increased efficiency of MAFIA over � Related Work DepthProject due to: � Algorithmic Components � Fast itemset generation and support counting � Database Representation � Parent-equivalence pruning � Experimental Results � Comparison – DepthProject � Conclusion 29 30 MAFIA Presentation MAFIA Presentation Conclusions – MAFIA flexibility Conclusions – MAFIA flexibility � MAFIA can also be used to find all FI � MAFIA can be used to mine FCI � To Find FI: � To find FCI: � Suppress all pruning tools (PEP, FHUT, � Only use PEP for pruning HUTMFI). � Still check for supersets in previously � Add all frequent nodes in itemset lattice to FI discovered FCI without superset checking 31 32 MAFIA Presentation MAFIA Presentation

  9. Conclusion - Followup Conclusions � After original paper, new version of MAFIA uses � MAFIA shines when: progressive focusing technique introduced in � Data is dense and contains long itemsets GenMax [Gouda,Zaki]: LMFI update � Database is large � MAFIA is not so good when: � minimum support is high (short itemsets) � MAFIA and GenMax are both useful 33 34 MAFIA Presentation MAFIA Presentation Thank you! Questions?

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend