mining free itemsets under
play

Mining Free Itemsets under Constrained itemset mining Constraints - PowerPoint PPT Presentation

Content Introduction Mining Free Itemsets under Constrained itemset mining Constraints Apriori revisit Anti-monotone constrains Monotone constrains By Jean-Francois Boulicaut and Baptiste Jeudy Generic algorithm


  1. Content � Introduction Mining Free Itemsets under � Constrained itemset mining Constraints � Apriori revisit � Anti-monotone constrains � Monotone constrains By Jean-Francois Boulicaut and Baptiste Jeudy � Generic algorithm International Database Engineering and � Frequent closed itemset mining Application Symposium � CLOSE algorithm � Incorporating constraints into Apriori � Conclusion Introduction Introduction � Frequent itemset mining � Problems with frequent itemset mining algorithms � A set of items is referred to as itemset � The computation may be intractable for a user- # X � X is an item(or itemset), = Support X ( ) n given frequency threshold: the number of � Support is bounded by a threshold r frequent itemsets may explode � A frequent itemset is an itemset with a support � Lack of focus leads to huge output of frequent larger than the minimum support itemsets � Given a database, find all the frequent itemsets

  2. Introduction Introduction � Two issues to tackle these problems � Constraint-based extraction of frequent itemsets � Syntactic constraints � Constraint-based extraction of the frequent itemsets: only a subset of the collection of � an item must not appear in the itemsets frequent itemsets is interesting. � Constraints related to objective measures of interestingness � Condensed representation of frequent itemsets: extract a subset of the frequent patterns and � the itemsets must be frequent regenerate the whole collection when necessary • Decrease the � Push constraint checking into algorithms size of output � Anti-monotone constraints • Improve user guidance � Monotone constraints Introduction Introduction � Condensed representation of frequent � Main idea of the paper itemsets � Combine the above two approaches into one algorithm � Extract a particular subset of the frequent itemset � This algorithm is based on the structure of collection Apriori � The condensed subset is much smaller than the original collection � Can be extracted efficiently � The whole frequent itemsets can be regenerated

  3. Content Summary of paper � Introduction � Definition of constraints � Constrained itemset mining T : transactional database � � Apriori revisit 2 Items : set of all itemsets � � Anti-monotone constrains C : constraint � � Monotone constrains S 2 Items S ∈ : itemset, � � Generic algorithm 2 Items I : subset of � � Frequent closed itemset mining iff C S ture � S satisfies C in T ( , T ) = � CLOSE algorithm ∈ S I S C SAT (I)={ , satisfies } � C � Incorporating constraints into Apriori denotes Items SAT SAT (2 ) � C C � Conclusion Summary of paper Summary of paper TID Items � Constrained itemset mining Itemset Support Frequency 1 ABCD A 1,2,3,4,6 0.83 T : transactional database � 2 AC B 1,4,5,6 0.67 3 AC : constraint C AB 1,4,5 0.5 � 4 ABCD AC 1,2,3,4,6 0.83 � Computation of the collection of itemsets that 5 BC CD 1,4 0.33 satisfy together with their frequecies C 6 ABC ACD 1,4 0.33 = ∈ R {( , S F S ( )), S SAT } � C C ≡ ≥ r C ( ) S F S ( ) r : an itemset must be at least frequent. � Use Apriori for constrained itemset mining freq r = SAT { , , A B C AC BC , , } = 0,6 C where is freq C C ≡ ≤ ≡ ∉ C ( ) | S S | 2 C ( ) S B S , then and size miss freq = = SAT { , A C D AC AD CD , , , , } SAT { , A C AC , } , Λ Λ Λ C C C C C size miss size miss freq

  4. Summary of paper - Apriori Anti-monotone constraints The completeness of Apriori relies on the � Definition: an anti-monotone constraint is a � Apriori Algorithm anti-monotonicity of Phase 1 – Candidate safe the constraint pruning constraint C such that for all itemsets S, S’: = = Φ g � 1. C : Items L ; { } Eliminate candidates for 1 1 0 k = : 1 which a subset of length k ⊆ Λ ⇒ � 2. ( ' S S S C ) S ' C satisfy satisfy is not frequent Phase 2- frequency Phase 3 – candidate � 3 .while do C ≠ Φ g � If S does not satisfy , every superset of S constraint generation for level k+1, C k (database scan) am fuse two elements that = � 4. C : safe-pruning-on( ) g C , L − does not satisfy share the same k-1 first k k k 1 C am items = � 5 . L : SAT ( C ) � Example: k C k ≤ sum S price ( , ) v freq � 6. g + = C 1 : generate ( L ) k apriori k = ∪ � A disjunction or conjunction of anti-monotone generate ( L ) { A B , where apriori k = + � 7. k : k 1 ∈ A B , L A and B share the k-1 first k , constraints is an anti-monotone constraint items(in lexicographic order)} − U k 1 L � 8 . i = 0 i Anti-monotone constraints Monotone Constraints � Apriori can be changed: � Definition � Let be an anti-monotone constraint. Step 5 of is true is true ∈ ⇒ ∀ ⊃ C S Items C , ( ) S S ' S C , ( ') S � am m m What about Apriori is replaced by = L : SAT ( C ) � Example monotone k C k am constraints? it is still correct and complete. ≥ sum S price ( , ) v � � Apriori can be used to mine constrained � Given a monotone constraint , simply C m itemsets when the given constraint is anti- replacing Step 5 in Apriori with = L : SAT ( C ) k C k monotone m leads to the loss of the completeness of Apriori.

  5. The generation step Monotone Constraints Monotone Constraints in Apriori must be complete: i.e., it must not miss any itemset satisfying C � Example � Some definition in modified generation The pruning step procedure � Assume Itemset ABC should be ≡ ∈ C S ( ) C S . (Phase 1) must be correct, i.e., it must generated by from AB and AC but generate � Negative border: If denotes an anti-monotone C apriori not prune an itemset am since ACB is not generated whereas = constraint, is the collection of the minimal Bd C AB ( ) false , that verify C C am = C ABC ( ) true itemsets that do not satisfy C am denotes a monotone constraint, it is the � Assume Itemset ABC is correctly ≡ ∈ C C S ( ) A S . � m negation of , so equals to C ¬ C generated by from AB and AC but C ' am generate am m apriori since ACB is incorrectly pruned = C AB ( ) false , whereas = C ABC ( ) true The generation step and pruning step need to be modified in order to include monotone constraints Monotone Constraints Monotone Constraints � Generation procedure � Pruning procedure prune m = ∈ } � generate L 1 ( ) { A B where , A L and B is a 1-itemset � For all and for all such that | S’|=k ⊂ U S ' S k ∈ g k S C + • C denotes an anti- k 1 am generate L 2 ( ) = { A B where , � U A,B ∈ L } monotone constraint k k do if and ∉ = S ' L C ( ') S true ¬ C ' am • denotes a Assume and = Λ¬ k m C C C ' = � ms Max | S | am am S Bd ∈ C am ' monotone constraint then delete S from C + g generate � m Bd • denotes the k 1 C generate ( L ) = Bd ∩ Items am We do not need to collection of the m 0 C ' 1 is correct and complete prune am � , verify the monotone minimal itemsets that do m For k ≥ 1 constraint after this not satisfy C � If k<ms, generate ( L ) = generate L ( ) ∪ ( Bd ∩ Items + ) m k 1 k C ' k 1 m am generation = The algorithm is correct because it does not prune any itemset that verify � If k=ms, generate ( L ) generate L ( ) m k 1 k procedure = Λ¬ .Its completeness means that if an itemset is not pruned then � If k>ms, C C C ' = generate ( L ) generate L ( ) am am m k 2 k every proper subset of that itemset verify . C � This generation procedure is complete and ensures that am every candidate itemset verifies ( ) ¬ C ' am C m

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend