high confidence rule mining h f d l
play

High Confidence Rule Mining h f d l Row Enumeration for - PowerPoint PPT Presentation

2007-11-26 Outline Introduction High Confidence Rule Mining h f d l Row Enumeration for Microarray Analysis Confidence based Prune Strategy MAXCONF Algorithm Kang Deng Evaluation U i University of Alberta it f Alb t


  1. 2007-11-26 Outline • Introduction High Confidence Rule Mining h f d l • Row Enumeration for Microarray Analysis • Confidence ‐ based Prune Strategy • MAXCONF Algorithm Kang Deng • Evaluation U i University of Alberta it f Alb t • References What is Microarrays? “High Confidence Rule Mining for Microarray Analysis”, by Tara McIntosh, • A DNA microarray is a collection of Sanjay Chawla, 2006 microscopic DNA spots, commonly representing microscopic DNA spots commonly representing • 27 pages single genes, arrayed on a solid surface by • 14 definitions covalent attachment to a chemical matrix. Confidence-based Strategy genes • 2 lemmas items • 4 tables samples l • Figures, formulas, etc transactions 1

  2. 2007-11-26 Row Enumeration Our Task Explosive increase of candidates • Traditional Dataset • Microarray Dataset • One main objective of molecular biology is to items items 1-length d develop a deeper understanding of how l d d t di f h 2-length transactions genes are functionally related. 3-length The length of transactions the patterns minimum support minimum confidence is much less than the average Minimum Support = 30%, Minimum Confidence = 80% number of items in One items in One transaction We do not mine association rules, but Width: 12 Length: 10000 Width: <500 Length: >>6000 Confidence Rules. How can we make the right rectangle like EXPLOSION!!! the left one? Row Enumeration Transposed table & Tree Outline Items Transactions A 1,2,5,6 B 1,4,8 • Introduction C 1,2,3,4,5,8 D 1,2,3,4,6,7,8 E 1,2,3,4,5 • Row Enumeration F 3 G 1,2,3,4,8 • Confidence ‐ based Prune Strategy H 3 I 3,5,6,7 • MAXCONF Algorithm J 7 • Evaluation Row Enumeration • References 2

  3. 2007-11-26 Row Enumeration Confidence ‐ based Strategy Tree If the current parent node n, is completely contained within a l l i d i hi RER II, “Mining frequent closed patterns in microarray data.” by G. Cong, K.-L. sibling node, a child node is not Tan, A. Tung, and F. Pan, 2004 Support-based pruning strategy constructed. Minimum Support = 30%, Minimum Confidence = 80% For Example, node 2. In Biology, we care confidence rules, but not support Confidence ‐ based Strategy Prune #1 Outline • Introduction • Row Enumeration • Confidence ‐ based Prune Strategy • MAXCONF Algorithm • Evaluation • References σ = + = m a x ( 5 ) 1 2 3 3

  4. 2007-11-26 Confidence ‐ based Strategy Confidence ‐ based Strategy Prune #1 Prune #1 → This rule has the highest confidence ( ) I ( ACEG ) → √ What about this one? ( AI ) ( CEG ) In the itemset {A,B,C}, Support(A)<=Support(B), Support(A)<=Support(C) { } pp ( ) pp ( ) pp ( ) pp ( ) So, A is the minimum feature in {A,B,C} Itemset becomes larger, the support of it will not change or even become smaller σ ≤ σ ( AI ) ( ) I σ ( itemset ) (A)->(B, C) is an I-spanning rule; (B, C)->(A) is not = confidence σ ( ( ) ) antecedent Confidence ‐ based Strategy Confidence ‐ based Strategy Prune #1 Prune #2 → → → Itemset: {CDEG} Itemset: {CDEG} C C DEG E DEG E , CDG G CDG G , CDE CDE The maximum feature of CDEG is CEG Prune Strategy #2: If maximum feature set M of an itemset at node If maximum feature set M of an itemset at node σ = + = m a x ( 5 ) 1 2 3 Maximum Support of 5 n is not empty, we can prune all child nodes of σ = ( I ) 4 Minimum Feature in this itemset is I n whose itemsets are subsets of M. σ (5) Maximum Confidence of 5: = = conf (5) max 3 / 4 max σ ( ) I If minimum confidence is 4/5, the child of node #5 will be pruned σ σ ( antecedent ) ( ) I 4

  5. 2007-11-26 Confidence ‐ based Strategy MAXCONF Algorithm Prune #2 Pruning #2 Pruning #1 σ Itemset: (1234){CDEG} (5) = = conf (5) max 3 / 4 max σ → → → → ( ) ( ) I → → C C DEG E DEG E , CDG G CDG G , CDE CDE → → → Itemset: (1234){CDEG} C DEG E , CDG G , CDE Maximum Feature {CEG} Itemset of child node {CEG} The maximum feature of CDEG is CEG Sub- → → → Node (12345)generates: C EG E , CG G , CE rules Outline Outline • Introduction • Introduction • Row Enumeration • Row Enumeration • Confidence ‐ based Prune Strategy • Confidence ‐ based Prune Strategy • MAXCONF Algorithm • MAXCONF Algorithm • Evaluation • Evaluation • Conclusion • References 5

  6. 2007-11-26 Evaluation Evaluation Scalability MAXCONF vs RER II The performance of RER II is not Two Aspect: affected by minimum confidence 1 Rule Generation 1.Rule Generation In most cases, MAXCONF is better I t MAXCONF i b tt than RER II. RER II only outperforms 2.Scalability MAXCONF when the minimum support is higher than 40%. Evaluation Rule Generation References • MAXCONF, “High Confidence Rule Mi i Mining for Microarray Analysis”, by f Mi A l i ” b Tara McIntosh, Sanjay Chawla, 2006 • RER II, “Mining frequent closed patterns in microarray data.” by G. Cong, K.-L. Tan, A. Tung, and F. When minimum support is 0, Pan, 2004 RER II will run out of memory MAXCONF generates more rules than RER II! 6

  7. 2007-11-26 Any Question? Thanks for your Attentions 7

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend