High Confidence Rule Mining h f d l Row Enumeration for - - PowerPoint PPT Presentation

high confidence rule mining h f d l
SMART_READER_LITE
LIVE PREVIEW

High Confidence Rule Mining h f d l Row Enumeration for - - PowerPoint PPT Presentation

2007-11-26 Outline Introduction High Confidence Rule Mining h f d l Row Enumeration for Microarray Analysis Confidence based Prune Strategy MAXCONF Algorithm Kang Deng Evaluation U i University of Alberta it f Alb t


slide-1
SLIDE 1

2007-11-26 1

h f d l High Confidence Rule Mining for Microarray Analysis

Kang Deng U i it f Alb t University of Alberta

“High Confidence Rule Mining for Microarray Analysis”, by Tara McIntosh,

Sanjay Chawla, 2006

  • 27 pages
  • 14 definitions
  • 2 lemmas
  • 4 tables

Confidence-based Strategy

  • Figures, formulas, etc

Outline

  • Introduction
  • Row Enumeration
  • Confidence‐based Prune Strategy
  • MAXCONF Algorithm
  • Evaluation
  • References

What is Microarrays?

  • A DNA microarray is a collection of

microscopic DNA spots commonly representing microscopic DNA spots, commonly representing single genes, arrayed on a solid surface by covalent attachment to a chemical matrix. genes l

items

samples

transactions

slide-2
SLIDE 2

2007-11-26 2

Our Task

  • One main objective of molecular biology is to

d l d d t di f h develop a deeper understanding of how genes are functionally related.

minimum support minimum confidence Minimum Support = 30%, Minimum Confidence = 80%

We do not mine association rules, but Confidence Rules.

Outline

  • Introduction
  • Row Enumeration
  • Confidence‐based Prune Strategy
  • MAXCONF Algorithm
  • Evaluation
  • References

Row Enumeration

  • Traditional Dataset

items

1-length

  • Microarray Dataset

items Explosive increase of candidates transactions

2-length 3-length The length of the patterns is much less than the average number of items in One

transactions

items in One transaction

Width: 12 Length: 10000 Width: <500 Length: >>6000 How can we make the right rectangle like the left one?

EXPLOSION!!!

Transposed table & Tree Items Transactions

A 1,2,5,6 B 1,4,8 C 1,2,3,4,5,8 D 1,2,3,4,6,7,8

Row Enumeration

E 1,2,3,4,5 F 3 G 1,2,3,4,8 H 3 I 3,5,6,7 J 7

Row Enumeration

slide-3
SLIDE 3

2007-11-26 3

Tree

Row Enumeration

If the current parent node n, is l l i d i hi completely contained within a sibling node, a child node is not constructed. For Example, node 2.

Outline

  • Introduction
  • Row Enumeration
  • Confidence‐based Prune Strategy
  • MAXCONF Algorithm
  • Evaluation
  • References

Confidence‐based Strategy

RER II, “Mining frequent closed patterns in microarray data.” by G. Cong, K.-L. Tan, A. Tung, and F. Pan, 2004 Support-based pruning strategy In Biology, we care confidence rules, but not support

Minimum Support = 30%, Minimum Confidence = 80%

Confidence‐based Strategy

Prune #1

m a x ( 5 )

1 2 3 σ = + =

slide-4
SLIDE 4

2007-11-26 4

Confidence‐based Strategy

Prune #1

In the itemset {A,B,C}, Support(A)<=Support(B), Support(A)<=Support(C) { } pp ( ) pp ( ) pp ( ) pp ( ) So, A is the minimum feature in {A,B,C} (A)->(B, C) is an I-spanning rule; (B, C)->(A) is not

Confidence‐based Strategy

Prune #1

m a x ( 5 )

1 2 3 σ = + =

Minimum Feature in this itemset is I

( ) 4 I σ =

Maximum Confidence of 5:

max max

(5) (5) 3 / 4 ( ) conf I σ σ = =

If minimum confidence is 4/5, the child of node #5 will be pruned Maximum Support of 5

( ) antecedent σ ( ) I σ

Confidence‐based Strategy

Prune #1

( ) ( ) I ACEG →

This rule has the highest confidence

( ) ( ) AI CEG →

What about this one?

Itemset becomes larger, the support of it will not change or even become smaller

( ) ( ) AI I σ σ ≤

( ) ( ) itemset confidence antecedent σ σ = ( )

Confidence‐based Strategy

Prune #2 Itemset: {CDEG}

C DEG E CDG G CDE

The maximum feature of CDEG is CEG

Itemset: {CDEG}

, , C DEG E CDG G CDE → → →

Prune Strategy #2: If maximum feature set M of an itemset at node If maximum feature set M of an itemset at node n is not empty, we can prune all child nodes of n whose itemsets are subsets of M.

slide-5
SLIDE 5

2007-11-26 5

Confidence‐based Strategy

Prune #2 The maximum feature of CDEG is CEG Itemset: (1234){CDEG}

, , C DEG E CDG G CDE → → →

Node (12345)generates:

, , C EG E CG G CE → → →

Sub- rules

Outline

  • Introduction
  • Row Enumeration
  • Confidence‐based Prune Strategy
  • MAXCONF Algorithm
  • Evaluation
  • Conclusion

MAXCONF Algorithm

Pruning #1

max max

(5) (5) 3 / 4 ( ) conf I σ σ = =

Pruning #2

Itemset: (1234){CDEG} C DEG E CDG G CDE → → → ( ) , , C DEG E CDG G CDE → → →

Maximum Feature {CEG} Itemset of child node {CEG}

Outline

  • Introduction
  • Row Enumeration
  • Confidence‐based Prune Strategy
  • MAXCONF Algorithm
  • Evaluation
  • References
slide-6
SLIDE 6

2007-11-26 6

Evaluation

MAXCONF vs RER II Two Aspect: 1 Rule Generation 1.Rule Generation 2.Scalability

Evaluation

Rule Generation When minimum support is 0, RER II will run out of memory MAXCONF generates more rules than RER II!

Evaluation

Scalability

The performance of RER II is not affected by minimum confidence I t MAXCONF i b tt In most cases, MAXCONF is better than RER II. RER II only outperforms MAXCONF when the minimum support is higher than 40%.

References

  • MAXCONF, “High Confidence Rule

Mi i f Mi A l i ” b Mining for Microarray Analysis”, by

Tara McIntosh, Sanjay Chawla, 2006

  • RER II, “Mining frequent closed

patterns in microarray data.” by G. Cong, K.-L. Tan, A. Tung, and F. Pan, 2004

slide-7
SLIDE 7

2007-11-26 7

Any Question?

Thanks for your Attentions