Mining Free Itemsets under Constrained itemset mining Constraints - - PowerPoint PPT Presentation

mining free itemsets under
SMART_READER_LITE
LIVE PREVIEW

Mining Free Itemsets under Constrained itemset mining Constraints - - PowerPoint PPT Presentation

Content Introduction Mining Free Itemsets under Constrained itemset mining Constraints Apriori revisit Anti-monotone constrains Monotone constrains By Jean-Francois Boulicaut and Baptiste Jeudy Generic algorithm


slide-1
SLIDE 1

Mining Free Itemsets under Constraints

By Jean-Francois Boulicaut and Baptiste Jeudy International Database Engineering and Application Symposium

Content

Introduction Constrained itemset mining

Apriori revisit Anti-monotone constrains Monotone constrains Generic algorithm

Frequent closed itemset mining

CLOSE algorithm Incorporating constraints into Apriori

Conclusion

Introduction

Frequent itemset mining

A set of items is referred to as itemset X is an item(or itemset), Support is bounded by a threshold r A frequent itemset is an itemset with a support

larger than the minimum support

Given a database, find all the frequent itemsets

# ( ) X Support X n =

Introduction

Problems with frequent itemset mining

algorithms

The computation may be intractable for a user-

given frequency threshold: the number of frequent itemsets may explode

Lack of focus leads to huge output of frequent

itemsets

slide-2
SLIDE 2

Introduction

Two issues to tackle these problems

Constraint-based extraction of the frequent

itemsets: only a subset of the collection of frequent itemsets is interesting.

Condensed representation of frequent itemsets:

extract a subset of the frequent patterns and regenerate the whole collection when necessary

Introduction

Constraint-based extraction of frequent itemsets

Syntactic constraints

an item must not appear in the itemsets

Constraints related to objective measures of

interestingness

the itemsets must be frequent

Push constraint checking into algorithms

Anti-monotone constraints Monotone constraints

  • Decrease the

size of output

  • Improve

user guidance

Introduction

Condensed representation of frequent

itemsets

Extract a particular subset of the frequent itemset

collection

The condensed subset is much smaller than the

  • riginal collection

Can be extracted efficiently The whole frequent itemsets can be regenerated

Introduction

Main idea of the paper

Combine the above two approaches into one

algorithm

This algorithm is based on the structure of

Apriori

slide-3
SLIDE 3

Content

Introduction Constrained itemset mining

Apriori revisit Anti-monotone constrains Monotone constrains Generic algorithm

Frequent closed itemset mining

CLOSE algorithm Incorporating constraints into Apriori

Conclusion

Summary of paper

Definition of constraints

  • : transactional database
  • : set of all itemsets
  • : constraint
  • : itemset,
  • : subset of

S satisfies C in T

( , T ) =

  • (I)={ , satisfies }
  • denotes

2Items 2Items

iff

ture

S I ∈

S C

C

SAT

T

C S

I

C S

2Items S∈

C

SAT

(2 )

Items C

SAT

Summary of paper

: an itemset must be at least frequent. = 0,6 and , then ,

( ) ( )

freq

C S F S r ≡ ≥ r

r

{ , , , , }

freq

C

SAT A B C AC BC =

( ) | | 2

size

C S S ≡ ≤ ( )

miss

C S B S ≡ ∉ { , , , , , }

size miss

C C

SAT A C D AC AD CD

Λ

= { , , }

size miss freq

C C C

SAT A C AC

Λ Λ

=

TID Items 1 ABCD 2 AC 3 AC 4 ABCD 5 BC 6 ABC Itemset Support Frequency A 1,2,3,4,6 0.83 B 1,4,5,6 0.67 AB 1,4,5 0.5 AC 1,2,3,4,6 0.83 CD 1,4 0.33 ACD 1,4 0.33

Summary of paper

Constrained itemset mining

  • : transactional database
  • : constraint

Computation of the collection of itemsets that

satisfy together with their frequecies

  • Use Apriori for constrained itemset mining

where is

T

C C

{( , ( )), }

C C

R S F S S SAT = ∈

freq

C C

slide-4
SLIDE 4

Summary of paper - Apriori

Apriori Algorithm

1. 2. 3.while do 4.

safe-pruning-on( )

5. 6. 7. 8.

1 1

: ; { }

g

C Items L = = Φ

: 1 k =

g k

C ≠ Φ

:

k

C =

1

,

g k k

C L − : ( )

freq

k C k

L SAT C =

1 :

( )

g k apriori k

C generate L

+ =

: 1 k k = +

1 k i i

U L

− =

Phase 1 – Candidate safe pruning Eliminate candidates for which a subset of length k is not frequent Phase 2- frequency constraint (database scan) Phase 3 – candidate generation for level k+1, fuse two elements that share the same k-1 first items where A and B share the k-1 first items(in lexicographic order)}

( ) { ,

apriori k

generate L A B = ∪

,

,

k

A B L ∈

The completeness of Apriori relies on the anti-monotonicity of the constraint

Anti-monotone constraints

Definition: an anti-monotone constraint is a

constraint C such that for all itemsets S, S’:

satisfy satisfy

If S does not satisfy , every superset of S

does not satisfy

Example: A disjunction or conjunction of anti-monotone

constraints is an anti-monotone constraint

( ' S S S ⊆ Λ ) ' C S ⇒

C

( , ) sum S price v ≤

am

C

am

C

Anti-monotone constraints

Apriori can be changed:

Let be an anti-monotone constraint. Step 5 of

Apriori is replaced by it is still correct and complete.

Apriori can be used to mine constrained

itemsets when the given constraint is anti- monotone

am

C

What about monotone constraints?

: ( )

am

k C k

L SAT C =

Monotone Constraints

Definition

  • is true is true

Example

  • Given a monotone constraint

, simply replacing Step 5 in Apriori with leads to the loss of the completeness of Apriori.

m

C

: ( )

m

k C k

L SAT C = , ( )

m

S Items C S ∈ ' , ( ')

m

S S C S ⇒ ∀ ⊃

( , ) sum S price v ≥

slide-5
SLIDE 5

Monotone Constraints

Example

Assume Itemset ABC should be

generated by from AB and AC but since ACB is not generated whereas

Assume Itemset ABC is correctly

generated by from AB and AC but since ACB is incorrectly pruned whereas

( ) . C S C S ≡ ∈

apriori

generate

( ) , C AB false =

( ) C ABC true =

( ) . C S A S ≡ ∈

apriori

generate

( ) , C AB false = ( ) C ABC true =

The generation step in Apriori must be complete: i.e., it must not miss any itemset satisfying C The pruning step (Phase 1) must be correct, i.e., it must not prune an itemset that verify C The generation step and pruning step need to be modified in order to include monotone constraints

Monotone Constraints

Some definition in modified generation

procedure

Negative border: If denotes an anti-monotone

constraint, is the collection of the minimal itemsets that do not satisfy

  • denotes a monotone constraint, it is the

negation of , so equals to

am

C

am

C

Bd

am

C

m

C

am

C

'am C ¬

m

C

Monotone Constraints

Generation procedure

  • and B is a 1-itemset
  • A,B
  • Assume

and

  • For

,

If k<ms, If k=ms, If k>ms,

This generation procedure is complete and ensures that

every candidate itemset verifies ( )

1(

) { ,

k

generate L A B where = U

k

A L ∈

2(

) { ,

k

generate L A B where = U

k

L ∈ '

am am

C C C = Λ¬

'

| |

C am

S Bd

ms Max S

=

1 '

( )

am

m C

generate L Bd Items = ∩ 1 k ≥

1 1 '

( ) ( ) ( )

am

m k k k C

generate L generate L Bd Items + = ∪ ∩

1

( ) ( )

m k k

generate L generate L =

2

( ) ( )

m k k

generate L generate L = } }

  • denotes an anti-

monotone constraint

  • denotes a

monotone constraint

  • denotes the

collection of the minimal itemsets that do not satisfy

am

C

am

C

Bd

m

C

'am C ¬

m

generate

'am C ¬ We do not need to verify the monotone constraint after this generation procedure

m

C

Monotone Constraints

Pruning procedure

For all and for all such that |S’|=k

do if and then delete S from

  • is correct and complete

m

prune

1 g k

S C + ∈

' S S ⊂

'

k

S L ∉ ( ')

m

C S true =

1 g k

C +

m

prune

The algorithm is correct because it does not prune any itemset that verify .Its completeness means that if an itemset is not pruned then every proper subset of that itemset verify .

'

am am

C C C = Λ¬

am

C

slide-6
SLIDE 6

Generic Algorithm

  • For a constraint , the generic algorithm uses the structure of Apriori

and the procedures and

  • 1.
  • 2. k:=1
  • 3. while do
  • 4. Phase 1 – candidate safe pruning
  • 5. Phase 2 - anti-monotone constraint checking
  • 6. Phase 3 – candidate generation for level k+1
  • 7. k:=k+1
  • 8. output

'

am m am am

C C C C C = Λ = Λ¬

m

generate

m

pruning

1 1 '

: ; { }

am

g C

C Bd Items L = ∩ = ∅

g k

C ≠ ∅

1

: ( , )

g k m k k

C pruning C L − = : ( )

am

k C k

L SAT C =

1 :

( )

g k m k

C generate L

+ = 1 k i i

L

− =

Apriori Algorithm 1.

  • 2. k:=1
  • 3. while do
  • 4. Phase 1–candidate safe pruning
  • 5. Phase 2-frequency constraint
  • 6. Phase 3-candidate generate
  • 7. k:=k+1
  • 8. Output

1 1

: ; { }

g

C Items L = = ∅

g k

C ≠ ∅

1

: ( , )

g k k k

C safe pruning

  • n C

L − = − − : ( )

freq

k C k

L SAT C =

1 :

( )

g k apriori k

C generate L

+ = 1 k i i

L

− =

Generic Algorithm-example

Constraints:

  • ,

am m

C A S C B S ≡ ∉ ≡ ∈

{ , }

am

C

Bd B AB = | | 2

Cam

S Bd

ms Max S

= =

1

{ } C B =

1

{ } L B =

2

{ , , } { } { , , } C AB BC BD AB AB BC BD = ∪ =

2

{ , } L BC BD =

3

{ , , } C ABC BCD ABD =

3

{ } L BCD =

TID Items 1 ABCD 2 AC 3 AC 4 ABCD 5 BC 6 ABC

  • and B is 1-itemset}
  • A,B }
  • k<ms,
  • k=ms,
  • k<ms,

1(

) { ,

k

generate L A B where = U

k

A L ∈

2(

) { ,

k

generate L A B where = U

k

L ∈

1 1 '

( ) ( ) ( )

am

m k k k C

generate L generate L Bd Items + = ∪ ∩

1

( ) ( )

m k k

generate L generate L =

2

( ) ( )

m k k

generate L generate L =

For all and for all such that |S’|=k do if and then delete S from

1 g k

S C + ∈

' S S ⊂

'

k

S L ∉ ( ')

m

C S true =

1 g k

C

+

{B,BC,BD,BCD}

Content

Introduction Constrained itemset mining

Apriori revisit Anti-monotone constrains Monotone constrains Generic algorithm

Frequent closed itemset mining

CLOSE algorithm Incorporating constraints into Apriori

Conclusion

CLOSE algorithm- frequent closed itemset mining

The closure of an itemset S(closure(S)) is the

maximal superset of S which has the same support as S.

A closed itemset is an itemset that is equal to

its closure

The set of closed itemset is a lattice called

the closed itemset lattice

slide-7
SLIDE 7

CLOSE algorithm

We can consider CLOSE as an exploration

  • f the classical itemset lattice with a new

constraint

A constraint for CLOSE Free itemsets: itemsets that are not included

in any closure of their proper sub-set. Equivalently, free itemsets are itemsets that verify

' ( ) ' ( ')

Free

C S S S S closure S ≡ ⊂ ⇒ ⊆ /

'Free C

We can use this constraint in

  • ur generic algorithm

together with other constraints to achieve constrained free-set mining

CLOSE algorithm

Example

Closure(AB): items A and B

are simultaneously in transactions 1,4,6. Item C is the only other item that is also present in these three transactions, thus closure(AB)=ABC.

Closure(A)=AC, Closure(B)=BC,

and . Therefore is true.

If frequency threshold r = ½,

where means that

TID Items 1 ABCD 2 AC 3 AC 4 ABCD 5 BC 6 ABC

( ) AB closure A ⊆ /

( ) AB closure B ⊆ /

' ( )

Free

C AB

'

{ , , , , }

freq Free

C C C AC C C C

SAT A B D AB

Λ

= ∅

C

AB

' ( ) , ( )

Free

C AB true closure AB ABC = = The closure of an itemset S(closure(S)) is the maximal superset of S which has the same support as S

CLOSE algorithm

The constraint is anti-monotone, it needs

a database pass to be checked

Checking this constraint seems expensive if

the closure of every subset of S has to be computed

'Free C

  • We can use an equivalent constraint

The equivalence means that is true iff is true.

  • We need the closure of every subset of S of size |S|-1, then check if

( ) ( ' | '| | | 1) ( ')

Free

C S S S S S S closure S ≡ ⊂ Λ = − ⇒ ⊆ /

( )

Free

C S ' ( )

Free

C S ( ') S closure S ⊆

Incorporating constraints into Apriori

Directly using causes two

problems

The closures of some candidates of level k are

not computed => impossible to check at level k+1

  • will no longer enables to compute

Free am m

C C C C = Λ Λ

Free

C

Free am m

C C C

SAT

Λ Λ

am m

C C

SAT

Λ

slide-8
SLIDE 8

Incoporating constraints

Assume we replace

  • with
  • with

Then: the constraints and are equivalent and

anti-monotone. The set can be efficiently computed using the same method as in CLOSE using i.e., the output of the generic algorithm with the constraint

'Free C

Free

C

m

Free C

C

Λ

'

m

Free C

C

Λ

am m

C C

SAT

Λ

am m Free Cm

C C C

SAT

Λ

Λ Λ

m

Free C am m

C C C C

Λ

= Λ Λ

Now we can find free-itemsets that verify conjunctions of anti-monotone and monotone constraints ( ') ' ( ) ( ' ) ( ')

m

m Free C

C S S S S closur C S e S

Λ

≡ ⊂ Λ ⇒ ⊆ / ( ) ( ' | '| | | 1 ) ( ') ( ')

m

m Free C

C S S S S S S closur S S e C

Λ

≡ ⊂ Λ = − Λ ⇒ ⊆ /

Content

Introduction Constrained itemset mining

Apriori revisit Anti-monotone constrains Monotone constrains Generic algorithm

Frequent closed itemset mining

CLOSE algorithm Incorporating constraints into Apriori

Conclusion

Conclusion

Frequent itemset mining can be intractable

for a given support threshold and a particular database

Two issues to address this problem:

constraint-based itemset mining and condensed representation of frequent itemsets

The generic algorithm can be used to achieve

constrained free-set mining when

am m

C C C = Λ