Scope Constrained Frequent Pattern Mining: Constrained Frequent - - PowerPoint PPT Presentation

scope constrained frequent pattern mining constrained
SMART_READER_LITE
LIVE PREVIEW

Scope Constrained Frequent Pattern Mining: Constrained Frequent - - PowerPoint PPT Presentation

Scope Constrained Frequent Pattern Mining: Constrained Frequent Pattern Mining: A A Pattern-Growth View Pattern-Growth View Frequent Pattern Mining by Jian Pei and Jiawei Han Constraints presentation by Rafal Rak CMPUT 695


slide-1
SLIDE 1

Constrained Frequent Pattern Mining: A Pattern-Growth View

by Jian Pei and Jiawei Han presentation by Rafal Rak

CMPUT 695 presentation

November 16, 2004 Rafal Rak, CMPUT695 presentation 2

Scope Constrained Frequent Pattern Mining: A Pattern-Growth View

Frequent Pattern Mining Constraints Pattern-Growth Approach

November 16, 2004 Rafal Rak, CMPUT695 presentation 3

Outline

Background Categories of Constraints Pattern-growth Method

Constrained Frequent Pattern Mining Constrained Sequential Pattern Mining

Conclusion

November 16, 2004 Rafal Rak, CMPUT695 presentation 4

Pushing Constraints Drawbacks

Pattern Mining: inefficient Rules: ineffective

Pattern Mining DB Rules Interesting Rules Constraints

slide-2
SLIDE 2

November 16, 2004 Rafal Rak, CMPUT695 presentation 5

Pushing Constraints (2)

Efficient and effective Feasible?

Pattern Mining DB Interesting Rules Constraints

November 16, 2004 Rafal Rak, CMPUT695 presentation 6

Apriori Approach

c,e,f,g 40 a,c,d,e,f 30 b,c,d,f,g,h 20 a,b,c,d,f 10 Items TID

  • 10

h 20 g 30 f

  • 30

e 10 d

  • 20

c b 40 a Value Item

min(S)>15

min(df)<15 => min(adf)<15

max(S)>35

max(df)<35 but max(adf)>35

Apriori anti-monotone property: if a pattern is not frequent, its super- pattern can never be frequent anti-monotone not anti-monotone

November 16, 2004 Rafal Rak, CMPUT695 presentation 7

Categories of Constraints

Application point of view

Item constraint

e.g. dairy products in a grocery store

Length constraint

e.g. at least 5 keywords in documents

Model-based constraint

e.g. travel agency: after visiting Washington and NYC, what’s next?

Aggregate constraint

e.g. avg(price of items) > $100

November 16, 2004 Rafal Rak, CMPUT695 presentation 8

Categories of Constraints (2)

Properties point of view

Anti-monotone

e.g. min(S)>v

Monotone

e.g. max(S)>v

Succinct Convertible Constraints

slide-3
SLIDE 3

November 16, 2004 Rafal Rak, CMPUT695 presentation 9

Anti-monotone Constraint

When an itemset violates the constraint, so does any of its superset.

c,e,f,g 40 a,c,d,e,f 30 b,c,d,f,g,h 20 a,b,c,d,f 10 Items TID

  • 10

h 20 g 30 f

  • 30

e 10 d

  • 20

c b 40 a Value Item

min(S)>15

min(df)<15 => min(adf)<15

min(S)<15

min(af)>15, but min(adf)<15

November 16, 2004 Rafal Rak, CMPUT695 presentation 10

Monotone Constraint

When an itemset satisfies the constraint, so does any of its superset.

c,e,f,g 40 a,c,d,e,f 30 b,c,d,f,g,h 20 a,b,c,d,f 10 Items TID

  • 10

h 20 g 30 f

  • 30

e 10 d

  • 20

c b 40 a Value Item

max(S)>35

max(af)>35 => max(adf)>35

max(S)<35

max(df)<35, but max(adf)>35

November 16, 2004 Rafal Rak, CMPUT695 presentation 11

Succinct Constraint

If it’s possible to explicitly and precisely generate all the itemsets satisfying the constraint, then the constraint is succinct.

c,e,f,g 40 a,c,d,e,f 30 b,c,d,f,g,h 20 a,b,c,d,f 10 Items TID

  • 10

h 20 g 30 f

  • 30

e 10 d

  • 20

c b 40 a Value Item

max(S)>15

itemsets containing: a, f, g

avg(S)<10

November 16, 2004 Rafal Rak, CMPUT695 presentation 12

Convertible Anti-monotone Constraint

A constraint is convertible anti-monotone if there is an order on items such that whenever an itemset satisfies the constraint, so does any of its prefix.

c,e,f,g 40 a,c,d,e,f 30 b,c,d,f,g,h 20 a,b,c,d,f 10 Items TID

  • 10

h 20 g 30 f

  • 30

e 10 d

  • 20

c b 40 a Value Item

avg(S)>25

avg(afg)>25 => avg(af)>25, avg(a)>25, avg(f)>25

  • 10

h 10 d 20 g 30 f

  • 30

e

  • 20

c b 40 a Value Item

  • rder

avg(S)<32

avg(afg)<32, but avg(af)>32

slide-4
SLIDE 4

November 16, 2004 Rafal Rak, CMPUT695 presentation 13

Convertible Monotone Constraint

A constraint is convertible monotone if there is an order on items such that whenever an itemset violates the constraint, so does any of its prefix.

c,e,f,g 40 a,c,d,e,f 30 b,c,d,f,g,h 20 a,b,c,d,f 10 Items TID

  • 10

h 20 g 30 f

  • 30

e 10 d

  • 20

c b 40 a Value Item

avg(S)>0

avg(ech)<0 => avg(ec)<0, avg(e)<0, avg(c)<0

30 f 20 g 10 d b

  • 10

h

  • 20

c

  • 30

e 40 a Value Item

  • rder

avg(S)<-22

avg(ech)>-22, but avg(ec)<-22

November 16, 2004 Rafal Rak, CMPUT695 presentation 14

Classification of Constraints

convertible anti-monotone convertible monotone anti- monotone monotone succinct strongly convertible

November 16, 2004 Rafal Rak, CMPUT695 presentation 15

Outline

Background Categories of Constraints Pattern-growth Method

Constrained Frequent Pattern Mining Constrained Sequential Pattern Mining

Conclusion

November 16, 2004 Rafal Rak, CMPUT695 presentation 16

Pattern-growth Method

c,e,f,g 40 a,c,d,e,f 30 b,c,d,f,g,h 20 a,b,c,d,f 10 Items TID c e f g a c d e f b c d f g h a b c d f 40 30 20 10 Transaction DB

  • freq. items: a,b,c,d,e,f,g

min_sup=2

c d e f b c d f 30 10 a-projected DB

  • freq. items: c,d,f

ac-projected DB ad-projected DB af-projected DB b-projected DB c-projected DB

final patterns:

a,b,c,d,e,f,g, ac,ad,af,bc,bd,bf,cd,ce,cf,cg,df,ef,fg, acd,acdf,adf,bcd,bcf,cdf,cef,cfg,bdf, bcdf

slide-5
SLIDE 5

November 16, 2004 Rafal Rak, CMPUT695 presentation 17

Constrained Frequent Pattern Mining

c,e,f,g 40 a,c,d,e,f 30 b,c,d,f,g,h 20 a,b,c,d,f 10 Items TID

  • 10

h 10 d 20 g 30 f

  • 30

e

  • 20

c b 40 a Value Item f g c e a f d c e f g d b h c a f d b c 40 30 20 10 Transaction DB C(g)=false C(f)=true C(a)=true

  • freq. items: a,f,g,d,b,c,e

avg(S)≥25

min_sup=2

C(ac)=false C(ad)=true C(af)=true

  • freq. items: g,d,b,c,e

C(fd)=false C(fg)=true f d c e f d b c 30 10 a-projected DB g c e d c e g d b c d b c 40 20 30 10 f-projected DB af-projected DB ad-projected DB fg-projected DB

  • freq. items: f,d,c

November 16, 2004 Rafal Rak, CMPUT695 presentation 18

Constrained Frequent Pattern Mining

c,e,f,g 40 a,c,d,e,f 30 b,c,d,f,g,h 20 a,b,c,d,f 10 Items TID

  • 10

h 10 d 20 g 30 f

  • 30

e

  • 20

c b 40 a Value Item Transaction DB

avg(S)≥25

min_sup=2

d c d c 10 30 af-projected DB c c 10 30 ad-projected DB c e d b c 20 40 fg-projected DB

  • freq. items: d,c
  • freq. items: c

C(afc)=false C(afd)=true C(adc)=false

  • freq. items: c

final patterns: a, f, af, ad, fg, afd

C(fgc)=false C(ac)=false C(ad)=true C(af)=true

  • freq. items: g,d,b,c,e

C(fd)=false C(fg)=true f d c e f d b c 30 10 a-projected DB g c e d c e g d b c d b c 40 20 30 10 f-projected DB

  • freq. items: f,d,c

November 16, 2004 Rafal Rak, CMPUT695 presentation 19

Outline

Background Categories of Constraints Pattern-growth Method

Constrained Frequent Pattern Mining Constrained Sequential Pattern Mining

Conclusion

November 16, 2004 Rafal Rak, CMPUT695 presentation 20

Sequential Pattern Mining

d

  • Nov. 14

a

  • Oct. 25

40 d

  • Nov. 5

d

  • Nov. 11

c

  • Nov. 13

c

  • Oct. 30

30 aef

  • Nov. 4

abc

  • Nov. 13

dd

  • Nov. 16

e

  • Oct. 30

ab

  • Nov. 3

bc

  • Nov. 12

a

  • Nov. 10

bc

  • Nov. 13
  • Nov. 14
  • Nov. 13
  • Nov. 15

Transaction Time b d 20 e 10 Items Bought Customer ID <addcb> 40 <c(aef)(abc)dd> 30 <e(ab)(bc)dd> 20 <a(bc)e> 10 Sequence SID

d d (bc) (ab) e

< > 5-sequence

length = number of transactions transactions in order

slide-6
SLIDE 6

November 16, 2004 Rafal Rak, CMPUT695 presentation 21

Sequential Pattern Mining (2)

Goal: Find the complete set of sequential patterns w.r.t. a given sequence DB and a support threshold.

<(ab)d> is a subsequence of:

<e(ab)(bc)dd> <c(aef)(abc)dd> but not <c(aef)(bc)dd>

Given a support threshold = 2, <(ab)d> is a sequential pattern.

<addcb> 40 <c(aef)(abc)dd> 30 <e(ab)(bc)dd> 20 <a(bc)e> 10 Sequence SID November 16, 2004 Rafal Rak, CMPUT695 presentation 22

More Constraints

Regular expression constraint e.g. web click stream starting from Yahoo’s home page and reaching hotels in NYC: „Travel (New York | New York City) (Hotels | Motels)” Duration constraint e.g. long-term investment patterns based on the duration of 1 year between the first and the last items Gap constraint e.g. basketball players regularly (every week) on the field, i.e., gap < 2 weeks

November 16, 2004 Rafal Rak, CMPUT695 presentation 23 < (_ c) d d > < (_ c) d d > 30 20 <ab>-projected DB < b > 40 < d d > < d d > 30 20 <ac>-projected DB < c b > 40 < d > < d > 30 20 <ad>-projected DB < d d c b > < (_ e) (a b c) d d > < (_ b) (b c) d d > 40 30 20 <a>-projected DB

Constrained Sequential Pattern Mining

<addcb> 40 <c(aef)(abc)dd> 30 <e(ab)(bc)dd> 20 <a(bc)e> 10 Sequence SID length-1 patterns: <a>,<b>,<c>,<d>,<e>

<a+{bb|(bc)d|dd}>

min_sup = 2

length-2 patterns: <(ab)>,<ab>,<ac>,<ad> length-3 patterns: <a(bc)>,<abd> < a d d c b > < c (a e f) (a b c) d d > < e (a b) (b c) d d > < a (b c) e > 40 30 20 10 Sequence DB length-3 patterns: <add> November 16, 2004 Rafal Rak, CMPUT695 presentation 24

Constrained Sequential Pattern Mining

<addcb> 40 <c(aef)(abc)dd> 30 <e(ab)(bc)dd> 20 <a(bc)e> 10 Sequence SID

<a+{bb|(bc)d|dd}>

min_sup = 2

< d d c b > < (_ e) (a b c) d d > < (_ b) (b c) d d > 40 30 20 <a>-projected DB length-2 patterns: <(ab)>,<ab>,<ac>,<ad> < (_ c) d d > < (_ c) d d > 30 20 <ab>-projected DB length-3 patterns: <a(bc)>,<abd> Sequence DB < b > 40 < d d > < d d > 30 20 <ac>-projected DB < c b > 40 < d > < d > 30 20 <ad>-projected DB length-3 patterns: <add>

final patterns: <a(bc)d>, <add>

length-4 patterns: <a(bc)d> < d d > < d d > 30 20 <a(bc)>-projected DB

slide-7
SLIDE 7

November 16, 2004 Rafal Rak, CMPUT695 presentation 25

Conclusion

Pushing constraints deep into the mining process is

efficient and effective

Constraints can be classified Pattern-growth approach is more powerful then the

Apriori-based one w.r.t. „tough” constraints

Pattern-growth methods: FP-growth, PrefixSpan „Tough” constraints can be also applied to depth-first

algorithms: First Search, MAFIA, CHARM

Pattern-growth approach can be extended to

structured pattern mining (trees, graphs), classification, clustering, outlier analysis

November 16, 2004 Rafal Rak, CMPUT695 presentation 26

References

1.

  • J. Pei, J. Han. Constrained Frequent Pattern Mining: A Pattern-

growth View. ACM SIGKDD Explorations (Special Issue on Constrained Data Mining), 2002. 2.

  • J. Pei, J. Han, and L.V.S. Lakshmanan. Mining Frequent Itemsets

with Convertible Constraints. Proc. 2001 Int. Conf. Data Engineering, 2001. 3.

  • J. Pei, J. Han, and W. Wang. Constraint-based Sequential Pattern

Mining in Large Databases. Proc. 11th Int. Conf. on Information and Knowledge Management (CIKM'02), 2002. 4.

  • J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M.-
  • C. Hsu. PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-

Projected Pattern Growth. Proc. 2001 Int. Conf. Data Engineering (ICDE’01), 2001.

Thank you! Questions?