Mind the Gap: Large-Scale Frequent Sequence Mining
Iris Miliaraki Klaus Berberich Rainer Gemulla Spyros Zoupanos Max Planck Institute for Informatics Saarbrücken, Germany
SIGMOD 2013 27th June 2013, New York
Mind the Gap: Large-Scale Frequent Sequence Mining Iris Miliaraki - - PowerPoint PPT Presentation
Mind the Gap: Large-Scale Frequent Sequence Mining Iris Miliaraki Klaus Berberich Rainer Gemulla Spyros Zoupanos Max Planck Institute for Informatics Saarbrcken, Germany 27 th June 2013, New York SIGMOD 2013 Why are sequences
Iris Miliaraki Klaus Berberich Rainer Gemulla Spyros Zoupanos Max Planck Institute for Informatics Saarbrücken, Germany
SIGMOD 2013 27th June 2013, New York
Mind the Gap: Large-Scale Frequent Sequence Mining 2
Mind the Gap: Large-Scale Frequent Sequence Mining 3
Central Park is the best place to be on a sunny day in New York. It was a sunny, beautiful New York City afternoon.
Mind the Gap: Large-Scale Frequent Sequence Mining 4
5
Mind the Gap: Large-Scale Frequent Sequence Mining
6 Mind the Gap: Large-Scale Frequent Sequence Mining
Mind the Gap: Large-Scale Frequent Sequence Mining 7
Central Park is the best place to be on a sunny day in New York. Monday was a sunny day in New York. It was a sunny, beautiful New York City afternoon.
8
Frequent n-gram for σ = 2, γ = 0, λ = 5 sunny day in New York
Mind the Gap: Large-Scale Frequent Sequence Mining
Central Park is the best place to be on a sunny day in New York. Monday was a sunny day in New York. It was a sunny, beautiful New York City afternoon.
9
Frequent n-gram for σ = 2, γ = 0, λ = 5 sunny day in New York Frequent n-gram for σ = 3, γ = 0, λ = 2 New York
Mind the Gap: Large-Scale Frequent Sequence Mining
Central Park is the best place to be on a sunny day in New York. Monday was a sunny day in New York. It was a sunny, beautiful New York City afternoon.
10
Frequent n-gram for σ = 2, γ = 0, λ = 5 sunny day in New York Frequent n-gram for σ = 3, γ = 0, λ = 2 New York
Mind the Gap: Large-Scale Frequent Sequence Mining
Frequent subsequence for σ = 3, γ ≥ 2, λ = 3 sunny New York
Mind the Gap: Large-Scale Frequent Sequence Mining 11
Partitioning Frequent sequences F FSM mining FSM mining FSM mining
12
Sequence database D
Filter Filter Filter
Mind the Gap: Large-Scale Frequent Sequence Mining
Partitioning Frequent sequences F FSM mining FSM mining FSM mining
13
Sequence database D
Filter Filter Filter item a item b item k Includes b but not c,d,...,k Includes a but not b,c,...,k Includes k
Mind the Gap: Large-Scale Frequent Sequence Mining
Disjoint subsequence sets computed in-parallel and independently
A B A B A B C C B A A D C A D B A C A B C A B A B A B C C B A A D C A D B A C A A B A B A B C C B B A C A B C A B C C B C A D B A C A B C A D C A D Support σ=2
A:6 B:4 C:4 D:2 A, B, C, D, AA, AB, AC, AD, BA, BC, CA
A C D B
A, D, AD A, B, C, AC, BC, CA A, B, C, AA, AB, AC, BA, BC
14 Mind the Gap: Large-Scale Frequent Sequence Mining
with C but not D with B but not C,D.... with D with A but not B,C,D
A B A B A B C C B A A D C A D B A C A B C A B A B A B C C B A A D C A D B A C A A B A B A B C C B B A C A B C A B C C B C A D B A C A B C A D C A D Support σ=2
A:6 B:4 C:4 D:2 A, B, C, D, AA, AB, AC, AD, BA, BC, CA
A C D B
A, D, AD A, B, C, AC, BC, CA A, B, C, AA, AB, AC, BA, BC
15 Mind the Gap: Large-Scale Frequent Sequence Mining
with C but not D with B but not C,D.... with D with A but not B,C,D
A B A B A B C C B A A D C A D B A C A B C A B A B A B C C B A A D C A D B A C A A B A B A B C C B B A C A B C A B C C B C A D B A C A B C A D C A D Support σ=2
A:6 B:4 C:4 D:2 A, B, C, D, AA, AB, AC, AD, BA, BC, CA
A C D B
A, D, AD A, B, C, AC, BC, CA A, B, C, AA, AB, AC, BA, BC
16 Mind the Gap: Large-Scale Frequent Sequence Mining
ensuring each w-partition generates the set of pivot sequences for w
17
Mind the Gap: Large-Scale Frequent Sequence Mining
Mind the Gap: Large-Scale Frequent Sequence Mining 18
Partition C
γ = 1 λ = 3 frequent sequences with C but not D
Mind the Gap: Large-Scale Frequent Sequence Mining 19
Partition C
γ = 1 λ = 3 frequent sequences with C but not D
Mind the Gap: Large-Scale Frequent Sequence Mining 20
Partition C
γ = 1 λ = 3 frequent sequences with C but not D
Mind the Gap: Large-Scale Frequent Sequence Mining 21
Partition C
frequent sequences with C but not D γ = 1 λ = 3
Mind the Gap: Large-Scale Frequent Sequence Mining
Replace irrelevant items (i.e., less frequent) by special blank symbol (_)
22
Partition C
frequent sequences with C but not D γ = 1 λ = 3
Mind the Gap: Large-Scale Frequent Sequence Mining
Replace irrelevant items (i.e., less frequent) by special blank symbol (_)
23
Partition C
frequent sequences with C but not D γ = 1 λ = 3
Mind the Gap: Large-Scale Frequent Sequence Mining 24
Replace irrelevant items (i.e., less frequent) by special blank symbol (_)
Partition C
frequent sequences with C but not D γ = 1 λ = 3
Mind the Gap: Large-Scale Frequent Sequence Mining 25
Replace irrelevant items (i.e., less frequent) by special blank symbol (_)
Partition C
frequent sequences with C but not D γ = 1 λ = 3
Mind the Gap: Large-Scale Frequent Sequence Mining 26
Replace irrelevant items (i.e., less frequent) by special blank symbol (_)
Partition C
frequent sequences with C but not D γ = 1 λ = 3
Mind the Gap: Large-Scale Frequent Sequence Mining 27
Replace irrelevant items (i.e., less frequent) by special blank symbol (_)
Partition C
frequent sequences with C but not D γ = 1 λ = 3
Mind the Gap: Large-Scale Frequent Sequence Mining 28
Replace irrelevant items (i.e., less frequent) by special blank symbol (_)
Partition C
frequent sequences with C but not D γ = 1 λ = 3
Mind the Gap: Large-Scale Frequent Sequence Mining 29
Replace irrelevant items (i.e., less frequent) by special blank symbol (_)
Partition C
frequent sequences with C but not D γ = 1 λ = 3
Mind the Gap: Large-Scale Frequent Sequence Mining 30
Replace irrelevant items (i.e., less frequent) by special blank symbol (_)
Partition C
frequent sequences with C but not D γ = 1 λ = 3
Mind the Gap: Large-Scale Frequent Sequence Mining 31
Replace irrelevant items (i.e., less frequent) by special blank symbol (_)
with C but not D with B but not C,D.... with D with A but not B,C,D
A B A B A B C C B A A D C A D B A C A B C A B A B A B C C B A A D C A D B A C A A B A B A B C C B B A C A B C A B C C B C A D B A C A B C A D C A D Support σ=2
A:6 B:4 C:4 D:2 A, B, C, D, AA, AB, AC, AD, BA, BC, CA
A C D B
A, D, AD A, B, C, AC, BC, CA A, B, C, AA, AB, AC, BA, BC
32 Mind the Gap: Large-Scale Frequent Sequence Mining
with C but not D with B but not C,D.... with D with A but not B,C,D A B A B A B C C B A A D C A D B A C A B C A _ A : 2 A B A B A B B A _ A A B C C B C A B A C A B C A D C A D
A C D B
A, D, AD
33
AA AD AC, BC, CA AA,AB,BA
Mind the Gap: Large-Scale Frequent Sequence Mining
Support σ=2
A:6 B:4 C:4 D:2
with C but not D with B but not C,D.... with D with A but not B,C,D A B A B A B C C B A A D C A D B A C A B C A _ A : 2 A B A B A B B A _ A A B C C B C A B A C A B C A D C A D
A C D B
A, D, AD
34
AA AD AC, BC, CA AA,AB,BA
Mind the Gap: Large-Scale Frequent Sequence Mining
Support σ=2
A:6 B:4 C:4 D:2
Mind the Gap: Large-Scale Frequent Sequence Mining 35
36 Mind the Gap: Large-Scale Frequent Sequence Mining
37
Mind the Gap: Large-Scale Frequent Sequence Mining
38 Mind the Gap: Large-Scale Frequent Sequence Mining
39 Mind the Gap: Large-Scale Frequent Sequence Mining
40 Mind the Gap: Large-Scale Frequent Sequence Mining
Mind the Gap: Large-Scale Frequent Sequence Mining 41
42 Mind the Gap: Large-Scale Frequent Sequence Mining