MFI-TransSW+ : Efficiently Mining Frequent Itemsets in Clickstreams - - PowerPoint PPT Presentation

mfi transsw efficiently mining frequent itemsets in
SMART_READER_LITE
LIVE PREVIEW

MFI-TransSW+ : Efficiently Mining Frequent Itemsets in Clickstreams - - PowerPoint PPT Presentation

17th International Conference on Electronic Commerce and Web Technologies - EC-Web 2016 MFI-TransSW+ : Efficiently Mining Frequent Itemsets in Clickstreams Franklin Anderson de Amorim Bernardo Pereira Nunes Gisele Rabello Lopes Marco Antonio


slide-1
SLIDE 1

MFI-TransSW+: Efficiently Mining Frequent Itemsets in Clickstreams

Franklin Anderson de Amorim 17th International Conference on Electronic Commerce and Web Technologies - EC-Web 2016 Marco Antonio Casanova Gisele Rabello Lopes Bernardo Pereira Nunes

slide-2
SLIDE 2

MFI-TransSW+: Efficiently Mining Frequent Itemsets in Clickstreams

Franklin Anderson de Amorim 17th International Conference on Electronic Commerce and Web Technologies - EC-Web 2016 Marco Antonio Casanova Gisele Rabello Lopes Bernardo Pereira Nunes

slide-3
SLIDE 3

Agenda

  • 1. Frequent Itemsets and Data Streams
  • 2. MFI-TransSW+ algorithm
  • 3. ClickRec Recommendation System
  • 4. Experiments and results.
slide-4
SLIDE 4

Frequent Itemsets

{bread,milk,coffee},{bread,milk,cheese},{bread,cheese}

Item transaction Itemsets k=2 Support bread, milk 2 bread, coffee 1 milk, coffee 1 bread, cheese 2 milk, cheese 1

X is frequent if and only if sup(X) ≥ N · s, were N is the number of transactions and s is a limit, defined by the user, called minimum support.

s = 0.5

Frequent itemset

N = 3

If a set I of items is frequent, then so is every subset of I.

slide-5
SLIDE 5

Data Streams

{a,b,c},{a,b,d}, {c,d,f},{a,d},{d,e},{b,c,e},{a,f},{a,c,f},{a,b,c}...

Data stream

slide-6
SLIDE 6

{a,b,c},{a,b,d}, {c,d,f},{a,d},{d,e},{b,c,e},{a,f},{a,c,f}

Data Stream - Sliding Windows

Sliding window

window size = 6 ,{a,b,c}

slide-7
SLIDE 7

MFI-TransSW & MFI-TransSW+

slide-8
SLIDE 8

MFI-TransSW

  • Process sliding windows
  • Uses bit vectors

bit(x)=101001 (original algorithm)

slide-9
SLIDE 9

MFI-TransSW

Phases

  • 1. Load window
  • 2. Slide window
  • 3. Generate frequent itemsets
slide-10
SLIDE 10

T1=(acd) ,T4=(be) bit(a)=1 bit(b)=0 bit(c)=1 bit(d)=1 bit(e)=0

MFI-TransSW

1 1

window size=3

1 1 1 1 1 1 1 , T3=(abce) , T2=(bce)

Data stream

Loading and sliding window

slide-11
SLIDE 11

bit(a)=1 bit(b)=0 bit(c)=1 bit(d)=1 bit(e)=0 T1=(acd) ,T4=(be)

MFI-TransSW

left bit-shift

1 1 1 1 1 1 1 1 1 , T3=(abce) , T2=(bce)

Data stream

window size=3

Loading and sliding window

slide-12
SLIDE 12

bit(a)=101 bit(b)=011 bit(c)=111 bit(d)=100 bit(e)=011 freq(a)=2 freq(b)=2 freq(c)=3 freq(e)=1 freq(f)=2

MFI-TransSW

window size=3 s=0.5

Mining frequent itemsets

slide-13
SLIDE 13

bit(a)=101 bit(b)=011

MFI-TransSW

freq(a)=2 freq(b)=2 bit(a <and> b)=001 freq(a <and> b)=1

bitwise AND

window size=3 s=0.5

Mining frequent itemsets

slide-14
SLIDE 14
  • Fast
  • Finds all frequent itemsets
  • No false positives or false negatives
  • On-demand generation of frequent

itemsets

  • Small memory footprint

MFI-TransSW

slide-15
SLIDE 15

(user-1,a),(user-2,b),(user-3,a),(user-2,c),(user-3,b) ({a}),({b,c}),({a,b}) ,(user-2,a) ({a}),({a,b,c}),({a,b}) Transactions Clickstream

MFI-TransSW+

List of UID's

1 2

user-1 user-2 user-3

1 2

bit(a) 1 0 1 bit(b) 0 1 1 bit(c) 0 1 0 bit(a) 1 1 1

slide-16
SLIDE 16

MFI-TransSW+

(user-1,a),(user-2,b),(user-3,a),(user-2,c),(user-3,b),(user-2,a) Clickstream 1 2

bit(a) 1 1 bit(b) 1 1 bit(c) 0 1 0

List of UID's

1 2

user-2 user-3

List of Bit Vectors per User

1 2

0,1,2 0,1

1 2 ,(user-4,b) window size=3

1 1 user-1 user-4 1

slide-17
SLIDE 17
  • Process clickstreams
  • Uses bit vectors as circular lists
  • More efficient “clean and update"
  • Faster

MFI-TransSW+

slide-18
SLIDE 18

ClickRec

slide-19
SLIDE 19

A news article realtime recommendation system based on web clickstreams and semantic annotations.

ClickRec

slide-20
SLIDE 20

ClickRec

01100100 01100001 01110100

1) Data Streams Processor

Clickstream

2) Frequent Itemsets Miner 3) Recommender

MFI-TransSW+ MFI-TransSW+ ClickRec

slide-21
SLIDE 21

(user-1, {a,b,c})

ClickRec

(user-1, {<tag1>, <tag2>,<tag3>,<tag4>})

slide-22
SLIDE 22

(user-1, {a,b,c})

ClickRec

(user-1, {<neymar>, <messi>,<c.ronaldo>,<barcelona>})

slide-23
SLIDE 23

ClickRec

<messi> <neymar>

<c.ronaldo> <barcelona> <messi>

TF-IDF TF-IDF

<neymar> <barcelona> <messi> <c.ronaldo> <chelsea> <messi> <c.ronaldo> <barcelona> <robben> <neymar> <chelsea> <robben>

Frequent itemsets

slide-24
SLIDE 24

Experiments

slide-25
SLIDE 25
  • 1. Real world clickstream from one of the

largest news Web sites in Brazil

  • 2. Total = 24 hours of clickstream = 25

million “clicks" (pageviews)

  • 3. Two editorials: sports and entertainment

Experiments

slide-26
SLIDE 26
  • 1. Load a window with w transactions
  • 2. Execute 10k slidings
  • 3. Measure the time to execute item 2

Experiments

MFI-TransSW vs MFI-TransSW+

slide-27
SLIDE 27

MFI-TransSW vs MFI-TransSW+

Execution time (seconds)

MFI-TranSW MFI-TranSW+

0,41 41,45

Window Size = 1.000

1 x f a s t e r

slide-28
SLIDE 28

Times faster Window Size

1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000 10,000

816x 666x 623x 521x 476x 413x 337x 286x 216x 102x

MFI-TransSW vs MFI-TransSW+

slide-29
SLIDE 29

Experiments

Window size Execution time (seconds) MFI-TranSW MFI-TranSW+ 1.000 41,45 0,41 2.000 136,74 0,63 3.000 272,24 0,95 4.000 395,55 1,18 5.000 533,10 1,29 6.000 761,31 1,60 7.000 996,10 1,91 8.000 1.295,16 2,08 9.000 1.484,10 2,23 10.000 1.928,76 2,36

MFI-TransSW vs MFI-TransSW+

slide-30
SLIDE 30
  • 1. Divide clickstream in pairs of two consecutive hours
  • A. The first hour is used to mine the frequent itemsets
  • B. The second hour is used to extract a sample of 10k users

(the sample users must have accessed more than one page)

  • 2. Test recommendations
  • C. Feed the first page accessed by the user to ClickRec,

which recommends 10 pages to the user

  • D. Verify if the user accessed one of the recommendations

Experiments

ClickRec

slide-31
SLIDE 31

Experiments

ClickRec

Hit rate

0% 5% 10% 15% 20% 25% 30% 35% 40% 0:00 vs 1:00 6:00 vs 7:00 12:00 vs 13:00 18:00 vs 19:00 Sports editorial

Morning Afternoon Night Late Night

slide-32
SLIDE 32

Experiments

ClickRec

Hit rate

0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% 0:00 vs 1:00 6:00 vs 7:00 12:00 vs 13:00 18:00 vs 19:00 Entertainment editorial

Morning Afternoon Night Late Night

slide-33
SLIDE 33

Conclusion

slide-34
SLIDE 34

Conclusion

MFI-TransSW+

  • Processes clickstreams
  • Uses bit vectors as circular lists
  • Up to 2 orders of magnitude faster than

the original algorithm (MFI-TransSW)

slide-35
SLIDE 35

Conclusion

ClickRec

  • Based on MFI-TransSW+
  • Uses semantic annotations
  • Generates recommendations in

realtime

  • Hit rate > 20%
slide-36
SLIDE 36

References

[Agrawal et al. 1994] AGRAWAL, R.; SRIKANT, R.. Fast Algorithms for Mining Association Rules. Proc. 20th int. conf. very large data bases, VLDB, p. 1–32,

  • 1994. 3, 4.1.3

[Chi et al. 2006] CHI, Y.; WANG, H.; PHILIP, S. Y. ; MUNTZ, R. R.. Catch the moment: maintaining closed frequent itemsets over a data stream sliding window. Knowledge and Information Systems, 10(3):265– 294, 2006. 3 [Li et al. 2009] LI, H.-F.; LEE, S.-Y.. Mining frequent itemsets over data streams using efficient window sliding techniques. Expert Systems with Applications, 36(2):1466–1477, 2009. 1.2, 3, 20

slide-37
SLIDE 37

Thanks!