MFI-TransSW+: Efficiently Mining Frequent Itemsets in Clickstreams
Franklin Anderson de Amorim 17th International Conference on Electronic Commerce and Web Technologies - EC-Web 2016 Marco Antonio Casanova Gisele Rabello Lopes Bernardo Pereira Nunes
MFI-TransSW+ : Efficiently Mining Frequent Itemsets in Clickstreams - - PowerPoint PPT Presentation
17th International Conference on Electronic Commerce and Web Technologies - EC-Web 2016 MFI-TransSW+ : Efficiently Mining Frequent Itemsets in Clickstreams Franklin Anderson de Amorim Bernardo Pereira Nunes Gisele Rabello Lopes Marco Antonio
Franklin Anderson de Amorim 17th International Conference on Electronic Commerce and Web Technologies - EC-Web 2016 Marco Antonio Casanova Gisele Rabello Lopes Bernardo Pereira Nunes
Franklin Anderson de Amorim 17th International Conference on Electronic Commerce and Web Technologies - EC-Web 2016 Marco Antonio Casanova Gisele Rabello Lopes Bernardo Pereira Nunes
Item transaction Itemsets k=2 Support bread, milk 2 bread, coffee 1 milk, coffee 1 bread, cheese 2 milk, cheese 1
X is frequent if and only if sup(X) ≥ N · s, were N is the number of transactions and s is a limit, defined by the user, called minimum support.
s = 0.5
Frequent itemset
N = 3
If a set I of items is frequent, then so is every subset of I.
Data stream
Sliding window
window size=3
Data stream
left bit-shift
Data stream
window size=3
window size=3 s=0.5
bitwise AND
window size=3 s=0.5
(user-1,a),(user-2,b),(user-3,a),(user-2,c),(user-3,b) ({a}),({b,c}),({a,b}) ,(user-2,a) ({a}),({a,b,c}),({a,b}) Transactions Clickstream
List of UID's
1 2
user-1 user-2 user-3
1 2
bit(a) 1 0 1 bit(b) 0 1 1 bit(c) 0 1 0 bit(a) 1 1 1
(user-1,a),(user-2,b),(user-3,a),(user-2,c),(user-3,b),(user-2,a) Clickstream 1 2
bit(a) 1 1 bit(b) 1 1 bit(c) 0 1 0
List of UID's
1 2
user-2 user-3
List of Bit Vectors per User
1 2
0,1,2 0,1
1 2 ,(user-4,b) window size=3
1 1 user-1 user-4 1
01100100 01100001 01110100
1) Data Streams Processor
Clickstream
2) Frequent Itemsets Miner 3) Recommender
MFI-TransSW+ MFI-TransSW+ ClickRec
(user-1, {<neymar>, <messi>,<c.ronaldo>,<barcelona>})
<messi> <neymar>
<c.ronaldo> <barcelona> <messi>
TF-IDF TF-IDF
<neymar> <barcelona> <messi> <c.ronaldo> <chelsea> <messi> <c.ronaldo> <barcelona> <robben> <neymar> <chelsea> <robben>
Frequent itemsets
Execution time (seconds)
MFI-TranSW MFI-TranSW+
0,41 41,45
Window Size = 1.000
Times faster Window Size
1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000 10,000
816x 666x 623x 521x 476x 413x 337x 286x 216x 102x
Window size Execution time (seconds) MFI-TranSW MFI-TranSW+ 1.000 41,45 0,41 2.000 136,74 0,63 3.000 272,24 0,95 4.000 395,55 1,18 5.000 533,10 1,29 6.000 761,31 1,60 7.000 996,10 1,91 8.000 1.295,16 2,08 9.000 1.484,10 2,23 10.000 1.928,76 2,36
(the sample users must have accessed more than one page)
which recommends 10 pages to the user
Hit rate
0% 5% 10% 15% 20% 25% 30% 35% 40% 0:00 vs 1:00 6:00 vs 7:00 12:00 vs 13:00 18:00 vs 19:00 Sports editorial
Morning Afternoon Night Late Night
Hit rate
0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% 0:00 vs 1:00 6:00 vs 7:00 12:00 vs 13:00 18:00 vs 19:00 Entertainment editorial
Morning Afternoon Night Late Night
[Agrawal et al. 1994] AGRAWAL, R.; SRIKANT, R.. Fast Algorithms for Mining Association Rules. Proc. 20th int. conf. very large data bases, VLDB, p. 1–32,
[Chi et al. 2006] CHI, Y.; WANG, H.; PHILIP, S. Y. ; MUNTZ, R. R.. Catch the moment: maintaining closed frequent itemsets over a data stream sliding window. Knowledge and Information Systems, 10(3):265– 294, 2006. 3 [Li et al. 2009] LI, H.-F.; LEE, S.-Y.. Mining frequent itemsets over data streams using efficient window sliding techniques. Expert Systems with Applications, 36(2):1466–1477, 2009. 1.2, 3, 20