WindMine: Fast and Effective Mining
- f Web-click Sequences
SDM 2011 Y . Sakurai et al. 1
Yasushi Sakurai (NTT) Lei Li (Carnegie Mellon Univ.) Yasuko Matsubara (Kyoto Univ.) Christos Faloutsos (Carnegie Mellon Univ.)
WindMine: Fast and Effective Mining of Web-click Sequences Yasushi - - PowerPoint PPT Presentation
WindMine: Fast and Effective Mining of Web-click Sequences Yasushi Sakurai (NTT) Lei Li (Carnegie Mellon Univ.) Yasuko Matsubara (Kyoto Univ.) Christos Faloutsos (Carnegie Mellon Univ.) SDM 2011 Y . Sakurai et al. 1 Introduction Web-click
SDM 2011 Y . Sakurai et al. 1
Yasushi Sakurai (NTT) Lei Li (Carnegie Mellon Univ.) Yasuko Matsubara (Kyoto Univ.) Christos Faloutsos (Carnegie Mellon Univ.)
SDM 2011 Y . Sakurai et al. 2
access count from a business news site
SDM 2011 Y . Sakurai et al. 3
Original web-click sequence
SDM 2011 Y . Sakurai et al. 4
SDM 2011 Y . Sakurai et al. 5
SDM 2011 Y . Sakurai et al. 6
Source Mix
SDM 2011 Y . Sakurai et al. 7
ICA recognizes the components successfully and separately
PCA ICA
SDM 2011 Y . Sakurai et al. 8
Divide a sequence into subsequences of length w Compute the local components from the window matrix
a b c d e f g h
window matrix ˆ X B local components
a b c d e f g h
w = 2
a b c d e f g h
X
time
SDM 2011 9
Y . Sakurai et al.
SDM 2011 Y . Sakurai et al. 10
k: # of components M: # of subsequences
i j i j i j w
, , ,
i j i j i j i
, , ,
j j i j i j i
2 , , ,
, j i w
) , , 1 ; , , 1 ( k j M i ! ! = =
SDM 2011 Y . Sakurai et al. 11
SDM 2011 Y . Sakurai et al. 12
local components window matrix sub-matrices
partition
ICA partition ICA
Level 1 Level 2
SDM 2011 Y . Sakurai et al. 13
access count of users
SDM 2011 Y . Sakurai et al. 14
Original sequence
PCA: failed Anomaly spikes Weekly pattern Daily pattern
Q & A site
SDM 2011 Y . Sakurai et al. 15
Weekly pattern Low activity during sleeping time Dip at dinner time Increase from morning to night and reach a peak
job-seeking site
SDM 2011 Y . Sakurai et al. 16
High activity on week days (daily access decreases as the weekend approaches) Workers arrive at their office Job seeking during a short break Large spike during the lunch break
SDM 2011 Y . Sakurai et al. 17
Educational site for kids (they visit here after school, 3pm) Website for baby nursery (the main users will be their parents, rather than babies!) High activity 8am-11pm, weekday (business purposes)
SDM 2011 Y . Sakurai et al. 18
The users visit three times a day (early morning, noon, early evening) The users rarely visit here late in the evening (which is indeed good for their health!) Access count is still high in the night, 0am-1am (healthy diet should include an earlier bed time!) Access count increases after meal times
SDM 2011 Y . Sakurai et al. 19
SDM 2011 Y . Sakurai et al. 20
SDM 2011 Y . Sakurai et al. 21
SDM 2011 Y . Sakurai et al. 22
SDM 2011 Y . Sakurai et al. 23