Automatic Sequential Pattern Mining in Data Streams
Koki Kawabata*, Yasuko Matsubara & Yasushi Sakurai
ISIR-AIRC, Osaka University
*Supported by SIGIR Student Travel Grants
Automatic Sequential Pattern Mining in Data Streams Koki Kawabata*, - - PowerPoint PPT Presentation
Automatic Sequential Pattern Mining in Data Streams Koki Kawabata*, Yasuko Matsubara & Yasushi Sakurai ISIR-AIRC, Osaka University *Supported by SIGIR Student Travel Grants Motivation Given: time-evolving data streams e.g., IoT
Koki Kawabata*, Yasuko Matsubara & Yasushi Sakurai
ISIR-AIRC, Osaka University
*Supported by SIGIR Student Travel Grants
CIKM2019 Sakurai Lab. K.Kawabata et al. 2
CIKM2019 Sakurai Lab. K.Kawabata et al. 3
CIKM2019 Sakurai Lab. K.Kawabata et al. 4
CIKM2019 Sakurai Lab. K.Kawabata et al. 5
CIKM2019 Sakurai Lab. K.Kawabata et al. 6
CIKM2019 Sakurai Lab. K.Kawabata et al. 7
CIKM2019 Sakurai Lab. K.Kawabata et al. 8
CIKM2019 Sakurai Lab. K.Kawabata et al. 9
CIKM2019 Sakurai Lab. K.Kawabata et al. 10
CIKM2019 Sakurai Lab. K.Kawabata et al. 11 1000 2000 3000 4000 5000
Time
0.5 1
Value
L-arm R-arm L-leg R-leg
1 2 3 4
CIKM2019 Sakurai Lab. K.Kawabata et al. 12 1000 2000 3000 4000 5000
Time
0.5 1
Value
L-arm R-arm L-leg R-leg
CIKM2019 Sakurai Lab. K.Kawabata et al. 13 1000 2000 3000 4000 5000
Time
0.5 1
Value
L-arm R-arm L-leg R-leg
CIKM2019 Sakurai Lab. K.Kawabata et al. 14 1000 2000 3000 4000 5000
Time
0.5 1
Value
L-arm R-arm L-leg R-leg
1 2 3 4
CIKM2019 Sakurai Lab. K.Kawabata et al. 15 1000 2000 3000 4000 5000
Time
0.5 1
Value
L-arm R-arm L-leg R-leg
1 2 3 4
1, 1, 2, 2, 2, 4, 4, 3,
CIKM2019 Sakurai Lab. K.Kawabata et al. 16
CIKM2019 Sakurai Lab. K.Kawabata et al. 17
CIKM2019 Sakurai Lab. K.Kawabata et al. 18
CIKM2019 Sakurai Lab. K.Kawabata et al. 19
π""
&
stand run walk
CIKM2019 Sakurai Lab. K.Kawabata et al. 20
π"" π#" π"# π## π#$ π$# π$$ π"$ π$"
state1 state3 state2
stand run walk
CIKM2019 Sakurai Lab. K.Kawabata et al. 21
π"" π#" π"# π## π#$ π$# π$$ π"$ π$"
state1 state3 state2
+
stand run walk
π%% π'' π%( π(% π'( π(' π(( π%' π'%
CIKM2019 Sakurai Lab. K.Kawabata et al. 22
CIKM2019 Sakurai Lab. K.Kawabata et al. 23
1 2 3 4 5 6 7 8 9 10 CostM CostC CostT
(# of r, m)
CIKM2019 Sakurai Lab. K.Kawabata et al. 24
A regime A segment A state Keep compact!
regime? How many? s e g m e n t ?
CIKM2019 Sakurai Lab. K.Kawabata et al. 25
Keep compact!
regime? How many? s e g m e n t ? A regime A segment A state D e t a i l s i n p a p e r
CIKM2019 Sakurai Lab. K.Kawabata et al. 26
CIKM2019 Sakurai Lab. K.Kawabata et al. 27
StreamScope Optimize/update parameter set π
Identify regime transitions & segments
Estimate new regimes π Main
CIKM2019 Sakurai Lab. K.Kawabata et al. 28
π) = π‘* βͺ π¦+
Data stream π
π’ β
CIKM2019 Sakurai Lab. K.Kawabata et al. 29
π) = π‘* βͺ π¦+
Data stream π
π’ β
Increase segments? (SegmentAssignment) vs. Increase states/regimes? (RegimeGeneration)
CIKM2019 Sakurai Lab. K.Kawabata et al. 30
Data stream π
π’ β
π) = π‘* βͺ π¦+
Increase segments? (SegmentAssignment) vs. Increase states/regimes? (RegimeGeneration)
CIKM2019 Sakurai Lab. K.Kawabata et al. 31
CIKM2019 Sakurai Lab. K.Kawabata et al. 32
π¦" π¦# π¦$ π¦* π¦+ π¦, π¦-
Dynamic programing algorithm to compute
π"$
CIKM2019 Sakurai Lab. K.Kawabata et al. 33
π"#
Dynamic programing algorithm to compute
Keep all candidate cut points
π# = 2, 3 π# = 4, 2 π¦" π¦# π¦$ π¦* π¦+ π¦, π¦-
π‘π₯ππ’πβ? ? π‘π₯ππ’πβ? ? π"$
CIKM2019 Sakurai Lab. K.Kawabata et al. 34
Dynamic programing algorithm to compute
πΏ β guarantee:
π# = 2, 3
π¦" π¦# π¦$ π¦* π¦+ π¦, π¦-
Keep all candidate cut points
π"$
CIKM2019 Sakurai Lab. K.Kawabata et al. 35
CIKM2019 Sakurai Lab. K.Kawabata et al. 36
Phase 1 Phase 2
CIKM2019 Sakurai Lab. K.Kawabata et al. 37
CIKM2019 Sakurai Lab. K.Kawabata et al. 38
CIKM2019 Sakurai Lab. K.Kawabata et al. 39
How successful is it in discovering patterns?
How well does it find cut-points & regimes?
How does it scale in terms of time & memory consumption?
CIKM2019 Sakurai Lab. K.Kawabata et al. 40 1000 2000 3000 4000 5000
Time
0.5 1
Value
L-arm R-arm L-leg R-leg
CIKM2019 Sakurai Lab. K.Kawabata et al. 41 1000 2000 3000 4000 5000
Time
0.5 1
Value
L-arm R-arm L-leg R-leg
1 2 3 4
#1 Going straight #2 Stretching arms #4 Stretching left arm #3 Stretching right arm
CIKM2019 Sakurai Lab. K.Kawabata et al. 42
Acceleration β (X,Y ,Z)
CIKM2019 Sakurai Lab. K.Kawabata et al. 43
#1 #2 #3 #4 #5
CIKM2019 Sakurai Lab. K.Kawabata et al. 44
#Mocap #Bicycle #Workout 0.5 1
Macro-F1 score
StreamScope AutoPlait TICC-2 TICC-4 TICC-8 pHMM
CIKM2019 Sakurai Lab. K.Kawabata et al. 45
#Mocap #Bicycle #Workout 0.5 1
Accuracy
StreamScope AutoPlait TICC-2 TICC-4 TICC-8 pHMM
CIKM2019 Sakurai Lab. K.Kawabata et al. 46
1 1.5 2 2.5 3 3.5 4
Time
104 10-4 10-2 100 102 104
Wall clock time (s)
StreamScope AutoPlait TICC pHMM
CIKM2019 Sakurai Lab. K.Kawabata et al. 47
1 1.5 2 2.5 3 3.5 4
Time
104 104 105 106
Memory space (byte)
StreamScope O(n)
Find optimal segments/regimes
Automatic and incremental
It does not depend on data length
CIKM2019 Sakurai Lab. K.Kawabata et al. 48
CIKM2019 Sakurai Lab. K.Kawabata et al. 49