automatic sequential pattern mining in data streams
play

Automatic Sequential Pattern Mining in Data Streams Koki Kawabata*, - PowerPoint PPT Presentation

Automatic Sequential Pattern Mining in Data Streams Koki Kawabata*, Yasuko Matsubara & Yasushi Sakurai ISIR-AIRC, Osaka University *Supported by SIGIR Student Travel Grants Motivation Given: time-evolving data streams e.g., IoT


  1. Automatic Sequential Pattern Mining in Data Streams Koki Kawabata*, Yasuko Matsubara & Yasushi Sakurai ISIR-AIRC, Osaka University *Supported by SIGIR Student Travel Grants

  2. Motivation Given: time-evolving data streams • e.g., IoT sensors/Web click logs • contain multiple patterns CIKM2019 Sakurai Lab. K.Kawabata et al. 2

  3. Motivation Given: time-evolving data streams • e.g., IoT sensors/Web click logs • contain multiple patterns Answer: the following questions: 1. What kind of patterns? 2. How many patterns? 3. When do patterns change? CIKM2019 Sakurai Lab. K.Kawabata et al. 3

  4. Motivation Given: time-evolving data streams • e.g., IoT sensors/Web click logs • contain multiple patterns Requirements: • Incremental ? • We cannot access all historical data • Automatic • # of patterns are unknown in advance • without any parameter tunings CIKM2019 Sakurai Lab. K.Kawabata et al. 4

  5. Motivation Given: time-evolving data streams • e.g., IoT sensors/Web click logs • contain multiple patterns Requirements: • Incremental • We cannot access all historical data • Automatic • # of patterns are unknown in advance StreamScope: automatic & incremental approach • without any parameter tunings CIKM2019 Sakurai Lab. K.Kawabata et al. 5

  6. Demo movie CIKM2019 Sakurai Lab. K.Kawabata et al. 6

  7. Demo movie #1: Arm curl #2: Rowing CIKM2019 Sakurai Lab. K.Kawabata et al. 7

  8. Demo movie CIKM2019 Sakurai Lab. K.Kawabata et al. 8

  9. Demo movie #1: Arm curl #2: Rowing #3: Intervals #4: side raise #5: Push up CIKM2019 Sakurai Lab. K.Kawabata et al. 9

  10. Outline 1. Motivation 2. Problem definition 3. Model 4. Streaming Algorithm 5. Experiments 6. Conclusions CIKM2019 Sakurai Lab. K.Kawabata et al. 10

  11. Problem definition 1 L-arm Given: R-arm Value L-leg • Data stream X 0.5 R-leg 0 1000 2000 3000 4000 5000 Time Find: 1 1. Segment: 𝒯 2 2. Regime: Θ 3 3. Segment- 4 membership: ℱ CIKM2019 Sakurai Lab. K.Kawabata et al. 11

  12. Problem definition Data stream: set of d-dimensional vectors Given 𝑌 = 𝑦 ! , … , 𝑦 " 1 L-arm R-arm Value L-leg 0.5 R-leg 0 1000 2000 3000 4000 5000 𝑒 = 4 Time CIKM2019 Sakurai Lab. K.Kawabata et al. 12

  13. Problem definition Segment: start/end positions of each pattern Hidden 𝒯 = 𝑡 ! , … , 𝑡 # 1 L-arm R-arm Value L-leg 0.5 R-leg 0 1000 2000 3000 4000 5000 Time 𝑛 = 8 𝑡 ! 𝑡 " 𝑡 # 𝑡 $ 𝑡 % 𝑡 & 𝑡 ' 𝑡 ( CIKM2019 Sakurai Lab. K.Kawabata et al. 13

  14. Problem definition Regime: segment groups Hidden Θ = 𝜄 ! , … , 𝜄 $ , Φ 1 L-arm R-arm Value L-leg 0.5 R-leg 0 1000 2000 3000 4000 5000 Time 𝑠 = 4 1 2 3 4 CIKM2019 Sakurai Lab. K.Kawabata et al. 14

  15. Problem definition Segment-membership: regime-assignment Hidden ℱ = 𝑔 1 , … , 𝑔 𝑛 1 L-arm R-arm Value L-leg 0.5 R-leg 0 1000 2000 3000 4000 5000 Time e.g., 𝑔 3 = 4 1 2 3 4 ℱ = { } 1, 2, 4, 2, 3, 4, 2, 1 , CIKM2019 Sakurai Lab. K.Kawabata et al. 15

  16. Problem definition Given: d -dimensional data stream 𝑌 = 𝑦 ! , … , 𝑦 ) Find: compact description 𝒟 = 𝑛, 𝑠, 𝒯, Θ, ℱ of 𝑌 • 𝑛 segments 𝒯 • 𝑠 regimes Θ • segment-membership ℱ CIKM2019 Sakurai Lab. K.Kawabata et al. 16 𝒟 = 𝑛, 𝑠, 𝒯, Θ, ℱ

  17. Outline 1. Motivation 2. Problem definition 3. Model 4. Streaming Algorithm 5. Experiments 6. Conclusions CIKM2019 Sakurai Lab. K.Kawabata et al. 17

  18. Proposed model Goal: find compact description C in a streaming setting Challenges: Q1. How can we represent regimes? Idea (1): Hierarchical probabilistic model Q2. How can we decide # of segments/regimes? Idea (2): Model description cost CIKM2019 Sakurai Lab. K.Kawabata et al. 18

  19. Idea (1): hierarchical probabilistic model Q. How to describe patterns? stand walk Model run t Data stream Regimes Idea: HMM-based probabilistic model • ‘within-regime’ transitions: A hidden Markov model 𝜄 = 𝜌, 𝐵, 𝐶 & • ‘across-regime’ transitions: Regime transition matrix Φ = 𝜚 !" !,"$% CIKM2019 Sakurai Lab. K.Kawabata et al. 19 𝑏 "" 𝜄 *

  20. Idea (1): hierarchical probabilistic model Full model Θ = 𝜄 ! , … , 𝜄 + , Φ 𝑏 "" 𝜄 * stand walk state1 𝑏 #" 𝑏 $" 𝑏 "$ 𝑏 "# run state3 state2 𝑏 #$ Regimes 𝑏 $$ 𝑏 $# 𝑏 ## Single HMM parameters: 𝜄 * = 𝜌 * , 𝐵 * , 𝐶 * CIKM2019 Sakurai Lab. K.Kawabata et al. 20 Θ = 𝜄 ! , … , 𝜄 $ , Φ

  21. Idea (1): hierarchical probabilistic model Full model Θ = 𝜄 ! , … , 𝜄 + , Φ 𝜚 '' 𝜚 %% 𝜚 %' 𝑏 "" 𝜄 * stand walk 𝜚 '% state1 𝜚 '( 𝜚 %( 𝜚 (' 𝜚 (% 𝑏 #" 𝑏 $" 𝑏 "$ 𝑏 "# run state3 𝜚 (( state2 𝑏 #$ Regimes 𝑏 $$ 𝑏 $# 𝑏 ## Single HMM parameters: Regime transition matrix: + 𝜄 * = 𝜌 * , 𝐵 * , 𝐶 * Φ = 𝜚 *, *,,.! CIKM2019 Sakurai Lab. K.Kawabata et al. 21 Θ = 𝜄 ! , … , 𝜄 $ , Φ

  22. Idea (2): Incremental encoding scheme Q. How to decide # of segments/regimes? Idea: Minimum description length (MDL) • Minimize the total description cost of a data stream • Update ‘optimal’ # of segments/regimes CIKM2019 Sakurai Lab. K.Kawabata et al. 22

  23. Idea (2): Incremental encoding scheme Idea: Minimize total encoding cost CostM CostC min ( ) Cost M (C) + Cost C (X|C) CostT Model cost Coding cost Good Good compression description 1 2 3 4 5 6 7 8 9 10 (# of r, m) CIKM2019 Sakurai Lab. K.Kawabata et al. 23

  24. Idea (2): Incremental encoding scheme Q. How many new components does 𝒟 need? 𝑌 ! s e g m e n t ? regime? 𝒟 How many? A state A segment A regime Keep compact! CIKM2019 Sakurai Lab. K.Kawabata et al. 24

  25. Idea (2): Incremental encoding scheme Q. How many new components does 𝒟 need? 𝑌 ! s e g m e n t ? regime? 𝒟 How many? r e p a p n A state A segment A regime i s l i a t e D Keep compact! CIKM2019 Sakurai Lab. K.Kawabata et al. 25

  26. Outline 1. Motivation 2. Problem definition 3. Model 4. Streaming Algorithm 5. Experiments 6. Conclusions CIKM2019 Sakurai Lab. K.Kawabata et al. 26

  27. Streaming algorithms • Algorithms StreamScope Main Optimize/update parameter set 𝒟 1. SegmentAssignment Identify regime transitions & segments 2. RegimeGeneration Estimate new regimes 𝜄 CIKM2019 Sakurai Lab. K.Kawabata et al. 27

  28. StreamScope • Overview 𝑌 : Data stream 𝑌 𝑢 → 1. Keep current window: • The latest segment, 𝑡 % 𝒟 • New observations, 𝑦 & , … 𝑌 ) = 𝑡 * ∪ 𝑦 + CIKM2019 Sakurai Lab. K.Kawabata et al. 28

  29. StreamScope • Overview 𝑌 : Data stream 𝑌 𝑢 → 2. Update model set 𝓓 • Minimize Δ𝐷𝑝𝑡𝑢 ' (𝑌 ( |𝒟) 1. Keep current window: Increase segments? • The latest segment, 𝑡 % (SegmentAssignment) 𝒟 • New observations, 𝑦 & , … vs. 𝑌 ) = 𝑡 * ∪ 𝑦 + Increase states/regimes? (RegimeGeneration) CIKM2019 Sakurai Lab. K.Kawabata et al. 29

  30. StreamScope 3. Update 𝒀 𝒅 • If pattern has changed • Overview 𝑌 : Data stream 𝑌 𝑢 → 2. Update model set 𝓓 • Minimize Δ𝐷𝑝𝑡𝑢 ' (𝑌 ( |𝒟) 1. Keep current window: Increase segments? • The latest segment, 𝑡 % (SegmentAssignment) 𝒟 • New observations, 𝑦 & , … vs. 𝑌 ) = 𝑡 * ∪ 𝑦 + Increase states/regimes? (RegimeGeneration) CIKM2019 Sakurai Lab. K.Kawabata et al. 30

  31. 1. SegmentAssignment Given: • Observation 𝒚 𝒖 • Model parameter set Θ = {𝜄 % , … , 𝜄 & , Φ} Find: • Optimal cut point between regimes: 𝑛, 𝒯, ℱ CIKM2019 Sakurai Lab. K.Kawabata et al. 31

  32. 1. SegmentAssignment Overview 𝜄 ! Dynamic programing algorithm to compute 𝜄 " 𝜚 "$ 𝑄(𝑦 " |Θ) 𝜄 # 𝑢 → 𝑦 " 𝑦 # 𝑦 $ 𝑦 + 𝑦 * 𝑦 , 𝑦 - CIKM2019 Sakurai Lab. K.Kawabata et al. 32

  33. 1. SegmentAssignment Overview 𝜄 ! Dynamic programing 𝜚 "# 𝑀 # = 4, 2 𝑡𝑥𝑗𝑢𝑑ℎ? ? algorithm to compute 𝜄 " 𝑡𝑥𝑗𝑢𝑑ℎ? ? 𝜚 "$ 𝑀 # = 2, 3 𝑄(𝑦 " |Θ) Keep all candidate 𝜄 # cut points 𝑢 → ℒ = 𝑀 ! , 𝑀 % , … 𝑦 " 𝑦 # 𝑦 $ 𝑦 + 𝑦 * 𝑦 , 𝑦 - CIKM2019 Sakurai Lab. K.Kawabata et al. 33

  34. 1. SegmentAssignment Overview 𝜄 ! Dynamic programing algorithm to compute 𝜄 " 𝜚 "$ 𝑀 # = 2, 3 𝑄(𝑦 " |Θ) Keep all candidate 𝜄 # cut points 𝑢 → ℒ = 𝑀 ! , 𝑀 % , … 𝑦 " 𝑦 # 𝑦 $ 𝑦 + 𝑦 * 𝑦 , 𝑦 - 𝛿 − guarantee: 𝛿 𝛿 ∝ 𝑛𝑓𝑏𝑜( 𝑡 ) CIKM2019 Sakurai Lab. K.Kawabata et al. 34

  35. 2. RegimeGeneration Given: • Current window 𝑌 ) Find: • New regimes: parameter set 𝑛, 𝑠, 𝒯, Θ, ℱ for 𝑌 ) CIKM2019 Sakurai Lab. K.Kawabata et al. 35

  36. 2. RegimeGeneration 1. Two phase iterative approach 𝑌 / • Phase1: split segments into 2 groups 𝜄 ! 𝜄 " • Phase2: update 2 model parameters Phase 1 S 1 = S 2 = 𝜄 ! , 𝜄 % , Φ Phase 2 CIKM2019 Sakurai Lab. K.Kawabata et al. 36

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend