online clustering of high dimensional trajectories under
play

Online Clustering of High-Dimensional Trajectories under Concept - PowerPoint PPT Presentation

Online Clustering of High-Dimensional Trajectories under Concept Drift 2011-09-07, ECMLPKDD 2011 Athens, Greece Georg Krempl 1 , 2 Zaigham Siddiqui 2 Myra Spiliopoulou 2 2 University of Magdeburg 1 University of Graz { myra,siddiqui,krempl }


  1. Online Clustering of High-Dimensional Trajectories under Concept Drift 2011-09-07, ECMLPKDD 2011 Athens, Greece Georg Krempl 1 , 2 Zaigham Siddiqui 2 Myra Spiliopoulou 2 2 University of Magdeburg 1 University of Graz { myra,siddiqui,krempl } georg.krempl@uni-graz.at @iti.cs.uni-magdeburg.de 1 / 25

  2. Outline ◮ Problem Description ◮ Motivation and Objectives ◮ Modeling Trajectories as Gaussian Mixtures ◮ Trajectory Clustering with Expectation Maximization (offline) ◮ TRACER Algorithm (online) ◮ Overview ◮ Initialisation ◮ Update, Clustering and Prediction ◮ Experiments ◮ Settings ◮ Results ◮ Conclusion 2 / 25

  3. Outline ◮ Problem Description ◮ Motivation and Objectives ◮ Modeling Trajectories as Gaussian Mixtures ◮ Trajectory Clustering with Expectation Maximization (offline) ◮ TRACER Algorithm (online) ◮ Overview ◮ Initialisation ◮ Update, Clustering and Prediction ◮ Experiments ◮ Settings ◮ Results ◮ Conclusion 3 / 25

  4. ◮ CRM Application ◮ Customers are shopping online ◮ Money is spent on different product groups in a basket ◮ Multiple visits per customer ◮ Behaviour changing over time (recession, new product) ◮ Can we cluster customers ? Can we predict values in the next basket ? 4 / 25

  5. ◮ CRM Application ◮ Customers are shopping online ◮ Money is spent on different product groups in a basket ◮ Multiple visits per customer ◮ Behaviour changing over time (recession, new product) ◮ Can we cluster customers ? Can we predict values in the next basket ? ◮ Trajectory Clustering Problem ◮ Customers: Population of individuals ◮ Each visit: Measurement , Money spent in all product groups: Measurement vector ◮ Customer history: Trajectory ◮ Subpopulations of customers: Clusters ◮ Multiple measurements per individual ◮ Measurements are not taken at equi-distant times ◮ Distribution of measurements is subject to drift 4 / 25

  6. Clustering Trajectories under Drift: Objective ◮ Cluster individuals ◮ Track clusters over time ◮ Predict/Extrapolate cluster movements t 0 t 1 t 2 t 3 5 / 25

  7. Clustering Trajectories under Drift: Objective ◮ Cluster individuals ◮ Track clusters over time ◮ Predict/Extrapolate cluster movements t 0 t 1 t 2 t 3 6 / 25

  8. Clustering Trajectories under Drift: Objective ◮ Cluster individuals ◮ Track clusters over time ◮ Predict/Extrapolate cluster movements t 0 t 1 t 2 t 3 t 0 t 1 t 2 t 3 6 / 25

  9. Clustering Trajectories under Drift: Objective ◮ Cluster individuals ◮ Track clusters over time ◮ Predict/Extrapolate cluster movements t 0 t 1 t 2 t 3 t 0 t 1 t 2 t 3 t 0 t 1 t 2 t 3 6 / 25

  10. Clustering Trajectories under Drift ◮ Formulation as Gaussian Mixture Model ◮ z i = z i 1 , z i 2 , · · · , z in i are the n i observations of i -th individual ◮ K clusters, with ◮ mixing proportions α k ◮ distribution parameters θ k mean depends on time via regression coefficients β k , covariance matrix Σ k is static for the k -th cluster. 7 / 25

  11. Clustering Trajectories under Drift ◮ Formulation as Gaussian Mixture Model ◮ z i = z i 1 , z i 2 , · · · , z in i are the n i observations of i -th individual ◮ K clusters, with ◮ mixing proportions α k ◮ distribution parameters θ k mean depends on time via regression coefficients β k , covariance matrix Σ k is static for the k -th cluster. ◮ Likelihood of observing trajectory of individual i : n i K � � p ( z i ; Θ) = α k p ( z il ; θ k ) (1) l =1 k =1 7 / 25

  12. EM Trajectory Clustering ◮ EM algorithm for general likelihood maximisation problem: Dempster et al., 1977 ◮ Offline EM Trajectory Clustering algorithm: ◮ Gaffney and Smyth, 1999 ◮ Provides an initial clustering ◮ Problem: Offline algorithm, how to use in a stream? How robust against sudden change (non-smooth trajectories) 8 / 25

  13. Outline ◮ Problem Description ◮ Motivation and Objectives ◮ Modelling Trajectories as Gaussian Mixtures ◮ Trajectory Clustering with Expectation Maximisation (offline) ◮ TRACER Algorithm (online) ◮ Overview ◮ Initialisation ◮ Update, Clustering and Prediction ◮ Experiments ◮ Settings ◮ Results ◮ Conclusion 9 / 25

  14. TRACER Algorithm Overview ◮ Make an initial clustering using EM ◮ Update clustering: ◮ Estimate new position of clusters ◮ Assign new individuals to clusters ◮ Assumptions: ◮ Static number of clusters, K ◮ Static covariance matrices, Σ k 10 / 25

  15. TRACER Algorithm Overview ◮ Make an initial clustering using EM ◮ Update clustering: ◮ Estimate new position of clusters ◮ Assign new individuals to clusters ◮ Assumptions: ◮ Static number of clusters, K ◮ Static covariance matrices, Σ k ◮ Approach: K´ alm´ an filter (K´ alm´ an, 1959 ) 10 / 25

  16. K´ alm´ an filter ◮ State transition: New state x s x s = Ax s − 1 + w s (2) ◮ State-to-signal: Measurement z ∈ R D z s = Hx s + v s (3) 11 / 25

  17. K´ alm´ an filter ◮ State transition: New state x s x s = Ax s − 1 + w s (2) ◮ State-to-signal: Measurement z ∈ R D z s = Hx s + v s (3) ◮ States: True (unobservable) cluster centroids, vector of length D ∗ ( O + 1) ◮ K´ alm´ an filter computes at each discrete time step s : State estimate for each cluster: ˆ x s Error estimate on cluster state: P s ◮ Questions: ◮ How to chose ˆ x 0 , A , Q , H , R ? ◮ How to assign individuals to clusters ? 11 / 25

  18. TRACER Initialisation Initial State of Each Cluster State is initialised from β -coefficients obtained via EM ◮ State vector µ 0 of size ( D ∗ ( O + 1) x 1) at t = 0: f ( t ) = ( f 1 (0) , · · · , f D (0)) ◮ d -th coordinate estimate: f (0) ( t ) = β d 0 + t β d 1 + · · · + t o β do d ◮ Covariance matrix Σ 0 : Identity matrix 12 / 25

  19. TRACER Initialisation State Transition Matrix A ◮ Matrix A = [ a ij ] with � δ q = ∆ q if ∃ q ∈ N 0 : i − j + D ∗ q = 0 q ! a i , j = 0 otherwise 13 / 25

  20. TRACER Initialisation State Transition Matrix A ◮ Matrix A = [ a ij ] with � δ q = ∆ q if ∃ q ∈ N 0 : i − j + D ∗ q = 0 q ! a i , j = 0 otherwise ◮ Example for D = 2 and O = 2:  a 0 0 a 1 0 a 2 0  0 0 0 a 0 a 1 a 2     0 0 a 0 0 a 1 0   A =   0 0 0 0 a 0 a 1     0 0 0 0 a 0 0   0 0 0 0 0 a 0 with a 0 = 1, a 1 = ∆, a 2 = ∆ 2 2 13 / 25

  21. TRACER Initialisation Process Noise Covariance Matrix Q ◮ Identity matrix multiplied by process noise factor ˆ q : Q = I ∗ ˆ q 14 / 25

  22. TRACER Initialisation Process Noise Covariance Matrix Q ◮ Identity matrix multiplied by process noise factor ˆ q : Q = I ∗ ˆ q Measurement (or state-to-signal) Matrix H ◮ Set equal to the identity matrix, H = I 14 / 25

  23. TRACER Initialisation Process Noise Covariance Matrix Q ◮ Identity matrix multiplied by process noise factor ˆ q : Q = I ∗ ˆ q Measurement (or state-to-signal) Matrix H ◮ Set equal to the identity matrix, H = I Measurement Noise Covariance Matrix R ◮ Computed as covariance matrix of EM clustering 14 / 25

  24. TRACER Update and Clustering New measurement z Y z of known individual? Calc. meta-features (speed, accelaration, etc) Calc. measurement's Update Matrices cluster membership A, R probability Update individual's Estimate new cluster membership cluster positions* probability 15 / 25

  25. Outline ◮ Problem Description ◮ Motivation and Objectives ◮ Modelling Trajectories as Gaussian Mixtures ◮ Trajectory Clustering with Expectation Maximisation (offline) ◮ TRACER Algorithm (online) ◮ Overview ◮ Initialisation ◮ Update, Clustering and Prediction ◮ Experiments ◮ Settings ◮ Results ◮ Conclusion 16 / 25

  26. Objective ◮ Similar clustering quality of EM and TRACER? ◮ Robustness against sudden shift ◮ Speed and suitability for online processing 17 / 25

  27. Objective ◮ Similar clustering quality of EM and TRACER? ◮ Robustness against sudden shift ◮ Speed and suitability for online processing Synthetic Data Streams with Drift ◮ 5 types of synthetic data sets: ◮ Different state transition noise ( A : high, C low) ◮ Different number of dimensions ( A , · · · , C : one; D , E : two) ◮ 10 data sets per type ◮ 1500 individuals, on average 2 measurements / individual, 1000 measurements for training, 1000 for test before shift, 1000 for test after shift 17 / 25

  28. Update Strategies Method Description EM EM Expectation Maximisation (multivariate variant of [Gaffney and Smyth, 1999]) K-1 Confidence prop. to squared membership probability Kalman K-2 Confidence ∈ { 0; 1 } , winner-takes-all K-3 Confidence prop. to membership probability K-4 As K1, but 10x higher ST noise factor estimate K-5 As K1, but 10x smaller ST noise factor estimate K-6 As K1, but use of speed and acceleration as meta-features for membership probability estimation p 18 / 25

  29. Measure ◮ Cluster Purity: K purity = 1 K � max i =1 C ij N j =1 C ij Number of elements in the i -th true and j -th pred. cluster N Total number of elements ◮ Wilcoxon signed rank sum test: Significance of differences in clustering quality 19 / 25

  30. Accuracy of State Estimation over Time 600 Train Valid 1 Valid 2 400 200 0 Feature 1 -200 -400 -600 Component A Component B Component C Cluster 1 Cluster 2 Cluster 3 -800 0 0.5 1 1.5 2 2.5 3 Time 20 / 25

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend