 
              Online Clustering of High-Dimensional Trajectories under Concept Drift 2011-09-07, ECMLPKDD 2011 Athens, Greece Georg Krempl 1 , 2 Zaigham Siddiqui 2 Myra Spiliopoulou 2 2 University of Magdeburg 1 University of Graz { myra,siddiqui,krempl } georg.krempl@uni-graz.at @iti.cs.uni-magdeburg.de 1 / 25
Outline ◮ Problem Description ◮ Motivation and Objectives ◮ Modeling Trajectories as Gaussian Mixtures ◮ Trajectory Clustering with Expectation Maximization (offline) ◮ TRACER Algorithm (online) ◮ Overview ◮ Initialisation ◮ Update, Clustering and Prediction ◮ Experiments ◮ Settings ◮ Results ◮ Conclusion 2 / 25
Outline ◮ Problem Description ◮ Motivation and Objectives ◮ Modeling Trajectories as Gaussian Mixtures ◮ Trajectory Clustering with Expectation Maximization (offline) ◮ TRACER Algorithm (online) ◮ Overview ◮ Initialisation ◮ Update, Clustering and Prediction ◮ Experiments ◮ Settings ◮ Results ◮ Conclusion 3 / 25
◮ CRM Application ◮ Customers are shopping online ◮ Money is spent on different product groups in a basket ◮ Multiple visits per customer ◮ Behaviour changing over time (recession, new product) ◮ Can we cluster customers ? Can we predict values in the next basket ? 4 / 25
◮ CRM Application ◮ Customers are shopping online ◮ Money is spent on different product groups in a basket ◮ Multiple visits per customer ◮ Behaviour changing over time (recession, new product) ◮ Can we cluster customers ? Can we predict values in the next basket ? ◮ Trajectory Clustering Problem ◮ Customers: Population of individuals ◮ Each visit: Measurement , Money spent in all product groups: Measurement vector ◮ Customer history: Trajectory ◮ Subpopulations of customers: Clusters ◮ Multiple measurements per individual ◮ Measurements are not taken at equi-distant times ◮ Distribution of measurements is subject to drift 4 / 25
Clustering Trajectories under Drift: Objective ◮ Cluster individuals ◮ Track clusters over time ◮ Predict/Extrapolate cluster movements t 0 t 1 t 2 t 3 5 / 25
Clustering Trajectories under Drift: Objective ◮ Cluster individuals ◮ Track clusters over time ◮ Predict/Extrapolate cluster movements t 0 t 1 t 2 t 3 6 / 25
Clustering Trajectories under Drift: Objective ◮ Cluster individuals ◮ Track clusters over time ◮ Predict/Extrapolate cluster movements t 0 t 1 t 2 t 3 t 0 t 1 t 2 t 3 6 / 25
Clustering Trajectories under Drift: Objective ◮ Cluster individuals ◮ Track clusters over time ◮ Predict/Extrapolate cluster movements t 0 t 1 t 2 t 3 t 0 t 1 t 2 t 3 t 0 t 1 t 2 t 3 6 / 25
Clustering Trajectories under Drift ◮ Formulation as Gaussian Mixture Model ◮ z i = z i 1 , z i 2 , · · · , z in i are the n i observations of i -th individual ◮ K clusters, with ◮ mixing proportions α k ◮ distribution parameters θ k mean depends on time via regression coefficients β k , covariance matrix Σ k is static for the k -th cluster. 7 / 25
Clustering Trajectories under Drift ◮ Formulation as Gaussian Mixture Model ◮ z i = z i 1 , z i 2 , · · · , z in i are the n i observations of i -th individual ◮ K clusters, with ◮ mixing proportions α k ◮ distribution parameters θ k mean depends on time via regression coefficients β k , covariance matrix Σ k is static for the k -th cluster. ◮ Likelihood of observing trajectory of individual i : n i K � � p ( z i ; Θ) = α k p ( z il ; θ k ) (1) l =1 k =1 7 / 25
EM Trajectory Clustering ◮ EM algorithm for general likelihood maximisation problem: Dempster et al., 1977 ◮ Offline EM Trajectory Clustering algorithm: ◮ Gaffney and Smyth, 1999 ◮ Provides an initial clustering ◮ Problem: Offline algorithm, how to use in a stream? How robust against sudden change (non-smooth trajectories) 8 / 25
Outline ◮ Problem Description ◮ Motivation and Objectives ◮ Modelling Trajectories as Gaussian Mixtures ◮ Trajectory Clustering with Expectation Maximisation (offline) ◮ TRACER Algorithm (online) ◮ Overview ◮ Initialisation ◮ Update, Clustering and Prediction ◮ Experiments ◮ Settings ◮ Results ◮ Conclusion 9 / 25
TRACER Algorithm Overview ◮ Make an initial clustering using EM ◮ Update clustering: ◮ Estimate new position of clusters ◮ Assign new individuals to clusters ◮ Assumptions: ◮ Static number of clusters, K ◮ Static covariance matrices, Σ k 10 / 25
TRACER Algorithm Overview ◮ Make an initial clustering using EM ◮ Update clustering: ◮ Estimate new position of clusters ◮ Assign new individuals to clusters ◮ Assumptions: ◮ Static number of clusters, K ◮ Static covariance matrices, Σ k ◮ Approach: K´ alm´ an filter (K´ alm´ an, 1959 ) 10 / 25
K´ alm´ an filter ◮ State transition: New state x s x s = Ax s − 1 + w s (2) ◮ State-to-signal: Measurement z ∈ R D z s = Hx s + v s (3) 11 / 25
K´ alm´ an filter ◮ State transition: New state x s x s = Ax s − 1 + w s (2) ◮ State-to-signal: Measurement z ∈ R D z s = Hx s + v s (3) ◮ States: True (unobservable) cluster centroids, vector of length D ∗ ( O + 1) ◮ K´ alm´ an filter computes at each discrete time step s : State estimate for each cluster: ˆ x s Error estimate on cluster state: P s ◮ Questions: ◮ How to chose ˆ x 0 , A , Q , H , R ? ◮ How to assign individuals to clusters ? 11 / 25
TRACER Initialisation Initial State of Each Cluster State is initialised from β -coefficients obtained via EM ◮ State vector µ 0 of size ( D ∗ ( O + 1) x 1) at t = 0: f ( t ) = ( f 1 (0) , · · · , f D (0)) ◮ d -th coordinate estimate: f (0) ( t ) = β d 0 + t β d 1 + · · · + t o β do d ◮ Covariance matrix Σ 0 : Identity matrix 12 / 25
TRACER Initialisation State Transition Matrix A ◮ Matrix A = [ a ij ] with � δ q = ∆ q if ∃ q ∈ N 0 : i − j + D ∗ q = 0 q ! a i , j = 0 otherwise 13 / 25
TRACER Initialisation State Transition Matrix A ◮ Matrix A = [ a ij ] with � δ q = ∆ q if ∃ q ∈ N 0 : i − j + D ∗ q = 0 q ! a i , j = 0 otherwise ◮ Example for D = 2 and O = 2:  a 0 0 a 1 0 a 2 0  0 0 0 a 0 a 1 a 2     0 0 a 0 0 a 1 0   A =   0 0 0 0 a 0 a 1     0 0 0 0 a 0 0   0 0 0 0 0 a 0 with a 0 = 1, a 1 = ∆, a 2 = ∆ 2 2 13 / 25
TRACER Initialisation Process Noise Covariance Matrix Q ◮ Identity matrix multiplied by process noise factor ˆ q : Q = I ∗ ˆ q 14 / 25
TRACER Initialisation Process Noise Covariance Matrix Q ◮ Identity matrix multiplied by process noise factor ˆ q : Q = I ∗ ˆ q Measurement (or state-to-signal) Matrix H ◮ Set equal to the identity matrix, H = I 14 / 25
TRACER Initialisation Process Noise Covariance Matrix Q ◮ Identity matrix multiplied by process noise factor ˆ q : Q = I ∗ ˆ q Measurement (or state-to-signal) Matrix H ◮ Set equal to the identity matrix, H = I Measurement Noise Covariance Matrix R ◮ Computed as covariance matrix of EM clustering 14 / 25
TRACER Update and Clustering New measurement z Y z of known individual? Calc. meta-features (speed, accelaration, etc) Calc. measurement's Update Matrices cluster membership A, R probability Update individual's Estimate new cluster membership cluster positions* probability 15 / 25
Outline ◮ Problem Description ◮ Motivation and Objectives ◮ Modelling Trajectories as Gaussian Mixtures ◮ Trajectory Clustering with Expectation Maximisation (offline) ◮ TRACER Algorithm (online) ◮ Overview ◮ Initialisation ◮ Update, Clustering and Prediction ◮ Experiments ◮ Settings ◮ Results ◮ Conclusion 16 / 25
Objective ◮ Similar clustering quality of EM and TRACER? ◮ Robustness against sudden shift ◮ Speed and suitability for online processing 17 / 25
Objective ◮ Similar clustering quality of EM and TRACER? ◮ Robustness against sudden shift ◮ Speed and suitability for online processing Synthetic Data Streams with Drift ◮ 5 types of synthetic data sets: ◮ Different state transition noise ( A : high, C low) ◮ Different number of dimensions ( A , · · · , C : one; D , E : two) ◮ 10 data sets per type ◮ 1500 individuals, on average 2 measurements / individual, 1000 measurements for training, 1000 for test before shift, 1000 for test after shift 17 / 25
Update Strategies Method Description EM EM Expectation Maximisation (multivariate variant of [Gaffney and Smyth, 1999]) K-1 Confidence prop. to squared membership probability Kalman K-2 Confidence ∈ { 0; 1 } , winner-takes-all K-3 Confidence prop. to membership probability K-4 As K1, but 10x higher ST noise factor estimate K-5 As K1, but 10x smaller ST noise factor estimate K-6 As K1, but use of speed and acceleration as meta-features for membership probability estimation p 18 / 25
Measure ◮ Cluster Purity: K purity = 1 K � max i =1 C ij N j =1 C ij Number of elements in the i -th true and j -th pred. cluster N Total number of elements ◮ Wilcoxon signed rank sum test: Significance of differences in clustering quality 19 / 25
Accuracy of State Estimation over Time 600 Train Valid 1 Valid 2 400 200 0 Feature 1 -200 -400 -600 Component A Component B Component C Cluster 1 Cluster 2 Cluster 3 -800 0 0.5 1 1.5 2 2.5 3 Time 20 / 25
Recommend
More recommend