Online Clustering of High-Dimensional Trajectories under Concept - - PowerPoint PPT Presentation

online clustering of high dimensional trajectories under
SMART_READER_LITE
LIVE PREVIEW

Online Clustering of High-Dimensional Trajectories under Concept - - PowerPoint PPT Presentation

Online Clustering of High-Dimensional Trajectories under Concept Drift 2011-09-07, ECMLPKDD 2011 Athens, Greece Georg Krempl 1 , 2 Zaigham Siddiqui 2 Myra Spiliopoulou 2 2 University of Magdeburg 1 University of Graz { myra,siddiqui,krempl }


slide-1
SLIDE 1

Online Clustering of High-Dimensional Trajectories under Concept Drift

2011-09-07, ECMLPKDD 2011 Athens, Greece

Georg Krempl1,2 Zaigham Siddiqui2 Myra Spiliopoulou2

1 University of Graz

georg.krempl@uni-graz.at

2 University of Magdeburg

{myra,siddiqui,krempl} @iti.cs.uni-magdeburg.de

1 / 25

slide-2
SLIDE 2

Outline

◮ Problem Description

◮ Motivation and Objectives ◮ Modeling Trajectories as Gaussian Mixtures ◮ Trajectory Clustering with Expectation Maximization (offline)

◮ TRACER Algorithm (online)

◮ Overview ◮ Initialisation ◮ Update, Clustering and Prediction

◮ Experiments

◮ Settings ◮ Results

◮ Conclusion

2 / 25

slide-3
SLIDE 3

Outline

◮ Problem Description

◮ Motivation and Objectives ◮ Modeling Trajectories as Gaussian Mixtures ◮ Trajectory Clustering with Expectation Maximization (offline)

◮ TRACER Algorithm (online)

◮ Overview ◮ Initialisation ◮ Update, Clustering and Prediction

◮ Experiments

◮ Settings ◮ Results

◮ Conclusion

3 / 25

slide-4
SLIDE 4

◮ CRM Application

◮ Customers are shopping online ◮ Money is spent on different product groups in a basket ◮ Multiple visits per customer ◮ Behaviour changing over time (recession, new product) ◮ Can we cluster customers ?

Can we predict values in the next basket ?

4 / 25

slide-5
SLIDE 5

◮ CRM Application

◮ Customers are shopping online ◮ Money is spent on different product groups in a basket ◮ Multiple visits per customer ◮ Behaviour changing over time (recession, new product) ◮ Can we cluster customers ?

Can we predict values in the next basket ?

◮ Trajectory Clustering Problem

◮ Customers: Population of individuals ◮ Each visit: Measurement,

Money spent in all product groups: Measurement vector

◮ Customer history: Trajectory ◮ Subpopulations of customers: Clusters ◮ Multiple measurements per individual ◮ Measurements are not taken at equi-distant times ◮ Distribution of measurements is subject to drift 4 / 25

slide-6
SLIDE 6

Clustering Trajectories under Drift: Objective

◮ Cluster individuals ◮ Track clusters over time ◮ Predict/Extrapolate cluster movements

t0 t1 t2 t3

5 / 25

slide-7
SLIDE 7

Clustering Trajectories under Drift: Objective

◮ Cluster individuals ◮ Track clusters over time ◮ Predict/Extrapolate cluster movements

t0 t1 t2 t3

6 / 25

slide-8
SLIDE 8

Clustering Trajectories under Drift: Objective

◮ Cluster individuals ◮ Track clusters over time ◮ Predict/Extrapolate cluster movements

t0 t1 t2 t3 t0 t1 t2 t3

6 / 25

slide-9
SLIDE 9

Clustering Trajectories under Drift: Objective

◮ Cluster individuals ◮ Track clusters over time ◮ Predict/Extrapolate cluster movements

t0 t1 t2 t3 t0 t1 t2 t3 t0 t1 t2 t3

6 / 25

slide-10
SLIDE 10

Clustering Trajectories under Drift

◮ Formulation as Gaussian Mixture Model ◮ zi = zi1, zi2, · · · , zini are the ni observations of i-th individual ◮ K clusters, with

◮ mixing proportions αk ◮ distribution parameters θk

mean depends on time via regression coefficients βk, covariance matrix Σk is static

for the k-th cluster.

7 / 25

slide-11
SLIDE 11

Clustering Trajectories under Drift

◮ Formulation as Gaussian Mixture Model ◮ zi = zi1, zi2, · · · , zini are the ni observations of i-th individual ◮ K clusters, with

◮ mixing proportions αk ◮ distribution parameters θk

mean depends on time via regression coefficients βk, covariance matrix Σk is static

for the k-th cluster.

◮ Likelihood of observing trajectory of individual i:

p(zi; Θ) =

ni

  • l=1

K

  • k=1

αkp(zil; θk) (1)

7 / 25

slide-12
SLIDE 12

EM Trajectory Clustering

◮ EM algorithm for general likelihood maximisation problem:

Dempster et al., 1977

◮ Offline EM Trajectory Clustering algorithm:

◮ Gaffney and Smyth, 1999 ◮ Provides an initial clustering ◮ Problem:

Offline algorithm, how to use in a stream? How robust against sudden change (non-smooth trajectories)

8 / 25

slide-13
SLIDE 13

Outline

◮ Problem Description

◮ Motivation and Objectives ◮ Modelling Trajectories as Gaussian Mixtures ◮ Trajectory Clustering with Expectation Maximisation (offline)

◮ TRACER Algorithm (online)

◮ Overview ◮ Initialisation ◮ Update, Clustering and Prediction

◮ Experiments

◮ Settings ◮ Results

◮ Conclusion

9 / 25

slide-14
SLIDE 14

TRACER Algorithm

Overview

◮ Make an initial clustering using EM ◮ Update clustering:

◮ Estimate new position of clusters ◮ Assign new individuals to clusters

◮ Assumptions:

◮ Static number of clusters, K ◮ Static covariance matrices, Σk 10 / 25

slide-15
SLIDE 15

TRACER Algorithm

Overview

◮ Make an initial clustering using EM ◮ Update clustering:

◮ Estimate new position of clusters ◮ Assign new individuals to clusters

◮ Assumptions:

◮ Static number of clusters, K ◮ Static covariance matrices, Σk

◮ Approach: K´

alm´ an filter (K´ alm´ an, 1959 )

10 / 25

slide-16
SLIDE 16

K´ alm´ an filter

◮ State transition: New state xs

xs = Axs−1 + ws (2)

◮ State-to-signal: Measurement z ∈ RD

zs = Hxs + vs (3)

11 / 25

slide-17
SLIDE 17

K´ alm´ an filter

◮ State transition: New state xs

xs = Axs−1 + ws (2)

◮ State-to-signal: Measurement z ∈ RD

zs = Hxs + vs (3)

◮ States: True (unobservable) cluster centroids,

vector of length D ∗ (O + 1)

◮ K´

alm´ an filter computes at each discrete time step s: State estimate for each cluster: ˆ xs Error estimate on cluster state: Ps

◮ Questions:

◮ How to chose ˆ

x0, A, Q, H, R ?

◮ How to assign individuals to clusters ? 11 / 25

slide-18
SLIDE 18

TRACER Initialisation

Initial State of Each Cluster

State is initialised from β-coefficients obtained via EM

◮ State vector µ0 of size (D ∗ (O + 1)x1) at t = 0:

f (t) = (f1(0), · · · , fD(0))

◮ d-th coordinate estimate:

f (0)

d

(t) = βd0 + tβd1 + · · · + toβdo

◮ Covariance matrix Σ0: Identity matrix

12 / 25

slide-19
SLIDE 19

TRACER Initialisation

State Transition Matrix A

◮ Matrix A = [aij] with

ai,j = δq = ∆q

q!

if ∃ q ∈ N0 : i − j + D ∗ q = 0

  • therwise

13 / 25

slide-20
SLIDE 20

TRACER Initialisation

State Transition Matrix A

◮ Matrix A = [aij] with

ai,j = δq = ∆q

q!

if ∃ q ∈ N0 : i − j + D ∗ q = 0

  • therwise

◮ Example for D = 2 and O = 2:

A =         a0 a1 a2 a0 a1 a2 a0 a1 a0 a1 a0 a0         with a0 = 1, a1 = ∆, a2 = ∆2

2

13 / 25

slide-21
SLIDE 21

TRACER Initialisation

Process Noise Covariance Matrix Q

◮ Identity matrix multiplied by process noise factor ˆ

q: Q = I ∗ ˆ q

14 / 25

slide-22
SLIDE 22

TRACER Initialisation

Process Noise Covariance Matrix Q

◮ Identity matrix multiplied by process noise factor ˆ

q: Q = I ∗ ˆ q

Measurement (or state-to-signal) Matrix H

◮ Set equal to the identity matrix, H = I

14 / 25

slide-23
SLIDE 23

TRACER Initialisation

Process Noise Covariance Matrix Q

◮ Identity matrix multiplied by process noise factor ˆ

q: Q = I ∗ ˆ q

Measurement (or state-to-signal) Matrix H

◮ Set equal to the identity matrix, H = I

Measurement Noise Covariance Matrix R

◮ Computed as covariance matrix of EM clustering

14 / 25

slide-24
SLIDE 24

TRACER Update and Clustering

New measurement z

  • Calc. meta-features

(speed, accelaration, etc) Update Matrices A, R Estimate new cluster positions*

  • Calc. measurement's

cluster membership probability Update individual's cluster membership probability

z of known individual? Y 15 / 25

slide-25
SLIDE 25

Outline

◮ Problem Description

◮ Motivation and Objectives ◮ Modelling Trajectories as Gaussian Mixtures ◮ Trajectory Clustering with Expectation Maximisation (offline)

◮ TRACER Algorithm (online)

◮ Overview ◮ Initialisation ◮ Update, Clustering and Prediction

◮ Experiments

◮ Settings ◮ Results

◮ Conclusion

16 / 25

slide-26
SLIDE 26

Objective

◮ Similar clustering quality of EM and TRACER? ◮ Robustness against sudden shift ◮ Speed and suitability for online processing

17 / 25

slide-27
SLIDE 27

Objective

◮ Similar clustering quality of EM and TRACER? ◮ Robustness against sudden shift ◮ Speed and suitability for online processing

Synthetic Data Streams with Drift

◮ 5 types of synthetic data sets:

◮ Different state transition noise (A : high, C low) ◮ Different number of dimensions (A, · · · , C: one; D, E: two)

◮ 10 data sets per type ◮ 1500 individuals, on average 2 measurements / individual,

1000 measurements for training, 1000 for test before shift, 1000 for test after shift

17 / 25

slide-28
SLIDE 28

Update Strategies

Method Description EM EM Expectation Maximisation (multivariate variant of [Gaffney and Smyth, 1999]) Kalman K-1 Confidence prop. to squared membership probability K-2 Confidence ∈ {0; 1}, winner-takes-all K-3 Confidence prop. to membership probability K-4 As K1, but 10x higher ST noise factor estimate K-5 As K1, but 10x smaller ST noise factor estimate K-6 As K1, but use of speed and acceleration as meta-features for membership probability estimation p

18 / 25

slide-29
SLIDE 29

Measure

◮ Cluster Purity:

purity = 1 N

K

  • j=1

K

max

i=1 Cij

Cij Number of elements in the i-th true and j-th pred. cluster N Total number of elements

◮ Wilcoxon signed rank sum test:

Significance of differences in clustering quality

19 / 25

slide-30
SLIDE 30

Accuracy of State Estimation over Time

  • 800
  • 600
  • 400
  • 200

200 400 600 0.5 1 1.5 2 2.5 3 Feature 1 Time Component A Component B Component C Cluster 1 Cluster 2 Cluster 3

Train Valid 1 Valid 2

20 / 25

slide-31
SLIDE 31

Dependence of Purity : Shift and Speed : Dataset Size

EM K 1 K 2 K 3 K 4 K 5 K 6 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 No Shift Shift

21 / 25

slide-32
SLIDE 32

Dependence of Purity : Shift and Speed : Dataset Size

EM K 1 K 2 K 3 K 4 K 5 K 6 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 No Shift Shift 2000 3000 20 40 60 80 100 120 EM K 1

Purity Time No Shift Shift 2000 3000 Method Description EM 0.93 0.91 72.74 95.42 Offline Expectation Maximisation K 1 0.82 0.88 5.84 5.92 Squared membership prob. c = 1/p2 K 2 0.77 0.80 5.54 5.68 Winner-takes-all K 3 0.82 0.88 5.82 6.10 Membership prob. as weights, c = 1/p K 4 0.81 0.86 5.76 5.92 As K1, but ST noise estimated 10x higher K 5 0.77 0.79 5.72 6.12 As K1, but ST noise estimated 10x lower K 6 0.79 0.84 5.84 5.92 As K1, but speed and acceleration as features for p estimation 21 / 25

slide-33
SLIDE 33

Outline

◮ Problem Description

◮ Motivation and Objectives ◮ Modelling Trajectories as Gaussian Mixtures ◮ Trajectory Clustering with Expectation Maximisation (offline)

◮ TRACER Algorithm (online)

◮ Overview ◮ Initialisation ◮ Update, Clustering and Prediction

◮ Experiments

◮ Settings ◮ Results

◮ Conclusion

22 / 25

slide-34
SLIDE 34

Conclusion

Summary

◮ Trajectory clustering: e.g. customers with purchase histories ◮ TRACER Algorithm: Online trajectory clustering and tracking ◮ Compared to offline EM: Competitive quality, much faster,

robust against shift Of particular interest when clustering streams

23 / 25

slide-35
SLIDE 35

Conclusion

Summary

◮ Trajectory clustering: e.g. customers with purchase histories ◮ TRACER Algorithm: Online trajectory clustering and tracking ◮ Compared to offline EM: Competitive quality, much faster,

robust against shift Of particular interest when clustering streams

Outlook

◮ Real-world application and experiments ◮ Dynamic covariance matrices (changing R over time),

dynamic number of clusters (changing K over time)

◮ Smoothness of prediction ◮ Consider case where individuals change their cluster

membership over time

23 / 25

slide-36
SLIDE 36

Conclusion

Questions ? Thank you!

Sourcecode available online: https://bitbucket.org/geos/ tracer-trajectory-tracking/overview

24 / 25

slide-37
SLIDE 37

Bibliography

  • A. P. Dempster, N. M. Laird, and D. Rubin.

Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39:1–38, 1977.

  • S. Gaffney and P. Smyth.

Trajectory clustering with mixtures of regression models. In KDD ’99, pages 63–72. ACM, 1999.

  • Y. Han, J. de Veth, and L. Boves.

Trajectory clustering for automatic speech recognition, 2005.

  • X. Jiang and N. Petkov, editors.

Computer Analysis of Images and Patterns, 13th International Conference, CAIP 2009, M¨ unster, Germany, September 2-4, 2009. Proceedings, volume 5702 of Lecture Notes in Computer Science. Springer, 2009.

  • R. E. Kalman.

A New Approach to Linear Filtering and Prediction Problems.

  • Trans. of the ASME – Journal of Basic Engineering, 82(Series D):35–45, 1960.
  • G. Xiong, C. Feng, and L. Ji.

Dynamical gaussian mixture model for tracking elliptical living objects. Pattern Recognition Letters, 27:838–842, May 2006.

25 / 25