Time Time-
- focused density
focused density-
- based
based clustering of trajectories clustering of trajectories
- f moving objects
- f moving objects
Margherita Margherita D’Auria D’Auria Mirco Nanni Mirco Nanni Dino Dino Pedreschi Pedreschi
Time- -focused density focused density- -based based Time - - PowerPoint PPT Presentation
Time- -focused density focused density- -based based Time clustering of trajectories clustering of trajectories of moving objects of moving objects Margherita DAuria DAuria Margherita Mirco Nanni Mirco Nanni Dino Pedreschi
Margherita Margherita D’Auria D’Auria Mirco Nanni Mirco Nanni Dino Dino Pedreschi Pedreschi
Introduction
Motivations Problem & context Density-based Clustering (OPTICS)
Density-based clustering on trajectories
Trajectory data model distance measure Results
Temporal Focusing
A clustering quality measure Heuristics for optimal temporal interval
Conclusions & future work
Data mining methods Which kind of patterns/models?
A better understanding of the application domain An improvement for private and public services
PDAs Mobile phones LBS-enabled devices (may include the two above)
Discovering groups of individuals that (approx.) move together in some
period of time
E.g.: detection of traffic jams during rush hours
Clustering of individuals’ trajectories
Non-spherical clusters should be allowed
E.g.: A traffic jam along a road It should be represented as a cluster which individuals form a
“snake-shaped” cluster
Tolerance to noise Low computational cost Applicability to complex, possibly non-vectorial data
In particular, we adopt OPTICS
A density threshold is defined through two parameters:
: A neighborhood radius MinPts: Minimum number of points
Key concepts:
Core objects
Objects with a -Neighborhood that contains at least MinPts objects
Reachability-distance reach-d( p, q )
(simplified definition:) Distance between objects p and q
Example: :
Object “q” is a core object if MinPts=2 Object “p” is not Their reach-d() is shown
–neighborhood of q neighborhood of q reach ch-d(p,q)
0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 0.22 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
18 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Y axis X axis
The algorithm:
1.
Repeatedly choose a non-visited random object, until a core object is selected
2.
Select the core object having the smallest reachability distance from all the visited core objects. If none can be found, go to step 1
Order of visit Output: reach-d() of all visited points ( (reachability reachability plot plot) )
“jump” from left
and group (0
) to right
and one (10 (10
8) 1 8) Reachability threshold Cluster 1 Cluster 2
A suitable representation for trajectories is needed
Which data model for trajectories?
A mean for comparing trajectories has to be provided
Which distance between objects? OPTICS needs to define one to perform range queries
Each trajectory is represented as a set of time-stamped coordinates T=(t1,x1,y1), …, (tn, xn, yn) => Object position at time ti was (xi,yi)
Parametric-spaghetti: linear interpolation between consecutive points
| | )) ( ), ( ( | ) , (
2 1 2 1
T dt t t d D
T T
= τ τ τ τ
E.g.: objects that are close to each other within a time
E.g.: rush hours vs. low traffic times
An automated mechanism is needed to find them
1. 1.
2. 2.
High-density clusters separated by
low-density noise are preferred
High-density clusters correspond to
low dents in the reachability plot => Evaluate the global quality Q of the clustering
reachability within clusters (noise is discarded)
D, as
QD, = - R (D, ’) = - AVGo in D’ reach-d(o) D’ = D – {noise objects}
Step 1: trajectory segments out of I are clipped away Step 2: OPTICS is run on the clipped trajectories Step 3: Q(I) is computed on the output reachability plot
A reachability threshold is needed in order to locate clusters (and noise) The threshold for the largest I is manually set by the user Thresholds for other intervals I’ I are computed from the first one by
proportionally rescaling w.r.t. average reachability
=> A small decrease in Q(I) is accepted when it yields a much larger I
A more complex complex sample sample dataset dataset ( (generated generated by by CENTRE) CENTRE)
Clear clusters in the central time interval vs. dispersion on the borders
Find the the optimal
Q() by by plotting plotting values values for for all all time time intervals intervals
The optimum corresponds to the central time interval
Each Q() Q() value value computation computation requires requires a a run run of the OPTICS
algorithm
Computing all all O(N O(N2
2)
) values values is is too too expensive expensive (N=|{ (N=|{sub sub-
intervals}|) }|)
Alternative approaches approaches are are needed needed
Preliminary tests tests with with hill hill-
climbing ( (i.e i.e., ., greedy greedy) ) approach approach: :
20 40 60 80 100 Tstart 20 40 60 80 100 Tend Tstart Tend
Test on the same same dataset dataset
Global optimum
found in the in the 70,7% of 70,7% of runs runs
Avg. . number number of
steps: 17 : 17
. OPTICS runs runs: 49 : 49
starting starting points points local local
global global
Extension of OPTICS to a trajectory data model & distance Definition of the Temporal Focusing problem Definition of a clustering quality measure (Preliminary) Tests with exhaustive & greedy optimization
Experimental validation over broader benchmarks Tighter integration between OPTICS and search strategy Alternative, domain-specific definition of quality measures