Trajectory Clustering: Visual Analytics Approaches Gennady Andrienko - - PowerPoint PPT Presentation
Trajectory Clustering: Visual Analytics Approaches Gennady Andrienko - - PowerPoint PPT Presentation
Trajectory Clustering: Visual Analytics Approaches Gennady Andrienko & Natalia Andrienko y http://geoanalytics.net Outline Similarity measures for trajectories Si il it f t j t i Density-based clustering of trajectories
Outline
Si il it f t j t i
- Similarity measures for trajectories
- Density-based clustering of trajectories
U i S Ti C b f i t ti l t
- Using Space-Time Cube for interpreting clusters
- Progressive clustering
- Using projection for assessing clustering results
Using projection for assessing clustering results
- Sammon’s projection for trajectory clustering
- Clustering of large trajectory data sets
Clustering of large trajectory data sets
- Analysis of trajectory attributes in clusters
2 http://geoanalytics.net
Projection
Set of objects Projection space Set of objects Projection space
Basic idea: similar objects close points; dissimilar objects distant points
3
This requires a numeric measure of (dis)similarity
http://geoanalytics.net
Measuring (dis)similarity
P Approach 1: use feature vectors
- Purpose
- Material
- Colour
- Approach 1: use feature vectors
- Describe objects by values of N
numeric attributes (features) chosen according to the analysis goals
- Colour
- Size
- Shape
according to the analysis goals
- Feature vector == list of N attribute
values
- Weight
- Producer
- Dissimilarity == distance between the
vectors in the N-dimensional abstract space of the possible combinations of the attribute values (e g Euclidean
- …
the attribute values (e.g. Euclidean distance)
- However, objects may have complex
properties that cannot be adequately properties that cannot be adequately represented by numeric features
- Approach 2: devise an ad-hoc
distance function
4
distance function
i.e. algorithm to measure dissimilarity
http://geoanalytics.net
Using feature vectors
Cl t i f t bl d t Whi h t j t tt ib t t ?
- Clustering of table data
standard clustering methods
- Which trajectory attributes to use?
- N points
- Duration
Travelled distance
- Travelled distance
- Displacement
- Direction
- Sinuosity
- Tortuosity
- Average / Median / Max speed
- N intermediate stops
- Total / Max duration of intermediate stops
- …
- How to avoid redundancy?
- How to normalize the selected attributes?
- Which clustering method to apply?
5
- How many clusters are expected?
http://geoanalytics.net
Trajectories are complex geographical objects:
- Trajectory: sequence (t1,s1), (t2,s2), … (ti – time moment, si – position in
) space)
- Similarity: routes (ignoring time), dynamics (relative time), coincidence
(absolute time) – no adequate representation by feature vectors
6
- Complexities: different sequence lengths, different time spacing,
measurement errors and gaps, …
http://geoanalytics.net
Distance functions for trajectories
A lib f il d t d bl di t f ti
- A library of easily understandable distance functions
- riented to different properties
- More sophisticated distance functions are available
More sophisticated distance functions are available through loose integration with HERMES TDW
- N.Pelekis, G.Andrienko, N.Andrienko, I.Kopanakis, G.Marketos,
Y.Theodoridis Visually Exploring Movement Data via Similarity-based Analysis Journal of Intelligent Information Systems, 2012, v.38 (2), pp.343-391 http://dx.doi.org/10.1007/s10844-011-0159-2
7 http://geoanalytics.net
Example of specific distance function: route similarity
Fi d di i t i t
- Finds corresponding points in two
trajectories
- Computes the average distance
Computes the average distance between the corresponding points
- Accumulates the length of the
di t corresponding parts
- Accumulates the deviations of non-
corresponding points corresponding points
- Penalty factor = (cumulative deviation) /
(corresponding length)
- Penalty distance = (cumulative
deviation) * (penalty factor) Fi l di t di t
8
- Final distance = average distance +
penalty distance
http://geoanalytics.net
Density-based Clustering Algorithm OPTICS
Ankerst M Breunig M Kriegel H -P & Sander J (1999) OPTICS: Ordering Points to Identify the Clustering Ankerst, M., Breunig, M., Kriegel, H. P., & Sander, J. (1999). OPTICS: Ordering Points to Identify the Clustering
- Structure. Proceedings of ACM SIGMOD International Conference on Management of Data (SIGMOD'99).
Philadelphia: ACM, 49-60
The algorithm assigns a reachability distance to each The algorithm assigns a reachability distance to each
- bject. The first object is chosen randomly. At each
following step, the object with the smallest reachability distance to the previously chosen objects is selected. y j
Reachability plot The order of choosing the objects Eps: a distance threshold MinPts: minimum number of neighbours for MinPts: minimum number of neighbours for a core object of a cluster
- : a core object (has MinPts neighbours)
r(p1), r(p2): reachability distances of p1 and
9
(p1), (p2) y p1 p2
Cluster 1 Cluster 2
Property of the algorithm: separates clusters from “noise”, typical from peculiar
http://geoanalytics.net
Clustering of trajectories
10 http://geoanalytics.net
Spatial summarization of trajectory clusters
D t il Details: N.Andrienko, G.Andrienko Spatial Generalization and Aggregation
- f Massive Movement Data
- f Massive Movement Data
IEEE Transactions on Visualization and Computer Graphics (TVCG), 2011, v.17 (2), pp.205-219 http://doi ieeecomputersociety org/10 1109 http://doi.ieeecomputersociety.org/10.1109 /TVCG.2010.44
11 http://geoanalytics.net
Space-time cube for cluster interpretation?
12 http://geoanalytics.net
Time transformation in space-time cube
T f ti ith t t t l l hi h i l d
- Transformations with respect to temporal cycles, which include
- bringing the times of the trajectories to the same year or season,
the same month
- the same month,
- week,
- day,
y,
- Hour
- Transformations with respect to the individual lifelines of the trajectories, which
include
- bringing the trajectories to a common start moment,
a common end moment
- a common end moment,
- common start and end moments
13 http://geoanalytics.net
Transformations with respect to temporal cycles: days
14 http://geoanalytics.net
Transformations with respect to temporal cycles: weeks
15 http://geoanalytics.net
Transformations with respect to individual lifelines
16 http://geoanalytics.net
Progressive clustering
C bi f ti b l i t lt f th
- Combine functions by applying one to results of another
- For example, cluster trajectories by common starts and ends, continue with route
similarity {& dynamics} for clusters of interest y { y }
- The approach improves computational complexity
by applying complex functions to the results of cheap functions
- Each step brings easily interpretable results;
following steps refine the results and our knowledge Details:
- Details:
- S.Rinzivillo, D.Pedreschi, M.Nanni, F.Giannotti, N.Andrienko, G.Andrienko
Visually–driven analysis of movement data by progressive clustering Information Visualization 2008 v 7 (3/4) pp 225-239 Information Visualization, 2008, v.7 (3/4), pp. 225 239 http://dx.doi.org/10.1057/palgrave.ivs.9500183
17 http://geoanalytics.net
Projection techniques
P i i l t l i (PCA) M lti di i l li (MDS)
- Principal components analysis (PCA)
- Self-organizing map (SOM) – projection
- nto a discrete space (regular grid)
- Multi-dimensional scaling (MDS)
- Sammon’s projection (a.k.a. Sammon’s
mapping)
- nto a discrete space (regular grid)
- Require input in form of feature vectors
mapping)
- Can be applied to a pre-defined
- Require input in form of feature vectors
- Can be applied to a pre defined
distance matrix, which can be computed using an arbitrary distance function S it bl f l bj t Suitable for complex objects requiring specific distance functions
18 http://geoanalytics.net
Use of Sammon’s projection to explore clustering results
19 http://geoanalytics.net
Density-based clusters of trajectories by route similarity
20 http://geoanalytics.net
Exploring clustering results
Are the clusters well separated? Are the clusters well-separated? Are they compact? How different are the clusters? Ho far are the cl sters from the noise? How far are the clusters from the noise? How sensitive are the results to the parameters?
21 http://geoanalytics.net
Exploring selected clusters
22 http://geoanalytics.net
Use of projection to define clusters
23 http://geoanalytics.net
Tessellation of the projection area
E h l d fi
- Each polygon defines a
cluster
24 http://geoanalytics.net
Choosing cluster sizes
The user may interactively change the sizes (radii) of the clusters.
25
y y g ( ) Further possible extension: interactive refinement or joining of selected clusters by direct manipulation in the projection display.
http://geoanalytics.net
Assigning colours to clusters
26 http://geoanalytics.net
Clusters of trajectories defined by means of projection
27 http://geoanalytics.net
Progressive clustering with the use of projection
28
Step 1 Step 2
http://geoanalytics.net
Results of step 2
29 http://geoanalytics.net
Clustering of a very large dataset
P bl l t i f l bj t ( h t j t i ) i l i t i i l
- Problem: clustering of complex objects (such as trajectories) involving non-trivial
distance functions (such as “route similarity”) can only be done in RAM, i.e. for a relatively small dataset
- Our approach:
- 1. Take a subset (sample) of the objects suitable for processing in RAM.
- 2. Discover clusters in the subset.
- 3. Load the remaining objects into RAM by portions.
Classify each object = identify to which of the discovered clusters the object Classify each object identify to which of the discovered clusters the object belongs. Store the result of the classification in the database. 4 Take the objects that remained unclassified and apply steps 1 to 3 to them
- 4. Take the objects that remained unclassified and apply steps 1 to 3 to them.
Repeat the procedure until no meaningful new clusters can be discovered.
- Question: how to identify the cluster where an object belongs?
30
G.Andrienko, N.Andrienko, S.Rinzivillo, M.Nanni, D.Pedreschi, F.Giannotti Interactive Visual Clustering of Large Collections of Trajectories IEEE Visual Analytics Science and Technology (VAST 2009) Proceedings IEEE Computer Society Press 2009 pp 3 10
http://geoanalytics.net
Proceedings, IEEE Computer Society Press, 2009, pp.3-10 http://dx.doi.org/10.1109/VAST.2009.5332584
Classifier, the main idea
From each cl ster C select one or more representati e objects (protot pes) and
- From each cluster Ci select one or more representative objects (prototypes) and
respective distance thresholds: { (pt1, d1), …, (ptn, dn) } such that oCi k, 1kn: distance (o, ptk) < dk Th t f ll l t t t ith th ti di t th h ld d fi
- The set of all cluster prototypes with the respective distance thresholds defines
the classifier
- A new object o may be ascribed to the cluster if the same condition holds for it.
For each object from a large database:
- compute the distances to all prototypes;
- take the closest prototype among those with the distances below the thresholds and
take the closest prototype among those with the distances below the thresholds and ascribe the object to the respective cluster;
- if no such prototypes found, label the object as unclassified.
- To select prototypes:
To select prototypes:
Divide the cluster into “round” subclusters Take the medoid of each subcluster as one of the prototypes
31
Take the maximum of the distances from the subcluster medoid to the subcluster members as the distance threshold for this prototype
http://geoanalytics.net
Illustration of the idea on point data: represent a cluster by Illustration of the idea on point data: represent a cluster by round sub-clusters and choose prototypes & radii
32 http://geoanalytics.net
Division of a cluster into “round” subclusters
33 http://geoanalytics.net
34
Is this all that is needed for a good analysis?
http://geoanalytics.net
To obtain meaningful results, the analyst needs to review and, possibly, edit the classifier
Some trajectories are not very similar to the others. Should such trajectories be in the cluster? Should I keep the three branches in
- ne cluster?
- ne cluster?
Or should I divide the cluster into two or three Is it good to have this prototype? This is not a core trajectory of the
35
into two or three clusters? cluster.
http://geoanalytics.net
Interactive editing of the classifier
36 http://geoanalytics.net
Example of interactive editing: remove selected objects from a cluster
37 http://geoanalytics.net
Example of interactive editing: move selected subclusters to a new cluster
38 http://geoanalytics.net
Example of interactive editing: merge and refine subclusters
39 http://geoanalytics.net
Application of the classifier to the whole database
40 http://geoanalytics.net
Application of the classifier to the whole database: results
41 http://geoanalytics.net
42 http://geoanalytics.net
Selected clusters loaded from the database
43 http://geoanalytics.net
Next steps of the analysis
P t th l ifi d t j t i i d t b t bl
- Put the unclassified trajectories in a new database table
- Select a subset (sample) from these trajectories
A l l t i t th l
Repeat until no interesting
- Apply clustering to the sample
- Build a classifier for the discovered clusters
no interesting clusters can be found
- Apply the classifier to the table with the unclassified trajectories
- Observations:
- the number and sizes of the discovered
clusters decrease with each round
Round Number of clusters found in the subset Maximum cluster size in the subset Maximum cluster size in the database 1 28 37 87 74 1525 2 23 22 488
clusters decrease with each round
- smaller clusters are cleaner
less editing of the classifier is needed; clusters are represented by fewer prototypes in the classifier
3 17 27 418 4 18 16 289
44
clusters are represented by fewer prototypes in the classifier classification of the whole DB takes less time
http://geoanalytics.net
Important properties of the method
S i ti ti b t h d t
- Synergistic cooperation between human and computer
- Computer finds clusters, summarizes, visualizes, makes draft classifiers
Human not only takes the results but also directs computer’s work:
- Human not only takes the results but also directs computer s work:
selects appropriate data subsets, finds suitable clustering parameters; edits the suggested draft classifiers (involves interpretation, evaluation, adaptation to the goals of analysis)
- Scalability to very large datasets
Doc mented VA process all operations are logged an annotated report is generated
- Documented VA process: all operations are logged; an annotated report is generated
(as a collection of HTML pages and PNG images)
- Tangible analysis result: the classifier
g y
- Describes the clusters without listing all their members
- May be applied to other data sets and to data streams
45
- It is possible to adapt this method to other types of data and problems
http://geoanalytics.net
Analyzing trajectory attributes in clusters
Th t j t ll
- The trajectory wall.
Details:
- C.Tominski, H.Schumann, G.Andrienko, N.Andrienko
Stacking-Based Visualization of Trajectory Attribute Data IEEE Transactions on Visualization and Computer Graphics (Proceedings IEEE Information Visualization 2012),
- vol. 18(12), pp.???, Dec. 2012
46 http://geoanalytics.net
Trajectory attributes: Space-Time Cube ?
47 http://geoanalytics.net
Time ordering
48
J i k i h C T i ki & H S h U i R k
http://geoanalytics.net
Joint work with C.Tominski & H.Schumann, Univ. Rostock details: http://dx.doi.org/10.1109/TVCG.2012.265
Traffic jam patterns in 4,000+ traj, 7 days
49 http://geoanalytics.net
50 http://geoanalytics.net
t t it
- tortuosity
51 http://geoanalytics.net
Radiation between Fukushima & Tokyo
t t it
- tortuosity
52 http://geoanalytics.net
Reminder
Si il it f t j t i
- Similarity measures for trajectories
- Density-based clustering of trajectories
U i S Ti C b f i t ti l t
- Using Space-Time Cube for interpreting clusters
- Progressive clustering
- Using projection for assessing clustering results
Using projection for assessing clustering results
- Sammon’s projection for trajectory clustering
- Clustering of large trajectory data sets
Clustering of large trajectory data sets
- Analysis of trajectory attributes in clusters
53 http://geoanalytics.net
W t t l ? htt // l ti t Want to learn more? http://geoanalytics.net
Papers, presentations, demonstrators, ICA GeoVisualization community, …
- Key reviews:
- 1. Space, time and visual analytics
http://dx.doi.org/10.1080/13658816.2010.508043 (IJGIS 2010) http://dx.doi.org/10.1080/13658816.2010.508043 (IJGIS 2010)
- 2. A conceptual framework and taxonomy of techniques for analyzing movement
http://dx.doi.org/10.1016/j.jvlc.2011.02.003 (JVLC 2011)
- 3. An event-based conceptual model for context-aware movement analysis
http://dx.doi.org/10.1080/13658816.2011.556120 (IJGIS 2011)
- 4. Visual Analytics of Movement: an Overview of Methods, Tools,
y , , and Procedures http://geoanalytics.net/and/papers/ivs12.pdf (IVS 2012) 5 Exploratory Analysis of Spatial and Temporal Data
- 5. Exploratory Analysis of Spatial and Temporal Data.
A Systematic Approach (Springer, 2006) ISBN 978-3-540-25994-7
54
- Software:
- http://geoanalytics.net/V-Analytics
http://geoanalytics.net