Trajectory Clustering: Visual Analytics Approaches Gennady Andrienko - - PowerPoint PPT Presentation

trajectory clustering visual analytics approaches
SMART_READER_LITE
LIVE PREVIEW

Trajectory Clustering: Visual Analytics Approaches Gennady Andrienko - - PowerPoint PPT Presentation

Trajectory Clustering: Visual Analytics Approaches Gennady Andrienko & Natalia Andrienko y http://geoanalytics.net Outline Similarity measures for trajectories Si il it f t j t i Density-based clustering of trajectories


slide-1
SLIDE 1

Trajectory Clustering: Visual Analytics Approaches

Gennady Andrienko & Natalia Andrienko y http://geoanalytics.net

slide-2
SLIDE 2

Outline

Si il it f t j t i

  • Similarity measures for trajectories
  • Density-based clustering of trajectories

U i S Ti C b f i t ti l t

  • Using Space-Time Cube for interpreting clusters
  • Progressive clustering
  • Using projection for assessing clustering results

Using projection for assessing clustering results

  • Sammon’s projection for trajectory clustering
  • Clustering of large trajectory data sets

Clustering of large trajectory data sets

  • Analysis of trajectory attributes in clusters

2 http://geoanalytics.net

slide-3
SLIDE 3

Projection

Set of objects Projection space Set of objects Projection space

Basic idea: similar objects  close points; dissimilar objects  distant points

3

This requires a numeric measure of (dis)similarity

http://geoanalytics.net

slide-4
SLIDE 4

Measuring (dis)similarity

P Approach 1: use feature vectors

  • Purpose
  • Material
  • Colour
  • Approach 1: use feature vectors
  • Describe objects by values of N

numeric attributes (features) chosen according to the analysis goals

  • Colour
  • Size
  • Shape

according to the analysis goals

  • Feature vector == list of N attribute

values

  • Weight
  • Producer
  • Dissimilarity == distance between the

vectors in the N-dimensional abstract space of the possible combinations of the attribute values (e g Euclidean

the attribute values (e.g. Euclidean distance)

  • However, objects may have complex

properties that cannot be adequately properties that cannot be adequately represented by numeric features

  • Approach 2: devise an ad-hoc

distance function

4

distance function

 i.e. algorithm to measure dissimilarity

http://geoanalytics.net

slide-5
SLIDE 5

Using feature vectors

Cl t i f t bl d t Whi h t j t tt ib t t ?

  • Clustering of table data

 standard clustering methods

  • Which trajectory attributes to use?
  • N points
  • Duration

Travelled distance

  • Travelled distance
  • Displacement
  • Direction
  • Sinuosity
  • Tortuosity
  • Average / Median / Max speed
  • N intermediate stops
  • Total / Max duration of intermediate stops
  • How to avoid redundancy?
  • How to normalize the selected attributes?
  • Which clustering method to apply?

5

  • How many clusters are expected?

http://geoanalytics.net

slide-6
SLIDE 6

Trajectories are complex geographical objects:

  • Trajectory: sequence (t1,s1), (t2,s2), … (ti – time moment, si – position in

) space)

  • Similarity: routes (ignoring time), dynamics (relative time), coincidence

(absolute time) – no adequate representation by feature vectors

6

  • Complexities: different sequence lengths, different time spacing,

measurement errors and gaps, …

http://geoanalytics.net

slide-7
SLIDE 7

Distance functions for trajectories

A lib f il d t d bl di t f ti

  • A library of easily understandable distance functions
  • riented to different properties
  • More sophisticated distance functions are available

More sophisticated distance functions are available through loose integration with HERMES TDW

  • N.Pelekis, G.Andrienko, N.Andrienko, I.Kopanakis, G.Marketos,

Y.Theodoridis Visually Exploring Movement Data via Similarity-based Analysis Journal of Intelligent Information Systems, 2012, v.38 (2), pp.343-391 http://dx.doi.org/10.1007/s10844-011-0159-2

7 http://geoanalytics.net

slide-8
SLIDE 8

Example of specific distance function: route similarity

Fi d di i t i t

  • Finds corresponding points in two

trajectories

  • Computes the average distance

Computes the average distance between the corresponding points

  • Accumulates the length of the

di t corresponding parts

  • Accumulates the deviations of non-

corresponding points corresponding points

  • Penalty factor = (cumulative deviation) /

(corresponding length)

  • Penalty distance = (cumulative

deviation) * (penalty factor) Fi l di t di t

8

  • Final distance = average distance +

penalty distance

http://geoanalytics.net

slide-9
SLIDE 9

Density-based Clustering Algorithm OPTICS

Ankerst M Breunig M Kriegel H -P & Sander J (1999) OPTICS: Ordering Points to Identify the Clustering Ankerst, M., Breunig, M., Kriegel, H. P., & Sander, J. (1999). OPTICS: Ordering Points to Identify the Clustering

  • Structure. Proceedings of ACM SIGMOD International Conference on Management of Data (SIGMOD'99).

Philadelphia: ACM, 49-60

The algorithm assigns a reachability distance to each The algorithm assigns a reachability distance to each

  • bject. The first object is chosen randomly. At each

following step, the object with the smallest reachability distance to the previously chosen objects is selected. y j

Reachability plot The order of choosing the objects Eps: a distance threshold MinPts: minimum number of neighbours for MinPts: minimum number of neighbours for a core object of a cluster

  • : a core object (has MinPts neighbours)

r(p1), r(p2): reachability distances of p1 and

9

(p1), (p2) y p1 p2

Cluster 1 Cluster 2

Property of the algorithm: separates clusters from “noise”, typical from peculiar

http://geoanalytics.net

slide-10
SLIDE 10

Clustering of trajectories

10 http://geoanalytics.net

slide-11
SLIDE 11

Spatial summarization of trajectory clusters

D t il Details: N.Andrienko, G.Andrienko Spatial Generalization and Aggregation

  • f Massive Movement Data
  • f Massive Movement Data

IEEE Transactions on Visualization and Computer Graphics (TVCG), 2011, v.17 (2), pp.205-219 http://doi ieeecomputersociety org/10 1109 http://doi.ieeecomputersociety.org/10.1109 /TVCG.2010.44

11 http://geoanalytics.net

slide-12
SLIDE 12

Space-time cube for cluster interpretation?

12 http://geoanalytics.net

slide-13
SLIDE 13

Time transformation in space-time cube

T f ti ith t t t l l hi h i l d

  • Transformations with respect to temporal cycles, which include
  • bringing the times of the trajectories to the same year or season,

the same month

  • the same month,
  • week,
  • day,

y,

  • Hour
  • Transformations with respect to the individual lifelines of the trajectories, which

include

  • bringing the trajectories to a common start moment,

a common end moment

  • a common end moment,
  • common start and end moments

13 http://geoanalytics.net

slide-14
SLIDE 14

Transformations with respect to temporal cycles: days

14 http://geoanalytics.net

slide-15
SLIDE 15

Transformations with respect to temporal cycles: weeks

15 http://geoanalytics.net

slide-16
SLIDE 16

Transformations with respect to individual lifelines

16 http://geoanalytics.net

slide-17
SLIDE 17

Progressive clustering

C bi f ti b l i t lt f th

  • Combine functions by applying one to results of another
  • For example, cluster trajectories by common starts and ends, continue with route

similarity {& dynamics} for clusters of interest y { y }

  • The approach improves computational complexity

by applying complex functions to the results of cheap functions

  • Each step brings easily interpretable results;

following steps refine the results and our knowledge Details:

  • Details:
  • S.Rinzivillo, D.Pedreschi, M.Nanni, F.Giannotti, N.Andrienko, G.Andrienko

Visually–driven analysis of movement data by progressive clustering Information Visualization 2008 v 7 (3/4) pp 225-239 Information Visualization, 2008, v.7 (3/4), pp. 225 239 http://dx.doi.org/10.1057/palgrave.ivs.9500183

17 http://geoanalytics.net

slide-18
SLIDE 18

Projection techniques

P i i l t l i (PCA) M lti di i l li (MDS)

  • Principal components analysis (PCA)
  • Self-organizing map (SOM) – projection
  • nto a discrete space (regular grid)
  • Multi-dimensional scaling (MDS)
  • Sammon’s projection (a.k.a. Sammon’s

mapping)

  • nto a discrete space (regular grid)
  • Require input in form of feature vectors

mapping)

  • Can be applied to a pre-defined
  • Require input in form of feature vectors
  • Can be applied to a pre defined

distance matrix, which can be computed using an arbitrary distance function S it bl f l bj t Suitable for complex objects requiring specific distance functions

18 http://geoanalytics.net

slide-19
SLIDE 19

Use of Sammon’s projection to explore clustering results

19 http://geoanalytics.net

slide-20
SLIDE 20

Density-based clusters of trajectories by route similarity

20 http://geoanalytics.net

slide-21
SLIDE 21

Exploring clustering results

Are the clusters well separated? Are the clusters well-separated? Are they compact? How different are the clusters? Ho far are the cl sters from the noise? How far are the clusters from the noise? How sensitive are the results to the parameters?

21 http://geoanalytics.net

slide-22
SLIDE 22

Exploring selected clusters

22 http://geoanalytics.net

slide-23
SLIDE 23

Use of projection to define clusters

23 http://geoanalytics.net

slide-24
SLIDE 24

Tessellation of the projection area

E h l d fi

  • Each polygon defines a

cluster

24 http://geoanalytics.net

slide-25
SLIDE 25

Choosing cluster sizes

The user may interactively change the sizes (radii) of the clusters.

25

y y g ( ) Further possible extension: interactive refinement or joining of selected clusters by direct manipulation in the projection display.

http://geoanalytics.net

slide-26
SLIDE 26

Assigning colours to clusters

26 http://geoanalytics.net

slide-27
SLIDE 27

Clusters of trajectories defined by means of projection

27 http://geoanalytics.net

slide-28
SLIDE 28

Progressive clustering with the use of projection

28

Step 1 Step 2

http://geoanalytics.net

slide-29
SLIDE 29

Results of step 2

29 http://geoanalytics.net

slide-30
SLIDE 30

Clustering of a very large dataset

P bl l t i f l bj t ( h t j t i ) i l i t i i l

  • Problem: clustering of complex objects (such as trajectories) involving non-trivial

distance functions (such as “route similarity”) can only be done in RAM, i.e. for a relatively small dataset

  • Our approach:
  • 1. Take a subset (sample) of the objects suitable for processing in RAM.
  • 2. Discover clusters in the subset.
  • 3. Load the remaining objects into RAM by portions.

Classify each object = identify to which of the discovered clusters the object Classify each object identify to which of the discovered clusters the object belongs. Store the result of the classification in the database. 4 Take the objects that remained unclassified and apply steps 1 to 3 to them

  • 4. Take the objects that remained unclassified and apply steps 1 to 3 to them.

Repeat the procedure until no meaningful new clusters can be discovered.

  • Question: how to identify the cluster where an object belongs?

30

G.Andrienko, N.Andrienko, S.Rinzivillo, M.Nanni, D.Pedreschi, F.Giannotti Interactive Visual Clustering of Large Collections of Trajectories IEEE Visual Analytics Science and Technology (VAST 2009) Proceedings IEEE Computer Society Press 2009 pp 3 10

http://geoanalytics.net

Proceedings, IEEE Computer Society Press, 2009, pp.3-10 http://dx.doi.org/10.1109/VAST.2009.5332584

slide-31
SLIDE 31

Classifier, the main idea

From each cl ster C select one or more representati e objects (protot pes) and

  • From each cluster Ci select one or more representative objects (prototypes) and

respective distance thresholds: { (pt1, d1), …, (ptn, dn) } such that oCi k, 1kn: distance (o, ptk) < dk Th t f ll l t t t ith th ti di t th h ld d fi

  • The set of all cluster prototypes with the respective distance thresholds defines

the classifier

  • A new object o may be ascribed to the cluster if the same condition holds for it.

 For each object from a large database:

  • compute the distances to all prototypes;
  • take the closest prototype among those with the distances below the thresholds and

take the closest prototype among those with the distances below the thresholds and ascribe the object to the respective cluster;

  • if no such prototypes found, label the object as unclassified.
  • To select prototypes:

To select prototypes:

 Divide the cluster into “round” subclusters  Take the medoid of each subcluster as one of the prototypes

31

 Take the maximum of the distances from the subcluster medoid to the subcluster members as the distance threshold for this prototype

http://geoanalytics.net

slide-32
SLIDE 32

Illustration of the idea on point data: represent a cluster by Illustration of the idea on point data: represent a cluster by round sub-clusters and choose prototypes & radii

32 http://geoanalytics.net

slide-33
SLIDE 33

Division of a cluster into “round” subclusters

33 http://geoanalytics.net

slide-34
SLIDE 34

34

Is this all that is needed for a good analysis?

http://geoanalytics.net

slide-35
SLIDE 35

To obtain meaningful results, the analyst needs to review and, possibly, edit the classifier

Some trajectories are not very similar to the others. Should such trajectories be in the cluster? Should I keep the three branches in

  • ne cluster?
  • ne cluster?

Or should I divide the cluster into two or three Is it good to have this prototype? This is not a core trajectory of the

35

into two or three clusters? cluster.

http://geoanalytics.net

slide-36
SLIDE 36

Interactive editing of the classifier

36 http://geoanalytics.net

slide-37
SLIDE 37

Example of interactive editing: remove selected objects from a cluster

37 http://geoanalytics.net

slide-38
SLIDE 38

Example of interactive editing: move selected subclusters to a new cluster

38 http://geoanalytics.net

slide-39
SLIDE 39

Example of interactive editing: merge and refine subclusters

39 http://geoanalytics.net

slide-40
SLIDE 40

Application of the classifier to the whole database

40 http://geoanalytics.net

slide-41
SLIDE 41

Application of the classifier to the whole database: results

41 http://geoanalytics.net

slide-42
SLIDE 42

42 http://geoanalytics.net

slide-43
SLIDE 43

Selected clusters loaded from the database

43 http://geoanalytics.net

slide-44
SLIDE 44

Next steps of the analysis

P t th l ifi d t j t i i d t b t bl

  • Put the unclassified trajectories in a new database table
  • Select a subset (sample) from these trajectories

A l l t i t th l

Repeat until no interesting

  • Apply clustering to the sample
  • Build a classifier for the discovered clusters

no interesting clusters can be found

  • Apply the classifier to the table with the unclassified trajectories
  • Observations:
  • the number and sizes of the discovered

clusters decrease with each round

Round Number of clusters found in the subset Maximum cluster size in the subset Maximum cluster size in the database 1 28  37 87  74 1525 2 23 22 488

clusters decrease with each round

  • smaller clusters are cleaner

 less editing of the classifier is needed;  clusters are represented by fewer prototypes in the classifier

3 17 27 418 4 18 16 289

44

 clusters are represented by fewer prototypes in the classifier  classification of the whole DB takes less time

http://geoanalytics.net

slide-45
SLIDE 45

Important properties of the method

S i ti ti b t h d t

  • Synergistic cooperation between human and computer
  • Computer finds clusters, summarizes, visualizes, makes draft classifiers

Human not only takes the results but also directs computer’s work:

  • Human not only takes the results but also directs computer s work:

 selects appropriate data subsets, finds suitable clustering parameters;  edits the suggested draft classifiers (involves interpretation, evaluation, adaptation to the goals of analysis)

  • Scalability to very large datasets

Doc mented VA process all operations are logged an annotated report is generated

  • Documented VA process: all operations are logged; an annotated report is generated

(as a collection of HTML pages and PNG images)

  • Tangible analysis result: the classifier

g y

  • Describes the clusters without listing all their members
  • May be applied to other data sets and to data streams

45

  • It is possible to adapt this method to other types of data and problems

http://geoanalytics.net

slide-46
SLIDE 46

Analyzing trajectory attributes in clusters

Th t j t ll

  • The trajectory wall.

Details:

  • C.Tominski, H.Schumann, G.Andrienko, N.Andrienko

Stacking-Based Visualization of Trajectory Attribute Data IEEE Transactions on Visualization and Computer Graphics (Proceedings IEEE Information Visualization 2012),

  • vol. 18(12), pp.???, Dec. 2012

46 http://geoanalytics.net

slide-47
SLIDE 47

Trajectory attributes: Space-Time Cube ?

47 http://geoanalytics.net

slide-48
SLIDE 48

Time  ordering

48

J i k i h C T i ki & H S h U i R k

http://geoanalytics.net

Joint work with C.Tominski & H.Schumann, Univ. Rostock details: http://dx.doi.org/10.1109/TVCG.2012.265

slide-49
SLIDE 49

Traffic jam patterns in 4,000+ traj, 7 days

49 http://geoanalytics.net

slide-50
SLIDE 50

50 http://geoanalytics.net

slide-51
SLIDE 51

t t it

  • tortuosity

51 http://geoanalytics.net

slide-52
SLIDE 52

Radiation between Fukushima & Tokyo

t t it

  • tortuosity

52 http://geoanalytics.net

slide-53
SLIDE 53

Reminder

Si il it f t j t i

  • Similarity measures for trajectories
  • Density-based clustering of trajectories

U i S Ti C b f i t ti l t

  • Using Space-Time Cube for interpreting clusters
  • Progressive clustering
  • Using projection for assessing clustering results

Using projection for assessing clustering results

  • Sammon’s projection for trajectory clustering
  • Clustering of large trajectory data sets

Clustering of large trajectory data sets

  • Analysis of trajectory attributes in clusters

53 http://geoanalytics.net

slide-54
SLIDE 54

W t t l ? htt // l ti t Want to learn more? http://geoanalytics.net

Papers, presentations, demonstrators, ICA GeoVisualization community, …

  • Key reviews:
  • 1. Space, time and visual analytics

http://dx.doi.org/10.1080/13658816.2010.508043 (IJGIS 2010) http://dx.doi.org/10.1080/13658816.2010.508043 (IJGIS 2010)

  • 2. A conceptual framework and taxonomy of techniques for analyzing movement

http://dx.doi.org/10.1016/j.jvlc.2011.02.003 (JVLC 2011)

  • 3. An event-based conceptual model for context-aware movement analysis

http://dx.doi.org/10.1080/13658816.2011.556120 (IJGIS 2011)

  • 4. Visual Analytics of Movement: an Overview of Methods, Tools,

y , , and Procedures http://geoanalytics.net/and/papers/ivs12.pdf (IVS 2012) 5 Exploratory Analysis of Spatial and Temporal Data

  • 5. Exploratory Analysis of Spatial and Temporal Data.

A Systematic Approach (Springer, 2006) ISBN 978-3-540-25994-7

54

  • Software:
  • http://geoanalytics.net/V-Analytics

http://geoanalytics.net