Analyse et fouille de donnes de trajectoires dobjets mobiles Thse - - PowerPoint PPT Presentation

analyse et fouille de donn es de trajectoires d objets
SMART_READER_LITE
LIVE PREVIEW

Analyse et fouille de donnes de trajectoires dobjets mobiles Thse - - PowerPoint PPT Presentation

Analyse et fouille de donnes de trajectoires dobjets mobiles Thse prsente et soutenue publiquement par Mohamed Khalil EL MAHRSI 30 septembre 2013 Devant le jury compos de : M. Talel ABDESSALEM (Prsident du jury) Mme Barbara


slide-1
SLIDE 1

Analyse et fouille de données de trajectoires d’objets mobiles

Thèse présentée et soutenue publiquement par

Mohamed Khalil EL MAHRSI

30 septembre 2013 Devant le jury composé de :

  • M. Talel ABDESSALEM

(Président du jury) Mme Barbara HAMMER (Rapporteur) Mme Karine ZEITOUNI (Rapporteur)

  • M. Pierre BORGNAT

(Examinateur)

  • M. Etienne CÔME

(Examinateur)

  • M. Ludovic DENOYER

(Examinateur)

  • M. Cédric DU MOUZA

(Examinateur)

  • M. Fabrice ROSSI

(Directeur de thèse)

slide-2
SLIDE 2

The Traffic Congestion Problem

Traffic congestion and road jams

Frustrating travel delays Economical losses Environmental damage

Countermeasures are needed

Infrastructure improvement Prohibiting/favoring specific routes

Based on the analysis of drivers’ behavior

Context and Motivations 1 / 41

slide-3
SLIDE 3

How is Road Traffic Monitored?

Traffic counters/recorders

Expensive Partially deployed Count traffic on their local section

Consequences:

Incomplete vision of traffic A valuable information is missed: vehicles’ identities

Context and Motivations 2 / 41

slide-4
SLIDE 4

Main Motivation: Trajectory Analysis as a Complement?

Why not collect the trajectories of vehicles moving on the road network...

Many fleet management companies already do this Commuters can contribute their trajectories

Context and Motivations 3 / 41

slide-5
SLIDE 5

Main Motivation: Trajectory Analysis as a Complement?

... and analyze them to discover

Groups of vehicles that followed the same routes Groups of roads that are often traveled together during a considerable number of commutes Etc.

Context and Motivations 4 / 41

slide-6
SLIDE 6

But...

Modern devices can sample their positions at high rates

At such rates, the data are inherently redundant

Transmitting and storing the entirety of the trajectories are impractical

Important space requirements Computational overheads

We have to intelligently reduce the size of the data

T1 T1 T2 T2 T3 T3 T4 T4

Context and Motivations 5 / 41

slide-7
SLIDE 7

Research Problems Explored in this Thesis

Main objective:

Clustering Trajectory Data in Road Network Environments

How to discover meaningful groupings of “similar” trajectories and road segments in the specific context of road networks? But first, a small detour:

Sampling Trajectory Data Streams

How to reduce the size of trajectory data streams while trying to preserve the most of their spatiotemporal features?

Context and Motivations 6 / 41

slide-8
SLIDE 8

Outline

1 Context and Motivations 2 Sampling Trajectory Data Streams 3 Graph-Based Clustering of Network-Constrained Trajectory Data 4 Co-Clustering Network-Constrained Trajectory Data 5 Conclusions, Future Work and Open Issues

slide-9
SLIDE 9

Outline

1 Context and Motivations 2 Sampling Trajectory Data Streams 3 Graph-Based Clustering of Network-Constrained Trajectory Data 4 Co-Clustering Network-Constrained Trajectory Data 5 Conclusions, Future Work and Open Issues

slide-10
SLIDE 10

Anatomy of a Trajectory Data Stream

(Raw) Trajectory

A trajectory T is a series of discrete, timestamped positions: T = id, {P1(t1, x1, y1), P2(t2, x2, y2), ..., Pi(ti, xi, yi), ...} id: identifier ti: timestamp (time of capture) (xi, yi): coordinates (in the Euclidean space)

P1(t1, x1, y1) P2(t2, x2, y2) P3(t3, x3, y3) P4(t4, x4, y4) P5(t5, x5, y5) P6(t6, x6, y6) P7(t7, x7, y7)

Figure : Illustration of a raw trajectory

Sampling Trajectory Data Streams 7 / 41

slide-11
SLIDE 11

Anatomy of a Trajectory Data Stream

(Raw) Trajectory

A trajectory T is a series of discrete, timestamped positions: T = id, {P1(t1, x1, y1), P2(t2, x2, y2), ..., Pi(ti, xi, yi), ...} id: identifier ti: timestamp (time of capture) (xi, yi): coordinates (in the Euclidean space) Interpolation is used to approximate missing positions

P1(t1, x1, y1) P2(t2, x2, y2) P3(t3, x3, y3) P4(t4, x4, y4) P5(t5, x5, y5) P6(t6, x6, y6) P7(t7, x7, y7)

Figure : Illustration of a linearly-interpolated trajectory

Sampling Trajectory Data Streams 7 / 41

slide-12
SLIDE 12

Problem Formulation, Objectives, and Constraints

Compressed (Sampled) Trajectory

Given a trajectory T, a compressed trajectory TC of T is a subset

  • f the original points forming T, such as:

TC covers T from start to finish ∀Pi ∈ TC, Pi ∈ T Objectives

Reduce data size (obviously) Small, preferably configurable approximation errors

Constraints

On-the-fly processing Low computational complexity Low in-memory complexity

Sampling Trajectory Data Streams 8 / 41

slide-13
SLIDE 13

Previous Work

Classic sampling techniques are inadequate

They overlook the spatiotemporal properties of the trajectories

Two types of trajectory oriented sampling techniques

Configurable approximation errors but high complexity Low complexity but no guarantees for approximation errors

To the best of our knowledge: no approaches combining low complexity and configurable approximation errors

Sampling Trajectory Data Streams 9 / 41

slide-14
SLIDE 14

The Spatiotemporal Stream Sampling (STSS) Algorithm

[El Mahrsi et al., 2010]

Intuition: use linear prediction to guess forthcoming positions The accuracy of the prediction (w.r.t. a threshold dThres) guides the sampling process

Pi(ti,xi,yi) Pj(tj,xj,yj) Pk(tk,xk,yk) Pk (tk,xk ,yk ) Distance(Pk, Pk )

Figure : Linear prediction of incoming positions

Sampling Trajectory Data Streams 10 / 41

slide-15
SLIDE 15

STSS: How it Works

P1 Legend: real trajectory sampled trajectory prediction

Figure : Illustration of the functioning of the STSS algorithm

Sampling Trajectory Data Streams 11 / 41

slide-16
SLIDE 16

STSS: How it Works

P1 Legend: real trajectory sampled trajectory prediction

Figure : Illustration of the functioning of the STSS algorithm

Sampling Trajectory Data Streams 11 / 41

slide-17
SLIDE 17

STSS: How it Works

P1 P2 Legend: real trajectory sampled trajectory prediction

Figure : Illustration of the functioning of the STSS algorithm

Sampling Trajectory Data Streams 11 / 41

slide-18
SLIDE 18

STSS: How it Works

P1 P2 Legend: real trajectory sampled trajectory prediction

Figure : Illustration of the functioning of the STSS algorithm

Sampling Trajectory Data Streams 11 / 41

slide-19
SLIDE 19

STSS: How it Works

P1 P2 P3 P3 Distance(P3, P3 ) dThres Legend: real trajectory sampled trajectory prediction

Figure : Illustration of the functioning of the STSS algorithm

Sampling Trajectory Data Streams 11 / 41

slide-20
SLIDE 20

STSS: How it Works

P1 P3 Legend: real trajectory sampled trajectory prediction

Figure : Illustration of the functioning of the STSS algorithm

Sampling Trajectory Data Streams 11 / 41

slide-21
SLIDE 21

STSS: How it Works

P1 P3 P4 P4 Legend: real trajectory sampled trajectory prediction

Figure : Illustration of the functioning of the STSS algorithm

Sampling Trajectory Data Streams 11 / 41

slide-22
SLIDE 22

STSS: How it Works

P1 P4 Legend: real trajectory sampled trajectory prediction

Figure : Illustration of the functioning of the STSS algorithm

Sampling Trajectory Data Streams 11 / 41

slide-23
SLIDE 23

STSS: How it Works

P1 P4 P5 P5 Distance(P5, P5 ) > dThres Legend: real trajectory sampled trajectory prediction

Figure : Illustration of the functioning of the STSS algorithm

Sampling Trajectory Data Streams 11 / 41

slide-24
SLIDE 24

STSS: How it Works

P1 P4 P5 Legend: real trajectory sampled trajectory prediction

Figure : Illustration of the functioning of the STSS algorithm

Sampling Trajectory Data Streams 11 / 41

slide-25
SLIDE 25

STSS: How it Works

P1 P4 P8 P9 P9 Legend: real trajectory sampled trajectory prediction

Figure : Illustration of the functioning of the STSS algorithm

Sampling Trajectory Data Streams 11 / 41

slide-26
SLIDE 26

STSS: How it Works

P1 P4 P8 P9 Legend: real trajectory sampled trajectory prediction

Figure : Illustration of the functioning of the STSS algorithm

Sampling Trajectory Data Streams 11 / 41

slide-27
SLIDE 27

STSS in Action

440000 441000 442000 443000 5285000 5287000 5289000 x(m) y(m)

(a) Original trajectory

(228 points)

440000 441000 442000 443000 5285000 5287000 5289000 x(m) y(m)

(b) Tolerated error: 10m

(117 points|comp. ratio: 1.9:1)

440000 441000 442000 443000 5285000 5287000 5289000 x(m) y(m)

(c) Tolerated error: 50m

(72 points|comp. ratio: 3.2:1)

440000 441000 442000 443000 5285000 5287000 5289000 x(m) y(m)

(d) Tolerated error: 100m

(49 points|comp. ratio: 4.6:1)

440000 441000 442000 443000 5285000 5287000 5289000 x(m) y(m)

(e) Tolerated error: 150m

(40 points|comp. ratio: 5.7:1)

440000 441000 442000 443000 5285000 5287000 5289000 x(m) y(m)

(f) Tolerated error: 200m

(32 points|comp. ratio: 7.1:1)

Figure : Example of a trajectory sampled with different error tolerances

Sampling Trajectory Data Streams 12 / 41

slide-28
SLIDE 28

STSS: Properties

Single-pass, on-the-fly algorithm Linear computational complexity Constant in-memory complexity Easy to configure (only one parameter) Guaranteed upper bound for compression errors

Sampling Trajectory Data Streams 13 / 41

slide-29
SLIDE 29

Experimental Results: Comparison with TD-TR and OPW-TR [Meratnia and de By, 2004]

Dataset

5263 trajectories 367691 data points (1 position/15 sec)

The competition

TD-TR: offline, recursive partitioning, quadratic complexity OPW-TR: on-the-fly, opening window, quadratic complexity

Evaluation criteria

Percentage of retained data = size of the output data size of the input data Approximation error (distance between real points and their approximation)

Sampling Trajectory Data Streams 14 / 41

slide-30
SLIDE 30

Experimental Results: Percentage of Retained Data

20 40 60 80 100 20 40 60 80 100 Theoretical error bound (m) Retained data (%) STSS TD-TR OPW-TR

Figure : Percentages of retained data achieved by STSS, TD-TR and OPW-TR for different error tolerances

Sampling Trajectory Data Streams 15 / 41

slide-31
SLIDE 31

Experimental Results: Approximation Errors

Figure : Distribution of the approximation errors resulting from applying STSS, TD-TR and OPW-TR for different error tolerances

Sampling Trajectory Data Streams 16 / 41

slide-32
SLIDE 32

Outline

1 Context and Motivations 2 Sampling Trajectory Data Streams 3 Graph-Based Clustering of Network-Constrained Trajectory Data 4 Co-Clustering Network-Constrained Trajectory Data 5 Conclusions, Future Work and Open Issues

slide-33
SLIDE 33

Existing Work on Trajectory Clustering

Two main research areas

Distance and similarity measures Clustering algorithms

In both areas

For trajectories moving freely in a Euclidean space For network-constrained trajectories

Observations on existing trajectory clustering techniques

Density-based clustering Flat clustering A promising new trend: graph-based analysis [Guo et al., 2010]

T1 T2 T3

Figure : Effect of the underlying network on trajectory similarity

Graph-Based Clustering of Network-Constrained Trajectory Data 17 / 41

slide-34
SLIDE 34

Existing Work on Trajectory Clustering

Two main research areas

Distance and similarity measures Clustering algorithms

In both areas

For trajectories moving freely in a Euclidean space For network-constrained trajectories

Observations on existing trajectory clustering techniques

Density-based clustering Flat clustering A promising new trend: graph-based analysis [Guo et al., 2010]

T1 T2 T3

Figure : Effect of the underlying network on trajectory similarity

Graph-Based Clustering of Network-Constrained Trajectory Data 17 / 41

slide-35
SLIDE 35

Data Representation: Road Network

Road Network

The road network is represented as a directed graph G = (V, S) Vertices (V): intersections and terminal points Edges (S): road segments (with travel direction)

v1 v2 v3 v4 v5 v1 v3 v4 v2 v5 s1 s2 s3 s4 s5 s6 s7 s8 s9

Figure : A road network and its graph representation

Graph-Based Clustering of Network-Constrained Trajectory Data 18 / 41

slide-36
SLIDE 36

Data Representation: Trajectories

(Network-Constrained) Trajectory

A trajectory T is represented symbolically, as the sequence of traveled road segments: T = id, {s1, s2, ..., sl} ∀1 ≤ i < l, si and si+1 are connected

s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s11 s12 s13 s14 T1 T2 T3 T1 = {s1, s7, s11, s12, s13} T2 = {s1, s4, s3} T3 = {s10, s11, s8, s5, s6}

Figure : Example of three trajectories moving on a road network

Graph-Based Clustering of Network-Constrained Trajectory Data 19 / 41

slide-37
SLIDE 37

Measuring the Similarity Between Trajectories

[El Mahrsi and Rossi, 2012a, El Mahrsi and Rossi, 2012c]

Cosine similarity is used to measure the resemblance between trajectories Similarity(Ti, Tj) = Ti · Tj ||Ti|| ||Tj|| =

  • s∈S ωs,Ti × ωs,Tj
  • s∈S ω2

s,Ti ×

  • s∈S ω2

s,Tj

Road segments are weighted based on:

Their spatial length Their frequency in the set of trajectories T

ωs,T = ns,T × length(s)

  • s′∈T ns′,T × length(s′) × log

|T | |{Ti : s ∈ Ti}|

Graph-Based Clustering of Network-Constrained Trajectory Data 20 / 41

slide-38
SLIDE 38

Trajectory Similarity Graph

A weighted graph GT (T , ET , WT ) is used to model relationships between trajectories

s2 s1 s3 s4 s6 s8 s5 s7 s9 T4 T2 T3 T1

T1 T2 T3 T4

T5

T5

Similarity(T1, T3)

Figure : Example of a trajectory similarity graph

Graph-Based Clustering of Network-Constrained Trajectory Data 21 / 41

slide-39
SLIDE 39

Clustering the Similarity Graph

We used an implementation of the algorithm in [Noack and Rotta, 2009]

Based on modularity optimization [Newman, 2006] Greedy hierarchical agglomerative clustering Combined with multi-level refinement

Input: trajectory similarity graph Output: a hierarchy of nested vertex (trajectory) clusters

Graph-Based Clustering of Network-Constrained Trajectory Data 22 / 41

slide-40
SLIDE 40

Case Study: The Data

(a) 14 trajectories (b) 19 trajectories (c) 20 trajectories (d) 20 trajectories (e) 12 trajectories

Figure : The case study dataset is formed of 85 artificial trajectories divided into 5 pre-established and interacting clusters

Graph-Based Clustering of Network-Constrained Trajectory Data 23 / 41

slide-41
SLIDE 41

Case Study: Hierarchy of Trajectory Clusters

Dataset (85 trajectories) Cluster 1 (39 trajectories) Cluster 2 (14 trajectories) Cluster 3 (32 trajectories) Cluster 4 (12 trajectories) Cluster 5 (19 trajectories) Cluster 6 (8 trajectories) Cluster 7 (7 trajectories) Cluster 8 (3 trajectories) Cluster 9 (4 trajectories) Cluster 10 (12 trajectories) Cluster 11 (20 trajectories) Cluster 12 (3 trajectories) Cluster 13 (9 trajectories)

Figure : Hierarchy of trajectory clusters discovered through graph-based clustering

Graph-Based Clustering of Network-Constrained Trajectory Data 24 / 41

slide-42
SLIDE 42

Case Study: High Level Trajectory Clusters

(a) Cluster 1 (39 trajectories) (b) Cluster 2 (14 trajectories) (c) Cluster 3 (32 trajectories)

Figure : Trajectory clusters in the highest level of hierarchy

Graph-Based Clustering of Network-Constrained Trajectory Data 25 / 41

slide-43
SLIDE 43

Case Study: Refinement of Trajectory Clusters

(a) Cluster 1 (39 trajectories) (b) Cluster 4 (12 trajectories) (c) Cluster 5 (19 trajectories) (d) Cluster 6 (8 trajectories)

Figure : Refinement of cluster 1 into its three sub-clusters

Graph-Based Clustering of Network-Constrained Trajectory Data 26 / 41

slide-44
SLIDE 44

Comparison with NNCluster [Roh and Hwang, 2010]

Experimental setting

9 artificial datasets containing labeled clusters Clusters can present interactions with each other

Evaluation based on external criteria

Adjusted Rand Index [Hubert and Arabie, 1985] Purity and entropy [Zhao and Karypis, 2002] Table : Characteristics of the labeled datasets

Dataset Clusters Trajectories Road network 1 9 158 Oldenburg 2 10 163 Oldenburg 3 11 141 Oldenburg 4 6 86 Oldenburg 5 6 91 Oldenburg 6 6 110 Oldenburg 7 12 205 San Joaquin 8 11 190 San Joaquin 9 12 203 San Joaquin Graph-Based Clustering of Network-Constrained Trajectory Data 27 / 41

slide-45
SLIDE 45

Comparison with NNCluster [Roh and Hwang, 2010]

Table : Adjusted Rand Index

Discovered Adjusted Rand Index Dataset clusters NNCluster Baseline Modularity 1 9 (9) 0.902 1 2 10 (10) 0.881 1 3 11 (11) 0.764 0.873 4 6 (6) 1 1 5 6 (6) 1 1 6 6 (6) 1 1 7 14 (12) 0.618 0.961 8 12 (11) 0.921 0.971 9 10 (12) 0.752 0.889

Table : Purity and entropy

Discovered Purity Entropy Dataset clusters NNCluster Baseline Modularity NNCluster Baseline Modularity 1 9 (9) 0.924 1 0.062 2 10 (10) 0.902 1 0.059 3 11 (11) 0.823 0.915 0.113 0.064 4 6 (6) 1 1 5 6 (6) 1 1 6 6 (6) 1 1 7 14 (12) 0.712 1 0.185 8 12 (11) 0.942 1 0.038 9 10 (12) 0.778 0.872 0.136 0.075 Graph-Based Clustering of Network-Constrained Trajectory Data 28 / 41

slide-46
SLIDE 46

Extension to Road Segment Clustering

Clustering road segments is equally important Motivations:

Characterize the roles they play in the road network Predict how traffic congestion propagates

(a) Cluster 4 (12 trajectories) (b) Cluster 5 (19 trajectories) (c) Cluster 6 (8 trajectories)

Figure : Trajectory clusters are clearly “supported” by groups of road segments

Graph-Based Clustering of Network-Constrained Trajectory Data 29 / 41

slide-47
SLIDE 47

Road Segment Clustering

[El Mahrsi and Rossi, 2012b, El Mahrsi and Rossi, 2013]

We proceed by analogy to the trajectory case

Cosine similarity is used to measure segment resemblances A weighted graph GS(S, ES, WS) depicts segment interactions The same clustering algorithm is used to cluster the graph

s2 s1 s3 s4 s6 s8 s5 s7 s9 T4 T2 T3 T1

s1

T5

Similarity(s1, s3)

s2 s3 s8 s4 s5 s7 s6

Figure : Example of a road segment similarity graph

Graph-Based Clustering of Network-Constrained Trajectory Data 30 / 41

slide-48
SLIDE 48

How to Interpret Road Segment Clusters?

We did discover clusters, but...

(a) (b) (c) (d) (e) (f)

Figure : Examples of road segment clusters discovered through graph-based segment clustering

Graph-Based Clustering of Network-Constrained Trajectory Data 31 / 41

slide-49
SLIDE 49

Observations

Duality between trajectory clustering and segment clustering Road segment clusters are hard to interpret “on their own”

Due to lack of context Easier to interpret in the light of trajectory clusters Left to the initiative of the user

Instead of considering trajectories and road segments separately, consider clustering both at the same time

Graph-Based Clustering of Network-Constrained Trajectory Data 32 / 41

slide-50
SLIDE 50

Outline

1 Context and Motivations 2 Sampling Trajectory Data Streams 3 Graph-Based Clustering of Network-Constrained Trajectory Data 4 Co-Clustering Network-Constrained Trajectory Data 5 Conclusions, Future Work and Open Issues

slide-51
SLIDE 51

Co-Clustering Network-Constrained Trajectory Data

Joint work w/ Romain Guigourès and Marc Boullé (Orange Labs) [El Mahrsi et al., 2013]

Objective: cluster trajectories and road segments simultaneously Equivalent to considering a bipartite graph G(T , S, E) representing interactions between trajectories and segments

s2 s1 s3 s4 s6 s8 s5 s7 s9 T4 T2 T3 T1

T1 T2 T3 T4

T5

T5 s1 s2 s3 s4 s5 s6 s7 s8

Figure : Bipartite graph of interactions between trajectories and road segments

Co-Clustering Network-Constrained Trajectory Data 33 / 41

slide-52
SLIDE 52

MODL Co-Clustering [Boullé, 2011]

MODL co-clustering is applied to the adjacency matrix of the bipartite graph

Based on Bayesian model selection with a hierarchical prior Rearrange rows and columns into homogeneously dense blocks

Output: a set of co-clusters, each is the intersection of

A trajectory cluster A road segment cluster

Co-Clustering Network-Constrained Trajectory Data 34 / 41

slide-53
SLIDE 53

Back to the Case Study

Trajectories Segments

(a) Modularity-based approach

Trajectories Segments

(b) Co-clustering approach

Figure : Adjacency matrix of the bipartite graph, rearranged based on the clusters discovered by both approaches

Co-Clustering Network-Constrained Trajectory Data 35 / 41

slide-54
SLIDE 54

Characterizing Traffic Using Trajectory/Segment Co-Clusters

We use the discovered co-clusters’ contribution to mutual information to guide the interpretation

Figure : Contribution to mutual information of the co-clusters discovered in the case study dataset. Trajectory clusters (7 clusters) are depicted on the rows and road segment clusters (12 clusters) on the columns

Co-Clustering Network-Constrained Trajectory Data 36 / 41

slide-55
SLIDE 55

Characterizing Traffic: Peripheral Road Segments

(a) 34 segments (b) 40 segments (c) 77 segments

Figure : Examples of “secondary” road segment clusters leading to peripheral areas of the road network and visited exclusively by single groups of trajectories

Co-Clustering Network-Constrained Trajectory Data 37 / 41

slide-56
SLIDE 56

Characterizing Traffic: Hub Road Segments

(a) Hub segment cluster (11 segments) (b) Trajectory cluster (20 trajectories) (c) Trajectory cluster (12 trajectories)

Figure : A hub road segment traveled by two different trajectory clusters with different departures and destinations

Co-Clustering Network-Constrained Trajectory Data 38 / 41

slide-57
SLIDE 57

Outline

1 Context and Motivations 2 Sampling Trajectory Data Streams 3 Graph-Based Clustering of Network-Constrained Trajectory Data 4 Co-Clustering Network-Constrained Trajectory Data 5 Conclusions, Future Work and Open Issues

slide-58
SLIDE 58

Main Contributions

STSS, a fast on-the-fly algorithm for sampling trajectory streams with configurable approximation errors

[El Mahrsi et al., 2010]

Graph-based approaches to clustering trajectories in road networks

[El Mahrsi and Rossi, 2012c, El Mahrsi and Rossi, 2012a, El Mahrsi and Rossi, 2012b, El Mahrsi and Rossi, 2013]

An approach to simultaneous co-clustering of trajectories and road segments

[El Mahrsi et al., 2013]

Conclusions, Future Work and Open Issues 39 / 41

slide-59
SLIDE 59

Future Work and Open Issues: Trajectory Sampling

Noise sensitivity Presence of the road network Effect on querying

Conclusions, Future Work and Open Issues 40 / 41

slide-60
SLIDE 60

Future Work and Open Issues: Trajectory Clustering

Better evaluation of the approaches

On real datasets With more realistic data generators

Effect of varying the clustering algorithms Integration of time in the clustering process “Social-oriented” clustering of mobility data

Conclusions, Future Work and Open Issues 41 / 41

slide-61
SLIDE 61

List of Publications

[1]

  • M. K. El Mahrsi, C. Potier, G. Hébrail, and F. Rossi, “Spatiotemporal sampling for trajectory

streams,” in SAC’10: Proceedings of the 2010 ACM Symposium on Applied Computing, (New York, NY, USA), pp. 1627-1628, ACM, 2010. (Poster) [2]

  • M. K. El Mahrsi and F. Rossi, “Modularity-Based Clustering for Network-Constrained

Trajectories,” in Proceedings of the 20th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2012), (Bruges, Belgium), pp. 471-476, Apr. 2012. [3]

  • M. K. El Mahrsi and F. Rossi, “Graph-Based Approaches to Clustering Network- Constrained

Trajectory Data,” in Proceedings of the Workshop on New Frontiers in Mining Complex Patterns (NFMCP 2012), (Bristol, UK), pp. 184-195, Sept. 2012. [4]

  • M. K. El Mahrsi and F. Rossi, “Clustering par optimisation de la modularité pour trajectoires

d’objets mobiles,” in Actes des 8èmes journées francophones Mobilité et Ubiquité, (Anglet, France), pp. 12-22, Cépaduès Éditions, Jun. 2012. [5]

  • M. K. El Mahrsi, R. Guigourès, F. Rossi, and M. Boullé, “Classifications croisées de données de

trajectoires contraintes par un réseau routier,” in Actes de 13ème Conférence Internationale Francophone sur l’Extraction et gestion des connaissances (EGC’2013), vol. RNTI-E-24, (Toulouse, France), pp. 341-352, Hermann-Éditions, Feb. 2013. [6]

  • M. K. El Mahrsi and F. Rossi, “Graph-based approaches to clustering network-constrained

trajectory data,” in New Frontiers in Mining Complex Patterns, vol. 7765 of Lecture Notes in Computer Science, pp. 124-137, Springer Berlin Heidelberg, 2013. [?]

  • M. K. El Mahrsi, R. Guigourès, F. Rossi, and M. Boullé, “Co-Clustering Network-Constrained

Trajectory Data,” Submitted to AKDM-5 (Advances in Knowledge Discovery and Management

  • Vol. 5).
slide-62
SLIDE 62

References I

Boullé, M. (2011). Data grid models for preparation and modeling in supervised

  • learning. In Hands-On Pattern Recognition: Challenges in Machine Learning, vol.

1, pages 99–130. Microtome. El Mahrsi, M. K., Guigourès, R., Rossi, F., and Boullé, M. (2013). Classifications croisées de données de trajectoires contraintes par un réseau routier. In Vrain, C., Péninou, A., and Sedes, F., editors, Actes de 13ème Conférence Internationale Francophone sur l’Extraction et gestion des connaissances (EGC’2013), volume RNTI-E-24, pages 341–352, Toulouse, France. Hermann-Éditions. El Mahrsi, M. K., Potier, C., Hébrail, G., and Rossi, F. (2010). Spatiotemporal sampling for trajectory streams. In SAC ’10: Proceedings of the 2010 ACM Symposium on Applied Computing, pages 1627–1628, New York, NY, USA. ACM. El Mahrsi, M. K. and Rossi, F. (2012a). Clustering par optimisation de la modularité pour trajectoires d’objets mobiles. In UbiMob’12, pages 12–22. El Mahrsi, M. K. and Rossi, F. (2012b). Graph-Based Approaches to Clustering Network-Constrained Trajectory Data. In Proceedings of the Workshop on New Frontiers in Mining Complex Patterns (NFMCP 2012), pages 184–195, Bristol, UK.

slide-63
SLIDE 63

References II

El Mahrsi, M. K. and Rossi, F. (2012c). Modularity-Based Clustering for Network-Constrained Trajectories. In Proceedings of the 20th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2012), pages 471–476, Bruges, Belgium. El Mahrsi, M. K. and Rossi, F. (2013). Graph-based approaches to clustering network-constrained trajectory data. In Appice, A., Ceci, M., Loglisci, C., Manco, G., Masciari, E., and Ras, Z., editors, New Frontiers in Mining Complex Patterns, volume 7765 of Lecture Notes in Computer Science, pages 124–137. Springer Berlin Heidelberg. Guo, D., Liu, S., and Jin, H. (2010). A graph-based approach to vehicle trajectory analysis. J. Locat. Based Serv., 4:183–199. Hubert, L. and Arabie, P. (1985). Comparing partitions. Journal of Classification, 2:193–218. Meratnia, N. and de By, R. A. (2004). Spatiotemporal compression techniques for moving point objects. In Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K., and Ferrari, E., editors, EDBT, volume 2992 of Lecture Notes in Computer Science, pages 765–782. Springer. Newman, M. E. J. (2006). Modularity and community structure in networks. Proceedings of the National Academy of Sciences, 103(23):8577–8582.

slide-64
SLIDE 64

References III

Noack, A. and Rotta, R. (2009). Multi-level algorithms for modularity clustering. In Proceedings of the 8th International Symposium on Experimental Algorithms, SEA ’09, pages 257–268, Berlin, Heidelberg. Springer-Verlag. Potamias, M., Patroumpas, K., and Sellis, T. (2006). Sampling trajectory streams with spatiotemporal criteria. In Proceedings of the 18th International Conference on Scientific and Statistical Database Management, SSDBM ’06, pages 275–284, Washington, DC, USA. IEEE Computer Society. Roh, G.-P. and Hwang, S.-w. (2010). Nncluster: An efficient clustering algorithm for road network trajectories. In Database Systems for Advanced Applications, volume 5982 of Lecture Notes in Computer Science, pages 47–61. Springer Berlin - Heidelberg. Zhao, Y. and Karypis, G. (2002). Criterion functions for document clustering: Experiments and analysis. Technical report.

slide-65
SLIDE 65

STSS Vs. STTrace [Potamias et al., 2006]

Athens trucks dataset

276 trajectories 112203 data points (1 position/30 sec)

STTrace: on-the-fly, no error guarantees (but storage space guarantee) Comparison for the same percentage of retained data Evaluation criteria

Average approximation error

Average Approximation Error = 1

  • T∈T |T| ×
  • T∈T
  • Pi ∈T

distance(Pi, P′

i )

Maximum approximation error

Maximum Approximation Error = max

T∈T ( max Pi ∈T(distance(Pi, P′ i )))

slide-66
SLIDE 66

STSS Vs. STTrace: Average Approximation Error

0,1 1 10 100 1000 30 40 50 60 70 80 90 100 Average Approximation Error (meters) Retained data (%) STSS STTrace

Figure : Average Approximation Errors resulting from STSS and STTrace sampling

slide-67
SLIDE 67

STSS Vs. STTrace: Maximum Approximation Error

10 100 1000 10000 100000 30 40 50 60 70 80 90 100 Maximum Approximation Error (meters) Retained data (%) STSS STTrace

Figure : Maximum Approximation Errors resulting from STSS and STTrace sampling

slide-68
SLIDE 68

Why Modularity-Based Community Detection?

Efficiency and effectiveness observed in practice Non-parametric Robustness to the presence of high degrees The implementation we used produces a hierarchy of nested clusters

Recursive descent based on the statistical significance of the partitions

slide-69
SLIDE 69

How Do We Generate Our Labeled Datasets?

When generating a cluster

A set of neighbor vertices is selected as the starting area A set of neighbor vertices is selected as the destination area For each trajectory, a vertex is chosen randomly in each set and the trajectory is generated as the shortest path between them

Clusters are generated based on patterns we considered as relevant

slide-70
SLIDE 70

Cluster Patterns: Inverted Clusters

The starting area of one cluster is the destination area of the

  • ther
  • (a)
  • (b)

Figure : Example of inverted clusters

slide-71
SLIDE 71

Cluster Patterns: Converging Clusters

The clusters depart from different areas and arrive to the same destination area

  • (a)
  • (b)

Figure : Example of converging clusters

slide-72
SLIDE 72

Cluster Patterns: Diverging Clusters

The clusters depart from the same area and arrive to different destinations

  • (a)
  • (b)

Figure : Example of diverging clusters

slide-73
SLIDE 73

Modularity Vs. Spectral Clustering (Trajectory Case)

Table : Adjusted Rand Index

Discovered Adjusted Rand Index Dataset clusters Spectral Modularity 1 9 (9) 1 1 2 10 (10) 1 1 3 11 (11) 0.802 0.873 4 6 (6) 1 1 5 6 (6) 0.974 1 6 6 (6) 1 1 7 14 (12) 0.961 0.961 8 12 (11) 0.942 0.971 9 10 (12) 0.889 0.889

Table : Entropy and Purity

Discovered Purity Entropy Dataset clusters Spectral Modularity Spectral Modularity 1 9 (9) 1 1 2 10 (10) 1 1 3 11 (11) 0.837 0.915 0.106 0.064 4 6 (6) 1 1 5 6 (6) 0.989 1 0.0233 6 6 (6) 1 1 7 14 (12) 1 1 8 12 (11) 0.963 1 0.021 9 10 (12) 0.872 0.872 0.075 0.075

slide-74
SLIDE 74

Internal Quality Criteria

Inspired by Intra-Cluster Inertia Sum of average trajectory intra-cluster overlaps Q(CT ) =

  • C∈CT

1 |C|

  • Ti,Tj∈C
  • s∈Ti,s∈Tj length(s)
  • s∈Ti length(s)

Sum of average road segment intra-cluster overlaps Q(CS) =

  • C∈CS

1 |C|

  • si,sj∈C

|{T ∈ T : si ∈ T ∧ sj ∈ T}| |{T ∈ T : si ∈ T ∨ sj ∈ T}|

slide-75
SLIDE 75

Similarity Between Road Segments

Road segments are considered as bags-of-trajectories Weights are assigned to trajectories based on the number of segments they visit ωT,s = ns,T

  • T ′∈T ns,T ′ × log

|S| |s′ ∈ S : s′ ∈ T| Segment resemblance is measured through cosine similarity Similarity(si, sj) =

  • T∈T ωT,si × ωT,sj
  • T∈T ω2

T,si ×

  • T∈T ω2

T,sj

slide-76
SLIDE 76

Modularity Vs. Spectral Clustering (Segment Case)

Comparison on 5 artificial datasets (composed of 100 trajectories each) Based on the sum of average road segment intra-cluster

  • verlaps

Q(CS) =

  • C∈CS

1 |C|

  • si,sj∈C

|{T ∈ T : si ∈ T ∧ sj ∈ T}| |{T ∈ T : si ∈ T ∨ sj ∈ T}|

Table : Characteristics of the five synthetic datasets Number of Number of edges in Dataset segments the similarity graph 1 2562 79811 2 2394 100270 3 2587 110095 4 2477 87023 5 2348 80659

slide-77
SLIDE 77

Modularity Vs. Spectral Clustering (Segment Case)

Table : Sum of average segment intra-cluster overlaps Number of Intra-cluster overlaps Dataset discovered clusters Spectral Modularity 1 23 685.82 657.20 2 21 556.22 524.46 3 20 623.21 561.09 4 22 647.56 594.76 5 26 684.81 666.24