Time- -focused density focused density- -based based Time - - PowerPoint PPT Presentation

time focused density focused density based based time
SMART_READER_LITE
LIVE PREVIEW

Time- -focused density focused density- -based based Time - - PowerPoint PPT Presentation

Time- -focused density focused density- -based based Time clustering of trajectories clustering of trajectories of moving objects of moving objects Margherita DAuria DAuria Margherita Mirco Nanni Mirco Nanni Dino Pedreschi


slide-1
SLIDE 1

Time Time-

  • focused density

focused density-

  • based

based clustering of trajectories clustering of trajectories

  • f moving objects
  • f moving objects

Margherita Margherita D’Auria D’Auria Mirco Nanni Mirco Nanni Dino Dino Pedreschi Pedreschi

slide-2
SLIDE 2
  • Plan of the talk

Plan of the talk

Introduction

Motivations Problem & context Density-based Clustering (OPTICS)

Density-based clustering on trajectories

Trajectory data model distance measure Results

Temporal Focusing

A clustering quality measure Heuristics for optimal temporal interval

Conclusions & future work

slide-3
SLIDE 3
  • Motivations

Motivations

  • Plenty of actual and future data sources for

Plenty of actual and future data sources for spatio spatio-

  • temporal data

temporal data

  • Sophisticated analysis method are required, in

Sophisticated analysis method are required, in

  • rder to fully exploit them
  • rder to fully exploit them

Data mining methods Which kind of patterns/models?

  • Main objectives

Main objectives

A better understanding of the application domain An improvement for private and public services

slide-4
SLIDE 4
  • Problem

Problem & & context context

  • A

A distinguishing distinguishing case: Mobile case: Mobile devices devices

PDAs Mobile phones LBS-enabled devices (may include the two above)

  • They

They (can) (can) yield yield traces traces of

  • f their

their movement movement

  • An

An important important problem problem: :

Discovering groups of individuals that (approx.) move together in some

period of time

E.g.: detection of traffic jams during rush hours

  • A candidate Data

A candidate Data Mining Mining reformulation reformulation of the

  • f the problem

problem

Clustering of individuals’ trajectories

slide-5
SLIDE 5
  • Which

Which kind kind of

  • f clustering

clustering? ?

  • Several

Several alternatives alternatives are are available available

  • General

General requirements requirements: :

Non-spherical clusters should be allowed

E.g.: A traffic jam along a road It should be represented as a cluster which individuals form a

“snake-shaped” cluster

Tolerance to noise Low computational cost Applicability to complex, possibly non-vectorial data

  • A

A suitable suitable candidate: candidate: Density Density-

  • based

based clustering clustering

In particular, we adopt OPTICS

slide-6
SLIDE 6
  • A

A crushed crushed intro intro to to OPTICS OPTICS

A density threshold is defined through two parameters:

: A neighborhood radius MinPts: Minimum number of points

  • Key concepts:

Key concepts:

Core objects

Objects with a -Neighborhood that contains at least MinPts objects

Reachability-distance reach-d( p, q )

(simplified definition:) Distance between objects p and q

  • Example

Example: :

Object “q” is a core object if MinPts=2 Object “p” is not Their reach-d() is shown

q q p p

–neighborhood of q neighborhood of q reach ch-d(p,q)

slide-7
SLIDE 7
  • A

A crushed crushed intro intro to to OPTICS OPTICS

0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 0.22 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

18 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Y axis X axis

The algorithm:

1.

Repeatedly choose a non-visited random object, until a core object is selected

2.

Select the core object having the smallest reachability distance from all the visited core objects. If none can be found, go to step 1

Order of visit Output: reach-d() of all visited points ( (reachability reachability plot plot) )

“jump” from left

  • h

and group (0

  • 9

) to right

  • h

and one (10 (10

  • 1

8) 1 8) Reachability threshold Cluster 1 Cluster 2

slide-8
SLIDE 8
  • Applying

Applying OPTICS OPTICS to to trajectories trajectories

  • Two

Two key key issues issues have have to to be be solved solved

A suitable representation for trajectories is needed

Which data model for trajectories?

A mean for comparing trajectories has to be provided

Which distance between objects? OPTICS needs to define one to perform range queries

slide-9
SLIDE 9
  • A

A trajectory trajectory data model data model

  • Raw

Raw input data: input data:

Each trajectory is represented as a set of time-stamped coordinates T=(t1,x1,y1), …, (tn, xn, yn) => Object position at time ti was (xi,yi)

  • Data model

Data model

Parametric-spaghetti: linear interpolation between consecutive points

slide-10
SLIDE 10
  • Adopted

Adopted distance distance = = average average distance distance

  • It

It is is a a metric metric => => efficient efficient indexing indexing methos methos allowed allowed

A A distance distance between between trajectories trajectories

| | )) ( ), ( ( | ) , (

2 1 2 1

T dt t t d D

T T

= τ τ τ τ

slide-11
SLIDE 11
  • A

A sample sample dataset dataset

  • Set of

Set of trajectories trajectories forming forming 4 4 clusters clusters + + noise noise

  • Generated

Generated by by the CENTRE system ( the CENTRE system (KDDLab KDDLab software) software)

slide-12
SLIDE 12
  • K-means

OPTICS HAC-average

OPTICS vs. OPTICS vs. HAC & K HAC & K-

  • means

means

slide-13
SLIDE 13
  • Temporal focusing

Temporal focusing

  • Different time intervals can show different

Different time intervals can show different behaviours behaviours

E.g.: objects that are close to each other within a time

interval can be much distant in other periods of time

  • The time interval becomes a parameter

The time interval becomes a parameter

E.g.: rush hours vs. low traffic times

  • Problem: significant time intervals are not always

Problem: significant time intervals are not always known known a priori a priori

An automated mechanism is needed to find them

slide-14
SLIDE 14
  • Temporal focusing

Temporal focusing

  • The

The proposed proposed method method

1. 1.

Provide Provide a a notion notion

  • f
  • f interestingness

interestingness to to be be associated associated with with time time intervals intervals

  • We define it in terms of estimated quality of the clustering

extracted on the given time interval

2. 2.

Formalize Formalize the the Temporal Temporal focusing focusing task task as as an an

  • ptimization
  • ptimization problem

problem

  • Discover

the time interval that maximizes the interestingness measure

slide-15
SLIDE 15
  • A quality measure for

A quality measure for density density-

  • based clustering

based clustering

  • General

General principle principle

High-density clusters separated by

low-density noise are preferred

  • The

The method method

High-density clusters correspond to

low dents in the reachability plot => Evaluate the global quality Q of the clustering

  • utput as the average

reachability within clusters (noise is discarded)

  • Definition

Definition: : given given and and dataset dataset D, D, compute compute Q QD,

D, as

as: :

QD, = - R (D, ’) = - AVGo in D’ reach-d(o) D’ = D – {noise objects}

slide-16
SLIDE 16
  • FAQs

FAQs

  • How

How Q() Q() is is computed computed for for a a given given time time interval interval I ? I ?

Step 1: trajectory segments out of I are clipped away Step 2: OPTICS is run on the clipped trajectories Step 3: Q(I) is computed on the output reachability plot

  • How is the

How is the reachability reachability threshold set for each interval? threshold set for each interval?

A reachability threshold is needed in order to locate clusters (and noise) The threshold for the largest I is manually set by the user Thresholds for other intervals I’ I are computed from the first one by

proportionally rescaling w.r.t. average reachability

  • Is the optimal Q(I) biased towards tiny intervals?

Is the optimal Q(I) biased towards tiny intervals?

  • Yes. The problem has been fixed by defining Q’(I) = Q(I) / log |I|

=> A small decrease in Q(I) is accepted when it yields a much larger I

slide-17
SLIDE 17
  • Esperiments

Esperiments

  • A more

A more complex complex sample sample dataset dataset ( (generated generated by by CENTRE) CENTRE)

Clear clusters in the central time interval vs. dispersion on the borders

slide-18
SLIDE 18
  • Optimizing

Optimizing Q() Q()

  • Find

Find the the optimal

  • ptimal Q()

Q() by by plotting plotting values values for for all all time time intervals intervals

The optimum corresponds to the central time interval

slide-19
SLIDE 19
  • Heuristics

Heuristics for for optimum search

  • ptimum search
  • Each

Each Q() Q() value value computation computation requires requires a a run run of the OPTICS

  • f the OPTICS algorithm

algorithm

  • Computing

Computing all all O(N O(N2

2)

) values values is is too too expensive expensive (N=|{ (N=|{sub sub-

  • intervals

intervals}|) }|)

  • Alternative

Alternative approaches approaches are are needed needed

  • Preliminary

Preliminary tests tests with with hill hill-

  • climbing

climbing ( (i.e i.e., ., greedy greedy) ) approach approach: :

20 40 60 80 100 Tstart 20 40 60 80 100 Tend Tstart Tend

  • Test on the

Test on the same same dataset dataset

  • Global

Global optimum

  • ptimum found

found in the in the 70,7% of 70,7% of runs runs

  • Avg

Avg. . number number of

  • f steps

steps: 17 : 17

  • Avg
  • Avg. OPTICS

. OPTICS runs runs: 49 : 49

starting starting points points local local

  • ptima
  • ptima

global global

  • ptimum
  • ptimum
slide-20
SLIDE 20
  • Conclusions

Conclusions & Future & Future works works

  • Summary of the work

Summary of the work

Extension of OPTICS to a trajectory data model & distance Definition of the Temporal Focusing problem Definition of a clustering quality measure (Preliminary) Tests with exhaustive & greedy optimization

  • Future work

Future work

Experimental validation over broader benchmarks Tighter integration between OPTICS and search strategy Alternative, domain-specific definition of quality measures