Similarity-based Analysis for Trajectory Data Kevin Zheng - - PowerPoint PPT Presentation

similarity based analysis for trajectory data
SMART_READER_LITE
LIVE PREVIEW

Similarity-based Analysis for Trajectory Data Kevin Zheng - - PowerPoint PPT Presentation

Similarity-based Analysis for Trajectory Data Kevin Zheng 25/04/2014 DASFAA 2014 Tutorial 1 Outline Background What is trajectory Where do they come from Why are they useful Characteristics Trajectory similarity search


slide-1
SLIDE 1

Similarity-based Analysis for Trajectory Data

Kevin Zheng

25/04/2014 DASFAA 2014 Tutorial 1

slide-2
SLIDE 2

Outline

  • Background

– What is trajectory – Where do they come from – Why are they useful – Characteristics

  • Trajectory similarity search

– Query classification – Trajectory similarity measures – Trajectory index

  • Similarity-based trajectory mining

– Popular route mining – Co-traveller discovery – Trajectory clustering

25/04/2014 DASFAA 2014 Tutorial 2

slide-3
SLIDE 3

Outline

  • Background

– What is trajectory – Where do they come from – Why are they useful – Characteristics

  • Trajectory similarity search

– Query classification – Trajectory similarity measures – Trajectory index

  • Similarity-based trajectory mining

– Popular route mining – Co-traveller discovery – Trajectory clustering

25/04/2014 DASFAA 2014 Tutorial 3

slide-4
SLIDE 4

What is trajectory?

  • Historical location records of moving objects
  • In mathematics

– Continuous function: time à location – Location can be any dimension

  • In real applications

– Locations are sampled periodically – A finite sequence of time-stamped locations: <p1, t1>, <p2, t2> …, <pn, tn> – p: two or three dimensions (longitude, latitude)

25/04/2014 DASFAA 2014 Tutorial 4

slide-5
SLIDE 5

Where is it from?

25/04/2014 DASFAA 2014 Tutorial 5

slide-6
SLIDE 6

Where is it from?

  • GPS module on moving objects

– Vehicles, mobile phone users, animals

  • Online social network

– Twitter, Flickr, Facebook, Weibo

  • Sensors

– Surveillance cameras, RFID, WiFi

  • More …

25/04/2014 DASFAA 2014 Tutorial 6

slide-7
SLIDE 7

Who cares about it?

  • Government

– Traffic pattern analysis – Public transportation management – Urban planning

  • Business

– Location-based service – Personalized advertisement & recommendation – Taxi company, logistic company

  • Scientists & Researchers

– Zoologist, meteorologist, astronomer – Open problems, challenging tasks

  • More …

25/04/2014 DASFAA 2014 Tutorial 7

slide-8
SLIDE 8

Trajectory data are BIG

  • Volume
  • Velocity
  • Variety

25/04/2014 DASFAA 2014 Tutorial 8

slide-9
SLIDE 9

Volume

  • In 2010, 1 billion vehicles

– Taxi, logistic companies keep tracking their vehicles – Self-driving car in near future?

  • In 2012, 1.08 billion smartphone users
  • In 2013, 20 million surveillance cameras in

China

  • They are generator!

– The data keep accumulated

25/04/2014 DASFAA 2014 Tutorial 9

slide-10
SLIDE 10

Velocity

  • Not just huge, they’re being generated quickly
  • Vehicle tracking & navigation

– Re-position every few seconds

  • Geo-tagged social media

– 2 million Flickr photos per day, 5% geo-tagged – 100 million posts on Sina Weibo per day, 1-2% geo-tagged – 400 million tweets per day, 1% geo-tagged

  • Sensors

– How many cars pass a road camera every day?

25/04/2014 DASFAA 2014 Tutorial 10

slide-11
SLIDE 11

Geo-tagged tweets

25/04/2014 DASFAA 2014 Tutorial 11

Images courtesy of Twitter

slide-12
SLIDE 12

Variety

  • Data source
  • Tracking devices

– Car GPS, smartphones, sensors

  • Tracking methods

– Sampling strategy, sampling rate,

  • Spatial length & temporal duration
  • Data quality

25/04/2014 DASFAA 2014 Tutorial 12

slide-13
SLIDE 13

Research directions

  • Scalable, real-time data processing
  • Flexible database storage and index
  • Effective similarity measures
  • Uncertainty management
  • Data compression

25/04/2014 DASFAA 2014 Tutorial 13

Key and fundamental research problem: similarity-based analysis

slide-14
SLIDE 14

Outline

  • Background

– What is trajectory – Where do they come from – Why are they useful – Characteristics

  • Trajectory similarity search

– Query classification – Trajectory similarity measures – Trajectory index

  • Similarity-based trajectory mining

– Popular route mining – Co-traveller discovery – Trajectory clustering

25/04/2014 DASFAA 2014 Tutorial 14

slide-15
SLIDE 15

Similarity-based analysis for trajectories

  • Core problem: trajectory similarity search

– Input: a trajectory dataset D, a query Q – Output: a subset of D that are ‘similar’ to Q

  • Foundation

– Trajectory similarity measures

  • Approach

– Index and search algorithm

  • Application

– Popular route mining (route recommendation) – co-traveller discovery, clustering, classification, etc…

25/04/2014 DASFAA 2014 Tutorial 15

slide-16
SLIDE 16

Similarity query classification

  • P-query

– Query: point(s)

  • R-query

– Query: region (spatial & temporal dimension)

  • T-query

– Query: trajectory

25/04/2014 DASFAA 2014 Tutorial 16

slide-17
SLIDE 17

P-query (single point)

25/04/2014 DASFAA 2014 Tutorial 17

ts te ts te

q 𝐸(𝑟, ¡𝑈)=​𝑛𝑗𝑜⁠𝑒𝑗𝑡𝑢(𝑟,𝑞) 𝑞∈𝑈 and satisfy tc dist(q,p):

  • Lp-norm
  • Network distance

Query location: q Temporal constraint (optional): tc = [ts, te]

[Tao2002] Tao Y., Papadias D. and Shen Q., Continuous nearest neighbour search, VLDB, 2002

slide-18
SLIDE 18

P-query (multiple points)

25/04/2014 DASFAA 2014 Tutorial 18

q1 q2 q3 q4

[Chen2010] Chen Z., Shen HT., Zhou X., Zheng Y and Xie X., Searching trajectories by locations – an efficiency study. SIGMOD 2010

Query locations Q: q1, q2, q3, q4

D(Q,T) is an aggregate function of D(q,T)

slide-19
SLIDE 19

R-query

  • Spatial region: R
  • Temporal interval:[ts, te]

25/04/2014 DASFAA 2014 Tutorial 19

ts te ts te

[Pfoster 2000] Dieter Pfoster, Christian S. Jensen, Yannis T., Novel approaches to the indexing of moving object trajectories. VLDB, 2000

R

Ask for trajectories in a given region during a time interval

slide-20
SLIDE 20

T-query

  • Query: Tq

25/04/2014 DASFAA 2014 Tutorial 20

Tq How to measure their distance?

slide-21
SLIDE 21

Trajectory similarity measures

  • Many-to-many mapping
  • Different semantic/applications
  • Different lengths
  • Different sampling rates
  • Noises
  • Temporal dimension?

25/04/2014 DASFAA 2014 Tutorial 21

slide-22
SLIDE 22

Classification

25/04/2014 DASFAA 2014 Tutorial 22

Lp-norm DTW LCSS EDR DTW, LCSS, EDR with time constrain OWD LIP Synchronous Euclidean Distance Spatial-only Spatial-temporal Discrete Continuous Consider location

  • nly

Consider both location and time Based on location samples Based on line segments or curves

slide-23
SLIDE 23

Classification

25/04/2014 DASFAA 2014 Tutorial 23

DTW, LCSS, EDR with time constrain OWD LIP Synchronous Euclidean Distance Spatial-only Spatial-temporal Discrete Continuous Lp-norm DTW LCSS EDR

slide-24
SLIDE 24

Lp-norm

  • Average Lp-norm distance of all matched

locations

  • 1-to-1 mapping
  • Trajectories are of the same length

25/04/2014 DASFAA 2014 Tutorial 24

slide-25
SLIDE 25

Lp-norm

  • Cannot detect similar trajectories with different

sampling rates

  • Sensitive to noise

25/04/2014 DASFAA 2014 Tutorial 25

slide-26
SLIDE 26

DTW

  • Dynamic Time Warping distance

– Adaptation from time series distance measure – Used to handle time shift and scale in time series

  • Optimal order-aware alignment between two

sequences

– Goal: minimize the aggregate distance between matched points

  • 1-to-many mapping

25/04/2014 DASFAA 2014 Tutorial 26

Yi, Byoung-Kee, Jagadish, HV and Faloutsos, Christos, Efficient retrieval of similar time sequences under time warping. ICDE 1998

slide-27
SLIDE 27

DTW for trajectories

  • Nothing to do with ‘time’ at all
  • Useful when detecting similar trajectories with

different sampling rates

  • Sensitive to noise

25/04/2014 DASFAA 2014 Tutorial 27

slide-28
SLIDE 28

LCSS

  • Longest Common Sub-Sequence
  • Adaptation of string similarity

– Lcss(‘abcde’,’bd’) = 2

  • Threshold-based equality relationship

– Two locations are regarded as equal if they’re ‘close’ (compared to a threshold)

  • 1-to-(1 or null) mapping

25/04/2014 DASFAA 2014 Tutorial 28

VLACHOS, M., GUNOPULOS, D., AND KOLLIOS, G. Discovering similar multidimensional trajectories. ICDE 2002

slide-29
SLIDE 29

LCSS

  • Insensitive to noise
  • Not easy to define threshold
  • May return dissimilar trajectories

25/04/2014 DASFAA 2014 Tutorial 29

p1 p2 p3 p4 p5 p’1 p’2 p’3

slide-30
SLIDE 30

EDR

  • Edit Distance on Real sequence
  • Adaptation from Edit Distance on strings

– Number of insert, delete, replace needed to convert A into B

  • Threshold-based equality relationship

– Two locations are regarded as equal if they’re ‘close’ (compared to a threshold)

25/04/2014 DASFAA 2014 Tutorial 30

Lei Chen, M. Tamer Ozsu, Vincent Oria, Robust and Fast Similarity Search for Moving Object Trajectories. SIGMOD 2005

slide-31
SLIDE 31

EDR

  • Value means the number of operations, not

“distance between locations”

– Insensitive to noise

25/04/2014 DASFAA 2014 Tutorial 31

p1 p2 p3 p4 p5 p’1 p’2 p’3

insert insert replace

slide-32
SLIDE 32

LCSS and EDR

  • They are both count-based

– LCSS counts the number of matched pairs – EDR counts the cost of operations needed to fix the unmatched pairs

  • Higher LCSS, lower EDR
  • If cost(replace) = cost(insert) + cost(delete):
  • EDR(X,Y) = L(X)+L(Y) – 2LCSS(X,Y)

25/04/2014 DASFAA 2014 Tutorial 32

slide-33
SLIDE 33

Classification

25/04/2014 DASFAA 2014 Tutorial 33

DTW, LCSS, EDR with time constrain Synchronous Euclidean Distance Spatial-only Spatial-temporal Discrete Continuous Lp-norm DTW LCSS EDR OWD LIP

slide-34
SLIDE 34

OWD

  • One Way Distance from T1 to T2 is:

– Integral of the distance from points of T1 to T2 – Divided by the length of T1

  • Make it into symmetric measure

25/04/2014 DASFAA 2014 Tutorial 34

Bin Lin, Jianwen Su, One Way Distance: For Shape Based Similarity Search of Moving Object Trajectories. In Geoinformatica (2008)

slide-35
SLIDE 35

OWD example

  • Consider one trajectory as piece-wise line

segment, and the other as discrete samples

25/04/2014 DASFAA 2014 Tutorial 35

slide-36
SLIDE 36

LIP distance

  • Locality In-between Polylines

– Polygon is the set of polygons formed between intersection points –

25/04/2014 DASFAA 2014 Tutorial 36

Nikos Pelekis et al, Similarity Search in Trajectory Databases. Symposium on Temporal Representation and Reasoning 2007

slide-37
SLIDE 37

LIP distance

  • Only work for 2-dimensional trajectories
  • Polygon à polyhedron: non-trivial change

25/04/2014 DASFAA 2014 Tutorial 37

slide-38
SLIDE 38

Spatial-temporal similarity measure

  • All the spatial-only similarity measures can

incorporate time information in trajectories

  • Lp-norm and DTW can apply a temporal

constrain for more synchronized alignment

  • EDR and LCSS can apply a temporal threshold
  • n top of spatial threshold

25/04/2014 DASFAA 2014 Tutorial 38

slide-39
SLIDE 39

Classification

25/04/2014 DASFAA 2014 Tutorial 39

OWD LIP Synchronous Euclidean Distance Spatial-only Spatial-temporal Discrete Continuous Lp-norm DTW LCSS EDR DTW, LCSS, EDR with time constrain

slide-40
SLIDE 40

DTW with temporal constrain

25/04/2014 DASFAA 2014 Tutorial 40

10 15 13 9 10 Time tolerance = 2 7

slide-41
SLIDE 41

Spatial-temporal LCSS

  • A temporal threshold controls how far in time

we can go in order to match two locations

25/04/2014 DASFAA 2014 Tutorial 41

VLACHOS, M., GUNOPULOS, D., AND KOLLIOS, G. Discovering similar multidimensional trajectories. ICDE 2002

p1 p2 p3 p4 p5 p’1 p’2 p’3 t1=1 t2=2 t3=4 t4=7 t5=11 t’1=1 t’2=3 t’3=4

Time threshold=2

slide-42
SLIDE 42

Classification

25/04/2014 DASFAA 2014 Tutorial 42

DTW, LCSS, EDR with time constrain OWD LIP Synchronous Euclidean Distance Spatial-only Spatial-temporal Discrete Continuous Lp-norm DTW LCSS EDR

slide-43
SLIDE 43

SED

  • Synchronous Euclidean Distance

– Euclidean distance between locations at the same time instance of two trajectories

  • Regard trajectory as continuous function of

time

25/04/2014 DASFAA 2014 Tutorial 43

Mirco Nanni, Dino Pedreschi, Time-focused clustering of trajectories of moving objects. Journal of Intelligent Information Systems (2006) POTAMIAS, M., PATROUMPAS, K., AND SELLIS, T. K. Sampling trajectory streams with spatiotemporal criteria. SSDBM 2006

slide-44
SLIDE 44

SED

25/04/2014 DASFAA 2014 Tutorial 44

t=0 t=10 t=5 t=0 t=15 t=25 t=20 t=12 t=10 t=7 t=3 Virtually create a sample point at t=3

slide-45
SLIDE 45

Trajectory index

  • Similarity measures give the way to calculate

the distance between two trajectories

  • However …

– Huge amount of trajectories – Linear scan is inefficient

25/04/2014 DASFAA 2014 Tutorial 45

slide-46
SLIDE 46

Trajectory index

  • 3D R-tree
  • STR-tree (Spatio-temporal R-tree)
  • TB-tree (Trajecoty Bundle)
  • Multi-version R-tree

– Partition temporal dimension

  • Grid-based index

– Partition spatial dimension

25/04/2014 DASFAA 2014 Tutorial 46

slide-47
SLIDE 47

3D R-tree

  • Indexing position samples only

– Cannot answer queries about movements in- between those samples

  • Indexing line segments

25/04/2014 DASFAA 2014 Tutorial 47

x ¡ Time ¡ y ¡

slide-48
SLIDE 48

Problem of R-tree

  • Large “dead space”

– MBB covers a large portion of the space with no data – Low pruning power

  • Trajectory preservation

– Line segments are grouped merely by spatial proximity – Regardless which trajectory they belong to – Retrieving a trajectory requires visits of different paths in the tree

25/04/2014 DASFAA 2014 Tutorial 48

slide-49
SLIDE 49

Augmented 3D R-tree

  • Augment leaf node with orientation

information

  • Distance to line segments can be approximated

more accurately

25/04/2014 DASFAA 2014 Tutorial 49 [Pfoster 2000] Dieter Pfoster, Christian S. Jensen, Yannis T., Novel approaches to the indexing of moving object trajectories. VLDB, 2000

q

slide-50
SLIDE 50

STR-tree

  • Extension of augmented 3D R-tree
  • Insertion

– Try to keep the line segments belonging to the same trajectory together – Find the leaf node containing the predecessor

  • Node split

– Put disconnected segments into new node – Put the most recent backward-connected segment into new node

25/04/2014 DASFAA 2014 Tutorial 50 [Pfoster 2000] Dieter Pfoster, Christian S. Jensen, Yannis T., Novel approaches to the indexing of moving object trajectories. VLDB, 2000

slide-51
SLIDE 51

TB-tree (Trajectory Bundle)

  • A leaf node only contains segments belonging

to the same trajectory

  • Leaf nodes containing same trajectory are

linked

  • Strictly preserve trajectories

25/04/2014 DASFAA 2014 Tutorial 51 [Pfoster 2000] Dieter Pfoster, Christian S. Jensen, Yannis T., Novel approaches to the indexing of moving object trajectories. VLDB, 2000

slide-52
SLIDE 52

TB-tree (Trajectory Bundle)

25/04/2014 DASFAA 2014 Tutorial 52

slide-53
SLIDE 53

Reflection

  • Previous index treat spatial and temporal

dimensions equally

– 3D tree structure

  • In trajectory database, temporal dimension is

more dynamic

– New segments are appended to existing trajectories – Archived trajectories rarely update

  • Indexing spatial and temporal dimensions

separately

– Partition temporal dimension first – Partition spatial dimension first

25/04/2014 DASFAA 2014 Tutorial 53

slide-54
SLIDE 54

Multi-version R-tree

25/04/2014 DASFAA 2014 Tutorial 54

HR-­‑tree ¡

For ¡each ¡2mestamp, ¡an ¡R-­‑tree ¡is ¡

  • created. ¡So, ¡there ¡are ¡many ¡R-­‑trees. ¡

These ¡R-­‑trees ¡are ¡indexed. ¡ ¡

Query for trajectories in a given region and in a given time interval:

  • 1. The R-tree at the timestamp is found first
  • 2. The trajectories in the specified region are retrieved from the R-tree.

[Nascimento1998] Nascimento, M., Silva, J. Towards Historical R-trees. ACM SAC, 1998 [Tao2001a] Tao, Y., Papadias, D.: Efficient historical r-trees. In: ssdbm, p. 0223. Published by the IEEE Computer Society (2001) [Xu2005]Xu, X., Han, J., Lu, W.: Rt-tree: An improved r-tree indexing structure for temporal spatial databases. In: Int. Symp. on Spatial Data Handling, 2005 [Tao2001b] Tao, Y., Papadias, D.: Mv3r-tree: A spatio-temporal access method for timestamp and interval queries. In: VLDB, pp. 431-440 (2001)

slide-55
SLIDE 55

Grid-based index

  • Partition space into non-overlapping cells
  • Trajectory segments in each cell are indexed
  • n temporal dimension
  • Query processing:

– Spatial filtering – Temporal filtering

25/04/2014 DASFAA 2014 Tutorial 55

[Prasad2003] V. Prasad Chakka Adam C. Everspaugh Jignesh M., Patel, Indexing Large Trajectory Data Sets With SETI, CIDR 2003

slide-56
SLIDE 56

Grid-based index

25/04/2014 DASFAA 2014 Tutorial 56

[Prasad2003] V. Prasad Chakka Adam C. Everspaugh Jignesh M., Patel, Indexing Large Trajectory Data Sets With SETI, CIDR 2003

slide-57
SLIDE 57

Outline

  • Background

– What is trajectory – Where do they come from – Why are they useful – Characteristics

  • Trajectory similarity search

– Query classification – Trajectory similarity measures – Trajectory index

  • Similarity-based trajectory mining

– Popular route mining – Co-traveller discovery – Trajectory clustering

25/04/2014 DASFAA 2014 Tutorial 57

slide-58
SLIDE 58

Popular route mining

  • Shortest path may not be the favourable
  • Find the most popular/desirable path using the

GPS trajectories of past travellers

  • Classification

– No specific source/destination – hot route discovery – Specific source/destination – popular route search

25/04/2014 DASFAA 2014 Tutorial 58

slide-59
SLIDE 59

T-pattern

  • A set of individual trajectories that share the

property of visiting the same sequence of places with similar travel times

25/04/2014 DASFAA 2014 Tutorial 59

  • 1. Spatial discretization: discretize space into finite set of regions of interest (RoI)
  • 2. Translate trajectories into sequence of RoIs
  • 3. Adapt sequential pattern mining algorithms with time constrain
  • F. Giannotti, M. Nanni, F. Pinelli, and D. Pedreschi. Trajectory pattern mining. In SIGKDD, pages 330–339, 2007
slide-60
SLIDE 60

Periodic pattern

  • Find the repeat pattern for individual’s

trajectory

25/04/2014 DASFAA 2014 Tutorial 60

  • 1. Pre-defined and synchronized timestamps
  • 2. Adaptive region: density-based cluster
  • 3. Trajectories are translated to sequence of regions

at pre-defined time instances

  • N. Mamoulis, H. Cao, G. Kollios, M. Hadjieleftheriou, Y. Tao, and D. W. Cheung. Mining, indexing, and querying historical spatiotemporal
  • data. In SIGKDD, pages 236–245, 2004.
slide-61
SLIDE 61

Interesting travel sequence

  • A sequence of interesting locations that is

travelled by experienced drivers frequently

  • Interestingness of a location?

– How many experienced travellers visited it

  • Experience of a traveller?

– How many interesting locations she has visited

25/04/2014 DASFAA 2014 Tutorial 61

  • Y. Zheng, L. Zhang, X. Xie, and W.-Y. Ma. Mining interesting locations and travel sequences from gps trajectories. In WWW, pages 791–800,

2009.

slide-62
SLIDE 62

Interesting travel sequence

  • Hyperlink-Induced Topic Search (HITS)

25/04/2014 DASFAA 2014 Tutorial 62

slide-63
SLIDE 63

Popular route search

  • Given source/destination, find/estimate the

most popular route in between

  • Ideally, we can just count the number of

trajectories on different paths connecting the two locations

25/04/2014 DASFAA 2014 Tutorial 63

  • Z. Chen, H. T. Shen, and X. Zhou. Discovering popular routes from trajectories. In ICDE, pages 900–911, 2011
slide-64
SLIDE 64

Popular route discovery

  • In real scenarios, it’s not easy to find such

well-divided groups

  • Even worse, there’s no trajectory connecting

two locations at all!

25/04/2014 DASFAA 2014 Tutorial 64

slide-65
SLIDE 65

Popular route discovery

  • Construct a transfer network from raw trajectories

as an intermediate result to capture the moving behaviors between locations

– Node: cluster of turning points – Edge: trajectories passing two nodes

25/04/2014 DASFAA 2014 Tutorial 65

slide-66
SLIDE 66

Popular route discovery

  • Transfer probability
  • Find the route with the highest joint transfer

probability w.r.t. destination

25/04/2014 DASFAA 2014 Tutorial 66

slide-67
SLIDE 67

Find popular route from uncertain trajectories

  • Low-sampling-rate trajectories have high

degree of uncertainty

  • Uncertain trajectories are prevalent!
  • Is it possible to recover the original route given

a low-sampling-rate trajectory?

  • What if…

– There is a historical set of uncertain trajectories

25/04/2014 DASFAA 2014 Tutorial 67

Kai Zheng, Yu Zheng, Xing Xie and Xiaofang Zhou. Reducing Uncertainty of Low-Sampling-Rate Trajectories. ICDE 2012

slide-68
SLIDE 68

Find popular route from uncertain trajectories

  • Can we find popular routes for specific s/d

using low-sampling-rate trajectories

25/04/2014 DASFAA 2014 Tutorial 68

Infrequent samples on the same path can reinforce each other, and they collectively form a more ‘dense’ trajectory

slide-69
SLIDE 69

Find popular route from uncertain trajectories

  • Find PR for a given sequence of locations
  • Two-phase approach

– Local PR construction – Global PR search

25/04/2014 DASFAA 2014 Tutorial 69

slide-70
SLIDE 70

Find popular route from uncertain trajectories without road networks

  • A road network is not always available or

applicable

25/04/2014 DASFAA 2014 Tutorial 70

  • 1. Discretize space into disjoint cells
  • 2. Derive the transfer graph using cells
  • 3. Infer frequent ‘virtual edges’ on the graph

L.-Y. Wei, Y. Zheng, and W.-C. Peng. Constructing popular routes from uncertain trajectories. In ACM SIGKDD, pages 195–203, 2012.

slide-71
SLIDE 71

Time period-based most frequent path (TPMFP)

  • Find the most frequent path for specific s/d and

time period

  • Desired properties for a MFP

– Suffix optimal: suffix of a MFP is also a MFP – Length insensitive: MFP shouldn’t favor long/short route – Bottleneck free: MFP shouldn’t contain infrequent edges

25/04/2014 DASFAA 2014 Tutorial 71

Wuman Luo, Haoyu Tan, Lei Chen, Lionel M. Ni. Finding Time Period-Based Most Frequent Path in Big Trajectory Data. SIGMOD 2013

slide-72
SLIDE 72

Time period-based most frequent path (TPMFP)

  • Footmark graph

– A weighted sub-graph – Edge frequency: number of trajectories reaching vd during T – Path frequency: non-decreasingly sorted sequence of edge frequencies

25/04/2014 DASFAA 2014 Tutorial 72

v1 à v2: 14, v2 à v3: 10 v3 à v12: 10, v2 à v12: 8 V1 à v2 à v12: (8, 14) V1àv2àv3àv12: (10,10,14) V1àv10àv11àv12: (1,21,21)

slide-73
SLIDE 73

Time period-based most frequent path (TPMFP)

  • Compare path frequency

– More-frequent-than relation (>) – F > F’ if their first different value fj > fj’ – (10,10,14) > (8,14) > (1,21,21)

  • Property

– A total order, so MFP always exist – Guarantee the suffix optimal – Length of path doesn’t matter a lot – Path with infrequent edge frequency be disadvantaged

25/04/2014 DASFAA 2014 Tutorial 73

slide-74
SLIDE 74

PR search is more challenging

  • Specific source/destination (and time

constraint)

– Data are sparse – How to utilize more relevant data?

  • Online performance

– Efficiency is critical

25/04/2014 DASFAA 2014 Tutorial 74

slide-75
SLIDE 75

Summary

  • Background

– What is trajectory – Where do they come from – Why are they useful – Characteristics

  • Trajectory similarity search

– Query classification – Trajectory similarity measures – Trajectory index

  • Similarity-based trajectory mining

– Popular route mining – Co-traveller discovery – Trajectory clustering

25/04/2014 DASFAA 2014 Tutorial 75

slide-76
SLIDE 76

Thank you

  • Questions

– kevinz@itee.uq.edu.au

  • DKE group at UQ

– http://www.itee.uq.edu.au/dke/dke-lab

  • My research

– http://staff.itee.uq.edu.au/kevinz/

25/04/2014 DASFAA 2014 Tutorial 76