Distance Measure for Querying Arrangements of Temporal Intervals - - PowerPoint PPT Presentation

distance measure for querying arrangements of temporal
SMART_READER_LITE
LIVE PREVIEW

Distance Measure for Querying Arrangements of Temporal Intervals - - PowerPoint PPT Presentation

Distance Measure for Querying Arrangements of Temporal Intervals, by Panagiotis Papapetrou. Distance Measure for Querying Arrangements of Temporal Intervals Orestis Kostakis, Panagiotis Papapetrou, and Jaakko Hollm en Department of


slide-1
SLIDE 1

Distance Measure for Querying Arrangements of Temporal Intervals, by Panagiotis Papapetrou.

Distance Measure for Querying Arrangements of Temporal Intervals

Orestis Kostakis, Panagiotis Papapetrou, and Jaakko Hollm´ en

Department of Information and Computer Science, Aalto University.

May 27, 2011

slide-2
SLIDE 2

Distance Measure for Querying Arrangements of Temporal Intervals, by Panagiotis Papapetrou.

Motivation

Sign Language similarity search

slide-3
SLIDE 3

Distance Measure for Querying Arrangements of Temporal Intervals, by Panagiotis Papapetrou.

Motivation

An expression in sign language contains a set of event-channels that are on or off over time. Each event is characterized by: a label: e.g., eye-brow raise. a duration, defined by a start and an end point.

Figure: An example of a Wh-question expressed in sign language.

slide-4
SLIDE 4

Distance Measure for Querying Arrangements of Temporal Intervals, by Panagiotis Papapetrou.

Motivation

Problem: How to assess the similarity of such representations?

C A B C

(a) (b)

A B

Figure: Two examples.

slide-5
SLIDE 5

Distance Measure for Querying Arrangements of Temporal Intervals, by Panagiotis Papapetrou.

Outline

Background Method Experiments Conclusions Discussion

slide-6
SLIDE 6

Distance Measure for Querying Arrangements of Temporal Intervals, by Panagiotis Papapetrou.

Background: Definitions

Sequences of interval-based events allow the representation of a wide range of real-world sequences. Formally, an e-sequence is defined as an ordered set S = {S1, . . . , Sn}, where each Si = (Ei, ti

start, ti end) is called an event-interval, Ei ∈ σ.

A B C C 3 1 4 7 15 19 23 30 42 D time

Figure: S = {(A, 1, 10), (B, 5, 13), (C, 17, 30), (A, 20, 26), (D, 24, 30)}

slide-7
SLIDE 7

Distance Measure for Querying Arrangements of Temporal Intervals, by Panagiotis Papapetrou.

Background: Related Work

Existing work (e.g., Papapetrou et al. 2009, Moerchen 2010, Hoeppner 2001) focuses mainly on: mining frequent patterns of interval-based events; mining association rules involving interval-based events; mining semi-interval partial order events. So far: no formulation of any type of robust distance or similarity metrics.

slide-8
SLIDE 8

Distance Measure for Querying Arrangements of Temporal Intervals, by Panagiotis Papapetrou.

Problem: Example

Problem: how to assess the similarity of two e-sequences?

C A B C

(a) (b)

A B

Figure: How similar are these two e-sequences?

slide-9
SLIDE 9

Distance Measure for Querying Arrangements of Temporal Intervals, by Panagiotis Papapetrou.

Problem: Formulation

Problem Formulation Given two e-sequences S and T , define a distance measure D, such that ∀S, T : D(S, T ) ≥ (1) D(S, S) = (2) D(S, T ) = D(T , S) (3) The degree to which the two e-sequences differ should be reflected in the value of D(S, T ) and should be in accordance with the knowledge obtained from domain experts.

slide-10
SLIDE 10

Distance Measure for Querying Arrangements of Temporal Intervals, by Panagiotis Papapetrou.

Problem: Solutions

Problem: how to assess the similarity of two e-sequences?

C A B C

(a) (b)

A B

Some options: map them to traditional sequences of instantaneous events?

slide-11
SLIDE 11

Distance Measure for Querying Arrangements of Temporal Intervals, by Panagiotis Papapetrou.

Problem: Solutions

Sequences of instantaneous events do not depict all the important information:

A A A A

Transforming the above sequences to sequences of instantaneous events would yield the same result: Astart, Astart, Aend, Aend.

slide-12
SLIDE 12

Distance Measure for Querying Arrangements of Temporal Intervals, by Panagiotis Papapetrou.

Problem: Solutions

Problem: how to assess the similarity of two e-sequences?

C A B C

(a) (b)

A B

Some options: map them to traditional sequences of instantaneous events? × compare event-labels? √ compare event-interval relations? √

slide-13
SLIDE 13

Distance Measure for Querying Arrangements of Temporal Intervals, by Panagiotis Papapetrou.

Problem: Solutions

Problem: how to assess the similarity of two e-sequences?

C A B C

(a) (b)

A B

what about event durations? for simplicity we ignore them. arrangement: an e-sequence where start and end “tags” are dropped [Papapetrou et al. 2009].

slide-14
SLIDE 14

Distance Measure for Querying Arrangements of Temporal Intervals, by Panagiotis Papapetrou.

Method: Key Idea

Our approach: Focus on the relations between pairs of intervals.

A B A B A B A B A B A B A B Follow(A,B) Meet(A,B) Overlap(A,B) Match(A,B) Right Contain(A,B) Left Contain(A,B) Contain(A,B)

Figure: Allen’s temporal model [Allen et al. 1983].

slide-15
SLIDE 15

Distance Measure for Querying Arrangements of Temporal Intervals, by Panagiotis Papapetrou.

Method: Relation Matrix

The solution: Given an event interval sequence S Create its relation matrix MA

relation {A,A} {A,B} {B,A} {B,B} meet 1 match 1

  • verlap

1 2 1 contain left-contain right-contain follow

slide-16
SLIDE 16

Distance Measure for Querying Arrangements of Temporal Intervals, by Panagiotis Papapetrou.

Method: Distance

Arrangement Distance δp(A, B) =  

|I|

  • i=1

|σ|2

  • j=1

|MA(i, j) − MB(i, j)|p  

1 p

, p ∈ N∗ (4) Question: What would be a suitable value for p? Manhattan Distance For p = 1, Eq. 8 corresponds to the Manhattan distance. δ1(A, B) =

|I|

  • i=1

|σ|2

  • j=1

|MA(i, j) − MB(i, j)| (5)

slide-17
SLIDE 17

Distance Measure for Querying Arrangements of Temporal Intervals, by Panagiotis Papapetrou.

Method: Distance

Arrangement Distance δp(A, B) =  

|I|

  • i=1

|σ|2

  • j=1

|MA(i, j) − MB(i, j)|p  

1 p

, p ∈ N∗ (6) Question: What would be a suitable value for p? Frobenius Norm For p = 2, Eq. 8 corresponds to the Frobenius norm of MA − MB: δ2(A, B) =

  • |I|
  • i=1

|σ|2

  • j=1

|MA(i, j) − MB(i, j)|2 (7)

slide-18
SLIDE 18

Distance Measure for Querying Arrangements of Temporal Intervals, by Panagiotis Papapetrou.

Method: Distance

Normalized arrangement distance δnorm(A, B) =

|I|

  • i=1

|σ|2

  • j=1

|MA(i, j) − MB(i, j)| MA(i, j) + MB(i, j) (8) based on the L1 norm. normalized over the total possible # of relations where A and B can differ. non-metric:

δnorm(A, B) if-f A = B (identity of the indiscernibles) is violated.

slide-19
SLIDE 19

Distance Measure for Querying Arrangements of Temporal Intervals, by Panagiotis Papapetrou.

Experiments: ASL Dataset

SignStream Database: by the National Center for Sign Language and Gesture Resources at Boston University. # of e-sequences: 873. # of intervals: 15675. Min size: 4. Max size: 41. Average size: 18. Labels: 216. Classes: 5.

slide-20
SLIDE 20

Distance Measure for Querying Arrangements of Temporal Intervals, by Panagiotis Papapetrou.

Experiments: Setup

We tested: robustness against artificial noise. classification accuracy. Artificial noise: shift probability s: each event-interval in S is shifted with probability s. distortion level d: the start point of each event-interval was shifted by ±d%|S|.

slide-21
SLIDE 21

Distance Measure for Querying Arrangements of Temporal Intervals, by Panagiotis Papapetrou.

Experiments: Robustness

Robustness to noise: we compared the Normalized, the Manhattan, and the Frobenius distance in terms of:

A nearest neighbor retrieval accuracy: the fraction of noisy

queries for which the originating sequence is retrieved.

B rank of nearest neighbor: for each query, the number of

database sequences with distance less than or equal to that of the originating counterpart.

slide-22
SLIDE 22

Distance Measure for Querying Arrangements of Temporal Intervals, by Panagiotis Papapetrou.

Experiments: Robustness

0.1 0.2 0.3 0.4 0.5 0.5 0.6 0.7 0.8 0.9 1

Distortion Retrieval accuracy

Probability 0.2 Probability 0.4 Probability 0.6 Probability 0.8 Probability 1.0

(a) Manhattan

0.1 0.2 0.3 0.4 0.5 0.95 0.96 0.97 0.98 0.99 1

Distortion Retrieval accuracy

Probability 0.2 Probability 0.4 Probability 0.6 Probability 0.8 Probability 1.0

(b) Normalized

Figure: Retrieval accuracy: success ratio of matching the noisy sequences to their original counterpart.

slide-23
SLIDE 23

Distance Measure for Querying Arrangements of Temporal Intervals, by Panagiotis Papapetrou.

Experiments: Robustness

0.02 0.04 0.06 0.08 0.1 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

Rank of NN, Ratio Database Ratio

Normalized Frobenius Manhattan

(a) Probability 0.6, distortion 50%

0.02 0.04 0.06 0.08 0.1 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

Rank of NN, Ratio Database Ratio

Normalized Frobenius Manhattan

(b) Probability 1.0, distortion 50%

Figure: Comparison of the cumulative histograms for the rank of the 1-NN for each distance measure. Ranks are denoted as a ratio of the database size.

slide-24
SLIDE 24

Distance Measure for Querying Arrangements of Temporal Intervals, by Panagiotis Papapetrou.

Experiments: 1-NN Classification Accuracy

1-NN classification accuracy: the fraction of e-sequences for which their class is the same as that of their 1-NN. Data: # of classes: 5. # of e-sequences: 873. 1-NN Classification Accuracy ≈ 88%.

slide-25
SLIDE 25

Distance Measure for Querying Arrangements of Temporal Intervals, by Panagiotis Papapetrou.

Conclusions

Reduced a problem related to assistive environments, to the problem of comparing temporal interval sequences. Proposed a distance measure for temporal interval sequences, which allows to quantify similarity among sequences. The distance measure relies on creating Relation matrices and creating the matrices. Experimented with three methods to compare sequences. One of the methods proved very robust against artificial noise.

slide-26
SLIDE 26

Distance Measure for Querying Arrangements of Temporal Intervals, by Panagiotis Papapetrou.

Directions for Future work

Formulate more robust distance metrics (ongoing work): Examine the applicability of temporal interval sequences in

  • ther domains.

Evaluate the proposed distance measure in typical machine learning tasks (e.g., clustering). Study the subsequence matching problem for e-sequences. Build an auto-complete recommendation system (like Google) for e-sequences.