Uncertain Time-Series Similarity: Return to the Basics Dallachiesa - - PowerPoint PPT Presentation

uncertain time series similarity return to the basics
SMART_READER_LITE
LIVE PREVIEW

Uncertain Time-Series Similarity: Return to the Basics Dallachiesa - - PowerPoint PPT Presentation

Uncertain Time-Series Similarity: Return to the Basics Dallachiesa et al., VLDB 2012 Li Xiong, CS730 Problem Problem: uncertain time-series similarity Applications: location tracking of moving objects; traffic monitoring; remote


slide-1
SLIDE 1

Uncertain Time-Series Similarity: Return to the Basics

Dallachiesa et al., VLDB 2012 Li Xiong, CS730

slide-2
SLIDE 2

Problem

  • Problem: uncertain time-series similarity
  • Applications:

– location tracking of moving objects; traffic monitoring; remote sensing

  • Uncertain time-series is pervasive

– Imprecision of sensor observations – Privacy preserving transformations

  • Similarity matching is basis for many analysis and

mining

– Clustering – Shapelet – Motif – …

slide-3
SLIDE 3

Overview

  • Review of 3 state-of-art techniques for similarity

matching in uncertain time series

– MUNICH, PROUD, DUST

  • Experimental comparison of the techniques for

similarity matching on 17 real (perturbed) datasets

  • Two additional (simple) similarity measures

which unexpectedly outperforms the state-of-art

  • Discussion of research directions
slide-4
SLIDE 4

Modeling/Representing uncertain time-series

  • Repeated measurements (samples)
  • Probability density function (pdf) over the

uncertain values

slide-5
SLIDE 5

Modeling/Representing uncertain time-series

  • Repeated measurements (samples)
slide-6
SLIDE 6

Modeling/Representing uncertain time-series

  • Probability density function (pdf) over the

uncertain values

slide-7
SLIDE 7

Similarity metrics

  • Euclidean Distance (ED)
  • Dynamic Time Warping (DTW)
slide-8
SLIDE 8

Similarity based range query

  • Range query: given a collection of time-series

C, a query sequence Q, find similar series S in C

  • Probabilistic range query
slide-9
SLIDE 9

State-of-the-Art

  • MUNICH

– Repeated observation model

  • PROUD

– Random variable model

  • DUST

– Random variable model

slide-10
SLIDE 10

MUNICH

  • Repeated observation model
  • Euclidean distance (Lp-norm) and Dynamic

Time Warping (DTW)

10 21/2/2011

slide-11
SLIDE 11

MUNICH

  • Materialize uncertain sequences X and Y to all

possible certain sequences

  • Define the set of distances between all

possible sequences

  • Uncertain distance
slide-12
SLIDE 12

MUNICH

  • Naïve Computation: exponential computation

cost (note the typo)

12 CAO Chen, DB Group, CSE, HKUST 21/2/2011

slide-13
SLIDE 13

MUNICH

  • Lower bounding and upper bounding the

distance/probability

  • Approximate the samples using minimum

bounding intervals

slide-14
SLIDE 14

MUNICH

  • Minimum bounding interval
slide-15
SLIDE 15

MUNICH

  • Compute upper bound and lower bound of

distances between all possible interval sequences

slide-16
SLIDE 16

MUNICH

  • Recall uncertain distance and probabilistic

range query

  • Compute lower bound and upper bound for Pr
slide-17
SLIDE 17

MUNICH

  • Pruning based on lower and upper bound
  • Stepwise refinement

True Hit True Drop

slide-18
SLIDE 18

PROUD

  • Pdf model and Euclidean distance
  • Probabilistic distance model
slide-19
SLIDE 19

PROUD

  • Probabilistic distance model
  • The distance approaches a normal distribution

when number of time points sufficiently large (central limit theorem)

slide-20
SLIDE 20

PROUD

  • Recall probabilistic range query
  • CDF of normal distribution expressed as error

function and compute

  • Compute normalized epsilon and test
slide-21
SLIDE 21

DUST

  • Probability model
  • DUST similarity metric
  • Bayesian probability computation
slide-22
SLIDE 22

DUST: A Generalized Notion of Similarity between Uncertain Time Series

Smruti R. Sarangi and Karin Murthy IBM Research Labs, Bangalore, India

slide-23
SLIDE 23

Resolving the Question

  • T2 should be closer to T1 than T3

– This is because it is possible that T2 and T1 are the same time series. T2 just has some additional error. – T3 and T1 can never be the same time series because the last value has a very large divergence

23

T1 T2 T3 time

value

T2 or T3 ???

Euclidean distance (EUCL) and Dynamic Time Warping (DTW)

T3 DUST T2

slide-24
SLIDE 24

Extending Prior Work

24

Two time series are considered similar if : P(DIST(T1,T2) ≤ ε) ≥ τ DIST(T1, T2) = sqrt(Σi dist(T1[i], T2[i])2) dist(x,y) = |x-y| Assumption P(DIST(T1,T2) ≤ ε) = p(DIST(T1,T2) = 0) ε (irrespective of the size of ε) Prior Work

slide-25
SLIDE 25

25

  • log (φ(|T1[i] – T2[i]|)

Some Algebra

P(DIST(T1,T2) ≤ ε) > P(DIST(T1,T3) ≤ ε) p(DIST(T1,T2) = 0) > p(DIST(T1,T3) = 0) Πi p(dist(T1[i], T2[i]) = 0) > Πi p(dist(T1[i], T3[i]) = 0) Σi –log(p(dist(T1[i], T2[i]) = 0)) ≤ Σi –log(p(dist(T1[i], T3[i]) = 0))

φ(x) = p(dist(0,x) = 0)

dist(x,y) is only dependent on |x-y| proved in the paper

dust(x,y) = -log(φ(|x-y|)) + log(φ(0)

Definition

slide-26
SLIDE 26

DUST

  • Compute
  • Bayes Theorem
  • Require

– Data distribution (uniform) – Error distribution

slide-27
SLIDE 27

Comparison

  • Common assumption: value at each

timestamp independent

– Correlations neglected

slide-28
SLIDE 28

Comparison

MUNICH PROUD DUST Uncertainty modeling Multiple

  • bservations

Random variable Random variable A priori knowledge Mean and standard deviation Data distribution and error distribution Distance metric Euclidean, DTW Euclidean DUST, Euclidean, DTW Similarity queries Probabilistic range queries Probabilistic range queries kNN queries

slide-29
SLIDE 29

Experimental Study

  • Data

– 17 real datasets from UCR: time series with exact values as ground truth – (not real) Perturbation with uniform, normal and exponential error distributions

  • Similarity matching: probabilistic range

queries

  • Metric: F1 metric
  • Baseline: Euclidean distance
slide-30
SLIDE 30

Moving average filters

  • Uncertain moving average (UMA) – weigh less

the observations with larger errror standard deviation

  • Uncertain exponential moving average

(UEMA) – weigh more the nearest neighbors

slide-31
SLIDE 31
slide-32
SLIDE 32
slide-33
SLIDE 33

Discussion

  • Experiment and Analysis track paper
  • Good analytical and experimental survey
  • Unexpected results
slide-34
SLIDE 34

Discussion

  • What’s realistic prior knowledge to assume?
  • How to model correlations between time

points?