A Bayesian Hybrid Approach to Unsupervised Time Series - - PowerPoint PPT Presentation

a bayesian hybrid approach to
SMART_READER_LITE
LIVE PREVIEW

A Bayesian Hybrid Approach to Unsupervised Time Series - - PowerPoint PPT Presentation

A Bayesian Hybrid Approach to Unsupervised Time Series Discretization Yoshitaka Kameya Gabriel Synnaeve Andrei Doncescu Tokyo Institute of Technology Grenoble University LAAS-CNRS Katsumi Inoue Taisuke Sato National Institute of


slide-1
SLIDE 1

A Bayesian Hybrid Approach to Unsupervised Time Series Discretization

Yoshitaka Kameya

Tokyo Institute of Technology

20/Nov/2010 1 TAAI-2010

Gabriel Synnaeve

Grenoble University

Andrei Doncescu

LAAS-CNRS

Katsumi Inoue

National Institute of Informatics

Taisuke Sato

Tokyo Institute of Technology

slide-2
SLIDE 2

Outline

  • Review: Unsupervised discretization of time series data

– Preliminary experimental results

  • Hybrid discretization method based on variational Bayes
  • Experimental results
  • Summary and future work

20/Nov/2010 2 TAAI-2010

slide-3
SLIDE 3

Discretization

  • ... converts numeric data into symbolic data
  • ... is a preprocessing task in knowledge discovery
  • ... may lead to noise reduction and a good data abstraction

– We wish to have interpretable discrete levels

  • ... may help symbolic data mining

– Frequent pattern mining – Inductive logic programming

20/Nov/2010 3

3.2 2.8 0.1 6.4 ... medium medium low high ...

TAAI-2010

........ ......... ...... ............ .... .... ......

Selection

Preprocessing

Transformation Data mining Interpretation/ Evaluation Target data Preprocessed data Transformed data Patterns Knowledge [Fayyad et al. 1995]

slide-4
SLIDE 4

Unsupervised discretization of time series data

  • Binning:

– Equal width binning – Equal frequency binning – ...

  • Clustering:

– Hierarchical clustering [Dimitrova et al. 05] – K-means – Gaussian mixture models [Mörchen et al. 05b] – ...

  • Smoothing:

– Regression trees [Geurts 01] – Smoothing filters

  • Moving averaging
  • Savitzky-Golay filters [Mörchen et al. 05b]

– ...

  • All-in-one methods:

– SAX [Lin et al. 07] – Persist [Mörchen et al. 05a] – Continuous hidden Markov models

[Mörchen et al. 05a] 20/Nov/2010 4 TAAI-2010

Common strategy:

– Smoothing at the time (x) axis – Binning or clustering at the measurement (y) axis

combined sequentially

  • r simultaneously

SAX

time measurement

 b a a b c c b c

slide-5
SLIDE 5

Persist [Mörchen et al. 05a]

  • Assumption:

Time series tries to stay at one of the discrete levels (= states) as long as possible

  • Persist greedily chooses the breakpoints so that less state changes occur

 a role of smoothing

20/Nov/2010 5

Breakpoints

S1 S2 S3 S4

state changes

TAAI-2010

slide-6
SLIDE 6

Continuous hidden Markov models

  • Two-step procedure

– Train the HMM – Find the most probable state sequence by the Viterbi algorithm  State sequence = Discrete time series

20/Nov/2010 6

Positions and shapes of Gaussians are adjusted by EM

X1 X2 X3 X4 X5 X6 X7 X8 S1 S2 S3 S4 S5 S6 S7 S8 State 1 State 2 State 3 Mean at State 1 Mean at State 2 Mean at State 3

TAAI-2010

discrete state Gaussian

  • utput

(measurement)

slide-7
SLIDE 7

Preliminary experiment [Mörchen et al. 05]

  • Comparison on the predictive performance among the discretizers
  • We used an artificial dataset called the “enduring-state” dataset

20/Nov/2010 7

How well do the discretizers recover the answers?

– SAX – Persist – HMMs – Equal width binning (EQW) – Equal frequency binning (EQF) – Gaussian mixture model (GMM)

TAAI-2010

noises

  • utliers
slide-8
SLIDE 8

Preliminary experiment (Cont’d)

20/Nov/2010 8

  • Error analysis: Persist

– Levels are correctly identified – However many noises go across the boundaries

5 levels 5 % outliers

TAAI-2010

slide-9
SLIDE 9

Preliminary experiment (Cont’d)

20/Nov/2010 9

  • Error analysis: HMMs

– Some levels are misidentified – Small noises are correctly smoothed

5 levels 5 % outliers

TAAI-2010

slide-10
SLIDE 10
  • From preliminary experiments, we can see:

– Persist: robust in identifying the discrete levels (because its heuristic score captures the global behavior

  • f the time series)

– HMMs: good at local smoothing

Motivation

20/Nov/2010 10 TAAI-2010

Our proposal: Hybridization of heterogeneous discretizers based on variational Bayes

slide-11
SLIDE 11
  • Efficient technique for Bayesian learning [Beal 03]

– Empirically known as robust against outliers – Gives a principled way of determining # of discrete levels

  • An HMM is modeled as: p(x, z,q ) = p(q ) p(x, z | q )

– x: input time series – z: hidden state sequence (discretized time series) – q : parameters – p(q ): prior – p(x, z | q ): likelihood

  • Prior of means and variances in HMMs:

Variational Bayes

20/Nov/2010 11 X1 X2 X3 X4 X5 X6 X7 X8 S1 S2 S3 S4 S5 S6 S7 S8

x: z:

TAAI-2010

Normal-Gamma distribution (conjugate prior)

hyperparameters

slide-12
SLIDE 12
  • Variational Bayesian EM in general form:

– We try to find q = q* that maximizes the variational free energy F[q]: – F[q] is a lower bound of the marginal likelihood L(x):  F[q*] is a good approximation of L(x) – To get q*, assuming q(z,q ) ≈ q(z)q(q ), we iterate the two steps alternately: – From L(x) – F[q*] = KL(q*(z,q ), p(z,q | x)), q* is a good approximation

  • f the posterior distribution and so used for discretization

Variational Bayes (Cont’d)

20/Nov/2010 12

 

 

 

 

 z

z x p z q p q d z x p q z q ) | , ( log ) ( exp ) ( ) ( : step M

  • VB

) ) | , ( log ) ( exp ) ( : step E

  • VB

q q q q q q

 

z

d z q z x p z q q F q q q q ) , ( ) , , ( log ) , ( ] [

 

 

z

d z x p x p x L q q) , , ( log ) ( log ) (

TAAI-2010

slide-13
SLIDE 13
  • The means of Gaussians are updated by:
  • We simply set

where bk are the breakpoints

  • btained by Persist
  • In a similar way, we can also

combine HMMs with SAX

breakpoints

1/3 1/3 1/3

Hybridization

20/Nov/2010 13

breakpoints S1 S2 S3 S4

TAAI-2010

Prior mean of the Gaussian for level k Weight (Pseudo count) Expected counts of staying at level k Expected mean of the Gaussian for level k

We aim to control the HMM by the settings of t and mk

slide-14
SLIDE 14

Experiment 1: “Enduring-state” dataset

20/Nov/2010 14

Weight t = 0.5, 1, 5, 10, 20, 50, 70, 100

TAAI-2010

ratio of outliers

raw time series (input)

slide-15
SLIDE 15

Experiment 1: “Enduring-state” dataset

20/Nov/2010 15

Weight t = 0.5, 1, 5, 10, 20, 50, 70, 100

TAAI-2010

ratio of outliers

raw time series (input)

slide-16
SLIDE 16

Experiment 1: “Enduring-state” dataset

20/Nov/2010 16

Weight t = 0.5, 1, 5, 10, 20, 50, 70, 100

TAAI-2010

ratio of outliers

raw time series (input)

slide-17
SLIDE 17

Experiment 1: “Enduring-state” dataset

20/Nov/2010 17

Weight t = 0.5, 1, 5, 10, 20, 50, 70, 100

TAAI-2010

ratio of outliers

raw time series (input)

slide-18
SLIDE 18

Experiment 1: “Enduring-state” dataset

20/Nov/2010 18

Weight t = 0.5, 1, 5, 10, 20, 50, 70, 100

TAAI-2010

ratio of outliers

raw time series (input)

slide-19
SLIDE 19

Experiment 1: “Enduring-state” dataset

20/Nov/2010 19

Weight t = 0.5, 1, 5, 10, 20, 50, 70, 100

Under accuracy HMM+Persist is significantly better than Persist except several cases with a large # of levels and many outliers Under NMI HMM+Persist is significantly better than Persist for all cases according to Wilcoxon’s rank sum test (p = 0.01)

TAAI-2010

ratio of outliers

slide-20
SLIDE 20

Experiment 2: Background

  • Also based on [Mörchen et al. 05a]
  • Data on muscle activation of a professional inline speed skater

– Nearly 30,000 points recorded in log-scale

20/Nov/2010 TAAI-2010 20

slide-21
SLIDE 21
  • Estimating a plausible # of discrete levels automatically

with variational Bayes

  • An expert prefers to have 3 levels [Mörchen et al. 05a]

Experiment 2: Goal

20/Nov/2010 TAAI-2010 21

Last kick to the ground to move forward Gliding phase (muscle is used to keep stability)

slide-22
SLIDE 22

Experiment 2: Settings

  • Having so many (30,000) data points, we need to:

– Use large pseudo counts ( 500) – Use PAA (used in SAX) to compress the time series

20/Nov/2010 TAAI-2010 22

(frame size = 50)

slide-23
SLIDE 23

Experiment 2: Discretization by CHMMs (Cont’d)

  • PAA disabled
  • Savitzky-Golay filter enabled with half-window size = 100
  • Pseudo counts = 1

20/Nov/2010 TAAI-2010 23

# of levels

slide-24
SLIDE 24

Experiment 2: Discretization by CHMMs (Cont’d)

  • PAA disabled
  • Pseudo counts = 1000

20/Nov/2010 TAAI-2010 24

# of levels

slide-25
SLIDE 25

Experiment 2: Discretization by CHMMs (Cont’d)

  • PAA enabled with frame size = 10
  • Pseudo counts = 1

20/Nov/2010 TAAI-2010 25

# of levels

slide-26
SLIDE 26

Experiment 2: Discretization by CHMMs (Cont’d)

  • PAA enabled with frame size = 20
  • Pseudo counts = 1

20/Nov/2010 TAAI-2010 26

# of levels

slide-27
SLIDE 27

Summary

  • Unsupervised discretization of time series data
  • Hybridizing heterogeneous discritizers via variational Bayes

– Fast approximate Bayesian inference – Robust against noises – Automatic finding of the plausible number of discrete levels

20/Nov/2010 TAAI-2010 27

Future work

  • Histogram-based discretizer