A Bayesian Hybrid Approach to Unsupervised Time Series - PowerPoint PPT Presentation

A Bayesian Hybrid Approach to Unsupervised Time Series Discretization Yoshitaka Kameya Gabriel Synnaeve Andrei Doncescu Tokyo Institute of Technology Grenoble University LAAS-CNRS Katsumi Inoue Taisuke Sato National Institute of Informatics Tokyo Institute of Technology 1 20/Nov/2010 TAAI-2010

Outline • Review: Unsupervised discretization of time series data – Preliminary experimental results • Hybrid discretization method based on variational Bayes • Experimental results • Summary and future work 20/Nov/2010 TAAI-2010 2

Discretization 3.2 medium 2.8 medium 0.1 low ... converts numeric data into symbolic data • 6.4 high ... ... ... is a preprocessing task in knowledge discovery • Interpretation/ Evaluation Data mining Knowledge Preprocessing Transformation Patterns ........ ......... ...... ............ .... .... ...... Selection Transformed data Preprocessed data Target data [Fayyad et al. 1995] ... may lead to noise reduction and a good data abstraction • – We wish to have interpretable discrete levels ... may help symbolic data mining • – Frequent pattern mining – Inductive logic programming 20/Nov/2010 TAAI-2010 3

Unsupervised discretization of time series data Common strategy: combined sequentially or simultaneously – Smoothing at the time (x) axis – Binning or clustering at the measurement (y) axis measurement Binning: • Equal width binning – Equal frequency binning – ... – Clustering: • Hierarchical clustering [Dimitrova et al. 05] – SAX K-means – time Gaussian mixture models [Mörchen et al. 05b] –  b a a b c c b c ... – Smoothing: • All-in-one methods: • Regression trees [Geurts 01] – Smoothing filters SAX [Lin et al. 07] – – • Moving averaging Persist [Mörchen et al. 05a] – • Savitzky-Golay filters [Mörchen et al. 05b] Continuous hidden Markov models – ... – [Mörchen et al. 05a] TAAI-2010 20/Nov/2010 4

Persist [Mörchen et al. 05a] Assumption: • Time series tries to stay at one of the discrete levels (= states) as long as possible Persist greedily chooses the breakpoints so that less state changes occur •  a role of smoothing Breakpoints state changes S 1 S 2 S 3 S 4 20/Nov/2010 TAAI-2010 5

Continuous hidden Markov models • Two-step procedure – Train the HMM – Find the most probable state sequence by the Viterbi algorithm  State sequence = Discrete time series discrete state S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 Gaussian output X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 (measurement) State 1 Positions and shapes of State 2 Gaussians State 3 are adjusted by EM Mean at State 1 Mean at State 2 Mean at State 3 20/Nov/2010 TAAI-2010 6

Preliminary experiment [Mörchen et al. 05] Comparison on the predictive performance among the discretizers • We used an artificial dataset called the “enduring - state” dataset • How well do the discretizers recover the answers? outliers noises – SAX – Persist – HMMs – Equal width binning (EQW) – Equal frequency binning (EQF) – Gaussian mixture model (GMM) 20/Nov/2010 TAAI-2010 7

Preliminary experiment (Cont’d) • Error analysis: Persist – Levels are correctly identified – However many noises go across the boundaries 5 levels 5 % outliers 20/Nov/2010 TAAI-2010 8

Preliminary experiment (Cont’d) • Error analysis: HMMs – Some levels are misidentified – Small noises are correctly smoothed 5 levels 5 % outliers 20/Nov/2010 TAAI-2010 9

Motivation • From preliminary experiments, we can see: – Persist : robust in identifying the discrete levels (because its heuristic score captures the global behavior of the time series) – HMMs : good at local smoothing Our proposal : Hybridization of heterogeneous discretizers based on variational Bayes 20/Nov/2010 TAAI-2010 10

Variational Bayes • Efficient technique for Bayesian learning [Beal 03] – Empirically known as robust against outliers – Gives a principled way of determining # of discrete levels An HMM is modeled as: p ( x , z , q ) = p ( q ) p ( x , z | q ) • – x : input time series – z : hidden state sequence z : S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 (discretized time series) x : X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 – q : parameters – p ( q ) : prior – p ( x , z | q ) : likelihood • Prior of means and variances in HMMs: Normal-Gamma distribution (conjugate prior) hyperparameters 20/Nov/2010 TAAI-2010 11

Variational Bayes (Cont’d) Variational Bayesian EM in general form: • – We try to find q = q * that maximizes the variational free energy F [ q ] : q ( , , ) p x z    q q [ ] ( , ) log F q q z d q  z ( , ) q z – F [ q ] is a lower bound of the marginal likelihood L ( x ) :     q ) q ( ) log ( ) log ( , , L x p x p x z d  z  F [ q *] is a good approximation of L ( x ) – To get q * , assuming q ( z , q ) ≈ q ( z ) q ( q ) , we iterate the two steps   alternately:   q q q VB - E step : ( ) exp ( ) log ( , | ) ) q z q p x z d     q  q q VB - M step : ( ) ( ) exp ( ) log ( , | ) q p q z p x z z – From L ( x ) – F [ q *] = KL( q *( z , q ), p ( z , q | x )) , q * is a good approximation of the posterior distribution and so used for discretization 20/Nov/2010 TAAI-2010 12

Hybridization We aim to control the HMM by the settings of t and m k The means of Gaussians are updated by: • Expected mean of the Gaussian for level k Expected counts of staying at level k Prior mean of the Gaussian for level k Weight (Pseudo count) We simply set • breakpoints where b k are the breakpoints S 1 S 2 obtained by Persist S 3 S 4 In a similar way, we can also • combine HMMs with SAX 1/3 breakpoints 1/3 1/3 20/Nov/2010 TAAI-2010 13

Experiment 1: “Enduring - state” dataset Weight t = 0.5, 1, 5, 10, 20, 50, 70, 100 raw time series (input) ratio of outliers 20/Nov/2010 TAAI-2010 14

Experiment 1: “Enduring - state” dataset Weight t = 0.5, 1, 5, 10, 20, 50, 70, 100 Under accuracy HMM+Persist is significantly better than Persist except several cases with a large # of levels and many outliers Under NMI HMM+Persist is significantly better than Persist for all cases according to Wilcoxon’s rank sum test ( p = 0.01) ratio of outliers 20/Nov/2010 TAAI-2010 19

Experiment 2: Background • Also based on [Mörchen et al. 05a] • Data on muscle activation of a professional inline speed skater – Nearly 30,000 points recorded in log-scale 20/Nov/2010 TAAI-2010 20

Experiment 2: Goal • Estimating a plausible # of discrete levels automatically with variational Bayes • An expert prefers to have 3 levels [Mörchen et al. 05a] Last kick to the ground to move forward Gliding phase (muscle is used to keep stability) 20/Nov/2010 TAAI-2010 21

Experiment 2: Settings • Having so many (30,000) data points, we need to: – Use large pseudo counts (  500) – Use PAA (used in SAX) to compress the time series (frame size = 50) 20/Nov/2010 TAAI-2010 22

Experiment 2: Discretization by CHMMs (Cont’d) • PAA disabled • Savitzky-Golay filter enabled with half-window size = 100 • Pseudo counts = 1 # of levels 20/Nov/2010 TAAI-2010 23

Experiment 2: Discretization by CHMMs (Cont’d) • PAA disabled • Pseudo counts = 1000 # of levels 20/Nov/2010 TAAI-2010 24

Experiment 2: Discretization by CHMMs (Cont’d) • PAA enabled with frame size = 10 • Pseudo counts = 1 # of levels 20/Nov/2010 TAAI-2010 25

Experiment 2: Discretization by CHMMs (Cont’d) • PAA enabled with frame size = 20 • Pseudo counts = 1 # of levels 20/Nov/2010 TAAI-2010 26

Summary • Unsupervised discretization of time series data • Hybridizing heterogeneous discritizers via variational Bayes – Fast approximate Bayesian inference – Robust against noises – Automatic finding of the plausible number of discrete levels Future work • Histogram-based discretizer 20/Nov/2010 TAAI-2010 27

A Bayesian Hybrid Approach to Unsupervised Time Series - PowerPoint PPT Presentation

A Bayesian Hybrid Approach to Unsupervised Time Series Discretization Yoshitaka Kameya Gabriel Synnaeve Andrei Doncescu Tokyo Institute of Technology Grenoble University LAAS-CNRS Katsumi Inoue Taisuke Sato National Institute of

Hybrid Construction Hybrid Construction Hybrid Construction Hybrid Construction 1 VP

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Web/CD Hybrid Model Web/CD Hybrid Model Web/CD Hybrid Model Web/CD Hybrid Model for t he Dist

Hybrid Automobiles Hybrid Automobiles It switches easily between fuel, batteries, or both It

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

EXPO REAL Hybrid Summit Your virtual exhibition EXPO REAL Hybrid Summit The Hybrid Conference

Model Predictive Control Model Predictive Control of Hybrid Systems of Hybrid Systems Model

Hybrid NLP Hybrid NLP O UTLINE O UTLINE Problems of Deep and Shallow Processing

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

A simple Bayesian regression model Alicia Johnson Associate Professor, Macalester College

Part 7 Bayesian hierarchical modelling, simulation and MCMC by Gero Walter 252 Bayesian

Case Study: Bayesian Linear Regression and Sparse Bayesian Models Piyush Rai Dept. of CSE, IIT

AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG)

Discretization and Symmetry Rob F. Remis and J orn T. Zimmerling DCSE Fall School, Delft,

PARALLEL SIMULATION IN TUNNEL ENGINEERING APPLICATION Hoang-Giang Bui and Gnther Meschke

The Discontinuous Galerkin Method for the Compressible Navier-Stokes Equations Miloslav

Brownian motion (cont.) 18.S995 - L05 http://web.mit.edu/mbuehler/www/research/f103.jpg Basic

Discretization of theory of Riemann surfaces and graphs of groups Maxim P. Limonov Sobolev

A Discrete-time approximation for reflected BSDEs related to switching problem J-F

On evaluating computational cost and approximation error in linear algebraic iterative solvers

Accuracy Considerations in RC Extraction for STA Aya Keller, Igor Keller TAU 2019 Monterey