SPIRAL: Efficient and Exact Model Identification for Hidden Markov - PowerPoint PPT Presentation

SPIRAL: Efficient and Exact Model Identification for Hidden Markov Models Yasuhiro Fujiwara (NTT Cyber Space Labs) Yasushi Sakurai (NTT Communication Science Labs) Masashi Yamamuro (NTT Cyber Space Labs) Speaker: Yasushi Sakurai 1

Motivation • HMM(Hidden Markov Model) – Mental task classification • Understand human brain functions with EEG signals – Biological analysis • Predict organisms functions with DNA sequences – Many other applications • Speech recognition, image processing, etc • Goal – Fast and exact identification of the highest-likelihood model for large datasets 2

Mini-introduction to HMM ( ) = • Observation sequence is a ! X x , x , , x 1 2 n probabilistic function of states • Consists of the three sets of parameters: { } ( ) – Initial state probability : p = p £ £ 1 i m i u = • State at time t 1 i { } ( ) – State transition probability: = £ £ a a 1 i , j m ij u u • Transition from state to j i ( ) { ( ) } ( ) – Symbol probability: = £ £ b v b v 1 i m i v u • Output symbol in state i 3

Mini-introduction to HMM • HMM types – Ergodic HMM • Every state can be reached from every other state – Left-right HMM • Transitions to lower number states are prohibited • Always begin with the first state • Transition are limited to a small number of states Ergodic HMM Left-right HMM 4

Mini-introduction to HMM • Viterbi path in the trellis structure – Trellis structure: states lie on the vertical axis, the sequence is aligned along the horizontal axis – Viterbi path: state sequence which gives the likelihood u m ・・ Viterbi path ・・ u 2 u 1 x x x 1 n 2 Trellis structure 5

Mini-introduction to HMM • Viterbi algorithm – Dynamic programming approach – Maximize the probabilities from the previous states ( ) = P max p in £ £ 1 i m ( ) ( ) ( ) × × £ £ ì max p a b x 2 t n ( ) - j t 1 ji i t = p í £ £ 1 j m ( ) ( ) it p × = b x t 1 î i i 1 p u t : the maximum probability of state at time it i 6

Problem Definition • Given – HMM dataset ( ) = – Sequence of arbitrary length X x , x , ! , x 1 2 n • Find – Highest-likelihood model, estimated with respect to X, from the dataset 7

Why not ‘Naive’ • Naïve solution 1. Compute the likelihood for every model using the Viterbi algorithm 2. Then choose the highest-likelihood model But.. ( ) 2 O nm – High search cost: time for every model • Prohibitive for large HMM datasets m : # of states n : sequence length of X 8

Our Solution, SPIRAL • Requirements: – High-speed search • Identify the model efficiently – Exactness • Accuracy is not sacrificed – No restriction on model type • Achieve high search performance for any type of models 9

Likelihood Approximation Reminder: Naive 10

Likelihood Approximation • Create compact models (reduce the model size) – For given m states and granularity g, – Create m / g states by merging ‘similar’ states m g m g n 11

Likelihood Approximation • Use the vector F i of state u i for clustering ( ( ) ( ) ) = p ! ! ! F ; a , , a , a , , a ; b v , , b v i i i 1 im 1 i mi i 1 i s s : number of symbols • Merge all the states u i in a cluster C and create a new state u C • Choose the highest probability among the probabilities of u i Obtain the upper ( ) ( ) ¢ ¢ p = p = bounding likelihood max a max a C i Cj ij Î Î Ï u C u C , u C i i j ( ) ( ) ( ) ( ( ) ) ¢ ¢ ¢ = = = a max a a max a b v max b v CC ik jC ji C i Î Î Ï Î u , u C u C , u C u C i k i j i 12

Likelihood Approximation P ¢ • Compute approximate likelihood from the compact model ( ) ¢ ¢ = P max p in ¢ £ £ 1 i m ( ) ( ) ( ) ¢ ¢ ¢ × × £ £ ì max p a b x 2 t n ( ) - j t 1 ji i t ¢ = p í ¢ £ £ 1 j m ( ) ( ) it ¢ ¢ p × = b x t 1 î p ¢ : maximum probability of states i i 1 it • Upper bounding likelihood P ¢ P ³ ' P – For approximate likelihood , holds – Exploit this property to guarantee exactness in search processing 13

Likelihood Approximation Advantages • The best model can not be pruned – The approximation gives the upper bounding likelihood of the original model • Support any model type – Any probabilistic constraint is not applied to the approximation 14

Multi-granularities • The likelihood approximation has the trade-off between accuracy and computation time – As the model size increases, accuracy improves – But the likelihood computation cost increases g • Q: How to choose granularity ? 15

Multi-granularities • The likelihood approximation has the trade-off between accuracy and computation time – As the model size increases, accuracy improves – But the likelihood computation cost increases g • Q: How to choose granularity ? • A: Use multiple granularities ( ) + = ë û – h 1 h log m distinct granularities that form a 2 geometric progression g i =2 i ( i =0,1,2,…, h ) – Geometrically increase the model size 16

Multi-granularities P ¢ • Compute the approximate likelihood from the coarsest model as the first step û ( ) ë = – Coarsest model has states h m 2 1 ¢ < q P • Prune the model if , otherwise q : threshold 17

Multi-granularities P ¢ • Compute the approximate likelihood from the second coarsest model ë û 2 - – Second coarsest model has states h 1 m ¢ < q P • Prune the model if 18

Multi-granularities q • Threshold – Exploit the fact that we have found a good model of high likelihood q • : exact likelihood of the best-so-far candidate during search processing q – is updated and increases when promising model is found q – Use for model pruning 19

Multi-granularities P ¢ • Compute the approximate likelihood from the second coarsest model ë û 2 - – Second coarsest model has states h 1 m ¢ < q P • Prune the model if , otherwise – q : exact likelihood of the best-so-far candidate 20

Multi-granularities P ¢ • Compute the likelihood from more accurate model ¢ < q P • Prune the model if 21

Multi-granularities • Repeat until the finest granularity (the original model) • Update the answer candidate and best-so-far ³ q P likelihood if 22

Multi-granularities • Optimize the trade-off between accuracy and computation time – Low-likelihood models are pruned by coarse-grained models – Fine-grained approximation is applied only to high- likelihood models • Efficiently find the best model for a large dataset – The exact likelihood computations are limited to the minimum number of necessary 23

Transition Pruning • Trellis structure has too many transitions • Q: How to exclude unlikely paths 24

Transition Pruning • Trellis structure has too many transitions • Q: How to exclude unlikely paths • A: Use the two properties – Likelihood is monotone non-increasing (likelihood computation) – Threshold is monotone non-decreasing (search processing) 25

Transition Pruning e • In likelihood computation, compute the estimate it ì ( ) ( n ( ) ) Õ - n t × × £ £ - ï p a b x 1 t n 1 = e it max max j í it = + j t 1 ( ) ï = p t n î in ( ) ( ) ( ) = = a max a , b v max b v where max ij max i £ £ £ £ 1 i , j m 1 i m – e it : conservative estimate of the likelihood p it of state u i at time t < q • If , prune all paths that pass through u i at t e it – q : exact likelihood of the best-so-far candidate 26

Transition Pruning • Terminate the likelihood computation if all the paths are excluded • Efficient especially for long sequences • Applicable to approximate likelihood computation 27

Accuracy and Complexity • SPIRAL needs the same order of memory space, 2 while can be up to times faster m Complexity Accuracy Memory Space Computation time ( ) 2 Viterbi O nm ( ) + 2 O m ms Guarantee exactness ( ) At least O n SPIRAL ( ) 2 O nm At most 28

Experimental Evaluation • Setup – Intel Core 2 1.66GHz, 2GB memory • Datasets – EEG, Chromosome, Traffic • Evaluation – Mainly computation time – Ergodic HMM – Compared the Viterbi algorithm and Beam search • Beam search: popular technique, but does not guarantee exactness 29

Experimental Evaluation • Evaluation – Wall clock time versus number of states – Wall clock time versus number of models – Effect of likelihood approximation – Effect of transition pruning – SPIRAL vs Beam search 30

Experimental Evaluation • Wall clock time versus number of states – EEG : up to 200 times faster 31

Experimental Evaluation • Wall clock time versus number of states – Chromosome : up to 150 times faster 32

SPIRAL: Efficient and Exact Model Identification for Hidden Markov - PowerPoint PPT Presentation

SPIRAL: Efficient and Exact Model Identification for Hidden Markov Models Yasuhiro Fujiwara (NTT Cyber Space Labs) Yasushi Sakurai (NTT Communication Science Labs) Masashi Yamamuro (NTT Cyber Space Labs) Speaker: Yasushi Sakurai 1 Motivation

SPIRAL Trial Switch from Protease Inhibitor to Raltegravir SPIRAL Trial: Study Design Study

Spiral Computer Generation of Performance Libraries Applications Jos M. F. Moura Markus

SPIRAL, FFTX, and the Path to SpectralPACK Franz Franchetti Carnegie Mellon University

Spiral Structure and Mass Inflows In Spiral Galaxies Yonghwi Kim, Woong-Tae Kim CEOU, Astronomy

Spiral Content Mapping Combinational Sequential System Level Implementation Spiral Theory

Spiral-CT Benjamin Keck 21. March 2006 1 Motivation Spiral-CT offers reconstruction of long

Spiral 2-1 Datapath Components: Counters Adders Design Example: Crosswalk Controller 2-1.2

Spiral Laminar Flow: A Revolution in Understanding? Professor Graeme Houston University of

High Assurance Spiral 18-847E Spiral: Formal Approaches to Hardware & Software Design &

Spiral 1 / Unit 1 Combinational vs. Sequential Logic Latency vs. Throughput (Pipelining) Digital

Algebraic Tools for Exact Geometric Computing I - Exact Arithmetic and Filtering Michael Hemmer

Notes on exact meets and joins R. N. Ball, J. Picado and A. Pultr 1 Exact meets and joins.

BEST OF EXACT GLOBE Jos Suijkens Michiel Beek Best of Exact Globe 2 Agenda 1. A fresh look for

Testing Density Wave Theory with Stellar Populations around Spiral Arms in M81 Yumi Choi 1

The Ekman Spiral We consider now an elegant method of describing the wind in the boundary layer,

POWER COUPLERS FOR Spiral 1 & 2 at GANIL (France) Yolanda GOMEZ MARTINEZ LPSC, UJF /

Pattern Recognition Part 8: Hidden Markov Models (HMMs) Gerhard Schmidt

Project 4, Question 2 3 The def elapseTime(self, gameState) function says: In order to obtain the

Reasoning Over Time [RN2] Sec 15.1-15.3, 15.5 [RN3] Sec 15.1-15.3, 15.5 CS 486/686 University

Hidden Markov models and dynamic programming Matthew Macauley Department of Mathematical Sciences

Introduction The classifiers weve looked at up to this point ignore the sequential aspects of

Hidden Markov Models User attention Medical monitoring Subhransu Maji Weather

Hidden Markov Models (Ch. 15) Announcements Homework 2 posted Programing: -Python (preferred)

Deep Neural Networks and Hidden Markov Models in i-vector-based Text-Dependent Speaker

SPIRAL: Efficient and Exact Model Identification for Hidden Markov - PowerPoint PPT Presentation

SPIRAL: Efficient and Exact Model Identification for Hidden Markov Models Yasuhiro Fujiwara (NTT Cyber Space Labs) Yasushi Sakurai (NTT Communication Science Labs) Masashi Yamamuro (NTT Cyber Space Labs) Speaker: Yasushi Sakurai 1 Motivation

SPIRAL Trial Switch from Protease Inhibitor to Raltegravir SPIRAL Trial: Study Design Study

Spiral Computer Generation of Performance Libraries Applications Jos M. F. Moura Markus

SPIRAL, FFTX, and the Path to SpectralPACK Franz Franchetti Carnegie Mellon University

Spiral Structure and Mass Inflows In Spiral Galaxies Yonghwi Kim, Woong-Tae Kim CEOU, Astronomy

Spiral Content Mapping Combinational Sequential System Level Implementation Spiral Theory

Spiral-CT Benjamin Keck 21. March 2006 1 Motivation Spiral-CT offers reconstruction of long

Spiral 2-1 Datapath Components: Counters Adders Design Example: Crosswalk Controller 2-1.2

Spiral Laminar Flow: A Revolution in Understanding? Professor Graeme Houston University of

High Assurance Spiral 18-847E Spiral: Formal Approaches to Hardware &amp; Software Design &amp;

Spiral 1 / Unit 1 Combinational vs. Sequential Logic Latency vs. Throughput (Pipelining) Digital

Algebraic Tools for Exact Geometric Computing I - Exact Arithmetic and Filtering Michael Hemmer

Notes on exact meets and joins R. N. Ball, J. Picado and A. Pultr 1 Exact meets and joins.

BEST OF EXACT GLOBE Jos Suijkens Michiel Beek Best of Exact Globe 2 Agenda 1. A fresh look for

Testing Density Wave Theory with Stellar Populations around Spiral Arms in M81 Yumi Choi 1

The Ekman Spiral We consider now an elegant method of describing the wind in the boundary layer,

POWER COUPLERS FOR Spiral 1 &amp; 2 at GANIL (France) Yolanda GOMEZ MARTINEZ LPSC, UJF /

Pattern Recognition Part 8: Hidden Markov Models (HMMs) Gerhard Schmidt

Project 4, Question 2 3 The def elapseTime(self, gameState) function says: In order to obtain the

Reasoning Over Time [RN2] Sec 15.1-15.3, 15.5 [RN3] Sec 15.1-15.3, 15.5 CS 486/686 University

Hidden Markov models and dynamic programming Matthew Macauley Department of Mathematical Sciences

Introduction The classifiers weve looked at up to this point ignore the sequential aspects of

Hidden Markov Models User attention Medical monitoring Subhransu Maji Weather

Hidden Markov Models (Ch. 15) Announcements Homework 2 posted Programing: -Python (preferred)

Deep Neural Networks and Hidden Markov Models in i-vector-based Text-Dependent Speaker

High Assurance Spiral 18-847E Spiral: Formal Approaches to Hardware & Software Design &

POWER COUPLERS FOR Spiral 1 & 2 at GANIL (France) Yolanda GOMEZ MARTINEZ LPSC, UJF /