SPIRAL: Efficient and Exact Model Identification for Hidden Markov - - PowerPoint PPT Presentation

spiral efficient and exact model identification for
SMART_READER_LITE
LIVE PREVIEW

SPIRAL: Efficient and Exact Model Identification for Hidden Markov - - PowerPoint PPT Presentation

SPIRAL: Efficient and Exact Model Identification for Hidden Markov Models Yasuhiro Fujiwara (NTT Cyber Space Labs) Yasushi Sakurai (NTT Communication Science Labs) Masashi Yamamuro (NTT Cyber Space Labs) Speaker: Yasushi Sakurai 1 Motivation


slide-1
SLIDE 1

SPIRAL: Efficient and Exact Model Identification for Hidden Markov Models

Yasuhiro Fujiwara (NTT Cyber Space Labs) Yasushi Sakurai (NTT Communication Science Labs) Masashi Yamamuro (NTT Cyber Space Labs)

Speaker: Yasushi Sakurai

1

slide-2
SLIDE 2

Motivation

  • HMM(Hidden Markov Model)

– Mental task classification

  • Understand human brain functions with EEG signals

– Biological analysis

  • Predict organisms functions with DNA sequences

– Many other applications

  • Speech recognition, image processing, etc
  • Goal

– Fast and exact identification of the highest-likelihood model for large datasets

2

slide-3
SLIDE 3

Mini-introduction to HMM

  • Observation sequence is a

probabilistic function of states

  • Consists of the three sets of parameters:

– Initial state probability :

  • State at time

– State transition probability:

  • Transition from state to

– Symbol probability:

  • Output symbol in state

3

{ } ( )

m i

i

£ £ = 1 p p

i

u

1 = t

{ } (

)

m j i a a

ij

£ £ = , 1

i

u

j

u

( ) ( ) { } ( )

m i v b v b

i

£ £ = 1

v

i

u

( )

n

x x x X , , ,

2 1

! =

slide-4
SLIDE 4

Mini-introduction to HMM

  • HMM types

– Ergodic HMM

  • Every state can be reached from every other state

– Left-right HMM

  • Transitions to lower number states are prohibited
  • Always begin with the first state
  • Transition are limited to a small number of states

4

Ergodic HMM Left-right HMM

slide-5
SLIDE 5

Mini-introduction to HMM

5

  • Viterbi path in the trellis structure

– Trellis structure: states lie on the vertical axis, the sequence is aligned along the horizontal axis – Viterbi path: state sequence which gives the likelihood

Viterbi path Trellis structure

1

u

2

u

m

u

1

x

2

x

n

x

・ ・ ・ ・

slide-6
SLIDE 6

Mini-introduction to HMM

  • Viterbi algorithm

– Dynamic programming approach – Maximize the probabilities from the previous states

6

( )

( )

( )

( ) ( ) ( ) ( )

î í ì = × £ £ × × = =

  • £

£ £ £

1 2 max max

1 1 1 1

t x b n t x b a p p p P

i i t i ji t j m j it in m i

p

: the maximum probability of state at time

it

p

i

u t

slide-7
SLIDE 7

Problem Definition

  • Given

– HMM dataset – Sequence of arbitrary length

  • Find

– Highest-likelihood model, estimated with respect to X, from the dataset

7

( )

n

x x x X , , ,

2 1

! =

slide-8
SLIDE 8

Why not ‘Naive’

  • Naïve solution
  • 1. Compute the likelihood for every model using the Viterbi

algorithm

  • 2. Then choose the highest-likelihood model

But..

– High search cost: time for every model

  • Prohibitive for large HMM datasets

8

( )

2

nm O

m: # of states n: sequence length of X

slide-9
SLIDE 9

Our Solution, SPIRAL

  • Requirements:

– High-speed search

  • Identify the model efficiently

– Exactness

  • Accuracy is not sacrificed

– No restriction on model type

  • Achieve high search performance for any type of models

9

slide-10
SLIDE 10

Likelihood Approximation

10

Reminder: Naive

slide-11
SLIDE 11

Likelihood Approximation

  • Create compact models (reduce the model size)

– For given m states and granularity g, – Create m/g states by merging ‘similar’ states

11

g m g m

n

slide-12
SLIDE 12

Likelihood Approximation

  • Use the vector Fi of state ui for clustering
  • Merge all the states ui in a cluster C and create a

new state uC

  • Choose the highest probability among the

probabilities of ui

12

( ) ( ) ( )

s i i mi i im i i i

v b v b a a a a F , , ; , , , , , ;

1 1 1

! ! ! p =

s: number of symbols

( )

( )

( )

( )

( ) ( ) ( )

v b v b a a a a a a

i C u C ji C u C u jC ik C u u CC ij C u C u Cj i C u C

i j i k i j i i

Î Ï Î Î Ï Î Î

= ¢ = ¢ = ¢ = ¢ = ¢ max max max max max

, , ,

p p

Obtain the upper bounding likelihood

slide-13
SLIDE 13

Likelihood Approximation

  • Compute approximate likelihood from the

compact model

  • Upper bounding likelihood

– For approximate likelihood , holds – Exploit this property to guarantee exactness in search processing

13

( )

( )

( )

( ) ( ) ( ) ( )

î í ì = ¢ × ¢ £ £ ¢ × ¢ × ¢ = ¢ ¢ = ¢

  • ¢

£ £ ¢ £ £

1 2 max max

1 1 1 1

t x b n t x b a p p p P

i i t i ji t j m j it in m i

p

: maximum probability of states

it

P P ³ '

slide-14
SLIDE 14

Likelihood Approximation

Advantages

  • The best model can not be pruned

– The approximation gives the upper bounding likelihood of the original model

  • Support any model type

– Any probabilistic constraint is not applied to the approximation

14

slide-15
SLIDE 15

Multi-granularities

  • The likelihood approximation has the trade-off

between accuracy and computation time

– As the model size increases, accuracy improves – But the likelihood computation cost increases

  • Q: How to choose granularity ?

15

g

slide-16
SLIDE 16

Multi-granularities

  • The likelihood approximation has the trade-off

between accuracy and computation time

– As the model size increases, accuracy improves – But the likelihood computation cost increases

  • Q: How to choose granularity ?
  • A: Use multiple granularities

– distinct granularities that form a geometric progression gi =2i (i=0,1,2,…,h) – Geometrically increase the model size

16

g

ë û

( )

m h h

2

log 1 = +

slide-17
SLIDE 17

Multi-granularities

  • Compute the approximate likelihood from the

coarsest model as the first step

– Coarsest model has states

  • Prune the model if , otherwise

17

P¢ q < ¢ P

ë û(

)

1 2 =

h

m

q : threshold

slide-18
SLIDE 18

Multi-granularities

  • Compute the approximate likelihood from the

second coarsest model

– Second coarsest model has states

  • Prune the model if

18

ë û

1

2 -

h

m

q < ¢ P P¢

slide-19
SLIDE 19

Multi-granularities

  • Threshold

– Exploit the fact that we have found a good model of high likelihood

  • : exact likelihood of the best-so-far candidate during

search processing

– is updated and increases when promising model is found – Use for model pruning

19

q

q

q q

slide-20
SLIDE 20

Multi-granularities

  • Compute the approximate likelihood from the

second coarsest model

– Second coarsest model has states

  • Prune the model if , otherwise

– : exact likelihood of the best-so-far candidate

20

ë û

1

2 -

h

m

q < ¢ P

q

slide-21
SLIDE 21

Multi-granularities

  • Compute the likelihood from more accurate

model

  • Prune the model if

21

q < ¢ P P¢

slide-22
SLIDE 22

Multi-granularities

  • Repeat until the finest granularity (the original

model)

  • Update the answer candidate and best-so-far

likelihood if

22

q ³ P

slide-23
SLIDE 23

Multi-granularities

  • Optimize the trade-off between accuracy and

computation time

– Low-likelihood models are pruned by coarse-grained models – Fine-grained approximation is applied only to high- likelihood models

  • Efficiently find the best model for a large dataset

– The exact likelihood computations are limited to the minimum number of necessary

23

slide-24
SLIDE 24

Transition Pruning

  • Trellis structure has too many transitions
  • Q: How to exclude unlikely paths

24

slide-25
SLIDE 25

Transition Pruning

  • Trellis structure has too many transitions
  • Q: How to exclude unlikely paths
  • A: Use the two properties

– Likelihood is monotone non-increasing (likelihood computation) – Threshold is monotone non-decreasing (search processing)

25

slide-26
SLIDE 26

Transition Pruning

  • In likelihood computation, compute the estimate

– eit : conservative estimate of the likelihood pit of state ui at time t

  • If , prune all paths that pass through ui at t

– : exact likelihood of the best-so-far candidate

26

it

e

( )

( ) (

) ( )

ï î ï í ì =

  • £

£ × × =

Õ

+ =

  • n

t p n t x b a p e

in n t j j t n it it

1 1

1 max max

( )

( ) ( )

v b v b a a

i m i ij m j i £ £ £ £

= =

1 max , 1 max

max , max where

q <

it

e

q

slide-27
SLIDE 27

Transition Pruning

  • Terminate the likelihood computation

if all the paths are excluded

  • Efficient especially for long sequences
  • Applicable to approximate likelihood computation

27

slide-28
SLIDE 28

Accuracy and Complexity

28

  • SPIRAL needs the same order of memory space,

while can be up to times faster

2

m

Accuracy Complexity Memory Space Computation time Viterbi Guarantee exactness SPIRAL At least At most

( )

ms m O +

2

( )

2

nm O

( )

2

nm O

( )

n O

slide-29
SLIDE 29

Experimental Evaluation

  • Setup

– Intel Core 2 1.66GHz, 2GB memory

  • Datasets

– EEG, Chromosome, Traffic

  • Evaluation

– Mainly computation time

– Ergodic HMM

– Compared the Viterbi algorithm and Beam search

  • Beam search: popular technique, but does not guarantee

exactness

29

slide-30
SLIDE 30

Experimental Evaluation

  • Evaluation

– Wall clock time versus number of states – Wall clock time versus number of models – Effect of likelihood approximation – Effect of transition pruning – SPIRAL vs Beam search

30

slide-31
SLIDE 31

Experimental Evaluation

  • Wall clock time versus number of states

– EEG: up to 200 times faster

31

slide-32
SLIDE 32

Experimental Evaluation

  • Wall clock time versus number of states

– Chromosome: up to 150 times faster

32

slide-33
SLIDE 33

Experimental Evaluation

  • Wall clock time versus number of states

– Traffic: up to 500 times faster

33

slide-34
SLIDE 34

Experimental Evaluation

  • Evaluation

– Wall clock time versus number of states – Wall clock time versus number of models – Effect of likelihood approximation – Effect of transition pruning – SPIRAL vs Beam search

34

slide-35
SLIDE 35

Experimental Evaluation

  • Wall clock time versus number of models

– EEG: up to 200 times faster

35

slide-36
SLIDE 36

Experimental Evaluation

  • Evaluation

– Wall clock time versus number of states – Wall clock time versus number of models – Effect of likelihood approximation – Effect of transition pruning – SPIRAL vs Beam search

36

slide-37
SLIDE 37

Experimental Evaluation

  • Effect of likelihood approximation

– Most of models are pruned by coarser approximations

37

slide-38
SLIDE 38

Experimental Evaluation

  • Evaluation

– Wall clock time versus number of states – Wall clock time versus number of models – Effect of likelihood approximation – Effect of transition pruning – SPIRAL vs Beam search

38

slide-39
SLIDE 39

Experimental Evaluation

  • Effect of transition pruning

– SPIRAL find the highest-likelihood model more efficiently by transition pruning

39

slide-40
SLIDE 40

Experimental Evaluation

  • Evaluation

– Wall clock time versus number of states – Wall clock time versus number of models – Effect of likelihood approximation – Effect of transition pruning – SPIRAL vs Beam search

40

slide-41
SLIDE 41

Experimental Evaluation

  • SPIRAL vs Beam search

– SPIRAL is significantly faster while it guarantees exactness

41

Wall clock time SPIRAL is up to 27 times faster Likelihood error ratio Note: SPIRAL gives no error

slide-42
SLIDE 42

Conclusion

42

  • Design goals:

– High-speed search

  • SPIRAL is significantly (up to 500 times) faster

– Exactness

  • We prove that it guarantees exactness

– No restriction on model type

  • It can handle any HMM model type
  • SPIRAL achieves all the goals