Pattern Recognition Part 8: Hidden Markov Models (HMMs) Gerhard - - PowerPoint PPT Presentation

pattern recognition
SMART_READER_LITE
LIVE PREVIEW

Pattern Recognition Part 8: Hidden Markov Models (HMMs) Gerhard - - PowerPoint PPT Presentation

Pattern Recognition Part 8: Hidden Markov Models (HMMs) Gerhard Schmidt Christian-Albrechts-Universitt zu Kiel Faculty of Engineering Institute of Electrical and Information Engineering Digital Signal Processing and System Theory Hidden


slide-1
SLIDE 1

Pattern Recognition

Gerhard Schmidt Christian-Albrechts-Universität zu Kiel Faculty of Engineering Institute of Electrical and Information Engineering Digital Signal Processing and System Theory

Part 8: Hidden Markov Models (HMMs)

slide-2
SLIDE 2

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 2

  • Hidden Markov Models (HMMs)

Contents

❑ Motivation ❑ Fundamentals

❑ The „hidden“ part of the model ❑ The inner family of random processes

❑ Fundamental problems of Hidden Markov Models

❑ Efficient calculation of sequence probabilities ❑ Efficient calculation of the most probable sequence ❑ Calculation (estimation) of the model parameters

slide-3
SLIDE 3

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 3

  • Hidden Markov Models (HMMs)

Motivation

❑ In the previous approaches (vector quantization, Gaussian mixture models), only the probability distribution of multi-

dimensional data vectors was analyzed and used. Their temporal progression was assumed to be uncorrelated.

❑ If also the temporal progression of the observed data vectors should be analyzed, the previous models can be extended

by a temporal component. This new component will again be derived on a statistical background.

❑ In hidden Markov models, two (or three) statistical components are nested. ❑ While for multivariate amplitude distributions, both discrete and continuous probability distributions can be used, the

temporal modeling will be done discretely.

Modeling of temporal dependencies

slide-4
SLIDE 4

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 4

  • Hidden Markov Models (HMMs)

Literature

Hidden Markov Models

❑ B. Pfister, T. Kaufman: Sprachverarbeitung, Springer, 2008 (in German) ❑ C. M. Bishop: Pattern Recognition and Maschine Learning, Springer, 2006 ❑ L. Rabiner, B.H. Juang: Fundamentals of Speech Recognition, Prentice Hall, 1993 ❑ B. Gold, N. Morgan: Speech and Audio Signal Processing, Wiley, 2000

slide-5
SLIDE 5

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 5

  • Hidden Markov Models (HMMs)

Common definitions – Part 1

❑ The hidden part of the model is assumed to be a Markov process

with N states. These states are not observable. For the state transitions from one discrete state to another, probabilities are specified.

❑ The hidden states govern a second family of random processes, which result in the observable sequence of vectors

.

❑ The sequence of hidden states is denoted as

where the elements each correspond to one of the hidden states, respectively:

Hidden part of the model (random process) in the Markov model

slide-6
SLIDE 6

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 6

  • Hidden Markov Models (HMMs)

Common definitions – Part 2

❑ As soon as the model gets into a new state, the model generates an observation vector. Its distribution is only

dependant on the new state , but not on previous ones: In the following, this probability is denoted as ,

❑ The state transitions are specified (surprise!) by probabilities. These transition probabilities depend only on the current

transition’s source and target state, but not on previous states.

Hidden part of the model (random process) in the Markov model

Transition probability Emission probability

slide-7
SLIDE 7

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 7

  • Hidden Markov Models (HMMs)

Common definitions – Part 3

The transition probabilities are abbreviated as follows,

❑ The initial and final states of a HMM are called

initial state, and final state. Both states are modeled as “non-emitting”. The direct transition from the initial to the final state is forbidden – no observation would be created in this case. I.e., for the transition probabilities, the following holds:

Hidden part of the model (random process) in the Markov model

Direct transition from initial to final state Transitions that leave the final state Transitions that enter the initial state

slide-8
SLIDE 8

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 8

  • Hidden Markov Models (HMMs)

Common definitions – Part 4

State Transition probabilities Emission probability

Hidden part of the model (random process) in the Markov model

slide-9
SLIDE 9

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 9

  • Hidden Markov Models (HMMs)

Common definitions – Part 5

❑ The transition probabilities of the model are combined in a transition matrix

.

❑ The constraints are:

Hidden part of the model (random process) in the Markov model

slide-10
SLIDE 10

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 10

  • Hidden Markov Models (HMMs)

Types of hidden Markov models – Part 1

Hidden Markov models of the type “left to right”

Transition matrix Structure of a left-to-right Markov model

❑ Initial, final and three emitting states are shown. ❑ Transitions from right to left are not possible.

slide-11
SLIDE 11

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 11

  • Hidden Markov Models (HMMs)

Types of hidden Markov models – Part 2

Linear hidden Markov models

Structure of a linear hidden Markov model

❑ Initial, final, and three emitting states are shown. ❑ Only transitions to the state itself and to right

neighbors are possible. Consequently, a sequence of

  • bservations must have at least 3 observations.

Transition matrix

slide-12
SLIDE 12

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 12

  • Hidden Markov Models (HMMs)

Common definitions – Part 6

❑ In order to generate the observation vectors, another random process is assigned to each state. It can be modeled

either as discrete or as continuous process.

❑ If the generation of the observations is modeled as N-2 discrete processes and each process may have K discrete

  • bservation states, then the applied probabilities can again be combined in a matrix

. Again, the following constraints hold:

Generation of observations by a random process

slide-13
SLIDE 13

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 13

  • Hidden Markov Models (HMMs)

Common definitions – Part 7

❑ If the generation of observations is modeled as continuous processes using multivariate Gaussian densities (GMMs),

then the applied probabilities can be defined as follows, , assuming that per state K Gaussian distributions are used. The Gaussian distributions are defined as in the GMM lecture, with

Generation of observations by a random process

slide-14
SLIDE 14

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 14

  • Hidden Markov Models (HMMs)

Common definitions – Part 8

Generation of observations by a random process

Final state Initial state Gaussian mixture model

  • f the first (non-initial) state

Gaussian mixture model

  • f the second (non-initial) state
slide-15
SLIDE 15

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 15

  • Hidden Markov Models (HMMs)

Trellis diagrams – Part 1

The initial state always leads to the first (non-initial) state. Time index State We assume an HMM of this structure.

slide-16
SLIDE 16

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 16

  • Hidden Markov Models (HMMs)

Trellis diagrams – Part 2

Based on state 1, only transitions to the states 1, 2, and 3 are possible. Time index State

slide-17
SLIDE 17

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 17

  • Hidden Markov Models (HMMs)

Trellis diagrams – Part 3

All possible transitions based

  • n the first state are plotted.

Time index State

slide-18
SLIDE 18

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 18

  • Hidden Markov Models (HMMs)

Motivation

All possible transitions based

  • n the second state are plotted.

Time index State

slide-19
SLIDE 19

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 19

  • Hidden Markov Models (HMMs)

Trellis diagrams – Part 5

All possible transitions based

  • n the third state are plotted.

Time index State

slide-20
SLIDE 20

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 20

  • Hidden Markov Models (HMMs)

Trellis diagrams – Part 6

All possible transitions from time index 2 to time index 3 are plotted. Time index State

slide-21
SLIDE 21

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 21

  • Hidden Markov Models (HMMs)

Trellis diagrams – Part 7

Now, all possible transitions of an

  • bservation sequence of

length 10 are plotted. Time index State

slide-22
SLIDE 22

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 22

  • Hidden Markov Models (HMMs)

Trellis diagrams – Part 8

❑ The transition probabilities are usually denoted at the edges. ❑ The emission probability, that the observed vector is produced by the corresponding state, is denoted at the nodes.

Meaning of edges and nodes

slide-23
SLIDE 23

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 23

  • Hidden Markov Models (HMMs)

Essential problems of hidden Markov models

❑ The probability that the hidden Markov model creates the (given) observation sequence is to be calculated. ❑ In order to calculate this probability, all possible observation sequences have to be taken into account. The direct

calculation (summing over all possible observation sequences) would thus be very time consuming.

Evaluation problem

❑ Besides the probability calculated above, also the state sequence

that creates the observation sequence with the highest probability, is of note.

Decoding problem

❑ Based on a huge data base, all parameters of the hidden Markov model are to be estimated.

Estimation problem

slide-24
SLIDE 24

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 24

  • Hidden Markov Models (HMMs)

Evaluation problem – Part 1

❑ The probability that the hidden Markov model creates the (given) observation sequence is to be found. ❑ The wanted probability can be calculated by summing up the conditional production probabilities of all possible

  • bservation sequences,

❑ This can be written as follows, ❑ In the following we will try to calculate the two conditional probabilities separately.

Evaluation problem

slide-25
SLIDE 25

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 25

  • Hidden Markov Models (HMMs)

Evaluation problem – Part 2

❑ In a first step, the production probability is being calculated, that results from the assumption that the state sequence

is known. We use that the probability of an observation only depends on the actual state of the HMM – but not of previous or subsequent states:

❑ The probability that the sequence has been selected, can be evaluated as follows:

Evaluation problem

slide-26
SLIDE 26

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 26

  • Hidden Markov Models (HMMs)

Evaluation problem – Part 3

❑ The production probability results in ❑ The problem when directly calculating the production probability is the fact that per time index, there are N-2 possible states.

As a result, for the overall sequence, (N-2)T possible paths exist, so the number of summands is no longer manageable.

❑ As a remedy, the so-called forward algorithm is used. For this purpose the so-called forward probability is defined in a first

step,

This is the probability that at time index n, the state Si is active and the “shortened” observation sequence X(n) could be observed up to now.

Evaluation problem

slide-27
SLIDE 27

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 27

  • Hidden Markov Models (HMMs)

Evaluation problem – Part 4

❑ The upper indices specify the shortened versions of the observation matrix and of the state sequence, respectively: ❑ The forward probability can be determined by summing up all possible shortened observation sequences and being at

state Si at time index n,

Evaluation problem

slide-28
SLIDE 28

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 28

  • Hidden Markov Models (HMMs)

Evaluation problem – Part 5

Time index State Illustration of the forward probabilities

Evaluation problem

slide-29
SLIDE 29

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 29

  • Hidden Markov Models (HMMs)

Evaluation problem – Part 6

❑ Because of the independence of the previous states, the forward probabilities can be calculated recursively as follows, ❑ The initialization is done as follows, ❑ Hereby, the production probability of the observed sequence can be determined by summation of the previous forward

probabilities,

❑ Note that the computational complexity now just grows linearly with the sequence length (instead of growing

exponentially using direct calculation).

Evaluation problem

slide-30
SLIDE 30

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 30

  • Hidden Markov Models (HMMs)

Decoding problem – Part 1

❑ Besides the probability that the hidden Markov model created the observation vector sequence , some

applications require the most probable state sequence. The latter can be defined as follows,

❑ The conditional probability mentioned above can be permuted, ❑ Because only depends on the (given) observation sequence, also

can be optimized instead. By this permutation of the cost function, similar quantities as in the previous problem can be considered.

Decoding problem

slide-31
SLIDE 31

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 31

  • Hidden Markov Models (HMMs)

Decoding problem – Part 2

❑ The most probable state sequence can be calculated efficiently using the so-called Viterbi algorithm. In analogy to the

explanation of the evaluation problem, the joint probability for the shortened observation vector sequence and the

  • ptimal shortened state sequence is defined,

❑ The calculation of the probability can again be computed in a recursive way, ❑ For each time index and each state, the index of the state that induced the maximum probability has to be stored, so

the optimal path can be tracked later on.

Decoding problem

slide-32
SLIDE 32

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 32

  • Hidden Markov Models (HMMs)

Decoding problem – Part 3

❑ Initialization ❑ Recursion (Iteration) ❑ Termination ❑ Backtracking of the optimal state sequence

Summary of the Viterbi algorithm

slide-33
SLIDE 33

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 33

  • Hidden Markov Models (HMMs)

Decoding problem – Part 4

Time index State Initialization

slide-34
SLIDE 34

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 34

  • Hidden Markov Models (HMMs)

Decoding problem – Part 5

Recursion for the first (non-initial) state Time index State

slide-35
SLIDE 35

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 35

  • Hidden Markov Models (HMMs)

Decoding problem – Part 6

Recursion for the first (non-initial) state Time index State

slide-36
SLIDE 36

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 36

  • Hidden Markov Models (HMMs)

Decoding problem – Part 7

Recursion for the second state Time index State

slide-37
SLIDE 37

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 37

  • Hidden Markov Models (HMMs)

Decoding problem – Part 8

Recursion for the second state Time index State

slide-38
SLIDE 38

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 38

  • Hidden Markov Models (HMMs)

Decoding problem – Part 9

Recursion for the third state Time index State

slide-39
SLIDE 39

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 39

  • Hidden Markov Models (HMMs)

Decoding problem – Part 10

Recursion for the third state Time index State

slide-40
SLIDE 40

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 40

  • Hidden Markov Models (HMMs)

Decoding problem – Part 11

Recursion for the fourth state Time index State

slide-41
SLIDE 41

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 41

  • Hidden Markov Models (HMMs)

Decoding problem – Part 12

Recursion for the fourth state Time index State

slide-42
SLIDE 42

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 42

  • Hidden Markov Models (HMMs)

Decoding problem – Part 13

Complete recursion Time index State

slide-43
SLIDE 43

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 43

  • Hidden Markov Models (HMMs)

Decoding problem – Part 14

Termination Time index State

slide-44
SLIDE 44

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 44

  • Hidden Markov Models (HMMs)

Decoding problem – Part 15

Backtracking

  • f the optimal

state sequence Time index State

slide-45
SLIDE 45

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 45

  • Hidden Markov Models (HMMs)

Generating feature vectors using a hidden Markov model – Part 1

Basics

Final state Initial state Gaussian mixture model

  • f the first state

Gaussian mixture model

  • f the second state

Transition probabilities Emission probabilities

slide-46
SLIDE 46

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 46

  • Hidden Markov Models (HMMs)

Generating feature vectors using a hidden Markov model – Part 2

Initial state So-far

  • bservation sequence

Initial state

slide-47
SLIDE 47

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 47

  • Hidden Markov Models (HMMs)

Generating feature vectors using a hidden Markov model – Part 3

Initial state Transition probabilities

Determining the first transition

slide-48
SLIDE 48

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 48

  • Hidden Markov Models (HMMs)

Generating feature vectors using a hidden Markov model – Part 4

Generating the first observation vector

Emission probabilities Gaussian mixture model

  • f the first state

So-far

  • bservation sequence
slide-49
SLIDE 49

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 49

  • Hidden Markov Models (HMMs)

Generating feature vectors using a hidden Markov model – Part 5

Determining the second transition

Transition probabilities Gaussian mixture model

  • f the first state
slide-50
SLIDE 50

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 50

  • Hidden Markov Models (HMMs)

Generating feature vectors using a hidden Markov model – Part 6

Generation of the second observation vector

Gaussian mixture model

  • f the second state

Emission probabilities So-far

  • bservation sequence
slide-51
SLIDE 51

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 51

  • Hidden Markov Models (HMMs)

Generating feature vectors using a hidden Markov model – Part 7

Gaussian mixture model

  • f the second state

Transition probabilities

Determining the third transition

slide-52
SLIDE 52

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 52

  • Hidden Markov Models (HMMs)

Generating feature vectors using a hidden Markov model – Part 8

Generation of the third observation vector

Gaussian mixture model

  • f the second state

Emission probabilities So-far

  • bservation sequence
slide-53
SLIDE 53

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 53

  • Hidden Markov Models (HMMs)

Generating feature vectors using a hidden Markov model – Part 9

Transition probabilities

Determining the fourth transition

Gaussian mixture model

  • f the second state
slide-54
SLIDE 54

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 54

  • Hidden Markov Models (HMMs)

Generating feature vectors using a hidden Markov model – Part 10

Final state

Final state Overall

  • bservation sequence
slide-55
SLIDE 55

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 55

  • Hidden Markov Models (HMMs)

The three problems with hidden Markov models – Part 1

Initial state Final state Second model state

❑ After the model topology has been defined, the model parameters are to be estimated.

Emission probabilities Transition probabilities First model state Main subject of the next slides

slide-56
SLIDE 56

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 56

  • Hidden Markov Models (HMMs)

The three problems with hidden Markov models – Part 2

❑ After the model topology has been defined, the model parameters are to be estimated. ❑ The probability that a model generates an observed feature sequence has to be calculated in an efficient way.

Observation sequence Model 1 Model 2 Subject of the previous slides

slide-57
SLIDE 57

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 57

  • Hidden Markov Models (HMMs)

The three problems with hidden Markov models – Part 3

❑ After the model topology has been defined, the model parameters are to be estimated. ❑ The probability that a model generates an observed feature sequence has to be

calculated in an efficient way.

❑ The state sequence that generates the observed feature sequence with highest

probability has to calculated efficiently.

Overall

  • bservation sequence

Also subject of the previous slides!

slide-58
SLIDE 58

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 58

  • Hidden Markov Models (HMMs)

Lecture Evaluation

❑ Please help to improve the lecture by

filling out our survey ….

slide-59
SLIDE 59

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 59

  • Hidden Markov Models (HMMs)

Solving the estimation problem – Part 1

❑ For one or more given observation sequences the parameters (transition and emission probabilities) are to be found

in such a way, that

❑ To do so, we assume that an initial HMM is already existing. This model is optimized iteratively, until a certain

  • ptimization criterion is fulfilled or a maximum number of iterations was computed.

❑ The iteration methods known so far only are able to find local maxima. ❑ The most common method is based on a maximum likelihood estimation and is called Baum-Welch or forward-

backward algorithm.

Estimation problem

slide-60
SLIDE 60

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 60

  • Hidden Markov Models (HMMs)

Solving the estimation problem – Part 2

❑ In analogy to the forward probability (see previous slides)

we now introduce the backward probability The partial observation sequence describes all observations from the nth time index up to the end of the sequence,

❑ The backward probability, similar to the forward probability, can be calculated recursively, ❑ The initialization is done as follows,

Backward probability

slide-61
SLIDE 61

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 61

  • Hidden Markov Models (HMMs)

Solving the estimation problem – Part 3

Time index State

Forward and backward probability

slide-62
SLIDE 62

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 62

  • Hidden Markov Models (HMMs)

Solving the estimation problem – Part 4

Probability distribution over states

❑ Using the forward and backward probabilities, we can calculate the probability that the state Si is active at time index n, ❑ The “normalization” can be calculated either using the forward or the backward probability,

slide-63
SLIDE 63

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 63

  • Hidden Markov Models (HMMs)

Solving the estimation problem – Part 5

Probability distribution over states

Time index State The state Si is active at time index n

slide-64
SLIDE 64

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 64

  • Hidden Markov Models (HMMs)

Solving the estimation problem – Part 6

Transition probabilities

❑ Using the forward and backward probability, we can also easily calculate the probability that the state of the hidden

Markov model changes from state Si to state Sj at time index n,

slide-65
SLIDE 65

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 65

  • Hidden Markov Models (HMMs)

Solving the estimation problem – Part 7

Transition probabilities

State Si is active at time index n! State Sj is active at time index n! Time index State

slide-66
SLIDE 66

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 66

  • Hidden Markov Models (HMMs)

Solving the estimation problem – Part 8

Estimation of the Markov transition probabilities

❑ For the next iteration, the following transition probabilities are used, ❑ Additionally, the parameters mentioned above are to be calculated based on multiple observation sequences X and

averaged before being used in the next step.

Expected average number

  • f state transitions from

state Si to state Sj Expected average number of state transitions that start in state Si

slide-67
SLIDE 67

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 67

  • Hidden Markov Models (HMMs)

Solving the estimation problem – Part 9

Emission probabilities

❑ In order to determine the individual parameters of

the Gaussian densities, in a first step a partition of the states with multiple Gaussians into multiple states with just one Gaussian is performed.

slide-68
SLIDE 68

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 68

  • Hidden Markov Models (HMMs)

Solving the estimation problem – Part 10

Emission probabilities

❑ In analogy to the first approach, individual transition probabilities can be calculated for this extended model, ❑

These can again be expressed by forward and backward probabilities,

Probability that a transition from state Si into state Sj was performed at time index n while the k-th Gaussian of the state Sj was creating the

  • bservation vector.
slide-69
SLIDE 69

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 69

  • Hidden Markov Models (HMMs)

Solving the estimation problem – Part 11

❑ Summing all transition probabilities over the outgoing states results in the probability that the k-th Gaussian of the j-th state

generated the observed vector at time index n,

❑ Now, analogously to the “main transition probabilities“, also the GMM parameters can be determined by iteration.

Emission probabilities

slide-70
SLIDE 70

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 70

  • Hidden Markov Models (HMMs)

Solving the estimation problem – Part 12

❑ The emission probability was defined as follows, ❑ The adaptation of the weights is done as follows, ❑ The adaption of the averages vectors is done as follows,

Adaption of the GMM parameters

Average number of transitions from the

  • utgoing state Sj to the incoming state Si

Average number of state transitions that start in state Sj

slide-71
SLIDE 71

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 71

  • Hidden Markov Models (HMMs)

Solving the estimation problem – Part 13

❑ The adaptation of the covariance matrices is performed as follows,

Adaption of the GMM parameters

slide-72
SLIDE 72

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 72

  • Hidden Markov Models (HMMs)

Solving the estimation problem – Part 14

Viterbi training

❑ The method to estimate the model parameters that was described above is called Baum-Welch algorithm.

It is a special case of the EM algorithm that was described in the GMM lecture.

❑ Alternatively, the so-called Viterbi training can be applied. To do so, in a first step the state sequence

with the highest probability is computed.

❑ Then it is assumed that this path was taken with “certain” probability, i.e., it holds

slide-73
SLIDE 73

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 73

  • Hidden Markov Models (HMMs)

Solving the estimation problem – Part 15

Viterbi training

❑ For the internal transitions, the following consequently holds, ❑ The subsequent iterations to optimize the model parameters are performed as described at the Baum-Welch algorithm. ❑ Similar to the Baum-Welch algorithm, the iterations are performed until the probability that the model generates the

  • bservation sequence is no longer increasing significantly or the maximum number of iterations is reached.
slide-74
SLIDE 74

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 74

  • Hidden Markov Models (HMMs)

Solving the estimation problem – Part 16

Initializing a hidden Markov model

❑ In a first step, the number of states and their topology is defined (forbidden transitions are marked, i.e. their probability

is set to zero).

❑ Per state, just one Gaussian distribution is used. ❑ While the training is running, the number of Gaussian distributions is gradually increased. For example, the Gaussian

distributions are doubled and initialized as follows,

❑ This is repeated until the probability that the model generates the training sequences is no longer increased significantly

  • r a maximum number of parameters is reached.
slide-75
SLIDE 75

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 75

  • Hidden Markov Models (HMMs)

„Intermezzo“

Partner exercise:

❑ Please answer (in groups of two people) the questions that you will get during the lecture!

slide-76
SLIDE 76

Digital Signal Processing and System Theory | Pattern Recognition | Hidden Markov Models (HMMs) Slide 76

  • Hidden Markov Models (HMMs)

Summary and Outlook

Summary:

❑ Motivation ❑ Basics ❑ The „hidden“ part of the model ❑ The „inner“ random processes ❑ Basic problems of Hidden Markov Models ❑ Efficient computation of the probabilities of state sequences ❑ Efficient computation of the most probable sequence ❑ Computation (estimation) of the parameters of the model

Next week:

❑ Speaker and speech recognition