SLIDE 39 Automatic Speech Recognition: From the Beginning to the Portuguese Language 39 systems, that is, the state transitions have some temporal order, usually left to right. Thus, HMMs for speech applications have a final state (𝑡𝐺), altering the termination step of the forward algorithm to 𝑄 𝑃 𝜇 = 𝛽𝑈 𝑡𝐺 .
6.1.2 Decoding Problem An approach to find the optimal state sequence for a given observation sequence is to choose the states st that are individually most likely at each time t. Even though this approach maximizes the expected number of correct states, the estimated state sequence can have transitions that are not likely or impossible to occur (i.e., aij=0). The problem is that the approach does not take into account the transition
- probabilities. A modified version of the forward algorithm, known as the Viterbi
algorithm, can be used to estimate the optimal state sequence The Viterbi algorithm estimates the probability that the HMM is in state j after seeing the first t observations, like in the forward algorithm, but only over the most likely state sequence 𝑡1, 𝑡2, … , 𝑡𝑢−1, given the model that is, 𝜀𝑢 𝑗 = max
𝑡1,𝑡2,…,𝑡𝑢−1 𝑄 𝑡1, 𝑡2, … , 𝑡𝑢−1, 𝑡𝑢 = i, 𝑝1, 𝑝2, … , 𝑝𝑢 𝜇
where 𝜀𝑢 𝑗 is the probability of the most likely state sequence in state i at time t after seeing the t observations. An array 𝜔𝑢 𝑢 is used to keep track of the previous state with highest probability so the state sequence can be retrieved at the end of the algorithm. The Viterbi algorithm can be defined as follows:
𝜀1 𝑗 = 𝜌𝑗 ∙ 𝑐𝑗 𝑝1 , 1 ≤ 𝑗 ≤ 𝑂. 𝜔𝑢 𝑢 = 0.
𝜀𝑢 𝑘 = 𝑛𝑏𝑦
1≤𝑗≤𝑂 𝜀𝑢−1 𝑗 ∙ 𝑏𝑗𝑘 ∙ 𝑐 𝑘 𝑝𝑢 ,
𝜔𝑢 𝑘 = 𝑏𝑠𝑛𝑏𝑦
1≤𝑗≤𝑂
𝜀𝑢−1 𝑗 ∙ 𝑏𝑗𝑘 , 2 ≤ 𝑢 ≤ 𝑈, 1 ≤ 𝑘 ≤ 𝑂.
𝑄∗ = max
1≤𝑗≤𝑂 𝜀𝑈 𝑗 ,
𝑡𝑢
∗ = argmax 1≤𝑗≤𝑂
𝜀𝑈 𝑗 .
𝑡𝑢
∗ = 𝜔𝑢+1 𝑡𝑢+1 ∗ , 𝑢 = 𝑈 − 1, 𝑈 − 2, … , 1.
6.1.3 Learning Problem The estimation of the model parameters 𝜇 = A, B, π is the most difficult of the three problems, because there is no known analytical method to maximize the probability
- f the observation sequence in a closed form. However, the parameters can be estimated