Parametric Models Part III: Hidden Markov Models
Selim Aksoy
Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr
CS 551, Spring 2010
CS 551, Spring 2010 c 2010, Selim Aksoy (Bilkent University) 1 / 30
Parametric Models Part III: Hidden Markov Models Selim Aksoy - - PowerPoint PPT Presentation
Parametric Models Part III: Hidden Markov Models Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2010 CS 551, Spring 2010 2010, Selim Aksoy (Bilkent University) c 1 / 30
Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr
CS 551, Spring 2010 c 2010, Selim Aksoy (Bilkent University) 1 / 30
◮ The goal is to make a sequence of decisions where a
◮ Consider a system that can be described at any time as
◮ Let w(t) denote the actual state at time t where t = 1, 2, . . .. ◮ The probability of the system being in state w(t) is
CS 551, Spring 2010 c 2010, Selim Aksoy (Bilkent University) 2 / 30
◮ We assume that the state w(t) is conditionally independent
◮ We also assume that the Markov Chain defined by
CS 551, Spring 2010 c 2010, Selim Aksoy (Bilkent University) 3 / 30
◮ A particular sequence of states of length T is denoted by
◮ The model for the production of any sequence is described
j=1 aij = 1, ∀i.
CS 551, Spring 2010 c 2010, Selim Aksoy (Bilkent University) 4 / 30
◮ There is no requirement that the transition probabilities are
◮ Also, a particular state may be visited in succession
◮ This process is called an observable Markov model
CS 551, Spring 2010 c 2010, Selim Aksoy (Bilkent University) 5 / 30
◮ Consider the following 3-state first-order Markov model of
◮ w1: rain/snow ◮ w2: cloudy ◮ w3: sunny
CS 551, Spring 2010 c 2010, Selim Aksoy (Bilkent University) 6 / 30
◮ We can use this model to answer the following: Starting with
◮ Solution:
CS 551, Spring 2010 c 2010, Selim Aksoy (Bilkent University) 7 / 30
◮ Consider another question: Given that the model is in a known
◮ Solution:
∞
◮ For example, the expected number of consecutive days of sunny
CS 551, Spring 2010 c 2010, Selim Aksoy (Bilkent University) 8 / 30
◮ We can extend this model to the case where the
◮ The resulting model, called a Hidden Markov Model (HMM),
CS 551, Spring 2010 c 2010, Selim Aksoy (Bilkent University) 9 / 30
◮ We denote the observation at time t as v(t) and the
◮ There are many possible state-conditioned observation
◮ When the observations are discrete, the distributions
k=1 bjk = 1, ∀j.
CS 551, Spring 2010 c 2010, Selim Aksoy (Bilkent University) 10 / 30
◮ When the observations are continuous, the distributions are
Mj
k=1 αjk = 1, ∀j. ◮ We will restrict ourselves to discrete observations where a
CS 551, Spring 2010 c 2010, Selim Aksoy (Bilkent University) 11 / 30
◮ An HMM is characterized by:
◮ N, the number of hidden states ◮ M, the number of distinct observation symbols per state ◮ {aij}, the state transition probability distribution ◮ {bjk}, the observation symbol probability distribution ◮ {πi = P(w(1) = wi)}, the initial state distribution ◮ Θ = ({aij}, {bjk}, {πi}), the complete parameter set of the
CS 551, Spring 2010 c 2010, Selim Aksoy (Bilkent University) 12 / 30
◮ Consider the “urn and ball” example (Rabiner, 1989):
◮ There are N large urns in the room. ◮ Within each urn, there are a large number of colored balls
◮ An initial urn is chosen according to some random process,
◮ The ball’s color is recorded as the observation and it is put
◮ A new urn is selected according to the random selection
CS 551, Spring 2010 c 2010, Selim Aksoy (Bilkent University) 13 / 30
◮ The simplest HMM that corresponds to the urn and ball
◮ each state corresponds to a specific urn, ◮ a ball color probability is defined for each state. CS 551, Spring 2010 c 2010, Selim Aksoy (Bilkent University) 14 / 30
◮ Let’s extend the weather example.
◮ Assume that you have a friend who lives in ˙
◮ Your friend has a list of activities that she/he does every day
◮ Assume that ˙
◮ You have no information about the weather where your friend
CS 551, Spring 2010 c 2010, Selim Aksoy (Bilkent University) 15 / 30
◮ This process can be modeled using an HMM where the state of
◮ Given the model and the activity of your friend, you can make a
◮ For example, if your friend says that she/he played sports on the
◮ What is the overall probability of this sequence of
◮ What is the most likely weather sequence that would explain
CS 551, Spring 2010 c 2010, Selim Aksoy (Bilkent University) 16 / 30
◮ Speech recognition ◮ Optical character recognition ◮ Natural language processing (e.g., text summarization) ◮ Bioinformatics (e.g., protein sequence modeling) ◮ Image time series (e.g., change detection) ◮ Video analysis (e.g., story segmentation, motion tracking) ◮ Robot planning (e.g., navigation) ◮ Economics and finance (e.g., time series, customer
CS 551, Spring 2010 c 2010, Selim Aksoy (Bilkent University) 17 / 30
◮ Evaluation problem: Given the model, compute the
◮ Decoding problem: Given the model, find the most likely
◮ Learning problem: Given a set of output sequences, find
CS 551, Spring 2010 c 2010, Selim Aksoy (Bilkent University) 18 / 30
◮ A particular sequence of observations of length T is
◮ The probability of observing this sequence can be
CS 551, Spring 2010 c 2010, Selim Aksoy (Bilkent University) 19 / 30
◮ This summation includes N T terms in the form
T
◮ It is unfeasible with computational complexity O(N TT). ◮ However, a computationally simpler algorithm called the
CS 551, Spring 2010 c 2010, Selim Aksoy (Bilkent University) 20 / 30
◮ Define αj(t) as the probability that the HMM is in state wj at
◮ αj(t), j = 1, . . . , N can be computed as
i=1 αi(t − 1)aij
◮ Then, P(VT|Θ) = N j=1 αj(T).
CS 551, Spring 2010 c 2010, Selim Aksoy (Bilkent University) 21 / 30
◮ Similarly, we can define a backward algorithm where
◮ βi(t), i = 1, . . . , N can be computed as
j=1 βj(t + 1)aijbjv(t+1)
◮ Then, P(VT|Θ) = N i=1 βi(1)πibiv(1).
CS 551, Spring 2010 c 2010, Selim Aksoy (Bilkent University) 22 / 30
◮ The computations of both αj(t) and βi(t) have complexity
◮ For classification, we can compute the posterior
◮ Then, we can select the class with the highest posterior.
CS 551, Spring 2010 c 2010, Selim Aksoy (Bilkent University) 23 / 30
◮ Given a sequence of observations VT, we would like to find
◮ One possible solution is to enumerate every possible
◮ We can also define the problem of finding the optimal state
◮ This also corresponds to maximizing the expected number
CS 551, Spring 2010 c 2010, Selim Aksoy (Bilkent University) 24 / 30
◮ Define γi(t) as the probability that the HMM is in state wi at
j=1 αj(t)βj(t)
i=1 γi(t) = 1. ◮ Then, the individually most likely state w(t) at time t
i=1,...,N γi(t).
CS 551, Spring 2010 c 2010, Selim Aksoy (Bilkent University) 25 / 30
◮ One problem is that the resulting sequence may not be
◮ One possible solution is the Viterbi algorithm that finds the
◮ This algorithm recursively computes the state sequence
CS 551, Spring 2010 c 2010, Selim Aksoy (Bilkent University) 26 / 30
◮ The goal is to determine the model parameters {aij}, {bjk}
◮ Define ξij(t) as the probability that the HMM is in state wi at
i=1
j=1 αi(t − 1) aij bjv(t) βj(t)
CS 551, Spring 2010 c 2010, Selim Aksoy (Bilkent University) 27 / 30
◮ γi(t) defined in the decoding problem and ξij(t) defined
N
◮ Then, ˆ
t=2 ξij(t)
t=2 γi(t − 1)
CS 551, Spring 2010 c 2010, Selim Aksoy (Bilkent University) 28 / 30
◮ Similarly, ˆ
t=1 δv(t),vkγj(t)
t=1 γj(t)
◮ Finally, ˆ
CS 551, Spring 2010 c 2010, Selim Aksoy (Bilkent University) 29 / 30
◮ These are called the Baum-Welch equations (also called
◮ See (Bilmes, 1998) for the estimates ˆ
CS 551, Spring 2010 c 2010, Selim Aksoy (Bilkent University) 30 / 30