Hidden Markov Models Slides adapted from Joyce Ho, David Sontag, - - PowerPoint PPT Presentation

hidden markov models
SMART_READER_LITE
LIVE PREVIEW

Hidden Markov Models Slides adapted from Joyce Ho, David Sontag, - - PowerPoint PPT Presentation

Hidden Markov Models Slides adapted from Joyce Ho, David Sontag, Geoffrey Hinton, Eric Xing, and Nicholas Ruozzi Sequential Data Time-series: Stock market, weather, speech, video Ordered: Text, genes Sequential Data: Tracking Observe


slide-1
SLIDE 1

Hidden Markov Models

Slides adapted from Joyce Ho, David Sontag, Geoffrey Hinton, Eric Xing, and Nicholas Ruozzi

slide-2
SLIDE 2

Sequential Data

  • Time-series: Stock

market, weather, speech, video

  • Ordered: Text, genes
slide-3
SLIDE 3

Sequential Data: Tracking

Observe noisy measurements of missile location

Where is the missile now? Where will it be in 1 minute?

slide-4
SLIDE 4

Sequential Data: Weather

  • Predict the weather tomorrow

using previous information

  • If it rained yesterday, and the

previous day and historically it has rained 7 times in the past 10 years on this date — does this affect my prediction?

slide-5
SLIDE 5

Sequential Data: Weather

  • Use product rule for joint distribution of a sequence
  • How do I solve this?
  • Model how weather changes over time
  • Model how observations are produced
  • Reason about the model
slide-6
SLIDE 6

Markov Chain

  • Set S is called the state space
  • Process moves from one state to another generating a

sequence of states: x1, x2, …, xt

  • Markov chain property: probability of each subsequent

state depends only on the previous state:

slide-7
SLIDE 7

Markov Chain: Parameters

  • State transition matrix A (|S| x |S|)

A is a stochastic matrix (all rows sum to one) Time homogenous Markov chain: transition probability between two states does not depend on time

  • Initial (prior) state probabilities
slide-8
SLIDE 8

Rain Dry

0.7 0.3 0.2 0.8

  • Two states : ‘Rain’ and ‘Dry’.
  • Transition probabilities:

P(‘Rain’|‘Rain’)=0.3, P(‘Dry’|‘Rain’)=0.7 P(‘Rain’|‘Dry’) =0.2, P(‘Dry’|‘Dry’)=0.8

  • Initial probabilities:

P(‘Rain’)=0.4 , P(‘Dry’)=0.6

Example of Markov Model

slide-9
SLIDE 9

Example: Weather Prediction

  • Compute probability of tomorrow’s

weather using Markov property

  • Evaluation: given today is dry, what’s

the probability that tomorrow is dry and the next day is rainy?

  • Learning: give some observations,

determine the transition probabilities

P({‘Dry’,’Dry’,’Rain’} ) = P(‘Rain’|’Dry’) P(‘Dry’|’Dry’) P(‘Dry’) = 0.2*0.8*0.6

slide-10
SLIDE 10

Hidden Markov Model (HMM)

  • Stochastic model where

the states of the model are hidden

  • Each state can emit an
  • utput which is observed
slide-11
SLIDE 11

HMM: Parameters

  • State transition matrix A
  • Emission / observation

conditional output probabilities B

  • Initial (prior) state probabilities
slide-12
SLIDE 12

Low High

0.7 0.3 0.2 0.8

Dry Rain

0.6 0.6 0.4 0.4

Example of Hidden Markov Model

slide-13
SLIDE 13
  • Two states : ‘Low’ and ‘High’ atmospheric pressure.
  • Two observations : ‘Rain’ and ‘Dry’.
  • Transition probabilities:

P(‘Low’|‘Low’)=0.3 , P(‘High’|‘Low’)=0.7 P(‘Low’|‘High’)=0.2, P(‘High’|‘High’)=0.8

  • Observation probabilities :

P(‘Rain’|‘Low’)=0.6 , P(‘Dry’|‘Low’)=0.4 P(‘Rain’|‘High’)=0.4 , P(‘Dry’|‘High’)=0.3

  • Initial probabilities:

P(‘Low’)=0.4 , P(‘High’)=0.6

Example of Hidden Markov Model

slide-14
SLIDE 14
  • Suppose we want to calculate a probability of a sequence of
  • bservations in our example, {‘Dry’,’Rain’}.
  • Consider all possible hidden state sequences:

P({‘Dry’,’Rain’} ) = P({‘Dry’,’Rain’} , {‘Low’,’Low’}) + P({‘Dry’,’Rain’} , {‘Low’,’High’}) + P({‘Dry’,’Rain’} , {‘High’,’Low’}) + P({‘Dry’,’Rain’} , {‘High’,’High’})

where first term is :

P({‘Dry’,’Rain’} , {‘Low’,’Low’})= P({‘Dry’,’Rain’} | {‘Low’,’Low’}) P({‘Low’,’Low’}) = P(‘Dry’|’Low’)P(‘Rain’|’Low’) P(‘Low’)P(‘Low’|’Low) = 0.4*0.4*0.6*0.4*0.3

Calculation of observation sequence probability

slide-15
SLIDE 15

Example: Dishonest Casino

  • A casino has two dices that it switches between with

5% probability

  • Fair dice
  • Loaded dice
slide-16
SLIDE 16

Example: Dishonest Casino

  • Initial probabilities
  • State transition matrix
  • Emission probabilities
slide-17
SLIDE 17

Example: Dishonest Casino

  • Given a sequence of rolls by the casino player
  • How likely is this sequence given our model of how the

casino works? – evaluation problem

  • What sequence portion was generated with the fair die,

and what portion with the loaded die? – decoding problem

  • How “loaded” is the loaded die? How “fair” is the fair

die? How often does the casino player change from fair to loaded and back? – learning problem

slide-18
SLIDE 18

HMM: Problems

  • Evaluation: Given parameters and observation sequence,

find probability (likelihood) of observed sequence

  • forward algorithm
  • Decoding: Given HMM parameters and observation

sequence, find the most probable sequence of hidden states

  • Viterbi algorithm
  • Learning: Given HMM with unknown parameters and
  • bservation sequence, find the parameters that maximizes

likelihood of data

  • Forward-Backward algorithm
slide-19
SLIDE 19

HMM: Evaluation Problem

  • Given
  • Probability of observed sequence

Summing over all possible hidden state values at all times — KT exponential # terms

slide-20
SLIDE 20

s1 s2 si sK s1 s2 si sK s1 s2 sj sK s1 s2 si sK

a1j a2j aij aKj

Time= 1 t t+1 T

  • 1 ot
  • t+1 oT = Observations

Trellis representation of an HMM

slide-21
SLIDE 21

HMM: Forward Algorithm

  • Instead pose as recursive problem
  • Dynamic program to compute forward probability in state St = k

after observing the first t observations

  • Algorithm:
  • initialize: t=1
  • iterate with recursion: t=2, … t=k …
  • terminate: t=T

t k t t k

slide-22
SLIDE 22

HMM: Problems

  • Evaluation: Given parameters and observation sequence,

find probability (likelihood) of observed sequence

  • forward algorithm
  • Decoding: Given HMM parameters and observation

sequence, find the most probable sequence of hidden states

  • Viterbi algorithm
  • Learning: Given HMM with unknown parameters and
  • bservation sequence, find the parameters that maximizes

likelihood of data

  • Forward-Backward algorithm
slide-23
SLIDE 23

HMM: Decoding Problem 1

  • Given
  • Probability that hidden state at time t was k

We know how to compute the first part using forward algorithm

slide-24
SLIDE 24

HMM: Backward Probability

  • Similar to forward probability, we can express as a

recursion problem

  • Dynamic program
  • Initialize
  • Iterate using recursion

t t t k

slide-25
SLIDE 25

HMM: Decoding Problem 1

  • Probability that hidden state at time t was k
  • Most likely state assignment

Forward- backward algorithm

slide-26
SLIDE 26

HMM: Decoding Problem 2

  • Given
  • What is most likely state sequence?

probability of most likely sequence of states ending at state ST=k

slide-27
SLIDE 27

HMM: Viterbi Algorithm

  • Compute probability recursively over t
  • Use dynamic programming again!
slide-28
SLIDE 28

HMM: Viterbi Algorithm

  • Initialize
  • Iterate
  • Terminate

Traceback

slide-29
SLIDE 29

HMM: Computational Complexity

  • What is the running time for the forward algorithm,

backward algorithm, and Viterbi? O(K2T) vs O(KT)!

slide-30
SLIDE 30

HMM: Problems

  • Evaluation: Given parameters and observation sequence,

find probability (likelihood) of observed sequence

  • forward algorithm
  • Decoding: Given HMM parameters and observation

sequence, find the most probable sequence of hidden states

  • Viterbi algorithm
  • Learning: Given HMM with unknown parameters and
  • bservation sequence, find the parameters that maximizes

likelihood of data

  • Forward-Backward, Baum-Welch algorithm
slide-31
SLIDE 31

HMM: Learning Problem

  • Given only observations
  • Find parameters that maximize likelihood
  • Need to learn hidden state sequences as well
slide-32
SLIDE 32

HMM: Baum-Welch (EM) Algorithm

  • Randomly initialize parameters
  • E-step: Fix parameters, find expected state assignment

Forward-backward algorithm

slide-33
SLIDE 33

HMM: Baum-Welch (EM) Algorithm

  • Expected number of times we will be in state i
  • Expected number of transitions from state i
  • Expected number of transitions from state i to j
slide-34
SLIDE 34

HMM: Baum-Welch (EM) Algorithm

  • M-step: Fix expected state assignments, update

parameters

slide-35
SLIDE 35

HMM: Problems

  • Evaluation: Given parameters and observation sequence,

find probability (likelihood) of observed sequence

  • forward algorithm
  • Decoding: Given HMM parameters and observation

sequence, find the most probable sequence of hidden states

  • Viterbi algorithm
  • Learning: Given HMM with unknown parameters and
  • bservation sequence, find the parameters that maximizes

likelihood of data

  • Forward-Backward (Baum-Welch) algorithm
slide-36
SLIDE 36

HMM vs Linear Dynamical Systems

  • HMM
  • States are discrete
  • Observations are discrete or continuous
  • Linear dynamical systems
  • Observations and states are multivariate Gaussians
  • Can use Kalman Filters to solve
slide-37
SLIDE 37

Linear State Space Models

  • States & observations are Gaussian
  • Kalman filter: (recursive) prediction and

update

slide-38
SLIDE 38

More examples

  • Location prediction
  • Privacy preserving data monitoring
slide-39
SLIDE 39

Next Location Prediction: Definitions

slide-40
SLIDE 40

Source: A. Monreale, F. Pinelli, R. Trasarti, F. Giannotti. WhereNext: a Location Predictor on Trajectory Pattern Mining. KDD 2009

  • Personalization
  • Individual-based methods only utilize the history of one object to predict its

future locations.

  • General-based methods use the movement history of other objects

additionally (e.g. similar objects or similar trajectories) to predict the object’s future location.

Next Location Prediction: Classification of Methods

slide-41
SLIDE 41

Source: A. Monreale, F. Pinelli, R. Trasarti, F. Giannotti. WhereNext: a Location Predictor on Trajectory Pattern Mining. KDD 2009

  • Temporal Representation
  • Location-series representations define trajectories as a set of

sequenced locations ordered in time.

  • Fixed-interval time representations use a fixed time interval

between two consecutive locations

  • Variable-interval time representations allow variable

transition times between sequenced locations

Next Location Prediction: Classification of Methods

slide-42
SLIDE 42
  • Spatial Representation
  • Grid-based methods divide space into fixed-size cells which

can be simple rectangular regions

  • Frequent/dense regions using clustering methods such as

density-based algorithms such as DBSCAN and hierarchical clustering.

  • Semantic-based methods use semantic features of locations

in addition to the geographic information, e.g. home, bank, school.

Next Location Prediction: Classification of Methods

slide-43
SLIDE 43
  • Mobility Learning Method
  • Model-based (formulate the movement of moving objects using

mathematical models)

  • Markov Chains
  • Recursive Motion Function (Y. Tao et. al., ACM SIGMOD 2004)
  • Semi-Lazy Hidden Markov Model (J. Zhou et. al., ACM SIGKDD 2013)
  • Deep learning models
  • Pattern-based (exploit pattern mining algorithms for prediction)
  • Trajectory Pattern Mining (A. Monreale et. Al., ACM SIGKDD 2009)
  • Hybrid
  • Recursive Motion Function + Sequential Pattern Mining (H. Jeung et. al., ICDE 2008)

Next Location Prediction: Classification of Methods

slide-44
SLIDE 44

Preliminary Results

Prediction error for different prediction length using (a) Brinkhoff , and (b) Periodical Synthetic dataset

(a) (b)