Introduction to Hidden Markov Models CMSC 473/673 UMBC October 9 th - - PowerPoint PPT Presentation
Introduction to Hidden Markov Models CMSC 473/673 UMBC October 9 th - - PowerPoint PPT Presentation
Introduction to Hidden Markov Models CMSC 473/673 UMBC October 9 th , 2017 673 Announcement: Graduate Paper Due this Wednesday 10/11 at 11:59 AM (< 2 days) Use project id paper1 for the submit utility Course Announcement: Assignment 2 Due
673 Announcement: Graduate Paper
Due this Wednesday 10/11 at 11:59 AM (< 2 days) Use project id paper1 for the submit utility
Course Announcement: Assignment 2
Due next Wednesday, 10/18 (~9 days) Any questions?
Course Announcement: Midterm
Monday, 10/30 (3 weeks) Format: In-class (75 minutes) You may bring any notes you created yourself; they must be turned in with the exam (photocopies are OK) Some practice questions will be out next Wednesday (10/18)
Recap from last time…
Expectation Maximization (EM)
- 0. Assume some value for your parameters
Two step, iterative algorithm
- 1. E-step: count under uncertainty, assuming these
parameters
- 2. M-step: maximize log-likelihood, assuming these
uncertain counts
estimated counts
Counting Requires Marginalizing
E-step: count under uncertainty, assuming these parameters
Counting Requires Marginalizing
w z1 & w z2 & w z3 & w z4 & w
E-step: count under uncertainty, assuming these parameters
break into 4 disjoint pieces
EM Example 1: Three Coins/Class-based Unigrams
Imagine three coins Flip 1st coin (penny) If heads: flip 2nd coin (dollar coin) If tails: flip 3rd coin (dime)
- bserved:
a, b, e, etc. We run the code, vs. The run failed unobserved: vowel or constonant? part of speech?
EM Example 2: Machine Translation Alignment
Want: P(f|e) But don’t know how to train this directly… Solution: Use P(a, f|e), where a is an alignment Remember:
Le chat est sur la chaise verte The cat is on the green chair
marginalizing across all possible alignments
IBM Model 1 (1993)
f: vector of French words (visualization of alignment) e: vector of English words a: vector of alignment indices t(fj|ei) : translation probability
- f the word fj given the word
ei Le chat est sur la chaise verte The cat is on the green chair 0 1 2 3 4 6 5
Learning the Alignments through EM
- 0. Assume some value for
and compute other parameter values Two step, iterative algorithm
- 1. E-step: count alignments and translations under
uncertainty, assuming these parameters
- 2. M-step: maximize log-likelihood (update
parameters), using uncertain counts
estimated counts
P( | “the cat”) P( | “the cat”)
le chat le chat
Follow up: IBM Model 1 Parameters
For IBM model 1, we can compute all parameters given translation parameters: How many of these are there? |French vocabulary| x |English vocabulary| From Rebecca: See Sec. 31 of the Knight tutorial for more about space considerations
Alignment: Output and Complexities
Component of machine translation systems Produce a translation lexicon automatically Cross-lingual projection/extraction of information Supervision for training other models (for example, neural MT systems)
http://www.cis.upenn.edu/~ccb/figures/research-statement/pivoting.jpg
Any Questions on What We’ve Seen of EM So Far?
Hidden Markov Models
…
Agenda
HMM Motivation (Part of Speech) and Brief Definition What is Part of Speech? HMM Detailed Definition HMM Tasks
Hidden Markov Models
Class-based Model Use different distributions to explain groupings of
- bservations
Sequence Model Bigram model of the classes, not the observations Implicitly model all possible class sequences Algorithms for finding best sequence, and for the marginal likelihood
Hidden Markov Models: Part of Speech
p(British Left Waffles on Falkland Islands)
Adjective Noun Verb Noun Verb Noun Prep Noun Noun Prep Noun Noun (i): (ii):
Class-based model Bigram model
- f the classes
Model all class sequences
Hidden Markov Models: Part of Speech
p(British Left Waffles on Falkland Islands)
Adjective Noun Verb Noun Verb Noun Prep Noun Noun Prep Noun Noun (i): (ii):
Class-based model Bigram model
- f the classes
Model all class sequences
Hidden Markov Models: Part of Speech
p(British Left Waffles on Falkland Islands)
Adjective Noun Verb Noun Verb Noun Prep Noun Noun Prep Noun Noun (i): (ii):
Class-based model Bigram model
- f the classes
Model all class sequences
Hidden Markov Models: Part of Speech
p(British Left Waffles on Falkland Islands)
Adjective Noun Verb Noun Verb Noun Prep Noun Noun Prep Noun Noun (i): (ii):
Class-based model Bigram model
- f the classes
Model all class sequences
Hidden Markov Models: Part of Speech
p(British Left Waffles on Falkland Islands)
1. Explain this sentence as a sequence of (likely?) latent (unseen) tags (labels) 2. Produce a tag sequence for this sentence Adjective Noun Verb Noun Verb Noun Prep Noun Noun Prep Noun Noun (i): (ii):
Class-based model Bigram model
- f the classes
Model all class sequences
Agenda
HMM Motivation (Part of Speech) and Brief Definition What is Part of Speech? HMM Detailed Definition HMM Tasks
Brief Aside: Parts of Speech
Classes of words that behave like one another in similar syntactic contexts
Parts of Speech
Classes of words that behave like one another in similar syntactic contexts Pronunciation (stress) can differ: object (noun: OB-ject) vs. object (verb: ob-JECT) It can help improve the inputs to other systems (text-to-speech, syntactic parsing)
XKCD, #1771: https://imgs.xkcd.com/comics/it_was_i.png
Parts of Speech
Adapted from Luke Zettlemoyer
Nouns milk cat cats UMBC Baltimore bread speak give Verbs run Adjectives would-be wettest large happy red fake
Parts of Speech
Adapted from Luke Zettlemoyer
Nouns milk cat cats UMBC Baltimore bread speak give Verbs run Adjectives would-be wettest large happy red fake Determiners Conjunctions a the every what and
- r
if because Prepositions in under top
Parts of Speech
Adapted from Luke Zettlemoyer
Nouns milk cat cats UMBC Baltimore bread speak give Verbs run Adjectives would-be wettest large happy red fake can do may Determiners Conjunctions a the every what and
- r
if because Prepositions in under top
“I can eat.”
Parts of Speech
Adapted from Luke Zettlemoyer
Nouns milk cat cats UMBC Baltimore bread speak give Verbs run Adjectives would-be wettest large happy red fake can do may Determiners Conjunctions a the every what and
- r
if because Prepositions in under top
Parts of Speech
Adapted from Luke Zettlemoyer
Nouns milk cat cats UMBC Baltimore bread speak give Verbs run Adjectives would-be wettest large happy red fake can do may Determiners Conjunctions a the every what and
- r
if because Prepositions in under top Adverbs recently happily
Parts of Speech
Adapted from Luke Zettlemoyer
Nouns milk cat cats UMBC Baltimore bread speak give Verbs run Adjectives would-be wettest large happy red fake can do may Determiners Conjunctions a the every what and
- r
if because Prepositions in under top Adverbs recently happily
“Today, we eat there.”
then there (location)
Parts of Speech
Adapted from Luke Zettlemoyer
Nouns milk cat cats UMBC Baltimore bread speak give Verbs run Adjectives would-be wettest large happy red fake can do may Determiners Conjunctions a the every what and
- r
if because Prepositions in under top Adverbs recently happily
“I ate.” “There is a cat.”
then there (location) I you there
Parts of Speech
Adapted from Luke Zettlemoyer
Nouns milk cat cats UMBC Baltimore bread speak give Verbs run Adjectives would-be wettest large happy red fake can do may Determiners Conjunctions a the every what and
- r
if because Prepositions in under top Adverbs recently happily then there (location) I you there Numbers
- ne
1,324
Closed class words Open class words
Parts of Speech
Adapted from Luke Zettlemoyer
Nouns milk cat cats UMBC Baltimore bread speak give Verbs run Adjectives would-be wettest large happy red fake can do may Determiners Conjunctions a the every what and
- r
if because Prepositions in under top Adverbs recently happily then there (location) I you there Numbers
- ne
1,324
Parts of Speech
Adapted from Luke Zettlemoyer
Closed class words Open class words Nouns milk cat cats UMBC Baltimore bread speak give can do may Verbs Adjectives would-be wettest large happy red fake
Kamp & Partee (1995)
Adverbs recently happily then there (location)
intransitive
run
ditransitive transitive subsective non- subsective modals, auxiliaries
I you Determiners Prepositions Conjunctions
Pronouns
a the every what in under top and
- r
if because there Numbers
- ne
1,324
Parts of Speech
Adapted from Luke Zettlemoyer
Closed class words Open class words Nouns milk cat cats UMBC Baltimore bread speak give can do may Verbs Adjectives would-be wettest large happy red fake
Kamp & Partee (1995)
Adverbs recently happily then there (location)
intransitive
run
ditransitive transitive subsective non- subsective modals, auxiliaries
I you Determiners Prepositions Conjunctions
Pronouns
a the every what in under top Particles
(set) up
so (far) not (call)
- ff
and
- r
if because there Numbers
- ne
1,324
Parts of Speech
Adapted from Luke Zettlemoyer
Closed class words Open class words Nouns milk cat cats UMBC Baltimore bread speak give can do may Verbs Adjectives would-be wettest large happy red fake
Kamp & Partee (1995)
Adverbs recently happily then there (location)
intransitive
run
ditransitive transitive subsective non- subsective modals, auxiliaries
Numbers I you
- ne
1,324 Determiners Prepositions Conjunctions
Pronouns
and
- r
if a the every what in under top Particles
(set) up
so (far) not (call)
- ff
Language evolves! “I’m reading this because I want to procrastinate.” “I’m reading this because procrastination.”
https://www.theatlantic.com/technology/archive/2013/11/english-has-a-new-preposition-because-internet/281601/
because because
Agenda
HMM Motivation (Part of Speech) and Brief Definition What is Part of Speech? HMM Detailed Definition HMM Tasks
Hidden Markov Models: Part of Speech
p(British Left Waffles on Falkland Islands)
Adjective Noun Verb Noun Verb Noun Prep Noun Noun Prep Noun Noun (i): (ii):
Class-based model Bigram model
- f the classes
Model all class sequences
𝑞 𝑥𝑗|𝑨𝑗
Hidden Markov Models: Part of Speech
p(British Left Waffles on Falkland Islands)
Adjective Noun Verb Noun Verb Noun Prep Noun Noun Prep Noun Noun (i): (ii):
Class-based model Bigram model
- f the classes
Model all class sequences
𝑞 𝑥𝑗|𝑨𝑗 𝑞 𝑨𝑗|𝑨𝑗−1
Hidden Markov Models: Part of Speech
p(British Left Waffles on Falkland Islands)
Adjective Noun Verb Noun Verb Noun Prep Noun Noun Prep Noun Noun (i): (ii):
Class-based model Bigram model
- f the classes
Model all class sequences
𝑞 𝑥𝑗|𝑨𝑗 𝑞 𝑨𝑗|𝑨𝑗−1
𝑨1,..,𝑨𝑂
𝑞 𝑨1, 𝑥1, 𝑨2, 𝑥2, … , 𝑨𝑂, 𝑥𝑂
Hidden Markov Model
Goal: maximize (log-)likelihood In practice: we don’t actually observe these z values; we just see the words w
𝑞 𝑨1, 𝑥1,𝑨2, 𝑥2,… ,𝑨𝑂,𝑥𝑂 = 𝑞 𝑨1| 𝑨0 𝑞 𝑥1|𝑨1 ⋯ 𝑞 𝑨𝑂| 𝑨𝑂−1 𝑞 𝑥𝑂|𝑨𝑂 = ෑ
𝑗
𝑞 𝑥𝑗|𝑨𝑗 𝑞 𝑨𝑗| 𝑨𝑗−1
Hidden Markov Model
Goal: maximize (log-)likelihood In practice: we don’t actually observe these z values; we just see the words w
𝑞 𝑨1, 𝑥1,𝑨2, 𝑥2,… ,𝑨𝑂,𝑥𝑂 = 𝑞 𝑨1| 𝑨0 𝑞 𝑥1|𝑨1 ⋯ 𝑞 𝑨𝑂| 𝑨𝑂−1 𝑞 𝑥𝑂|𝑨𝑂 = ෑ
𝑗
𝑞 𝑥𝑗|𝑨𝑗 𝑞 𝑨𝑗| 𝑨𝑗−1
if we knew the probability parameters then we could estimate z and evaluate likelihood… but we don’t! :( if we did observe z, estimating the probability parameters would be easy… but we don’t! :(
Hidden Markov Model Terminology
Each zi can take the value of one of K latent states
𝑞 𝑨1, 𝑥1,𝑨2, 𝑥2,… ,𝑨𝑂,𝑥𝑂 = 𝑞 𝑨1| 𝑨0 𝑞 𝑥1|𝑨1 ⋯ 𝑞 𝑨𝑂| 𝑨𝑂−1 𝑞 𝑥𝑂|𝑨𝑂 = ෑ
𝑗
𝑞 𝑥𝑗|𝑨𝑗 𝑞 𝑨𝑗| 𝑨𝑗−1
Hidden Markov Model Terminology
Each zi can take the value of one of K latent states
𝑞 𝑨1, 𝑥1,𝑨2, 𝑥2,… ,𝑨𝑂,𝑥𝑂 = 𝑞 𝑨1| 𝑨0 𝑞 𝑥1|𝑨1 ⋯ 𝑞 𝑨𝑂| 𝑨𝑂−1 𝑞 𝑥𝑂|𝑨𝑂 = ෑ
𝑗
𝑞 𝑥𝑗|𝑨𝑗 𝑞 𝑨𝑗| 𝑨𝑗−1
transition probabilities/parameters
Hidden Markov Model Terminology
Each zi can take the value of one of K latent states
𝑞 𝑨1, 𝑥1,𝑨2, 𝑥2,… ,𝑨𝑂,𝑥𝑂 = 𝑞 𝑨1| 𝑨0 𝑞 𝑥1|𝑨1 ⋯ 𝑞 𝑨𝑂| 𝑨𝑂−1 𝑞 𝑥𝑂|𝑨𝑂 = ෑ
𝑗
𝑞 𝑥𝑗|𝑨𝑗 𝑞 𝑨𝑗| 𝑨𝑗−1
emission probabilities/parameters transition probabilities/parameters
Hidden Markov Model Terminology
Each zi can take the value of one of K latent states Transition and emission distributions do not change
𝑞 𝑨1, 𝑥1,𝑨2, 𝑥2,… ,𝑨𝑂,𝑥𝑂 = 𝑞 𝑨1| 𝑨0 𝑞 𝑥1|𝑨1 ⋯ 𝑞 𝑨𝑂| 𝑨𝑂−1 𝑞 𝑥𝑂|𝑨𝑂 = ෑ
𝑗
𝑞 𝑥𝑗|𝑨𝑗 𝑞 𝑨𝑗| 𝑨𝑗−1
emission probabilities/parameters transition probabilities/parameters
Hidden Markov Model Terminology
Each zi can take the value of one of K latent states Transition and emission distributions do not change Q: How many different probability values are there with K states and V vocab items?
𝑞 𝑨1, 𝑥1,𝑨2, 𝑥2,… ,𝑨𝑂,𝑥𝑂 = 𝑞 𝑨1| 𝑨0 𝑞 𝑥1|𝑨1 ⋯ 𝑞 𝑨𝑂| 𝑨𝑂−1 𝑞 𝑥𝑂|𝑨𝑂 = ෑ
𝑗
𝑞 𝑥𝑗|𝑨𝑗 𝑞 𝑨𝑗| 𝑨𝑗−1
emission probabilities/parameters transition probabilities/parameters
Hidden Markov Model Terminology
Each zi can take the value of one of K latent states Transition and emission distributions do not change Q: How many different probability values are there with K states and V vocab items? A: VK emission values and K2 transition values
𝑞 𝑨1, 𝑥1,𝑨2, 𝑥2,… ,𝑨𝑂,𝑥𝑂 = 𝑞 𝑨1| 𝑨0 𝑞 𝑥1|𝑨1 ⋯ 𝑞 𝑨𝑂| 𝑨𝑂−1 𝑞 𝑥𝑂|𝑨𝑂 = ෑ
𝑗
𝑞 𝑥𝑗|𝑨𝑗 𝑞 𝑨𝑗| 𝑨𝑗−1
emission probabilities/parameters transition probabilities/parameters
Hidden Markov Model Representation
𝑞 𝑨1, 𝑥1,𝑨2, 𝑥2,… ,𝑨𝑂,𝑥𝑂 = 𝑞 𝑨1| 𝑨0 𝑞 𝑥1|𝑨1 ⋯ 𝑞 𝑨𝑂| 𝑨𝑂−1 𝑞 𝑥𝑂|𝑨𝑂 = ෑ
𝑗
𝑞 𝑥𝑗|𝑨𝑗 𝑞 𝑨𝑗| 𝑨𝑗−1
emission probabilities/parameters transition probabilities/parameters
z1
w1
…
w2 w3 w4
z2 z3 z4
represent the probabilities and independence assumptions in a graph
Hidden Markov Model Representation
𝑞 𝑨1, 𝑥1,𝑨2, 𝑥2,… ,𝑨𝑂,𝑥𝑂 = 𝑞 𝑨1| 𝑨0 𝑞 𝑥1|𝑨1 ⋯ 𝑞 𝑨𝑂| 𝑨𝑂−1 𝑞 𝑥𝑂|𝑨𝑂 = ෑ
𝑗
𝑞 𝑥𝑗|𝑨𝑗 𝑞 𝑨𝑗| 𝑨𝑗−1
emission probabilities/parameters transition probabilities/parameters
z1
w1
…
w2 w3 w4
z2 z3 z4
Graphical Models (see 478/678)
Hidden Markov Model Representation
𝑞 𝑨1, 𝑥1,𝑨2, 𝑥2,… ,𝑨𝑂,𝑥𝑂 = 𝑞 𝑨1| 𝑨0 𝑞 𝑥1|𝑨1 ⋯ 𝑞 𝑨𝑂| 𝑨𝑂−1 𝑞 𝑥𝑂|𝑨𝑂 = ෑ
𝑗
𝑞 𝑥𝑗|𝑨𝑗 𝑞 𝑨𝑗| 𝑨𝑗−1
emission probabilities/parameters transition probabilities/parameters
z1
w1
…
w2 w3 w4
z2 z3 z4
𝑞 𝑥1|𝑨1 𝑞 𝑥2|𝑨2 𝑞 𝑥3|𝑨3 𝑞 𝑥4|𝑨4
Hidden Markov Model Representation
𝑞 𝑨1, 𝑥1,𝑨2, 𝑥2,… ,𝑨𝑂,𝑥𝑂 = 𝑞 𝑨1| 𝑨0 𝑞 𝑥1|𝑨1 ⋯ 𝑞 𝑨𝑂| 𝑨𝑂−1 𝑞 𝑥𝑂|𝑨𝑂 = ෑ
𝑗
𝑞 𝑥𝑗|𝑨𝑗 𝑞 𝑨𝑗| 𝑨𝑗−1
emission probabilities/parameters transition probabilities/parameters
z1
w1
…
w2 w3 w4
z2 z3 z4
𝑞 𝑥1|𝑨1 𝑞 𝑥2|𝑨2 𝑞 𝑥3|𝑨3 𝑞 𝑥4|𝑨4
𝑞 𝑨2| 𝑨1 𝑞 𝑨3| 𝑨2 𝑞 𝑨4| 𝑨3
Hidden Markov Model Representation
𝑞 𝑨1, 𝑥1,𝑨2, 𝑥2,… ,𝑨𝑂,𝑥𝑂 = 𝑞 𝑨1| 𝑨0 𝑞 𝑥1|𝑨1 ⋯ 𝑞 𝑨𝑂| 𝑨𝑂−1 𝑞 𝑥𝑂|𝑨𝑂 = ෑ
𝑗
𝑞 𝑥𝑗|𝑨𝑗 𝑞 𝑨𝑗| 𝑨𝑗−1
emission probabilities/parameters transition probabilities/parameters
z1
w1
…
w2 w3 w4
z2 z3 z4
𝑞 𝑥1|𝑨1 𝑞 𝑥2|𝑨2 𝑞 𝑥3|𝑨3 𝑞 𝑥4|𝑨4
𝑞 𝑨2| 𝑨1 𝑞 𝑨3| 𝑨2 𝑞 𝑨4| 𝑨3 𝑞 𝑨1| 𝑨0 initial starting distribution (“BOS”)
Hidden Markov Model Representation
𝑞 𝑨1, 𝑥1,𝑨2, 𝑥2,… ,𝑨𝑂,𝑥𝑂 = 𝑞 𝑨1| 𝑨0 𝑞 𝑥1|𝑨1 ⋯ 𝑞 𝑨𝑂| 𝑨𝑂−1 𝑞 𝑥𝑂|𝑨𝑂 = ෑ
𝑗
𝑞 𝑥𝑗|𝑨𝑗 𝑞 𝑨𝑗| 𝑨𝑗−1
emission probabilities/parameters transition probabilities/parameters
z1
w1
…
w2 w3 w4
z2 z3 z4
𝑞 𝑥1|𝑨1 𝑞 𝑥2|𝑨2 𝑞 𝑥3|𝑨3 𝑞 𝑥4|𝑨4
𝑞 𝑨2| 𝑨1 𝑞 𝑨3| 𝑨2 𝑞 𝑨4| 𝑨3 𝑞 𝑨1| 𝑨0 initial starting distribution (“BOS”)
Each zi can take the value of one of K latent states Transition and emission distributions do not change
Example: 2-state Hidden Markov Model as a Lattice
z1 = N
w1
…
w2 w3 w4
z2 = N z3 = N z4 = N z1 = V z2 = V z3 = V z4 = V
…
Example: 2-state Hidden Markov Model as a Lattice
z1 = N
w1
…
w2 w3 w4
𝑞 𝑥1|𝑂 𝑞 𝑥2|𝑂 𝑞 𝑥3|𝑂 𝑞 𝑥4|𝑂
z2 = N z3 = N z4 = N z1 = V z2 = V z4 = V
…
𝑞 𝑥4|𝑊 𝑞 𝑥3|𝑊 𝑞 𝑥2|𝑊 𝑞 𝑥1|𝑊
z3 = V
Example: 2-state Hidden Markov Model as a Lattice
z1 = N
w1
…
w2 w3 w4
𝑞 𝑥1|𝑂 𝑞 𝑥2|𝑂 𝑞 𝑥3|𝑂 𝑞 𝑥4|𝑂
𝑞 𝑂| start z2 = N z3 = N z4 = N z1 = V z2 = V z3 = V z4 = V 𝑞 𝑊| 𝑊 𝑞 𝑊| 𝑊 𝑞 𝑊| 𝑊 𝑞 𝑊| start
…
𝑞 𝑥4|𝑊 𝑞 𝑥3|𝑊 𝑞 𝑥2|𝑊 𝑞 𝑥1|𝑊
𝑞 𝑂| 𝑂 𝑞 𝑂| 𝑂 𝑞 𝑂| 𝑂
Example: 2-state Hidden Markov Model as a Lattice
z1 = N
w1
…
w2 w3 w4
𝑞 𝑥1|𝑂 𝑞 𝑥2|𝑂 𝑞 𝑥3|𝑂 𝑞 𝑥4|𝑂
𝑞 𝑂| start z2 = N z3 = N z4 = N z1 = V z2 = V z3 = V z4 = V 𝑞 𝑊| 𝑊 𝑞 𝑊| 𝑊 𝑞 𝑊| 𝑊 𝑞 𝑊| start
…
𝑞 𝑊| 𝑂 𝑞 𝑊| 𝑂 𝑞 𝑊| 𝑂 𝑞 𝑂| 𝑊 𝑞 𝑂| 𝑊 𝑞 𝑂| 𝑊
𝑞 𝑥4|𝑊 𝑞 𝑥3|𝑊 𝑞 𝑥2|𝑊 𝑞 𝑥1|𝑊
𝑞 𝑂| 𝑂 𝑞 𝑂| 𝑂 𝑞 𝑂| 𝑂
Unigram Language Model
Comparison of Joint Probabilities
𝑞 𝑥1, 𝑥2,… , 𝑥𝑂 = 𝑞 𝑥1 𝑞 𝑥2 ⋯ 𝑞 𝑥𝑂 = ෑ
𝑗
𝑞 𝑥𝑗
Unigram Class-based Language Model (“K” coins) Unigram Language Model
Comparison of Joint Probabilities
𝑞 𝑥1, 𝑥2,… , 𝑥𝑂 = 𝑞 𝑥1 𝑞 𝑥2 ⋯ 𝑞 𝑥𝑂 = ෑ
𝑗
𝑞 𝑥𝑗 𝑞 𝑨1,𝑥1, 𝑨2, 𝑥2,… ,𝑨𝑂, 𝑥𝑂 = 𝑞 𝑨1 𝑞 𝑥1|𝑨1 ⋯𝑞 𝑨𝑂 𝑞 𝑥𝑂|𝑨𝑂 = ෑ
𝑗
𝑞 𝑥𝑗|𝑨𝑗 𝑞 𝑨𝑗
Hidden Markov Model Unigram Class-based Language Model (“K” coins) Unigram Language Model
Comparison of Joint Probabilities
𝑞 𝑥1, 𝑥2,… , 𝑥𝑂 = 𝑞 𝑥1 𝑞 𝑥2 ⋯ 𝑞 𝑥𝑂 = ෑ
𝑗
𝑞 𝑥𝑗 𝑞 𝑨1,𝑥1, 𝑨2, 𝑥2,… ,𝑨𝑂, 𝑥𝑂 = 𝑞 𝑨1 𝑞 𝑥1|𝑨1 ⋯𝑞 𝑨𝑂 𝑞 𝑥𝑂|𝑨𝑂 = ෑ
𝑗
𝑞 𝑥𝑗|𝑨𝑗 𝑞 𝑨𝑗 𝑞 𝑨1, 𝑥1,𝑨2, 𝑥2,… ,𝑨𝑂,𝑥𝑂 = 𝑞 𝑨1| 𝑨0 𝑞 𝑥1|𝑨1 ⋯ 𝑞 𝑨𝑂| 𝑨𝑂−1 𝑞 𝑥𝑂|𝑨𝑂 = ෑ
𝑗
𝑞 𝑥𝑗|𝑨𝑗 𝑞 𝑨𝑗| 𝑨𝑗−1