Introduction to Hidden Markov Models CMSC 473/673 UMBC October 9 th - - PowerPoint PPT Presentation

introduction to hidden markov models
SMART_READER_LITE
LIVE PREVIEW

Introduction to Hidden Markov Models CMSC 473/673 UMBC October 9 th - - PowerPoint PPT Presentation

Introduction to Hidden Markov Models CMSC 473/673 UMBC October 9 th , 2017 673 Announcement: Graduate Paper Due this Wednesday 10/11 at 11:59 AM (< 2 days) Use project id paper1 for the submit utility Course Announcement: Assignment 2 Due


slide-1
SLIDE 1

Introduction to Hidden Markov Models

CMSC 473/673 UMBC October 9th, 2017

slide-2
SLIDE 2

673 Announcement: Graduate Paper

Due this Wednesday 10/11 at 11:59 AM (< 2 days) Use project id paper1 for the submit utility

slide-3
SLIDE 3

Course Announcement: Assignment 2

Due next Wednesday, 10/18 (~9 days) Any questions?

slide-4
SLIDE 4

Course Announcement: Midterm

Monday, 10/30 (3 weeks) Format: In-class (75 minutes) You may bring any notes you created yourself; they must be turned in with the exam (photocopies are OK) Some practice questions will be out next Wednesday (10/18)

slide-5
SLIDE 5

Recap from last time…

slide-6
SLIDE 6

Expectation Maximization (EM)

  • 0. Assume some value for your parameters

Two step, iterative algorithm

  • 1. E-step: count under uncertainty, assuming these

parameters

  • 2. M-step: maximize log-likelihood, assuming these

uncertain counts

estimated counts

slide-7
SLIDE 7

Counting Requires Marginalizing

E-step: count under uncertainty, assuming these parameters

slide-8
SLIDE 8

Counting Requires Marginalizing

w z1 & w z2 & w z3 & w z4 & w

E-step: count under uncertainty, assuming these parameters

break into 4 disjoint pieces

slide-9
SLIDE 9

EM Example 1: Three Coins/Class-based Unigrams

Imagine three coins Flip 1st coin (penny) If heads: flip 2nd coin (dollar coin) If tails: flip 3rd coin (dime)

  • bserved:

a, b, e, etc. We run the code, vs. The run failed unobserved: vowel or constonant? part of speech?

slide-10
SLIDE 10

EM Example 2: Machine Translation Alignment

Want: P(f|e) But don’t know how to train this directly… Solution: Use P(a, f|e), where a is an alignment Remember:

Le chat est sur la chaise verte The cat is on the green chair

marginalizing across all possible alignments

slide-11
SLIDE 11

IBM Model 1 (1993)

f: vector of French words (visualization of alignment) e: vector of English words a: vector of alignment indices t(fj|ei) : translation probability

  • f the word fj given the word

ei Le chat est sur la chaise verte The cat is on the green chair 0 1 2 3 4 6 5

slide-12
SLIDE 12

Learning the Alignments through EM

  • 0. Assume some value for

and compute other parameter values Two step, iterative algorithm

  • 1. E-step: count alignments and translations under

uncertainty, assuming these parameters

  • 2. M-step: maximize log-likelihood (update

parameters), using uncertain counts

estimated counts

P( | “the cat”) P( | “the cat”)

le chat le chat

slide-13
SLIDE 13

Follow up: IBM Model 1 Parameters

For IBM model 1, we can compute all parameters given translation parameters: How many of these are there? |French vocabulary| x |English vocabulary| From Rebecca: See Sec. 31 of the Knight tutorial for more about space considerations

slide-14
SLIDE 14

Alignment: Output and Complexities

Component of machine translation systems Produce a translation lexicon automatically Cross-lingual projection/extraction of information Supervision for training other models (for example, neural MT systems)

http://www.cis.upenn.edu/~ccb/figures/research-statement/pivoting.jpg

slide-15
SLIDE 15

Any Questions on What We’ve Seen of EM So Far?

slide-16
SLIDE 16

Hidden Markov Models

slide-17
SLIDE 17

Agenda

HMM Motivation (Part of Speech) and Brief Definition What is Part of Speech? HMM Detailed Definition HMM Tasks

slide-18
SLIDE 18

Hidden Markov Models

Class-based Model Use different distributions to explain groupings of

  • bservations

Sequence Model Bigram model of the classes, not the observations Implicitly model all possible class sequences Algorithms for finding best sequence, and for the marginal likelihood

slide-19
SLIDE 19

Hidden Markov Models: Part of Speech

p(British Left Waffles on Falkland Islands)

Adjective Noun Verb Noun Verb Noun Prep Noun Noun Prep Noun Noun (i): (ii):

Class-based model Bigram model

  • f the classes

Model all class sequences

slide-20
SLIDE 20

Hidden Markov Models: Part of Speech

p(British Left Waffles on Falkland Islands)

Adjective Noun Verb Noun Verb Noun Prep Noun Noun Prep Noun Noun (i): (ii):

Class-based model Bigram model

  • f the classes

Model all class sequences

slide-21
SLIDE 21

Hidden Markov Models: Part of Speech

p(British Left Waffles on Falkland Islands)

Adjective Noun Verb Noun Verb Noun Prep Noun Noun Prep Noun Noun (i): (ii):

Class-based model Bigram model

  • f the classes

Model all class sequences

slide-22
SLIDE 22

Hidden Markov Models: Part of Speech

p(British Left Waffles on Falkland Islands)

Adjective Noun Verb Noun Verb Noun Prep Noun Noun Prep Noun Noun (i): (ii):

Class-based model Bigram model

  • f the classes

Model all class sequences

slide-23
SLIDE 23

Hidden Markov Models: Part of Speech

p(British Left Waffles on Falkland Islands)

1. Explain this sentence as a sequence of (likely?) latent (unseen) tags (labels) 2. Produce a tag sequence for this sentence Adjective Noun Verb Noun Verb Noun Prep Noun Noun Prep Noun Noun (i): (ii):

Class-based model Bigram model

  • f the classes

Model all class sequences

slide-24
SLIDE 24

Agenda

HMM Motivation (Part of Speech) and Brief Definition What is Part of Speech? HMM Detailed Definition HMM Tasks

slide-25
SLIDE 25

Brief Aside: Parts of Speech

Classes of words that behave like one another in similar syntactic contexts

slide-26
SLIDE 26

Parts of Speech

Classes of words that behave like one another in similar syntactic contexts Pronunciation (stress) can differ: object (noun: OB-ject) vs. object (verb: ob-JECT) It can help improve the inputs to other systems (text-to-speech, syntactic parsing)

slide-27
SLIDE 27

XKCD, #1771: https://imgs.xkcd.com/comics/it_was_i.png

slide-28
SLIDE 28

Parts of Speech

Adapted from Luke Zettlemoyer

Nouns milk cat cats UMBC Baltimore bread speak give Verbs run Adjectives would-be wettest large happy red fake

slide-29
SLIDE 29

Parts of Speech

Adapted from Luke Zettlemoyer

Nouns milk cat cats UMBC Baltimore bread speak give Verbs run Adjectives would-be wettest large happy red fake Determiners Conjunctions a the every what and

  • r

if because Prepositions in under top

slide-30
SLIDE 30

Parts of Speech

Adapted from Luke Zettlemoyer

Nouns milk cat cats UMBC Baltimore bread speak give Verbs run Adjectives would-be wettest large happy red fake can do may Determiners Conjunctions a the every what and

  • r

if because Prepositions in under top

“I can eat.”

slide-31
SLIDE 31

Parts of Speech

Adapted from Luke Zettlemoyer

Nouns milk cat cats UMBC Baltimore bread speak give Verbs run Adjectives would-be wettest large happy red fake can do may Determiners Conjunctions a the every what and

  • r

if because Prepositions in under top

slide-32
SLIDE 32

Parts of Speech

Adapted from Luke Zettlemoyer

Nouns milk cat cats UMBC Baltimore bread speak give Verbs run Adjectives would-be wettest large happy red fake can do may Determiners Conjunctions a the every what and

  • r

if because Prepositions in under top Adverbs recently happily

slide-33
SLIDE 33

Parts of Speech

Adapted from Luke Zettlemoyer

Nouns milk cat cats UMBC Baltimore bread speak give Verbs run Adjectives would-be wettest large happy red fake can do may Determiners Conjunctions a the every what and

  • r

if because Prepositions in under top Adverbs recently happily

“Today, we eat there.”

then there (location)

slide-34
SLIDE 34

Parts of Speech

Adapted from Luke Zettlemoyer

Nouns milk cat cats UMBC Baltimore bread speak give Verbs run Adjectives would-be wettest large happy red fake can do may Determiners Conjunctions a the every what and

  • r

if because Prepositions in under top Adverbs recently happily

“I ate.” “There is a cat.”

then there (location) I you there

slide-35
SLIDE 35

Parts of Speech

Adapted from Luke Zettlemoyer

Nouns milk cat cats UMBC Baltimore bread speak give Verbs run Adjectives would-be wettest large happy red fake can do may Determiners Conjunctions a the every what and

  • r

if because Prepositions in under top Adverbs recently happily then there (location) I you there Numbers

  • ne

1,324

slide-36
SLIDE 36

Closed class words Open class words

Parts of Speech

Adapted from Luke Zettlemoyer

Nouns milk cat cats UMBC Baltimore bread speak give Verbs run Adjectives would-be wettest large happy red fake can do may Determiners Conjunctions a the every what and

  • r

if because Prepositions in under top Adverbs recently happily then there (location) I you there Numbers

  • ne

1,324

slide-37
SLIDE 37

Parts of Speech

Adapted from Luke Zettlemoyer

Closed class words Open class words Nouns milk cat cats UMBC Baltimore bread speak give can do may Verbs Adjectives would-be wettest large happy red fake

Kamp & Partee (1995)

Adverbs recently happily then there (location)

intransitive

run

ditransitive transitive subsective non- subsective modals, auxiliaries

I you Determiners Prepositions Conjunctions

Pronouns

a the every what in under top and

  • r

if because there Numbers

  • ne

1,324

slide-38
SLIDE 38

Parts of Speech

Adapted from Luke Zettlemoyer

Closed class words Open class words Nouns milk cat cats UMBC Baltimore bread speak give can do may Verbs Adjectives would-be wettest large happy red fake

Kamp & Partee (1995)

Adverbs recently happily then there (location)

intransitive

run

ditransitive transitive subsective non- subsective modals, auxiliaries

I you Determiners Prepositions Conjunctions

Pronouns

a the every what in under top Particles

(set) up

so (far) not (call)

  • ff

and

  • r

if because there Numbers

  • ne

1,324

slide-39
SLIDE 39

Parts of Speech

Adapted from Luke Zettlemoyer

Closed class words Open class words Nouns milk cat cats UMBC Baltimore bread speak give can do may Verbs Adjectives would-be wettest large happy red fake

Kamp & Partee (1995)

Adverbs recently happily then there (location)

intransitive

run

ditransitive transitive subsective non- subsective modals, auxiliaries

Numbers I you

  • ne

1,324 Determiners Prepositions Conjunctions

Pronouns

and

  • r

if a the every what in under top Particles

(set) up

so (far) not (call)

  • ff

Language evolves! “I’m reading this because I want to procrastinate.”  “I’m reading this because procrastination.”

https://www.theatlantic.com/technology/archive/2013/11/english-has-a-new-preposition-because-internet/281601/

because because

slide-40
SLIDE 40

Agenda

HMM Motivation (Part of Speech) and Brief Definition What is Part of Speech? HMM Detailed Definition HMM Tasks

slide-41
SLIDE 41

Hidden Markov Models: Part of Speech

p(British Left Waffles on Falkland Islands)

Adjective Noun Verb Noun Verb Noun Prep Noun Noun Prep Noun Noun (i): (ii):

Class-based model Bigram model

  • f the classes

Model all class sequences

𝑞 𝑥𝑗|𝑨𝑗

slide-42
SLIDE 42

Hidden Markov Models: Part of Speech

p(British Left Waffles on Falkland Islands)

Adjective Noun Verb Noun Verb Noun Prep Noun Noun Prep Noun Noun (i): (ii):

Class-based model Bigram model

  • f the classes

Model all class sequences

𝑞 𝑥𝑗|𝑨𝑗 𝑞 𝑨𝑗|𝑨𝑗−1

slide-43
SLIDE 43

Hidden Markov Models: Part of Speech

p(British Left Waffles on Falkland Islands)

Adjective Noun Verb Noun Verb Noun Prep Noun Noun Prep Noun Noun (i): (ii):

Class-based model Bigram model

  • f the classes

Model all class sequences

𝑞 𝑥𝑗|𝑨𝑗 𝑞 𝑨𝑗|𝑨𝑗−1

𝑨1,..,𝑨𝑂

𝑞 𝑨1, 𝑥1, 𝑨2, 𝑥2, … , 𝑨𝑂, 𝑥𝑂

slide-44
SLIDE 44

Hidden Markov Model

Goal: maximize (log-)likelihood In practice: we don’t actually observe these z values; we just see the words w

𝑞 𝑨1, 𝑥1,𝑨2, 𝑥2,… ,𝑨𝑂,𝑥𝑂 = 𝑞 𝑨1| 𝑨0 𝑞 𝑥1|𝑨1 ⋯ 𝑞 𝑨𝑂| 𝑨𝑂−1 𝑞 𝑥𝑂|𝑨𝑂 = ෑ

𝑗

𝑞 𝑥𝑗|𝑨𝑗 𝑞 𝑨𝑗| 𝑨𝑗−1

slide-45
SLIDE 45

Hidden Markov Model

Goal: maximize (log-)likelihood In practice: we don’t actually observe these z values; we just see the words w

𝑞 𝑨1, 𝑥1,𝑨2, 𝑥2,… ,𝑨𝑂,𝑥𝑂 = 𝑞 𝑨1| 𝑨0 𝑞 𝑥1|𝑨1 ⋯ 𝑞 𝑨𝑂| 𝑨𝑂−1 𝑞 𝑥𝑂|𝑨𝑂 = ෑ

𝑗

𝑞 𝑥𝑗|𝑨𝑗 𝑞 𝑨𝑗| 𝑨𝑗−1

if we knew the probability parameters then we could estimate z and evaluate likelihood… but we don’t! :( if we did observe z, estimating the probability parameters would be easy… but we don’t! :(

slide-46
SLIDE 46

Hidden Markov Model Terminology

Each zi can take the value of one of K latent states

𝑞 𝑨1, 𝑥1,𝑨2, 𝑥2,… ,𝑨𝑂,𝑥𝑂 = 𝑞 𝑨1| 𝑨0 𝑞 𝑥1|𝑨1 ⋯ 𝑞 𝑨𝑂| 𝑨𝑂−1 𝑞 𝑥𝑂|𝑨𝑂 = ෑ

𝑗

𝑞 𝑥𝑗|𝑨𝑗 𝑞 𝑨𝑗| 𝑨𝑗−1

slide-47
SLIDE 47

Hidden Markov Model Terminology

Each zi can take the value of one of K latent states

𝑞 𝑨1, 𝑥1,𝑨2, 𝑥2,… ,𝑨𝑂,𝑥𝑂 = 𝑞 𝑨1| 𝑨0 𝑞 𝑥1|𝑨1 ⋯ 𝑞 𝑨𝑂| 𝑨𝑂−1 𝑞 𝑥𝑂|𝑨𝑂 = ෑ

𝑗

𝑞 𝑥𝑗|𝑨𝑗 𝑞 𝑨𝑗| 𝑨𝑗−1

transition probabilities/parameters

slide-48
SLIDE 48

Hidden Markov Model Terminology

Each zi can take the value of one of K latent states

𝑞 𝑨1, 𝑥1,𝑨2, 𝑥2,… ,𝑨𝑂,𝑥𝑂 = 𝑞 𝑨1| 𝑨0 𝑞 𝑥1|𝑨1 ⋯ 𝑞 𝑨𝑂| 𝑨𝑂−1 𝑞 𝑥𝑂|𝑨𝑂 = ෑ

𝑗

𝑞 𝑥𝑗|𝑨𝑗 𝑞 𝑨𝑗| 𝑨𝑗−1

emission probabilities/parameters transition probabilities/parameters

slide-49
SLIDE 49

Hidden Markov Model Terminology

Each zi can take the value of one of K latent states Transition and emission distributions do not change

𝑞 𝑨1, 𝑥1,𝑨2, 𝑥2,… ,𝑨𝑂,𝑥𝑂 = 𝑞 𝑨1| 𝑨0 𝑞 𝑥1|𝑨1 ⋯ 𝑞 𝑨𝑂| 𝑨𝑂−1 𝑞 𝑥𝑂|𝑨𝑂 = ෑ

𝑗

𝑞 𝑥𝑗|𝑨𝑗 𝑞 𝑨𝑗| 𝑨𝑗−1

emission probabilities/parameters transition probabilities/parameters

slide-50
SLIDE 50

Hidden Markov Model Terminology

Each zi can take the value of one of K latent states Transition and emission distributions do not change Q: How many different probability values are there with K states and V vocab items?

𝑞 𝑨1, 𝑥1,𝑨2, 𝑥2,… ,𝑨𝑂,𝑥𝑂 = 𝑞 𝑨1| 𝑨0 𝑞 𝑥1|𝑨1 ⋯ 𝑞 𝑨𝑂| 𝑨𝑂−1 𝑞 𝑥𝑂|𝑨𝑂 = ෑ

𝑗

𝑞 𝑥𝑗|𝑨𝑗 𝑞 𝑨𝑗| 𝑨𝑗−1

emission probabilities/parameters transition probabilities/parameters

slide-51
SLIDE 51

Hidden Markov Model Terminology

Each zi can take the value of one of K latent states Transition and emission distributions do not change Q: How many different probability values are there with K states and V vocab items? A: VK emission values and K2 transition values

𝑞 𝑨1, 𝑥1,𝑨2, 𝑥2,… ,𝑨𝑂,𝑥𝑂 = 𝑞 𝑨1| 𝑨0 𝑞 𝑥1|𝑨1 ⋯ 𝑞 𝑨𝑂| 𝑨𝑂−1 𝑞 𝑥𝑂|𝑨𝑂 = ෑ

𝑗

𝑞 𝑥𝑗|𝑨𝑗 𝑞 𝑨𝑗| 𝑨𝑗−1

emission probabilities/parameters transition probabilities/parameters

slide-52
SLIDE 52

Hidden Markov Model Representation

𝑞 𝑨1, 𝑥1,𝑨2, 𝑥2,… ,𝑨𝑂,𝑥𝑂 = 𝑞 𝑨1| 𝑨0 𝑞 𝑥1|𝑨1 ⋯ 𝑞 𝑨𝑂| 𝑨𝑂−1 𝑞 𝑥𝑂|𝑨𝑂 = ෑ

𝑗

𝑞 𝑥𝑗|𝑨𝑗 𝑞 𝑨𝑗| 𝑨𝑗−1

emission probabilities/parameters transition probabilities/parameters

z1

w1

w2 w3 w4

z2 z3 z4

represent the probabilities and independence assumptions in a graph

slide-53
SLIDE 53

Hidden Markov Model Representation

𝑞 𝑨1, 𝑥1,𝑨2, 𝑥2,… ,𝑨𝑂,𝑥𝑂 = 𝑞 𝑨1| 𝑨0 𝑞 𝑥1|𝑨1 ⋯ 𝑞 𝑨𝑂| 𝑨𝑂−1 𝑞 𝑥𝑂|𝑨𝑂 = ෑ

𝑗

𝑞 𝑥𝑗|𝑨𝑗 𝑞 𝑨𝑗| 𝑨𝑗−1

emission probabilities/parameters transition probabilities/parameters

z1

w1

w2 w3 w4

z2 z3 z4

Graphical Models (see 478/678)

slide-54
SLIDE 54

Hidden Markov Model Representation

𝑞 𝑨1, 𝑥1,𝑨2, 𝑥2,… ,𝑨𝑂,𝑥𝑂 = 𝑞 𝑨1| 𝑨0 𝑞 𝑥1|𝑨1 ⋯ 𝑞 𝑨𝑂| 𝑨𝑂−1 𝑞 𝑥𝑂|𝑨𝑂 = ෑ

𝑗

𝑞 𝑥𝑗|𝑨𝑗 𝑞 𝑨𝑗| 𝑨𝑗−1

emission probabilities/parameters transition probabilities/parameters

z1

w1

w2 w3 w4

z2 z3 z4

𝑞 𝑥1|𝑨1 𝑞 𝑥2|𝑨2 𝑞 𝑥3|𝑨3 𝑞 𝑥4|𝑨4

slide-55
SLIDE 55

Hidden Markov Model Representation

𝑞 𝑨1, 𝑥1,𝑨2, 𝑥2,… ,𝑨𝑂,𝑥𝑂 = 𝑞 𝑨1| 𝑨0 𝑞 𝑥1|𝑨1 ⋯ 𝑞 𝑨𝑂| 𝑨𝑂−1 𝑞 𝑥𝑂|𝑨𝑂 = ෑ

𝑗

𝑞 𝑥𝑗|𝑨𝑗 𝑞 𝑨𝑗| 𝑨𝑗−1

emission probabilities/parameters transition probabilities/parameters

z1

w1

w2 w3 w4

z2 z3 z4

𝑞 𝑥1|𝑨1 𝑞 𝑥2|𝑨2 𝑞 𝑥3|𝑨3 𝑞 𝑥4|𝑨4

𝑞 𝑨2| 𝑨1 𝑞 𝑨3| 𝑨2 𝑞 𝑨4| 𝑨3

slide-56
SLIDE 56

Hidden Markov Model Representation

𝑞 𝑨1, 𝑥1,𝑨2, 𝑥2,… ,𝑨𝑂,𝑥𝑂 = 𝑞 𝑨1| 𝑨0 𝑞 𝑥1|𝑨1 ⋯ 𝑞 𝑨𝑂| 𝑨𝑂−1 𝑞 𝑥𝑂|𝑨𝑂 = ෑ

𝑗

𝑞 𝑥𝑗|𝑨𝑗 𝑞 𝑨𝑗| 𝑨𝑗−1

emission probabilities/parameters transition probabilities/parameters

z1

w1

w2 w3 w4

z2 z3 z4

𝑞 𝑥1|𝑨1 𝑞 𝑥2|𝑨2 𝑞 𝑥3|𝑨3 𝑞 𝑥4|𝑨4

𝑞 𝑨2| 𝑨1 𝑞 𝑨3| 𝑨2 𝑞 𝑨4| 𝑨3 𝑞 𝑨1| 𝑨0 initial starting distribution (“BOS”)

slide-57
SLIDE 57

Hidden Markov Model Representation

𝑞 𝑨1, 𝑥1,𝑨2, 𝑥2,… ,𝑨𝑂,𝑥𝑂 = 𝑞 𝑨1| 𝑨0 𝑞 𝑥1|𝑨1 ⋯ 𝑞 𝑨𝑂| 𝑨𝑂−1 𝑞 𝑥𝑂|𝑨𝑂 = ෑ

𝑗

𝑞 𝑥𝑗|𝑨𝑗 𝑞 𝑨𝑗| 𝑨𝑗−1

emission probabilities/parameters transition probabilities/parameters

z1

w1

w2 w3 w4

z2 z3 z4

𝑞 𝑥1|𝑨1 𝑞 𝑥2|𝑨2 𝑞 𝑥3|𝑨3 𝑞 𝑥4|𝑨4

𝑞 𝑨2| 𝑨1 𝑞 𝑨3| 𝑨2 𝑞 𝑨4| 𝑨3 𝑞 𝑨1| 𝑨0 initial starting distribution (“BOS”)

Each zi can take the value of one of K latent states Transition and emission distributions do not change

slide-58
SLIDE 58

Example: 2-state Hidden Markov Model as a Lattice

z1 = N

w1

w2 w3 w4

z2 = N z3 = N z4 = N z1 = V z2 = V z3 = V z4 = V

slide-59
SLIDE 59

Example: 2-state Hidden Markov Model as a Lattice

z1 = N

w1

w2 w3 w4

𝑞 𝑥1|𝑂 𝑞 𝑥2|𝑂 𝑞 𝑥3|𝑂 𝑞 𝑥4|𝑂

z2 = N z3 = N z4 = N z1 = V z2 = V z4 = V

𝑞 𝑥4|𝑊 𝑞 𝑥3|𝑊 𝑞 𝑥2|𝑊 𝑞 𝑥1|𝑊

z3 = V

slide-60
SLIDE 60

Example: 2-state Hidden Markov Model as a Lattice

z1 = N

w1

w2 w3 w4

𝑞 𝑥1|𝑂 𝑞 𝑥2|𝑂 𝑞 𝑥3|𝑂 𝑞 𝑥4|𝑂

𝑞 𝑂| start z2 = N z3 = N z4 = N z1 = V z2 = V z3 = V z4 = V 𝑞 𝑊| 𝑊 𝑞 𝑊| 𝑊 𝑞 𝑊| 𝑊 𝑞 𝑊| start

𝑞 𝑥4|𝑊 𝑞 𝑥3|𝑊 𝑞 𝑥2|𝑊 𝑞 𝑥1|𝑊

𝑞 𝑂| 𝑂 𝑞 𝑂| 𝑂 𝑞 𝑂| 𝑂

slide-61
SLIDE 61

Example: 2-state Hidden Markov Model as a Lattice

z1 = N

w1

w2 w3 w4

𝑞 𝑥1|𝑂 𝑞 𝑥2|𝑂 𝑞 𝑥3|𝑂 𝑞 𝑥4|𝑂

𝑞 𝑂| start z2 = N z3 = N z4 = N z1 = V z2 = V z3 = V z4 = V 𝑞 𝑊| 𝑊 𝑞 𝑊| 𝑊 𝑞 𝑊| 𝑊 𝑞 𝑊| start

𝑞 𝑊| 𝑂 𝑞 𝑊| 𝑂 𝑞 𝑊| 𝑂 𝑞 𝑂| 𝑊 𝑞 𝑂| 𝑊 𝑞 𝑂| 𝑊

𝑞 𝑥4|𝑊 𝑞 𝑥3|𝑊 𝑞 𝑥2|𝑊 𝑞 𝑥1|𝑊

𝑞 𝑂| 𝑂 𝑞 𝑂| 𝑂 𝑞 𝑂| 𝑂

slide-62
SLIDE 62

Unigram Language Model

Comparison of Joint Probabilities

𝑞 𝑥1, 𝑥2,… , 𝑥𝑂 = 𝑞 𝑥1 𝑞 𝑥2 ⋯ 𝑞 𝑥𝑂 = ෑ

𝑗

𝑞 𝑥𝑗

slide-63
SLIDE 63

Unigram Class-based Language Model (“K” coins) Unigram Language Model

Comparison of Joint Probabilities

𝑞 𝑥1, 𝑥2,… , 𝑥𝑂 = 𝑞 𝑥1 𝑞 𝑥2 ⋯ 𝑞 𝑥𝑂 = ෑ

𝑗

𝑞 𝑥𝑗 𝑞 𝑨1,𝑥1, 𝑨2, 𝑥2,… ,𝑨𝑂, 𝑥𝑂 = 𝑞 𝑨1 𝑞 𝑥1|𝑨1 ⋯𝑞 𝑨𝑂 𝑞 𝑥𝑂|𝑨𝑂 = ෑ

𝑗

𝑞 𝑥𝑗|𝑨𝑗 𝑞 𝑨𝑗

slide-64
SLIDE 64

Hidden Markov Model Unigram Class-based Language Model (“K” coins) Unigram Language Model

Comparison of Joint Probabilities

𝑞 𝑥1, 𝑥2,… , 𝑥𝑂 = 𝑞 𝑥1 𝑞 𝑥2 ⋯ 𝑞 𝑥𝑂 = ෑ

𝑗

𝑞 𝑥𝑗 𝑞 𝑨1,𝑥1, 𝑨2, 𝑥2,… ,𝑨𝑂, 𝑥𝑂 = 𝑞 𝑨1 𝑞 𝑥1|𝑨1 ⋯𝑞 𝑨𝑂 𝑞 𝑥𝑂|𝑨𝑂 = ෑ

𝑗

𝑞 𝑥𝑗|𝑨𝑗 𝑞 𝑨𝑗 𝑞 𝑨1, 𝑥1,𝑨2, 𝑥2,… ,𝑨𝑂,𝑥𝑂 = 𝑞 𝑨1| 𝑨0 𝑞 𝑥1|𝑨1 ⋯ 𝑞 𝑨𝑂| 𝑨𝑂−1 𝑞 𝑥𝑂|𝑨𝑂 = ෑ

𝑗

𝑞 𝑥𝑗|𝑨𝑗 𝑞 𝑨𝑗| 𝑨𝑗−1