Lecture 16: Weighted Finite State Transducers (WFST) Mark - - PowerPoint PPT Presentation

lecture 16 weighted finite state transducers wfst
SMART_READER_LITE
LIVE PREVIEW

Lecture 16: Weighted Finite State Transducers (WFST) Mark - - PowerPoint PPT Presentation

Review Semirings WFSTs Composition Epsilon Summary Lecture 16: Weighted Finite State Transducers (WFST) Mark Hasegawa-Johnson All content CC-SA 4.0 unless otherwise specified. ECE 417: Multimedia Signal Processing, Fall 2020 Review


slide-1
SLIDE 1

Review Semirings WFSTs Composition Epsilon Summary

Lecture 16: Weighted Finite State Transducers (WFST)

Mark Hasegawa-Johnson All content CC-SA 4.0 unless otherwise specified. ECE 417: Multimedia Signal Processing, Fall 2020

slide-2
SLIDE 2

Review Semirings WFSTs Composition Epsilon Summary

1

Review: WFSA

2

Semirings

3

How to Handle HMMs: The Weighted Finite State Transducer

4

Composition

5

Doing Useful Stuff: The Epsilon Transition

6

Summary

slide-3
SLIDE 3

Review Semirings WFSTs Composition Epsilon Summary

Outline

1

Review: WFSA

2

Semirings

3

How to Handle HMMs: The Weighted Finite State Transducer

4

Composition

5

Doing Useful Stuff: The Epsilon Transition

6

Summary

slide-4
SLIDE 4

Review Semirings WFSTs Composition Epsilon Summary

Weighted Finite State Acceptors

1 2 3 4 5 6

The/0.3 A/0.2 A/0.3 This/0.2 dog/1 dog/0.3 cat/0.7 is/1 very/0.2 cute/0.4 hungry/0.4

An FSA specifies a set of strings. A string is in the set if it corresponds to a valid path from start to end, and not

  • therwise.

A WFSA also specifies a probability mass function over the set.

slide-5
SLIDE 5

Review Semirings WFSTs Composition Epsilon Summary

Every Markov Model is a WFSA

1 2 3

1/a11 1/a12 1/a13 2/a22 2/a21 2/a23 3/a33 3/a32 3/a31

A Markov Model (but not an HMM!) may be interpreted as a WFSA: just assign a label to each edge. The label might just be the state number, or it might be something more useful.

slide-6
SLIDE 6

Review Semirings WFSTs Composition Epsilon Summary

Best-Path Algorithm for a WFSA

Given: Input string, S = [s1, . . . , sT]. For example, the string “A dog is very very hungry” has T = 5 words. Edges, e, each have predecessor state p[e] ∈ Q, next state n[e] ∈ Q, weight w[e] ∈ R and label ℓ[e] ∈ Σ. Initialize: δ0(i) =

  • ¯

1 i = initial state ¯

  • therwise

Iterate: δt(j) = best

e:n[e]=j,ℓ[e]=st

δt−1(p[e]) ⊗ w[e] ψt(j) = argbest

e:n[e]=j,ℓ[e]=st

δt−1(p[e]) ⊗ w[e] Backtrace: e∗

t = ψ(q∗ t+1),

q∗

t = p[e∗ t ]

slide-7
SLIDE 7

Review Semirings WFSTs Composition Epsilon Summary

Determinization

A WFSA is said to be deterministic if, for any given (predecessor state p[e], label ℓ[e]), there is at most one such edge. For example, this WFSA is not deterministic. 1 2 3 4 5 6

The/0.3 A/0.2 A/0.3 This/0.2 dog/1 dog/0.3 cat/0.7 is/1 very/0.2 cute/0.4 hungry/0.4

slide-8
SLIDE 8

Review Semirings WFSTs Composition Epsilon Summary

How to Determinize a WFSA

The only general algorithm for determinizing a WFSA is the following exponential-time algorithm: For every state in A, for every set of edges e1, . . . , eK that all have the same label:

Create a new edge, e, with weight w[e] = w[e1] ⊕ · · · ⊕ w[eK]. Create a brand new successor state n[e]. For every edge leaving any of the original successor states n[ek], 1 ≤ k ≤ K, whose label is unique:

Copy it to n[e], ⊗ its weight by w[ek]/w[e]

For every set of edges leaving n[ek] that all have the same label:

Recurse!

slide-9
SLIDE 9

Review Semirings WFSTs Composition Epsilon Summary

Outline

1

Review: WFSA

2

Semirings

3

How to Handle HMMs: The Weighted Finite State Transducer

4

Composition

5

Doing Useful Stuff: The Epsilon Transition

6

Summary

slide-10
SLIDE 10

Review Semirings WFSTs Composition Epsilon Summary

Semirings

A semiring is a set of numbers, over which it’s possible to define a

  • perators ⊗ and ⊕, and identity elements ¯

1 and ¯ 0. The Probability Semiring is the set of non-negative real numbers R+, with ⊗ = ·, ⊕ = +, ¯ 1 = 1, and ¯ 0 = 0. The Log Semiring is the extended reals R ∪ {∞}, with ⊗ = +, ⊕ = − logsumexp(−, −), ¯ 1 = 0, and ¯ 0 = ∞. The Tropical Semiring is just the log semiring, but with ⊕ = min. In other words, instead of adding the probabilities

  • f two paths, we choose the best path:

a ⊕ b = min(a, b) Mohri et al. (2001) formalize it like this: a semiring is K =

  • K, ⊕, ⊗, ¯

0, ¯ 1

  • where K is a set of numbers.
slide-11
SLIDE 11

Review Semirings WFSTs Composition Epsilon Summary

Outline

1

Review: WFSA

2

Semirings

3

How to Handle HMMs: The Weighted Finite State Transducer

4

Composition

5

Doing Useful Stuff: The Epsilon Transition

6

Summary

slide-12
SLIDE 12

Review Semirings WFSTs Composition Epsilon Summary

Weighted Finite State Transducers

1 2 3 4 7 5 6

The:Le/0.3 A:Un/0.2 A:Un/0.3 This:Ce/0.2 dog:chien/1 dog:chien/0.3 cat:chat/0.7 is:est/0.5 is:a/0.5 very:tr` es/0.2 cute:mignon/0.8 very:tr` es/0.2 hungry:faim/0.8

A (Weighted) Finite State Transducer (WFST) is a (W)FSA with two labels on every edge: An input label, i ∈ Σ, and An output label, o ∈ Ω.

slide-13
SLIDE 13

Review Semirings WFSTs Composition Epsilon Summary

What it’s for

An FST specifies a mapping between two sets of strings.

The input set is I ⊂ Σ∗, where Σ∗ is the set of all strings containing zero or more letters from the alphabet Σ. The output set is O ⊂ Ω∗. For every i = [i1, . . . , iT] ∈ I, the FST specifies one or more possible translations

  • = [o1, . . . , oT] ∈ O.

A WFST also specifies a probability mass function over the

  • translations. The example on the previous slide was

normalized to compute a joint pmf p( i,

  • ), but other WFSAs

might be normalized to compute a conditional pmf p(

  • |

i), or something else.

slide-14
SLIDE 14

Review Semirings WFSTs Composition Epsilon Summary

Normalizing for Conditional Probability

Here is a WFST whose weights are normalized to compute p(

  • |

i): 1 2 3 4 7 5 6

The:Le/1 A:Un/1 A:Un/1 This:Ce/1 dog:chien/1 dog:chien/1 cat:f´ elin/0.1 cat:chat/0.9 is:est/0.5 is:a/0.5 very:tr` es/1 cute:mignon/1 very:tr` es/1 hungry:faim/1

slide-15
SLIDE 15

Review Semirings WFSTs Composition Epsilon Summary

Normalizing for Conditional Probability

Normalizing for conditional probability allows us to separately represent the two parts of a hidden Markov model.

1 The transition probabilities, aij, are the weights on a WFSA. 2 The observation probabilities, bj(

xt), are the weights on a WFST.

slide-16
SLIDE 16

Review Semirings WFSTs Composition Epsilon Summary

WFSA: Symbols on the edges are called PDFIDs

It is no longer useful to say that “the labels on the edges are the state numbers.” Instead, let’s call them pdfids. 1 2 3

1/a11 1/a12 1/a13 2/a22 2/a21 2/a23 3/a33 3/a32 3/a31

slide-17
SLIDE 17

Review Semirings WFSTs Composition Epsilon Summary

Observation Probabilities as Conditional Edge Weights

Now we can create a new WFST whose output symbols are pdfids j, whose input symbols are observations, xt, and whose weights are the observation probabilities, bj( xt). 1 2 3 4

  • x1:1/b1(

x1)

  • x1:2/b2(

x1)

  • x1:3/b3(

x1)

  • x2:1/b1(

x2)

  • x2:2/b2(

x2)

  • x2:3/b3(

x2)

  • x3:1/b1(

x3)

  • x3:2/b2(

x3)

  • x3:3/b3(

x3)

  • x4:1/b1(

x4)

  • x4:2/b2(

x4)

  • x4:3/b3(

x4)

slide-18
SLIDE 18

Review Semirings WFSTs Composition Epsilon Summary

Hooray! We’ve almost re-created the HMM!

So far we have: You can create a WFSA whose weights are the transition probabilities. You can create a WFST whose weights are the observation probabilities. Here are the problems:

1 How can we combine them? 2 Even if we could combine them, can this do anything that an

HMM couldn’t already do?

slide-19
SLIDE 19

Review Semirings WFSTs Composition Epsilon Summary

Outline

1

Review: WFSA

2

Semirings

3

How to Handle HMMs: The Weighted Finite State Transducer

4

Composition

5

Doing Useful Stuff: The Epsilon Transition

6

Summary

slide-20
SLIDE 20

Review Semirings WFSTs Composition Epsilon Summary

Composition

The main reason to use WFSTs is an operator called “composition.” Suppose you have

1 A WFST, R, that translates strings a ∈ A into strings b ∈ B

with joint probability p(a, b).

2 Another WFST, S, that translates strings b ∈ B into strings

c ∈ C with conditional probability p(c|b). The operation T = R ◦ S gives you a WFST, T, that translates strings a ∈ A into strings c ∈ C with joint probability p(a, c) =

  • b∈B

p(a, b)p(c|b)

slide-21
SLIDE 21

Review Semirings WFSTs Composition Epsilon Summary

The WFST Composition Algorithm

1 Initialize: The initial state of T is a pair, iT = (iR, iS),

encoding the initial states of both R and S.

2 Iterate: While there is any state qT = (qR, qS) with edges

(eR = a : b, eS = b : c) that have not yet been copied to eT,

1

Create a new edge eT with next state n[eT] = (n[eR], n[eS]) and labels i[eT] : o[eT] = i[eR] : o[eS] = a : c.

2

If an edge with the same n[eT], i[eT], and o[eT] already exists, then update its weight: w[eT] = w[eT] ⊕ (w[eR] ⊗ w[eS])

3

If not, create a new edge with w[eT] = w[eR] ⊗ w[eS]

3 Terminate: A state qT = (qR, qS) is a final state if both qR

and qS are final states.

slide-22
SLIDE 22

Review Semirings WFSTs Composition Epsilon Summary

Composition Example: HMM

1 2 3 4

  • x1:1/b1(

x1)

  • x1:2/b2(

x1)

  • x1:3/b3(

x1)

  • x2:1/b1(

x2)

  • x2:2/b2(

x2)

  • x2:3/b3(

x2)

  • x3:1/b1(

x3)

  • x3:2/b2(

x3)

  • x3:3/b3(

x3)

  • x4:1/b1(

x4)

  • x4:2/b2(

x4)

  • x4:3/b3(

x4)

1 2 3

1/a11 1/a12 1/a13 2/a22 2/a21 2/a23 3/a33 3/a32 3/a31

slide-23
SLIDE 23

Review Semirings WFSTs Composition Epsilon Summary

Composition Example: HMM

0,1 1,1 1,2 1,3 2,1 2,2 2,3 3,1 3,2 3,3 4,3

  • x1:1/a11b1(

x1)

  • x1:1/a12b2(

x1)

  • x1:1/a13b3(

x1)

  • x4:1/a13b1(

x4)

  • x4:2/a23b2(

x4)

  • x4:3/a33b3(

x4)

slide-24
SLIDE 24

Review Semirings WFSTs Composition Epsilon Summary

Outline

1

Review: WFSA

2

Semirings

3

How to Handle HMMs: The Weighted Finite State Transducer

4

Composition

5

Doing Useful Stuff: The Epsilon Transition

6

Summary

slide-25
SLIDE 25

Review Semirings WFSTs Composition Epsilon Summary

Doing Useful Stuff: The Epsilon Transition

There’s only one more thing you need to do useful stuff: nothing. To be more precise: we can use the label ǫ (pronounced “epsilon”) to mean “nothing at all.”

slide-26
SLIDE 26

Review Semirings WFSTs Composition Epsilon Summary

Example: Epsilon Transitions in the Pronlex

A “pronlex” (pronunciation lexicon) is a WFST that maps from phoneme strings to words. A “phoneme string” is a sequence of many labels. A word is just one label. The extra labels in the output side of the WFST all use ǫ, to mean that they don’t generate any extra

  • utput string.
slide-27
SLIDE 27

Review Semirings WFSTs Composition Epsilon Summary

Example Pronlex

[@]:A [k]:ǫ [d]:ǫ [D]:ǫ [æ]:ǫ [O]:ǫ [@]:The [I]:ǫ [t]:cat [g]:dog [s]:This ǫ:ǫ ǫ:ǫ ǫ:ǫ ǫ:ǫ ǫ:ǫ ǫ:ǫ

slide-28
SLIDE 28

Review Semirings WFSTs Composition Epsilon Summary

Example: Speech-to-Text Translation

For example, suppose you have some English speech. You’d like to convert it to French text. Suppose you have an English pronlex, L, that maps English phonemes to words. You also have a translator, G, that maps English words to French words. Then T = L ◦ G maps from English phonemes to French words.

slide-29
SLIDE 29

Review Semirings WFSTs Composition Epsilon Summary

Example: Speech-to-Text Translation

[D]:ǫ [@]:Un/0.5 [I]:ǫ [@]:Le/0.2 [k]:ǫ [d]:ǫ [s]:Ce/0.2 [d]:ǫ/0.2 [æ]:ǫ [O]:ǫ

slide-30
SLIDE 30

Review Semirings WFSTs Composition Epsilon Summary

Example: Speech-to-Text Translation

Suppose you have: Observer, B, maps from xt to j, with weights bj( xt). HMM, H, maps from i and j to phonemes, with weights aij. Pronlex, L, maps from phonemes to English words. Grammar, G, maps from English words to French words. Then the translation of audio frames into French words is given by B ◦ H ◦ L ◦ G

slide-31
SLIDE 31

Review Semirings WFSTs Composition Epsilon Summary

Outline

1

Review: WFSA

2

Semirings

3

How to Handle HMMs: The Weighted Finite State Transducer

4

Composition

5

Doing Useful Stuff: The Epsilon Transition

6

Summary

slide-32
SLIDE 32

Review Semirings WFSTs Composition Epsilon Summary

Weighted Finite State Transducers

1 2 3 4 7 5 6

The:Le/0.3 A:Un/0.2 A:Un/0.3 This:Ce/0.2 dog:chien/1 dog:chien/0.3 cat:chat/0.7 is:est/0.5 is:a/0.5 very:tr` es/0.2 cute:mignon/0.8 very:tr` es/0.2 hungry:faim/0.8

A (Weighted) Finite State Transducer (WFST) is a (W)FSA with two labels on every edge: An input label, i ∈ Σ, and An output label, o ∈ Ω.

slide-33
SLIDE 33

Review Semirings WFSTs Composition Epsilon Summary

The WFST Composition Algorithm

T = R ◦ S

1 Initialize: The initial state of T is a pair, iT = (iR, iS),

encoding the initial states of both R and S.

2 Iterate: Each edge eT = (eR, eS):

Starts at p[eT] = (p[eR], p[eS]) Has the edge label i[eR] : o[eS]. Ends at n[eT] = (n[eR], n[eS]). Has the weight w[eT] = w[eR] ⊗ w[eS], possibly summed (⊕)

  • ver nondeterministic (eR, eS) pairs.

3 Terminate: A state qT = (qR, qS) is a final state if both qR

and qS are final states.