lecture 16 weighted finite state transducers wfst
play

Lecture 16: Weighted Finite State Transducers (WFST) Mark - PowerPoint PPT Presentation

Review Semirings WFSTs Composition Epsilon Summary Lecture 16: Weighted Finite State Transducers (WFST) Mark Hasegawa-Johnson All content CC-SA 4.0 unless otherwise specified. ECE 417: Multimedia Signal Processing, Fall 2020 Review


  1. Review Semirings WFSTs Composition Epsilon Summary Lecture 16: Weighted Finite State Transducers (WFST) Mark Hasegawa-Johnson All content CC-SA 4.0 unless otherwise specified. ECE 417: Multimedia Signal Processing, Fall 2020

  2. Review Semirings WFSTs Composition Epsilon Summary Review: WFSA 1 Semirings 2 How to Handle HMMs: The Weighted Finite State Transducer 3 Composition 4 Doing Useful Stuff: The Epsilon Transition 5 Summary 6

  3. Review Semirings WFSTs Composition Epsilon Summary Outline Review: WFSA 1 Semirings 2 How to Handle HMMs: The Weighted Finite State Transducer 3 Composition 4 Doing Useful Stuff: The Epsilon Transition 5 Summary 6

  4. Review Semirings WFSTs Composition Epsilon Summary Weighted Finite State Acceptors The/0 . 3 dog/1 1 very/0 . 2 A/0 . 2 is/1 cute/0 . 4 dog/0 . 3 0 3 4 5 A/0 . 3 hungry/0 . 4 This/0 . 2 2 6 cat/0 . 7 An FSA specifies a set of strings. A string is in the set if it corresponds to a valid path from start to end, and not otherwise. A WFSA also specifies a probability mass function over the set.

  5. Review Semirings WFSTs Composition Epsilon Summary Every Markov Model is a WFSA 1/ a 13 2/ a 22 1/ a 12 2/ a 23 1/ a 11 3/ a 33 1 2 3 2/ a 21 3/ a 32 3/ a 31 A Markov Model (but not an HMM!) may be interpreted as a WFSA: just assign a label to each edge. The label might just be the state number, or it might be something more useful.

  6. Review Semirings WFSTs Composition Epsilon Summary Best-Path Algorithm for a WFSA Given: Input string, S = [ s 1 , . . . , s T ]. For example, the string “A dog is very very hungry” has T = 5 words. Edges, e , each have predecessor state p [ e ] ∈ Q , next state n [ e ] ∈ Q , weight w [ e ] ∈ R and label ℓ [ e ] ∈ Σ. Initialize: � ¯ 1 i = initial state δ 0 ( i ) = ¯ 0 otherwise Iterate: δ t ( j ) = best δ t − 1 ( p [ e ]) ⊗ w [ e ] e : n [ e ]= j ,ℓ [ e ]= s t ψ t ( j ) = argbest δ t − 1 ( p [ e ]) ⊗ w [ e ] e : n [ e ]= j ,ℓ [ e ]= s t Backtrace: e ∗ t = ψ ( q ∗ q ∗ t = p [ e ∗ t +1 ) , t ]

  7. Review Semirings WFSTs Composition Epsilon Summary Determinization A WFSA is said to be deterministic if, for any given (predecessor state p [ e ], label ℓ [ e ]), there is at most one such edge. For example, this WFSA is not deterministic. The/0 . 3 1 dog/1 very/0 . 2 A/0 . 2 is/1 cute/0 . 4 dog/0 . 3 0 3 4 5 A/0 . 3 hungry/0 . 4 This/0 . 2 2 6 cat/0 . 7

  8. Review Semirings WFSTs Composition Epsilon Summary How to Determinize a WFSA The only general algorithm for determinizing a WFSA is the following exponential-time algorithm: For every state in A , for every set of edges e 1 , . . . , e K that all have the same label: Create a new edge, e , with weight w [ e ] = w [ e 1 ] ⊕ · · · ⊕ w [ e K ]. Create a brand new successor state n [ e ]. For every edge leaving any of the original successor states n [ e k ] , 1 ≤ k ≤ K , whose label is unique: Copy it to n [ e ], ⊗ its weight by w [ e k ] / w [ e ] For every set of edges leaving n [ e k ] that all have the same label: Recurse!

  9. Review Semirings WFSTs Composition Epsilon Summary Outline Review: WFSA 1 Semirings 2 How to Handle HMMs: The Weighted Finite State Transducer 3 Composition 4 Doing Useful Stuff: The Epsilon Transition 5 Summary 6

  10. Review Semirings WFSTs Composition Epsilon Summary Semirings A semiring is a set of numbers, over which it’s possible to define a operators ⊗ and ⊕ , and identity elements ¯ 1 and ¯ 0. The Probability Semiring is the set of non-negative real numbers R + , with ⊗ = · , ⊕ = +, ¯ 1 = 1, and ¯ 0 = 0. The Log Semiring is the extended reals R ∪ {∞} , with ⊗ = +, ⊕ = − logsumexp( − , − ), ¯ 1 = 0, and ¯ 0 = ∞ . The Tropical Semiring is just the log semiring, but with ⊕ = min. In other words, instead of adding the probabilities of two paths, we choose the best path: a ⊕ b = min( a , b ) Mohri et al. (2001) formalize it like this: a semiring is K , ⊕ , ⊗ , ¯ 0 , ¯ � � K = 1 where K is a set of numbers.

  11. Review Semirings WFSTs Composition Epsilon Summary Outline Review: WFSA 1 Semirings 2 How to Handle HMMs: The Weighted Finite State Transducer 3 Composition 4 Doing Useful Stuff: The Epsilon Transition 5 Summary 6

  12. Review Semirings WFSTs Composition Epsilon Summary Weighted Finite State Transducers The:Le/0 . 3 very:tr` es/0 . 2 dog:chien/1 1 A:Un/0 . 2 is:est/0 . 5 cute:mignon/0 . 8 dog:chien/0 . 3 0 3 4 5 is:a/0 . 5 A:Un/0 . 3 hungry:faim/0 . 8 2 7 6 This:Ce/0 . 2 very:tr` es/0 . 2 cat:chat/0 . 7 A (Weighted) Finite State Transducer (WFST) is a (W)FSA with two labels on every edge: An input label, i ∈ Σ, and An output label, o ∈ Ω.

  13. Review Semirings WFSTs Composition Epsilon Summary What it’s for An FST specifies a mapping between two sets of strings. The input set is I ⊂ Σ ∗ , where Σ ∗ is the set of all strings containing zero or more letters from the alphabet Σ. The output set is O ⊂ Ω ∗ . For every � i = [ i 1 , . . . , i T ] ∈ I , the FST specifies one or more possible translations � o = [ o 1 , . . . , o T ] ∈ O . A WFST also specifies a probability mass function over the translations. The example on the previous slide was normalized to compute a joint pmf p ( � i , � o ), but other WFSAs o | � might be normalized to compute a conditional pmf p ( � i ), or something else.

  14. Review Semirings WFSTs Composition Epsilon Summary Normalizing for Conditional Probability o | � Here is a WFST whose weights are normalized to compute p ( � i ): The:Le/1 very:tr` es/1 dog:chien/1 1 A:Un/1 is:est/0 . 5 cute:mignon/1 dog:chien/1 0 3 4 5 is:a/0 . 5 A:Un/1 cat:f´ elin/0 . 1 hungry:faim/1 2 7 6 This:Ce/1 very:tr` es/1 cat:chat/0 . 9

  15. Review Semirings WFSTs Composition Epsilon Summary Normalizing for Conditional Probability Normalizing for conditional probability allows us to separately represent the two parts of a hidden Markov model. 1 The transition probabilities, a ij , are the weights on a WFSA. 2 The observation probabilities, b j ( � x t ), are the weights on a WFST.

  16. Review Semirings WFSTs Composition Epsilon Summary WFSA: Symbols on the edges are called PDFIDs It is no longer useful to say that “the labels on the edges are the state numbers.” Instead, let’s call them pdfids . 1/ a 13 2/ a 22 1/ a 12 2/ a 23 1/ a 11 1 2 3 3/ a 33 2/ a 21 3/ a 32 3/ a 31

  17. Review Semirings WFSTs Composition Epsilon Summary Observation Probabilities as Conditional Edge Weights Now we can create a new WFST whose output symbols are pdfids j , whose input symbols are observations , � x t , and whose weights are the observation probabilities, b j ( � x t ). � x 1 :1/ b 1 ( � x 1 ) x 2 :1/ b 1 ( � x 2 ) x 3 :1/ b 1 ( � x 3 ) x 4 :1/ b 1 ( � � x 4 ) � � � x 1 :2/ b 2 ( � x 1 ) � x 2 :2/ b 2 ( � x 2 ) � x 3 :2/ b 2 ( � x 3 ) � x 4 :2/ b 2 ( � x 4 ) 0 1 2 3 4 � x 1 :3/ b 3 ( � x 1 ) x 2 :3/ b 3 ( � � x 2 ) x 3 :3/ b 3 ( � � x 3 ) � x 4 :3/ b 3 ( � x 4 )

  18. Review Semirings WFSTs Composition Epsilon Summary Hooray! We’ve almost re-created the HMM! So far we have: You can create a WFSA whose weights are the transition probabilities. You can create a WFST whose weights are the observation probabilities. Here are the problems: 1 How can we combine them? 2 Even if we could combine them, can this do anything that an HMM couldn’t already do?

  19. Review Semirings WFSTs Composition Epsilon Summary Outline Review: WFSA 1 Semirings 2 How to Handle HMMs: The Weighted Finite State Transducer 3 Composition 4 Doing Useful Stuff: The Epsilon Transition 5 Summary 6

  20. Review Semirings WFSTs Composition Epsilon Summary Composition The main reason to use WFSTs is an operator called “composition.” Suppose you have 1 A WFST, R , that translates strings a ∈ A into strings b ∈ B with joint probability p ( a , b ). 2 Another WFST, S , that translates strings b ∈ B into strings c ∈ C with conditional probability p ( c | b ). The operation T = R ◦ S gives you a WFST, T , that translates strings a ∈ A into strings c ∈ C with joint probability � p ( a , c ) = p ( a , b ) p ( c | b ) b ∈B

  21. Review Semirings WFSTs Composition Epsilon Summary The WFST Composition Algorithm 1 Initialize: The initial state of T is a pair, i T = ( i R , i S ), encoding the initial states of both R and S . 2 Iterate: While there is any state q T = ( q R , q S ) with edges ( e R = a : b , e S = b : c ) that have not yet been copied to e T , Create a new edge e T with next state n [ e T ] = ( n [ e R ] , n [ e S ]) 1 and labels i [ e T ] : o [ e T ] = i [ e R ] : o [ e S ] = a : c . If an edge with the same n [ e T ], i [ e T ], and o [ e T ] already exists, 2 then update its weight: w [ e T ] = w [ e T ] ⊕ ( w [ e R ] ⊗ w [ e S ]) If not, create a new edge with 3 w [ e T ] = w [ e R ] ⊗ w [ e S ] 3 Terminate: A state q T = ( q R , q S ) is a final state if both q R and q S are final states.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend