Lecture 16: Weighted Finite State Transducers (WFST) Mark - PowerPoint PPT Presentation

Review Semirings WFSTs Composition Epsilon Summary Lecture 16: Weighted Finite State Transducers (WFST) Mark Hasegawa-Johnson All content CC-SA 4.0 unless otherwise specified. ECE 417: Multimedia Signal Processing, Fall 2020

Review Semirings WFSTs Composition Epsilon Summary Review: WFSA 1 Semirings 2 How to Handle HMMs: The Weighted Finite State Transducer 3 Composition 4 Doing Useful Stuff: The Epsilon Transition 5 Summary 6

Review Semirings WFSTs Composition Epsilon Summary Outline Review: WFSA 1 Semirings 2 How to Handle HMMs: The Weighted Finite State Transducer 3 Composition 4 Doing Useful Stuff: The Epsilon Transition 5 Summary 6

Review Semirings WFSTs Composition Epsilon Summary Weighted Finite State Acceptors The/0 . 3 dog/1 1 very/0 . 2 A/0 . 2 is/1 cute/0 . 4 dog/0 . 3 0 3 4 5 A/0 . 3 hungry/0 . 4 This/0 . 2 2 6 cat/0 . 7 An FSA specifies a set of strings. A string is in the set if it corresponds to a valid path from start to end, and not otherwise. A WFSA also specifies a probability mass function over the set.

Review Semirings WFSTs Composition Epsilon Summary Every Markov Model is a WFSA 1/ a 13 2/ a 22 1/ a 12 2/ a 23 1/ a 11 3/ a 33 1 2 3 2/ a 21 3/ a 32 3/ a 31 A Markov Model (but not an HMM!) may be interpreted as a WFSA: just assign a label to each edge. The label might just be the state number, or it might be something more useful.

Review Semirings WFSTs Composition Epsilon Summary Best-Path Algorithm for a WFSA Given: Input string, S = [ s 1 , . . . , s T ]. For example, the string “A dog is very very hungry” has T = 5 words. Edges, e , each have predecessor state p [ e ] ∈ Q , next state n [ e ] ∈ Q , weight w [ e ] ∈ R and label ℓ [ e ] ∈ Σ. Initialize: � ¯ 1 i = initial state δ 0 ( i ) = ¯ 0 otherwise Iterate: δ t ( j ) = best δ t − 1 ( p [ e ]) ⊗ w [ e ] e : n [ e ]= j ,ℓ [ e ]= s t ψ t ( j ) = argbest δ t − 1 ( p [ e ]) ⊗ w [ e ] e : n [ e ]= j ,ℓ [ e ]= s t Backtrace: e ∗ t = ψ ( q ∗ q ∗ t = p [ e ∗ t +1 ) , t ]

Review Semirings WFSTs Composition Epsilon Summary Determinization A WFSA is said to be deterministic if, for any given (predecessor state p [ e ], label ℓ [ e ]), there is at most one such edge. For example, this WFSA is not deterministic. The/0 . 3 1 dog/1 very/0 . 2 A/0 . 2 is/1 cute/0 . 4 dog/0 . 3 0 3 4 5 A/0 . 3 hungry/0 . 4 This/0 . 2 2 6 cat/0 . 7

Review Semirings WFSTs Composition Epsilon Summary How to Determinize a WFSA The only general algorithm for determinizing a WFSA is the following exponential-time algorithm: For every state in A , for every set of edges e 1 , . . . , e K that all have the same label: Create a new edge, e , with weight w [ e ] = w [ e 1 ] ⊕ · · · ⊕ w [ e K ]. Create a brand new successor state n [ e ]. For every edge leaving any of the original successor states n [ e k ] , 1 ≤ k ≤ K , whose label is unique: Copy it to n [ e ], ⊗ its weight by w [ e k ] / w [ e ] For every set of edges leaving n [ e k ] that all have the same label: Recurse!

Review Semirings WFSTs Composition Epsilon Summary Semirings A semiring is a set of numbers, over which it’s possible to define a operators ⊗ and ⊕ , and identity elements ¯ 1 and ¯ 0. The Probability Semiring is the set of non-negative real numbers R + , with ⊗ = · , ⊕ = +, ¯ 1 = 1, and ¯ 0 = 0. The Log Semiring is the extended reals R ∪ {∞} , with ⊗ = +, ⊕ = − logsumexp( − , − ), ¯ 1 = 0, and ¯ 0 = ∞ . The Tropical Semiring is just the log semiring, but with ⊕ = min. In other words, instead of adding the probabilities of two paths, we choose the best path: a ⊕ b = min( a , b ) Mohri et al. (2001) formalize it like this: a semiring is K , ⊕ , ⊗ , ¯ 0 , ¯ � � K = 1 where K is a set of numbers.

Review Semirings WFSTs Composition Epsilon Summary Weighted Finite State Transducers The:Le/0 . 3 very:tr` es/0 . 2 dog:chien/1 1 A:Un/0 . 2 is:est/0 . 5 cute:mignon/0 . 8 dog:chien/0 . 3 0 3 4 5 is:a/0 . 5 A:Un/0 . 3 hungry:faim/0 . 8 2 7 6 This:Ce/0 . 2 very:tr` es/0 . 2 cat:chat/0 . 7 A (Weighted) Finite State Transducer (WFST) is a (W)FSA with two labels on every edge: An input label, i ∈ Σ, and An output label, o ∈ Ω.

Review Semirings WFSTs Composition Epsilon Summary What it’s for An FST specifies a mapping between two sets of strings. The input set is I ⊂ Σ ∗ , where Σ ∗ is the set of all strings containing zero or more letters from the alphabet Σ. The output set is O ⊂ Ω ∗ . For every � i = [ i 1 , . . . , i T ] ∈ I , the FST specifies one or more possible translations � o = [ o 1 , . . . , o T ] ∈ O . A WFST also specifies a probability mass function over the translations. The example on the previous slide was normalized to compute a joint pmf p ( � i , � o ), but other WFSAs o | � might be normalized to compute a conditional pmf p ( � i ), or something else.

Review Semirings WFSTs Composition Epsilon Summary Normalizing for Conditional Probability o | � Here is a WFST whose weights are normalized to compute p ( � i ): The:Le/1 very:tr` es/1 dog:chien/1 1 A:Un/1 is:est/0 . 5 cute:mignon/1 dog:chien/1 0 3 4 5 is:a/0 . 5 A:Un/1 cat:f´ elin/0 . 1 hungry:faim/1 2 7 6 This:Ce/1 very:tr` es/1 cat:chat/0 . 9

Review Semirings WFSTs Composition Epsilon Summary Normalizing for Conditional Probability Normalizing for conditional probability allows us to separately represent the two parts of a hidden Markov model. 1 The transition probabilities, a ij , are the weights on a WFSA. 2 The observation probabilities, b j ( � x t ), are the weights on a WFST.

Review Semirings WFSTs Composition Epsilon Summary WFSA: Symbols on the edges are called PDFIDs It is no longer useful to say that “the labels on the edges are the state numbers.” Instead, let’s call them pdfids . 1/ a 13 2/ a 22 1/ a 12 2/ a 23 1/ a 11 1 2 3 3/ a 33 2/ a 21 3/ a 32 3/ a 31

Review Semirings WFSTs Composition Epsilon Summary Observation Probabilities as Conditional Edge Weights Now we can create a new WFST whose output symbols are pdfids j , whose input symbols are observations , � x t , and whose weights are the observation probabilities, b j ( � x t ). � x 1 :1/ b 1 ( � x 1 ) x 2 :1/ b 1 ( � x 2 ) x 3 :1/ b 1 ( � x 3 ) x 4 :1/ b 1 ( � � x 4 ) � � � x 1 :2/ b 2 ( � x 1 ) � x 2 :2/ b 2 ( � x 2 ) � x 3 :2/ b 2 ( � x 3 ) � x 4 :2/ b 2 ( � x 4 ) 0 1 2 3 4 � x 1 :3/ b 3 ( � x 1 ) x 2 :3/ b 3 ( � � x 2 ) x 3 :3/ b 3 ( � � x 3 ) � x 4 :3/ b 3 ( � x 4 )

Review Semirings WFSTs Composition Epsilon Summary Hooray! We’ve almost re-created the HMM! So far we have: You can create a WFSA whose weights are the transition probabilities. You can create a WFST whose weights are the observation probabilities. Here are the problems: 1 How can we combine them? 2 Even if we could combine them, can this do anything that an HMM couldn’t already do?

Review Semirings WFSTs Composition Epsilon Summary Composition The main reason to use WFSTs is an operator called “composition.” Suppose you have 1 A WFST, R , that translates strings a ∈ A into strings b ∈ B with joint probability p ( a , b ). 2 Another WFST, S , that translates strings b ∈ B into strings c ∈ C with conditional probability p ( c | b ). The operation T = R ◦ S gives you a WFST, T , that translates strings a ∈ A into strings c ∈ C with joint probability � p ( a , c ) = p ( a , b ) p ( c | b ) b ∈B

Review Semirings WFSTs Composition Epsilon Summary The WFST Composition Algorithm 1 Initialize: The initial state of T is a pair, i T = ( i R , i S ), encoding the initial states of both R and S . 2 Iterate: While there is any state q T = ( q R , q S ) with edges ( e R = a : b , e S = b : c ) that have not yet been copied to e T , Create a new edge e T with next state n [ e T ] = ( n [ e R ] , n [ e S ]) 1 and labels i [ e T ] : o [ e T ] = i [ e R ] : o [ e S ] = a : c . If an edge with the same n [ e T ], i [ e T ], and o [ e T ] already exists, 2 then update its weight: w [ e T ] = w [ e T ] ⊕ ( w [ e R ] ⊗ w [ e S ]) If not, create a new edge with 3 w [ e T ] = w [ e R ] ⊗ w [ e S ] 3 Terminate: A state q T = ( q R , q S ) is a final state if both q R and q S are final states.

Lecture 16: Weighted Finite State Transducers (WFST) Mark - PowerPoint PPT Presentation

Review Semirings WFSTs Composition Epsilon Summary Lecture 16: Weighted Finite State Transducers (WFST) Mark Hasegawa-Johnson All content CC-SA 4.0 unless otherwise specified. ECE 417: Multimedia Signal Processing, Fall 2020 Review

Finite State Machines: Finite State Transducers; Specifying Control Logic Greg Plaxton Theory in

Weighted Finite State Transducer (WFST) Efficient algorithms for various operations. Weights

Weighted graphs Weighted graphs Weighted graphs Weighted graphs Graphs with numbers, called

Improved subword modeling for WFST-based speech recognition Peter Smit, Sami Virpioja, Mikko

The 2.5m Wide-Field Survey Telescope (WFST): Goals and Status XianZhong ZHENG

Kleenex: From nondeterministic finite state transducers to streaming string transducers Fritz

Weighted Tree Transducers in Natural Language Processing Andreas Maletti Universitat Rovira i

A Gentle Introduction to Weighted Extended Top-down Tree Transducers Andreas Maletti Universitat

Neural Grammatical Error Correction with Finite State Transducers Felix Stahlberg, Christopher

Unit 1: Sequence Models Lecture 2: Finite-State Acceptors/Transducers Liang Huang This Week:

13 Symbolic MT 2: Weighted Finite State Transducers The previous section introduced a number of

Towards Register Minimisation of Streaming String Transducers Pierre-Alain Reynier LIS,

Relating Tree Series Transducers and Weighted Tree Automata Andreas Maletti December 17, 2004

Weighted graphs 2 Weighted graphs So far we have only considered weighted graphs with

Weighted graphs 3 Weighted graph Edges in weighted graph are assigned a weight: w(v 1 , v 2 ),

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 4: WFST

A Composition Theorem for Conical Juntas Mika G o os University of Toronto IBM Almaden

Composition of product-form Generalized Stochastic Petri Nets: a modular approach Simonetta

Building complex DP algorithms using composition Privacy & Fairness in Data Science CS848

The algebra of functions Given two functions, say f ( x ) = x 2 and g ( x ) = x + 1 , we can, in

Composition of Cryptographic Protocols - Feasibility Muthu Venkitasubramaniam University of

Composition Announcements Linked Lists Linked List Structure A linked list is either empty or a

Modern Aspects of Complex Analysis and Its Applications Wayne Smith Composition Semigroups on BMOA

Evaluation of Segmentation Quality via Adaptive Composition of Reference Segmentations Bo Peng 1 ,

Lecture 16: Weighted Finite State Transducers (WFST) Mark - PowerPoint PPT Presentation

Review Semirings WFSTs Composition Epsilon Summary Lecture 16: Weighted Finite State Transducers (WFST) Mark Hasegawa-Johnson All content CC-SA 4.0 unless otherwise specified. ECE 417: Multimedia Signal Processing, Fall 2020 Review

Finite State Machines: Finite State Transducers; Specifying Control Logic Greg Plaxton Theory in

Weighted Finite State Transducer (WFST) Efficient algorithms for various operations. Weights

Weighted graphs Weighted graphs Weighted graphs Weighted graphs Graphs with numbers, called

Improved subword modeling for WFST-based speech recognition Peter Smit, Sami Virpioja, Mikko

The 2.5m Wide-Field Survey Telescope (WFST): Goals and Status XianZhong ZHENG

Kleenex: From nondeterministic finite state transducers to streaming string transducers Fritz

Weighted Tree Transducers in Natural Language Processing Andreas Maletti Universitat Rovira i

A Gentle Introduction to Weighted Extended Top-down Tree Transducers Andreas Maletti Universitat

Neural Grammatical Error Correction with Finite State Transducers Felix Stahlberg, Christopher

Unit 1: Sequence Models Lecture 2: Finite-State Acceptors/Transducers Liang Huang This Week:

13 Symbolic MT 2: Weighted Finite State Transducers The previous section introduced a number of

Towards Register Minimisation of Streaming String Transducers Pierre-Alain Reynier LIS,

Relating Tree Series Transducers and Weighted Tree Automata Andreas Maletti December 17, 2004

Weighted graphs 2 Weighted graphs So far we have only considered weighted graphs with

Weighted graphs 3 Weighted graph Edges in weighted graph are assigned a weight: w(v 1 , v 2 ),

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 4: WFST

A Composition Theorem for Conical Juntas Mika G o os University of Toronto IBM Almaden

Composition of product-form Generalized Stochastic Petri Nets: a modular approach Simonetta

Building complex DP algorithms using composition Privacy &amp; Fairness in Data Science CS848

The algebra of functions Given two functions, say f ( x ) = x 2 and g ( x ) = x + 1 , we can, in

Composition of Cryptographic Protocols - Feasibility Muthu Venkitasubramaniam University of

Composition Announcements Linked Lists Linked List Structure A linked list is either empty or a

Modern Aspects of Complex Analysis and Its Applications Wayne Smith Composition Semigroups on BMOA

Evaluation of Segmentation Quality via Adaptive Composition of Reference Segmentations Bo Peng 1 ,

Building complex DP algorithms using composition Privacy & Fairness in Data Science CS848