Lecture 17: Practical WFSTs Mark Hasegawa-Johnson All content CC-SA - PowerPoint PPT Presentation

Review Common FSTs Laplace Smoothing Composition Toposort Best Path Re-Estimation Summary Lecture 17: Practical WFSTs Mark Hasegawa-Johnson All content CC-SA 4.0 unless otherwise specified. ECE 417: Multimedia Signal Processing, Fall 2020

Review Common FSTs Laplace Smoothing Composition Toposort Best Path Re-Estimation Summary Review: WFSA 1 Common FSTs in Automatic Speech Recognition 2 Training a Grammar: Laplace Smoothing 3 Composition 4 Topological Sorting 5 Best Path 6 Re-Estimating WFST Transition Weights 7 Summary 8

Review Common FSTs Laplace Smoothing Composition Toposort Best Path Re-Estimation Summary Outline Review: WFSA 1 Common FSTs in Automatic Speech Recognition 2 Training a Grammar: Laplace Smoothing 3 Composition 4 Topological Sorting 5 Best Path 6 Re-Estimating WFST Transition Weights 7 Summary 8

Review Common FSTs Laplace Smoothing Composition Toposort Best Path Re-Estimation Summary Weighted Finite State Acceptors The/0 . 3 dog/1 1 very/0 . 2 A/0 . 2 is/1 cute/0 . 4 dog/0 . 3 0 3 4 5 A/0 . 3 hungry/0 . 4 This/0 . 2 2 6 cat/0 . 7 An FSA specifies a set of strings. A string is in the set if it corresponds to a valid path from start to end, and not otherwise. A WFSA also specifies a probability mass function over the set.

Review Common FSTs Laplace Smoothing Composition Toposort Best Path Re-Estimation Summary Semirings A semiring is a set of numbers, over which it’s possible to define a operators ⊗ and ⊕ , and identity elements ¯ 1 and ¯ 0. The Probability Semiring is the set of non-negative real numbers R + , with ⊗ = · , ⊕ = +, ¯ 1 = 1, and ¯ 0 = 0. The Log Semiring is the extended reals R ∪ {∞} , with ⊗ = +, ⊕ = − logsumexp( − , − ), ¯ 1 = 0, and ¯ 0 = ∞ . The Tropical Semiring is just the log semiring, but with ⊕ = min. In other words, instead of adding the probabilities of two paths, we choose the best path: a ⊕ b = min( a , b ) Mohri et al. (2001) formalize it like this: a semiring is K , ⊕ , ⊗ , ¯ 0 , ¯ � � K = 1 where K is a set of numbers.

Review Common FSTs Laplace Smoothing Composition Toposort Best Path Re-Estimation Summary Best-Path Algorithm for a WFSA Input string, S = [ s 1 , . . . , s K ]. For example, the string “A dog is very very hungry” has K = 5 words. Transitions, t , each have predecessor state p [ t ] ∈ Q , next state n [ t ] ∈ Q , weight w [ t ] ∈ R and label ℓ [ t ] ∈ Σ. Initialize with path cost either ¯ 1 or ¯ 0: � ¯ 1 i = initial state δ 0 ( i ) = ¯ 0 otherwise Iterate by choosing best incoming transition: δ k ( j ) = best δ k − 1 ( p [ t ]) ⊗ w [ t ] t : n [ t ]= j ,ℓ [ t ]= s k ψ k ( j ) = argbest δ k − 1 ( p [ t ]) ⊗ w [ t ] t : n [ t ]= j ,ℓ [ t ]= s k Backtrace by reading best transition from the backpointer: t ∗ k = ψ ( q ∗ q ∗ k = p [ t ∗ k +1 ) , k ]

Review Common FSTs Laplace Smoothing Composition Toposort Best Path Re-Estimation Summary Determinization A WFSA is said to be deterministic if, for any given (predecessor state p [ e ], label ℓ [ e ]), there is at most one such edge. For example, this WFSA is not deterministic. The/0 . 3 1 dog/1 very/0 . 2 A/0 . 2 is/1 cute/0 . 4 dog/0 . 3 0 3 4 5 A/0 . 3 hungry/0 . 4 This/0 . 2 2 6 cat/0 . 7

Review Common FSTs Laplace Smoothing Composition Toposort Best Path Re-Estimation Summary Weighted Finite State Transducers The:Le/0 . 3 very:tr` es/0 . 2 dog:chien/1 1 A:Un/0 . 2 is:est/0 . 5 cute:mignon/0 . 8 dog:chien/0 . 3 0 3 4 5 is:a/0 . 5 A:Un/0 . 3 hungry:faim/0 . 8 2 7 6 This:Ce/0 . 2 very:tr` es/0 . 2 cat:chat/0 . 7 A (Weighted) Finite State Transducer (WFST) is a (W)FSA with two labels on every transition: An input label, i [ t ] ∈ Σ, and An output label, o [ t ] ∈ Ω.

Review Common FSTs Laplace Smoothing Composition Toposort Best Path Re-Estimation Summary The WFST Composition Algorithm C = A ◦ B States: The states of C are Q C = Q A × Q B , i.e., q C = ( q A , q B ). Initial States: i C = ( i A , i B ) Final States: F C = F A × F B Input Alphabet: Σ C = Σ A Output Alphabet: Ω C = Ω B Transitions: Every pair q A ∈ Q A , t B ∈ E B with i [ t B ] = ǫ creates a transition 1 t C from ( q A , p [ t B ]) to ( q A , n [ t B ]). Every pair t A ∈ E A , q B ∈ Q B with o [ t A ] = ǫ creates a 2 transition t C from ( p [ t A ] , q B ) to ( n [ t A ] , q B ). Every pair t A ∈ E A , t B ∈ E B with o [ t A ] = i [ t B ] creates a 3 transition t C from ( p [ t A ] , p [ t B ]) to ( n [ t A ] , n [ t B ]).

Review Common FSTs Laplace Smoothing Composition Toposort Best Path Re-Estimation Summary The Standard FSTs in Automatic Speech Recognition 1 The observation, O 2 The hidden Markov model, H 3 The context, C 4 The lexicon, L 5 The grammar, G MP5 will use L and G , so those are the ones you need to pay attention to. At the input we’ll use a transcription T which is basically T = O ◦ H ◦ C , so you won’t need to remember the details of those transducers, just their output.

Review Common FSTs Laplace Smoothing Composition Toposort Best Path Re-Estimation Summary The observation, O WFST-based speech recognition begins by turning the speech spectrogram into a WFST. The input alphabet is Σ =the set of acoustic feature vectors. The output alphabet is Ω = { 1 , . . . , N } , the PDFIDs. 1/ b 1 ( � x 1 ) 1/ b 1 ( � x 2 ) 1/ b 1 ( � x 3 ) 1/ b 1 ( � x 4 ) 2/ b 2 ( � x 1 ) 2/ b 2 ( � x 2 ) 2/ b 2 ( � x 3 ) 2/ b 2 ( � x 4 ) N-1/ b N − 1 ( � x 1 ) N-1/ b N − 1 ( � x 2 ) N-1/ b N − 1 ( � x 3 ) N-1/ b N − 1 ( � x 4 ) N/ b N ( � x 1 ) N/ b N ( � x 2 ) N/ b N ( � x 3 ) N/ b N ( � x 4 )

Review Common FSTs Laplace Smoothing Composition Toposort Best Path Re-Estimation Summary The hidden Markov model, H Input alphabet is Σ = { 1 , . . . , N } , the set of PDFIDs. Output alphabet, Ω, is a set of context-dependent phone labels , e.g., triphones: o [ t ] =/#- a+b / means the sound an /a/ makes when preceded by silence, and followed by /b/ . ǫ : ǫ 1: ǫ 2: ǫ 3: ǫ ǫ :/#- a +#/ 1: ǫ 2: ǫ 3: ǫ 4: ǫ 5: ǫ 6: ǫ ǫ :/#- a+a / N − 2 : ǫ 4: ǫ 5: ǫ 6: ǫ ǫ :/#- a+b / N − 1 : ǫ N : ǫ N − 2 : ǫ N − 1 : ǫ N : ǫ

Review Common FSTs Laplace Smoothing Composition Toposort Best Path Re-Estimation Summary The Context Transducer, C Input alphabet, Σ, is context-dependent phone labels , e.g., o [ t ] =/#- a +#/. Output alphabet, Ω , is context-independent phone labels , e.g., / a /. /a-a+a/:[a] /a-a+#/:[a] /#-a+#/:[#] /#-a+a/:[#] /a-#+#/:[a] /a-#+a/:[#] /#-#+#/:[#] /#-#+a/:[#]

Review Common FSTs Laplace Smoothing Composition Toposort Best Path Re-Estimation Summary The Lexicon, L Input alphabet, Σ, is phone labels , e.g., / @ /. Output alphabet, Ω , is words . [s] : ǫ [I] : ǫ [@] : ǫ ǫ :This [D] : ǫ [O] : ǫ [g] : ǫ ǫ :The [d] : ǫ ǫ :dog [æ] : ǫ [t] : ǫ [k] : ǫ ǫ :cat [@] : ǫ ǫ :A ǫ : ǫ

Review Common FSTs Laplace Smoothing Composition Toposort Best Path Re-Estimation Summary The Grammar, G Input alphabet, Σ, is words , and Output alphabet, Ω, is also words . Edge weights show p ( w ) a/ p (a) of/ p (of) about/ p (about) above/ p (above)

Review Common FSTs Laplace Smoothing Composition Toposort Best Path Re-Estimation Summary The Standard WFSTs H , C , L and G all start in state 0, and end in state 0. That way they can make as many complete loops as necessary. O starts at the beginning of the speech file, and ends at the end, with NO LOOPS. The most important edge weights are in O and G , the acoustic model and language model respectively. The other transducers ( H , C , and L ) are used to scale up from 10ms (scale of x t ) to 400ms (scale of w )

Review Common FSTs Laplace Smoothing Composition Toposort Best Path Re-Estimation Summary You already know how to train the acoustic model. How can you train the language model?

Review Common FSTs Laplace Smoothing Composition Toposort Best Path Re-Estimation Summary N-Gram Language Model An N-gram language model is a model in which the probability of word w N depends on the N − 1 words that went before it: p ( w N | context) ≡ p ( w N | w 1 , w 2 , . . . , w N − 1 )

Review Common FSTs Laplace Smoothing Composition Toposort Best Path Re-Estimation Summary Maximum Likelihood N-Grams Suppose you have some training texts, for example: Example Training Texts when telling of nicholas the second the temptation is to start at the dramatic end the july nineteen eighteen massacre of him his entire family his household help and personal physician by which the triumphant communist movement introduced its rule

Lecture 17: Practical WFSTs Mark Hasegawa-Johnson All content CC-SA - PowerPoint PPT Presentation

Review Common FSTs Laplace Smoothing Composition Toposort Best Path Re-Estimation Summary Lecture 17: Practical WFSTs Mark Hasegawa-Johnson All content CC-SA 4.0 unless otherwise specified. ECE 417: Multimedia Signal Processing, Fall 2020

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 4: WFSTs in ASR

WFSTs in ASR & Basics of Speech Production Lecture 6 CS 753 Instructor: Preethi Jyothi

Practical Experience with Practical Experience with Practical Experience with Practical

Change from a Practical Perspective Change from a Practical Perspective Change from a Practical

Real-World applications of Boosting Yoav Freund UCSD Practical Advantages of AdaBoost

Practical Bioinformatics Mark Voorhies 5/15/2015 Mark Voorhies Practical Bioinformatics

CSpace CSpace CSpace CSpace A More Practical and A More Practical and A

ARDUINO & ELECTRONICS PRACTICAL PRACTICAL SESSION 1 Part of SmartProducts ARDUINO &

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

Theory and applications 1 Roadmap to Lecture 4 1. Practical turbulence estimates 2 Practical

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 4: WFST

Lecture 16: Weighted Finite State Transducers (WFST) Mark Hasegawa-Johnson All content CC-SA 4.0

GANs + Final practice questions Lecture 23 CS 753 Instructor: Preethi Jyothi Final Exam

The Air-Brake: A Practical Presentation of the Modern The Air-Brake: A Practical Presentation of

Practical Bioinformatics Mark Voorhies 4/16/2018 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 4/9/2018 Mark Voorhies Practical Bioinformatics

A review of what works in multi-agency decision making and the implications for child victims of

Ethics in Natural Language Processing Pierre Lison IN4080 : Natural Language Processing (Fall

Construction Storm Water Construction Storm Water Construction Storm Water - - 10 Most

1 Rust A new systems programming language 1.0 was released on May 15th been in

Algorithms for NLP CS 11-711 Fall 2020 Lecture 3: Nonlinear text classification Emma Strubell

AML/CFT Supervisor Workshop: Reserve Bank of New Zealand 12 th September 2017 Te Papa Museum,

Chancy Modus Ponens Sven Neth nethsven@berkeley.edu October 27, 2018 Outline What is Chancy

Part 3: Drill Music Injunctions and Ancillary Orders Professor Leslie Thomas QC, Garden Court

Lecture 17: Practical WFSTs Mark Hasegawa-Johnson All content CC-SA - PowerPoint PPT Presentation

Review Common FSTs Laplace Smoothing Composition Toposort Best Path Re-Estimation Summary Lecture 17: Practical WFSTs Mark Hasegawa-Johnson All content CC-SA 4.0 unless otherwise specified. ECE 417: Multimedia Signal Processing, Fall 2020

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 4: WFSTs in ASR

WFSTs in ASR &amp; Basics of Speech Production Lecture 6 CS 753 Instructor: Preethi Jyothi

Practical Experience with Practical Experience with Practical Experience with Practical

Change from a Practical Perspective Change from a Practical Perspective Change from a Practical

Real-World applications of Boosting Yoav Freund UCSD Practical Advantages of AdaBoost

Practical Bioinformatics Mark Voorhies 5/15/2015 Mark Voorhies Practical Bioinformatics

CSpace CSpace CSpace CSpace A More Practical and A More Practical and A

ARDUINO &amp; ELECTRONICS PRACTICAL PRACTICAL SESSION 1 Part of SmartProducts ARDUINO &amp;

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

Theory and applications 1 Roadmap to Lecture 4 1. Practical turbulence estimates 2 Practical

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 4: WFST

Lecture 16: Weighted Finite State Transducers (WFST) Mark Hasegawa-Johnson All content CC-SA 4.0

GANs + Final practice questions Lecture 23 CS 753 Instructor: Preethi Jyothi Final Exam

The Air-Brake: A Practical Presentation of the Modern The Air-Brake: A Practical Presentation of

Practical Bioinformatics Mark Voorhies 4/16/2018 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 4/9/2018 Mark Voorhies Practical Bioinformatics

A review of what works in multi-agency decision making and the implications for child victims of

Ethics in Natural Language Processing Pierre Lison IN4080 : Natural Language Processing (Fall

Construction Storm Water Construction Storm Water Construction Storm Water - - 10 Most

1 Rust A new systems programming language 1.0 was released on May 15th been in

Algorithms for NLP CS 11-711 Fall 2020 Lecture 3: Nonlinear text classification Emma Strubell

AML/CFT Supervisor Workshop: Reserve Bank of New Zealand 12 th September 2017 Te Papa Museum,

Chancy Modus Ponens Sven Neth nethsven@berkeley.edu October 27, 2018 Outline What is Chancy

Part 3: Drill Music Injunctions and Ancillary Orders Professor Leslie Thomas QC, Garden Court

WFSTs in ASR & Basics of Speech Production Lecture 6 CS 753 Instructor: Preethi Jyothi

ARDUINO & ELECTRONICS PRACTICAL PRACTICAL SESSION 1 Part of SmartProducts ARDUINO &