Pre-midsem Revision Lecture 11 CS 753 Instructor: Preethi Jyothi

Tied-state Triphone Models

State Tying Observation probabilities are shared across triphone states • which generate acoustically similar data b/a/k p/a/k b/a/g Triphone HMMs (No sharing) b/a/k p/a/k b/a/g Triphone HMMs (State Tying)

Tied state HMMs Four main steps in building a tied state HMM system: 1. Create and train 3-state monophone HMMs with single Gaussian observation probability densities 2. Clone these monophone distributions to initialise a set of untied triphone models. Train them using Baum-Welch estimation. Transition matrix remains common across all triphones of each phone. 3. For all triphones derived from the same monophone, cluster states whose parameters should be tied together. 4. Number of mixture components in each tied state is increased and models re-estimated using BW Image from: Young et al., “Tree-based state tying for high accuracy acoustic modeling”, ACL-HLT, 1994

Tied state HMMs: Step 2 Clone these monophone distributions to initialise a set of untied triphone models Image from: Young et al., “Tree-based state tying for high accuracy acoustic modeling”, ACL-HLT, 1994

Tied state HMMs: Step 3 Use decision trees to determine which states should be tied together Image from: Young et al., “Tree-based state tying for high accuracy acoustic modeling”, ACL-HLT, 1994

Example: Phonetic Decision Tree (DT) One tree is constructed for each state of each monophone to cluster all the   corresponding triphone states DT for center   ow2 state of [ow] Head node Uses all training data   aa 2 / ox 2 /f 2 , aa 2 / ox 2 /s 2 ,   tagged with *-ow 2 +* aa 2 / ox 2 /d 2 , h 2 / ox 2 /p 2 , aa 2 / ox 2 /n 2 , aa 2 / ox 2 /g 2 , …

Training data for DT nodes Align training instance x = ( x 1 , …, x T ) where x i ∈ ℝ d with a set • of triphone HMMs Use Viterbi algorithm to find the best HMM triphone state • sequence corresponding to each x Tag each x t with ID of current phone along with left-context • and right-context x t { { { sil-b+aa b-aa+g aa-g+sil x t is tagged with ID b 2 -aa 2 +g 2 i.e. x t is aligned with the second state of the 3-state HMM corresponding to the triphone b-aa+g Training data corresponding to state j in phone p: Gather all • x t ’s that are tagged with ID *- p j +*

Example: Phonetic Decision Tree (DT) One tree is constructed for each state of each monophone to cluster all the   corresponding triphone states DT for center   ow1 ow2 Ow3 state of [ow] Head node Uses all training data   aa 2 / ox 2 /f 2 , aa 2 / ox 2 /s 2 ,   tagged as *-ow 2 +* aa 2 / ox 2 /d 2 , h 2 / ox 2 /p 2 , aa 2 / ox 2 /n 2 , aa 2 / ox 2 /g 2 , Is left ctxt a vowel? … Yes No Is right ctxt a Is right ctxt nasal? fricative? Yes No Yes No Is right ctxt a Leaf E Leaf A Leaf B glide? aa 2 / ox 2 /n 2 ,   aa 2 / ox 2 /f 2 ,   aa 2 / ox 2 /d 2 ,   aa 2 / ox 2 /m 2 , aa 2 / ox 2 /s 2 , Yes No aa 2 / ox 2 /g 2 , … … … Leaf C Leaf D h 2 / ox 2 /l 2 ,   h 2 / ox 2 /p 2 ,   b 2 / ox 2 /r 2 , b 2 / ox 2 /k 2 , … …

      How do we build these phone DTs? 1. What questions are used?   Linguistically-inspired binary questions: “Does the left or right phone come from a broad class of phones such as vowels, stops, etc.?” “Is the left or right phone [k] or [m]?” 2. What is the training data for each phone state, p j ? (root node of DT)   All speech frames that align with the j th state of every triphone HMM that has p as the middle phone 3. What criterion is used at each node to find the best question to split the data on?   Find the question which partitions the states in the parent node so as to give the maximum increase in log likelihood

Likelihood of a cluster of states If a cluster of HMM states, S = {s 1 , s 2 , …, s M } consists of M states • and a total of K acoustic observation vectors are associated with S, { x 1 , x 2 …, x K } , then the log likelihood associated with S is: K X X L ( S ) = log Pr( x i ; µ S , Σ S ) γ s ( x i ) i =1 s ∈ S For a question q that splits S into S yes and S no , compute the • following quantity: ∆ q = L ( S q yes ) + L ( S q no ) − L ( S ) Go through all questions, find Δ q for each question q and choose • the question for which Δ q is the biggest Terminate when: Final Δ q is below a threshold or data associated • with a split falls below a threshold

WFSTs for ASR

WFST-based ASR System Acoustic   Context   Pronunciation   Language   Models Transducer Model Model Acoustic   Word   Triphones Monophones Words Indices Sequence

WFST-based ASR System Acoustic   Context   Pronunciation   Language   Models Transducer Model Model Acoustic   Word   Triphones Monophones Words Indices Sequence H a-a+b f 4 : ε f 1 : ε f 3 : ε f 0 : a-a+b ε f 2 : ε f 4 : ε f 6 : ε } a-b+b FST Union + One 3-state   Closure HMM for   . Resulting each   FST . triphone H . y-x+z

WFST-based ASR System Acoustic   Context   Pronunciation   Language   Models Transducer Model Model Acoustic   Word   Triphones Monophones Words Indices Sequence C . . b-c+x:b cx a-b+c:a � : b � : c ϵ ϵ o bc c ab b-c+a:b ca . .

WFST-based ASR System Acoustic   Context   Pronunciation   Language   Models Transducer Model Model Acoustic   Word   Triphones Monophones Words Indices Sequence L (a) t: ε /0.3 ax: ε /1 ey: ε /0.5 2 3 4 dx: ε /0.7 ae: ε /0.5 d:data/1 1 0 d:dew/1 uw: ε /1 5 6 (b) Figure reproduced from “Weighted Finite State Transducers in Speech Recognition”, Mohri et al., 2002

WFST-based ASR System Acoustic   Context   Pronunciation   Language   Models Transducer Model Model Acoustic   Word   Triphones Monophones Words Indices Sequence G are/0.693 walking birds/0.404 the 0 were/0.693 animals/1.789 is boy/1.789

Decoding Acoustic   Context   Pronunciation   Language   Models Transducer Model Model Acoustic   Word   Triphones Monophones Words Indices Sequence H C L G Carefully construct a decoding graph D using optimization algorithms: D = min(det(H ⚬ det(C ⚬ det(L ⚬ G)))) Given a test utterance O, how do I decode it?   Assuming ample compute, first construct the following machine X from O. f 0 :19.12 f 0 :18.52 f 0 :10.578 f 0 :9.21 If f i maps to state j,   f 1 :12.33 this is -log(b j (O i )) f i maps to a distinct f 1 :13.45 f 1 :5.645 f 1 :14.221 triphone HMM state ⠇ ⠇ ⠇ ⠇ ………… f 500 :20.21 f 500 :10.21 f 500 :8.123 f 500 :11.233 f 1000 :11.11 f 1000 :15.99 f 1000 :5.678 f 1000 :15.638 “Weighted Finite State Transducers in Speech Recognition”, Mohri et al., Computer Speech & Language, 2002

Pre-midsem Revision Lecture 11 CS 753 Instructor: Preethi Jyothi - PowerPoint PPT Presentation

Pre-midsem Revision Lecture 11 CS 753 Instructor: Preethi Jyothi Tied-state Triphone Models State Tying Observation probabilities are shared across triphone states which generate acoustically similar data b/a/k p/a/k b/a/g Triphone

Week 12 Revision Discrete Math May 14, 2020 Marie Demlova: Discrete Math Revision Revision

Revision! How can we help? Revision Technique Didnt bother to revise ? How do you revise?

REVISION GUIDES: How to use them effectively Miss A Humphries and Mr C Dawson Science revision

REVISION GUIDES: How to use them effectively Miss A Humphries and Mrs K Leafe Science revision

EWBS Receiving Module Specifications 1.00 Century Revision history Revision history Revision

Accounts Revision https://nsa.org.na Overview Revision of national accounts Performance

Effective Revision Techniques Mrs Poole DLA RE Revision skills Much more than simply reading,

Revision Python - Nick Reynolds April 7, 2017 Revision (~15 mins) This Class Quiz

B2 Symmetry and Relativity Revision 1 TT 2020 Revision notes Highlights basic things

Revision of Pharmaceutical Affairs Law (PAL) - Japan Update - Revision of Pharmaceutical

1 ReVision Energy presentation to SMMC Energy Team 3-13-2014 Sam LaValle of ReVision

EWBS Receiving Module Communication specifications v1.00 Century Revision history Revision

Using Revision Control In Vivado Tim Vanevenhoven Overview of revision control Recent

SoM Curriculum Revision The Curriculum Revision Committee 9/19/2017 Who is the Committee

Year 8 Study Skills Study Skills - Successful Revision Strategies 1. Organisation 2. Revision

Year 11 Parents Evening Revision and Study Skills The Importance of Revision Well organised,

Smoke free environment policy A guide to managing staff in a smoke free environment Smoke free

Operational Semantics Part I Jim Royer CIS 352 February 12, 2019 1 / 22 [Syntax] [Big Steps]

ForCES Applicability Statement draft-crouch-forces-applicability-00.txt Mark Handley Alan Crouch

T HE L OGIC OF A TOMIC S ENTENCES : P ROOFS OF ( IN )V ALIDITY Wednesday, 1 September Wednesday,

The isomorphism problem for subshifts John D. Clemens Department of Mathematics Penn State

Case Study: Gauss-Seidel on Multicores Erik Hagersten Uppsala University Sweden Thanks: Dan

Jean Monnet Chair Small Area Methods for Monitoring of

Medicaid Network Adequacy A Proactive Approach to Ensuring and Demonstrating Compliance Speaker: