Administrivia Main feedback from last lecture. Lecture 8 Mud: k - PowerPoint PPT Presentation

Administrivia Main feedback from last lecture. Lecture 8 Mud: k -means clustering. Lab 2 handed back today. LVCSR Decoding Answers: /user1/faculty/stanchen/e6870/lab2_ans/ . Bhuvana Ramabhadran, Michael Picheny, Stanley F. Chen Lab 3 due Thursday, 11:59pm. Next week: Election Day. IBM T.J. Watson Research Center Yorktown Heights, New York, USA Lab 4 out by then? {bhuvana,picheny,stanchen}@us.ibm.com 27 October 2009 ■❇▼ ■❇▼ EECS 6870: Speech Recognition LVCSR Decoding 27 October 2009 1 / 138 EECS 6870: Speech Recognition LVCSR Decoding 27 October 2009 2 / 138 The Big Picture Outline Weeks 1–4: Small vocabulary ASR. Part I: Introduction to LVCSR decoding, i.e. , search . Weeks 5–8: Large vocabulary ASR. Part II: Finite-state transducers. Week 5: Language modeling. Part III: Making decoding efficient. Week 6: Pronunciation modeling ⇔ acoustic modeling Part IV: Other decoding paradigms. for large vocabularies. Week 7: Training for large vocabularies. Week 8: Decoding for large vocabularies. Weeks 9–13: Advanced topics. ■❇▼ ■❇▼ EECS 6870: Speech Recognition LVCSR Decoding 27 October 2009 3 / 138 EECS 6870: Speech Recognition LVCSR Decoding 27 October 2009 4 / 138

Part I Decoding for LVCSR Introduction to LVCSR Decoding class ( x ) = arg max P ( ω | x ) ω P ( ω ) P ( x | ω ) = arg max P ( x ) ω = arg max P ( ω ) P ( x | ω ) ω Now that we know how to build models for LVCSR . . . n -gram models via counting and smoothing. CD acoustic models via complex recipes. How can we use them for decoding? ■❇▼ ■❇▼ EECS 6870: Speech Recognition LVCSR Decoding 27 October 2009 5 / 138 EECS 6870: Speech Recognition LVCSR Decoding 27 October 2009 6 / 138 Decoding: Small Vocabulary Issue: Are N -Gram Models WFSA’s? Take graph/WFSA representing language model. Yup. UH One state for each ( n − 1 ) -gram history ω . LIKE All paths ending in state ω . . . i.e. , all allowable word sequences. Are labeled with word sequence ending in ω . Expand to underlying HMM. State ω has outgoing arc for each word w . . . With arc probability P ( w | ω ) . LIKE UH Run the Viterbi algorithm! ■❇▼ ■❇▼ EECS 6870: Speech Recognition LVCSR Decoding 27 October 2009 7 / 138 EECS 6870: Speech Recognition LVCSR Decoding 27 October 2009 8 / 138

Bigram, Trigram LM’s Over Two Word Vocab Pop Quiz How many states in FSA representing n -gram model . . . w1/P(w1|w1) w2/P(w2|w2) With vocabulary size | V | ? How many arcs? w2/P(w2|w1) h=w1 h=w2 w1/P(w1|w2) w1/P(w1|w1,w2) w2/P(w2|w2,w2) h=w2,w2 w2/P(w2|w1,w2) w1/P(w1|w2,w2) w1/P(w1|w1,w1) h=w1,w2 w2/P(w2|w1,w1) h=w2,w1 w2/P(w2|w2,w1) w1/P(w1|w2,w1) h=w1,w1 ■❇▼ ■❇▼ EECS 6870: Speech Recognition LVCSR Decoding 27 October 2009 9 / 138 EECS 6870: Speech Recognition LVCSR Decoding 27 October 2009 10 / 138 Issue: Graph Expansion Context-Dependent Graph Expansion Word models. DH Replace each word with its HMM. AH CI phone models. D Replace each word with its phone sequence(s). AO Replace each phone with its HMM. G LIKE/P(LIKE|UH) How can we do context-dependent expansion? UH/P(UH|UH) h=UH UH/P(UH|LIKE) Handling branch points is tricky. LIKE/P(LIKE|LIKE) Other tricky cases. h=LIKE Words consisting of a single phone. Quinphone models. ■❇▼ ■❇▼ EECS 6870: Speech Recognition LVCSR Decoding 27 October 2009 11 / 138 EECS 6870: Speech Recognition LVCSR Decoding 27 October 2009 12 / 138

Triphone Graph Expansion Example Aside: Word-Internal Acoustic Models Simplify acoustic model to simplify graph expansion. DH Word-internal models. AH Don’t let decision trees ask questions across word D boundaries. AO Pad contexts with the unknown phone . G Hurts performance ( e.g. , coarticulation across words). As with word models, just replace each word with its HMM. DH_AH_DH AO_G_DH G_DH_AH AH_DH_AH AO_G_D DH_AH_D G_D_AO D_AO_G AH_D_AO ■❇▼ ■❇▼ EECS 6870: Speech Recognition LVCSR Decoding 27 October 2009 13 / 138 EECS 6870: Speech Recognition LVCSR Decoding 27 October 2009 14 / 138 Issue: How Big The Graph? Issue: How Slow Decoding? Trigram model ( e.g. , vocabulary size | V | = 2) In each frame, loop through every state in graph. If 100 frames/sec, 10 15 states . . . w1/P(w1|w1,w2) w2/P(w2|w2,w2) How many cells to compute per second? PC’s can do ∼ 10 10 floating-point ops per second. h=w2,w2 w2/P(w2|w1,w2) w1/P(w1|w2,w2) w1/P(w1|w1,w1) h=w1,w2 w2/P(w2|w1,w1) h=w2,w1 w2/P(w2|w2,w1) w1/P(w1|w2,w1) h=w1,w1 | V | 3 word arcs in FSA representation. Say words are ∼ 4 phones = 12 states on average. If | V | = 50000, 50000 3 × 12 ≈ 10 15 states in graph. PC’s have ∼ 10 9 bytes of memory. ■❇▼ ■❇▼ EECS 6870: Speech Recognition LVCSR Decoding 27 October 2009 15 / 138 EECS 6870: Speech Recognition LVCSR Decoding 27 October 2009 16 / 138

Part II Recap: Small vs. Large Vocabulary Decoding In theory, can use the same exact techniques. Finite-State Transducers In practice, three big problems: (Context-dependent) graph expansion is complicated. The decoding graph would be way too big. Decoding would be way too slow. ■❇▼ ■❇▼ EECS 6870: Speech Recognition LVCSR Decoding 27 October 2009 17 / 138 EECS 6870: Speech Recognition LVCSR Decoding 27 October 2009 18 / 138 A View of Graph Expansion A Framework for Rewriting Graphs Step 1: Take word graph as input. A general way of representing graph transformations? Convert into phone graph. Finite-state transducers (FST’s). Step 2: Take phone graph as input. A general operation for applying transformations to graphs? Convert into context-dependent phone graph. Composition. Step 3: Take context-dependent phone graph. Convert into HMM. ■❇▼ ■❇▼ EECS 6870: Speech Recognition LVCSR Decoding 27 October 2009 19 / 138 EECS 6870: Speech Recognition LVCSR Decoding 27 October 2009 20 / 138

Where Are We? Review: What is a Finite-State Acceptor? It has states. What Is an FST? 1 Exactly one initial state; one or more final states. It has arcs. Composition 2 Each arc has a label, which may be empty ( ǫ ). Ignore probabilities for now. FST’s, Composition, and ASR 3 c b 2 <epsilon> a Weights 4 3 1 a ■❇▼ ■❇▼ EECS 6870: Speech Recognition LVCSR Decoding 27 October 2009 21 / 138 EECS 6870: Speech Recognition LVCSR Decoding 27 October 2009 22 / 138 What Does an FSA Mean ? What is a Finite-State Transducer? The (possibly infinite) list of strings it accepts. It’s like a finite-state acceptor, except . . . We need this in order to define composition. Each arc has two labels instead of one. Things that don’t affect meaning. An input label (possibly empty). An output label (possibly empty). How labels are distributed along a path. Invalid paths. c:c b:a Are these equivalent? a:<epsilon> 2 <epsilon>:b 3 a 1 a:a <epsilon> a b ■❇▼ ■❇▼ EECS 6870: Speech Recognition LVCSR Decoding 27 October 2009 23 / 138 EECS 6870: Speech Recognition LVCSR Decoding 27 October 2009 24 / 138

What Does an FST Mean ? Terminology A (possibly infinite) list of pairs of strings . . . Finite-state acceptor (FSA): one label on each arc. An input string and an output string. Finite-state transducer (FST): input and output label on each arc. The gist of composition . Finite-state machine (FSM): FSA or FST. If string i 1 · · · i N occurs in input graph . . . And ( i 1 · · · i N , o 1 · · · o M ) occurs in transducer, . . . Also, finite-state automaton . Then string o 1 · · · o M occurs in output graph. ■❇▼ ■❇▼ EECS 6870: Speech Recognition LVCSR Decoding 27 October 2009 25 / 138 EECS 6870: Speech Recognition LVCSR Decoding 27 October 2009 26 / 138 Where Are We? The Composition Operation A simple and efficient algorithm for computing . . . What Is an FST? 1 Result of applying a transducer to an acceptor. Composing FSA A with FST T to get FSA A ◦ T . Composition 2 If string i 1 · · · i N ∈ A and . . . Input/output string pair ( i 1 · · · i N , o 1 · · · o M ) ∈ T , . . . FST’s, Composition, and ASR 3 Then string o 1 · · · o M ∈ A ◦ T . Weights 4 ■❇▼ ■❇▼ EECS 6870: Speech Recognition LVCSR Decoding 27 October 2009 27 / 138 EECS 6870: Speech Recognition LVCSR Decoding 27 October 2009 28 / 138

Rewriting a Single String A Single Way Rewriting a Single String A Single Way a b d a b d A A 1 2 3 4 1 2 3 4 d:D a:A b:B d:D T 1 2 3 4 c:C b:B a:A T 1 A B D A ◦ T 1 2 3 4 A B D A ◦ T 1 2 3 4 ■❇▼ ■❇▼ EECS 6870: Speech Recognition LVCSR Decoding 27 October 2009 29 / 138 EECS 6870: Speech Recognition LVCSR Decoding 27 October 2009 30 / 138 Transforming a Single String The Magic of FST’s and Composition Let’s say you have a string, e.g. , Let’s say you have a (possibly infinite) list of strings . . . Expressed as an FSA, as this is compact. THE DOG How to transform all strings in FSA in one go? Let’s say we want to apply a one-to-one transformation. e.g. , map words to their (single) baseforms. How to do one-to-many or one-to-zero transformations? Can we have the (possibly infinite) list of output strings . . . DH AH D AO G Expressed as an FSA, as this is compact? This is easy, e.g. , use sed or perl or . . . Fast? ■❇▼ ■❇▼ EECS 6870: Speech Recognition LVCSR Decoding 27 October 2009 31 / 138 EECS 6870: Speech Recognition LVCSR Decoding 27 October 2009 32 / 138

Administrivia Main feedback from last lecture. Lecture 8 Mud: k - PowerPoint PPT Presentation

Administrivia Main feedback from last lecture. Lecture 8 Mud: k -means clustering. Lab 2 handed back today. LVCSR Decoding Answers: /user1/faculty/stanchen/e6870/lab2_ans/ . Bhuvana Ramabhadran, Michael Picheny, Stanley F. Chen Lab 3 due

Administrivia CSCE150A CSCE150A Computer Science & Engineering 150A Administrivia Problem

Outline Administrivia Introduction to Machine Learning Greg Mori - CMPT 419/726 Machine

More Threads and Synchronization More Threads and Synchronization Administrivia Administrivia

Introduction to Machine Learning Greg Mori - CMPT 419/726 Bishop PRML Ch. 1 Administrivia

Administrivia Administrivia Nachos guide and Lab #1 are on the web.

CSCE 471/871 Lecture 0: Stephen Scott Administrivia Welcome Introduction What is Bioin-

Project 2 Soumya Basu Department of Computer Science Cornell University September 18, 2015

Ontology Engineering Administrivia and general information Maria Keet email: mkeet@cs.uct.ac.za

Administrivia Mini project is graded 1 st place: Justin (75.45) 2 nd place: Liia

Administrivia Website. cis.poly.edu/jsterling/cs3224 Text: Modern Operating Systems ;

Administrivia Mini project deadline: today Attach the capture of the evaluation run output

CS 744: Big Data Systems Shivaram Venkataraman Fall 2018 ADMINISTRIVIA - Waitlist/Enrollment

Modern Programming Languages (Seminar) Guido Salvaneschi Joscha Drechsler Outline

Provider EVV System Training December 21, 21, 2017 2017 Zoom Webinar - Administrivia

Plan for Today Administrivia come to office hours to start talking about possible projects

EECS E6870 - Speech Recognition Administrivia Lecture 11 Linear Discriminant Analysis

Di Digi gital and Analog og Si Sign gnals 01219335 Data Acquisition and Integration Chaipo

1 Transducers Inherently Discrete values devices that convert from one representation to

Tree Models Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net

Physique des Systmes Instrumentaux :RMFLHFK 'XOLQVNL 3L[HO

Type-1 Interferon and SARS-CoV-2 Literature update 16 Nov 2020 Guido Vanham 16/11/2020 1

Integrated Retiming and Simultaneous Vdd/Vth Scaling for Total Power Minimization Mongkol

ATLAS MDT ASD_V4 Design May 29 th , 2017 Federica Resta Marcello De Matteis

From the Desktop to the Grid: Conversion of KNIME Workflows

Sambuz

Useful Links

Newsletter

Mail Us

Administrivia Main feedback from last lecture. Lecture 8 Mud: k - PowerPoint PPT Presentation

Administrivia Main feedback from last lecture. Lecture 8 Mud: k -means clustering. Lab 2 handed back today. LVCSR Decoding Answers: /user1/faculty/stanchen/e6870/lab2_ans/ . Bhuvana Ramabhadran, Michael Picheny, Stanley F. Chen Lab 3 due

Administrivia CSCE150A CSCE150A Computer Science &amp; Engineering 150A Administrivia Problem

Outline Administrivia Introduction to Machine Learning Greg Mori - CMPT 419/726 Machine

More Threads and Synchronization More Threads and Synchronization Administrivia Administrivia

Introduction to Machine Learning Greg Mori - CMPT 419/726 Bishop PRML Ch. 1 Administrivia

Administrivia Administrivia Nachos guide and Lab #1 are on the web.

CSCE 471/871 Lecture 0: Stephen Scott Administrivia Welcome Introduction What is Bioin-

Project 2 Soumya Basu Department of Computer Science Cornell University September 18, 2015

Ontology Engineering Administrivia and general information Maria Keet email: mkeet@cs.uct.ac.za

Administrivia Mini project is graded 1 st place: Justin (75.45) 2 nd place: Liia

Administrivia Website. cis.poly.edu/jsterling/cs3224 Text: Modern Operating Systems ;

Administrivia Mini project deadline: today Attach the capture of the evaluation run output

CS 744: Big Data Systems Shivaram Venkataraman Fall 2018 ADMINISTRIVIA - Waitlist/Enrollment

Modern Programming Languages (Seminar) Guido Salvaneschi Joscha Drechsler Outline

Provider EVV System Training December 21, 21, 2017 2017 Zoom Webinar - Administrivia

Plan for Today Administrivia come to office hours to start talking about possible projects

EECS E6870 - Speech Recognition Administrivia Lecture 11 Linear Discriminant Analysis

Di Digi gital and Analog og Si Sign gnals 01219335 Data Acquisition and Integration Chaipo

1 Transducers Inherently Discrete values devices that convert from one representation to

Tree Models Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net

Physique des Systmes Instrumentaux :RMFLHFK 'XOLQVNL 3L[HO

Type-1 Interferon and SARS-CoV-2 Literature update 16 Nov 2020 Guido Vanham 16/11/2020 1

Integrated Retiming and Simultaneous Vdd/Vth Scaling for Total Power Minimization Mongkol

ATLAS MDT ASD_V4 Design May 29 th , 2017 Federica Resta Marcello De Matteis

From the Desktop to the Grid: Conversion of KNIME Workflows

Sambuz

Useful Links

Newsletter

Mail Us

Administrivia CSCE150A CSCE150A Computer Science & Engineering 150A Administrivia Problem