Learning a Language Model from Continuous Speech Graham Neubig, - PowerPoint PPT Presentation

Learning a Language Model from Continuous Speech Learning a Language Model from Continuous Speech Graham Neubig, Masato Mimura, Shinsuke Mori, Tatsuya Kawahara School of Informatics, Kyoto University, Japan 1

Learning a Language Model from Continuous Speech 1. Outline 2

Learning a Language Model from Continuous Speech Training of a Speech Recongition System Text Corpus this is the song that never ends it just Speech goes on and on my friends and if you started singing it not knowing what it was you'll just keep singing it forever just because this is the song that never ends it just goes on and on my friends and if Language you started singing it not knowing what it Training was you'll just keep singing it forever just Model because this is the song that never ends it just goes on and on my friends and if you started singing it not knowing what it was you'll just keep singing it forever just because this is the song that never ends it just goes on and on my friends and if... Decoder Speech Acoustic Training Model Transcription this is the song that never ends it just goes on and on my friends and if you started singing it not knowing what it was 3 you'll just keep singing it forever just because this is the song that never ends it just goes on and on my friends and if...

Learning a Language Model from Continuous Speech Training of a Speech Recongition System Text Corpus this is the song that never ends it just Speech goes on and on my friends and if you started singing it not knowing what it was you'll just keep singing it forever just because this is the song that never ends it just goes on and on my friends and if Language you started singing it not knowing what it Training was you'll just keep singing it forever just Model because this is the song that never ends it just goes on and on my friends and if you started singing it not knowing what it was you'll just keep singing it forever just because this is the song that never ends it just goes on and on my friends and if... Decoder Speech Acoustic Training Model Transcription this is the song that never ends it just goes on and on my friends and if you started singing it not knowing what it was 4 you'll just keep singing it forever just because this is the song that never ends it just goes on and on my friends and if...

Learning a Language Model from Continuous Speech Training of a Speech Recongition System Speech Language Training Model Decoder Speech Acoustic Training Model Transcription this is the song that never ends it just goes on and on my friends and if you started singing it not knowing what it was 5 you'll just keep singing it forever just because this is the song that never ends it just goes on and on my friends and if...

Learning a Language Model from Continuous Speech Why Learn a Language Model from Speech? ● A straightforward way to handle spoken language ● Fillers, colloquial expressions, and pronunciation variants are included in the model ● A way to learn models for resource-poor languages ● LMs can be learned even for languages with no digitized text ● Use with language-independent acoustic models? [Schultz & Waibel 01] ● Semi-supervised Learning ● Learn a model from newspaper text, update it with spoken expressions or new vocabulary from speech 6

Learning a Language Model from Continuous Speech Our Research ● Goal: Learn a LM using no text ● Two problems: ● Word boundaries are not clear → use unsupervised word segmentation ● Acoustic ambiguity→Use a phoneme lattice to absorb acoustic model errors ● Method: Apply a Bayesian word segmentation method [Mochihashi+ 09] to phoneme lattices ● Implementation using weighted finite state transducers (WFST) ● Result: An LM learned from continuous speech was able to significantly reduce the ASR phoneme error rate on test data 7

Learning a Language Model from Continuous Speech Previous Research ● Learning words from speech ● Using audio/visual data and techniques such as MMI or MDL, learn grounded words [Roy+ 02, Taguchi+ 09] ● Find similar audio segments using dynamic time warping and acoustic similarity scores [Park+ 08] ● Learning language models from speech ● Use standard LM learning techniques on 1-best AM results [de Marcken 95, Gorin+ 99] ● Multigram model from acoustic lattices [Driesen+ 08] ● No research learning n-gram LMs with acoustic uncertainty ● Most work handles small vocabulary (infant directed speech, digit recognition) 8

Learning a Language Model from Continuous Speech 2. Unsupervised word segmentation 9

Learning a Language Model from Continuous Speech LM-based Supervised Word Segmentation ● Training: Use corpus W that is annotated with word boundaries to train model G ● Decoding: for character sequence x , treat all word sequences w as possible candidates ● The probability of a candidate is proportional to its LM probability P( w= iam; G) P( w= i am; G) Language x =iam Model G P( w= ia m; G) P( w =i a m; G) 10

Learning a Language Model from Continuous Speech LM-Based Unsupervised Word Segmentation ● Estimate an unobserved word sequence W of unsegmented corpus X, train language model G over W ● We desire a model that is highly expressive, but simple ● Likelihood P(W|G) prefers expressive (complex) models ● Add a prior P(G) that prefers simple models ● Find a model with high joint probability P(G,W)=P(G)P(W|G) Simple Model Ideal Model Complex model P(G) high P(G) mid P(G) low P(W|G) low P(W|G) mid P(W|G) high 11 P(G)P(W|G) low P(G)P(W|G) mid P(G)P(W|G) low

Learning a Language Model from Continuous Speech Hierarchical Pitman-Yor Language Model (HPYLM) [Teh 06] ● An n-gram language model based on non-parametric Bayesian statistics ● Has a number of attractive traits ● Language model smoothing is realized through prior P(G) ● Parameters can be learned using Gibbs sampling … … PY(H a , d 3 , Θ 3 ) ~ ~ PY(H b , d 3 , Θ 3 ) H ba H ca H ab H db … H a H b ~ PY(H ε , d 2 , Θ 2 ) H ε ~ PY(H base , d 1 , Θ 1 ) 12

Learning a Language Model from Continuous Speech Unsupervised Word Segmentation using HPYLMs [Mochihashi+ 09] ● The model G is separated into a word-based language model LM and a character-based spelling model SM ● Words and spellings are connected in a probabilistic framework (unknown words can be modeled) i am in chiba now P LM (i|<s>) P LM (am|i) P LM (in|am) P LM (<unk>|in) P LM (now|<unk>) P LM (</s>|now) P SM (c|<s>) P SM (h|c) P SM (i|h) P SM (b|i) P SM (a|b) P SM (</s>|a) ● It is possible to sample word boundaries using a technique called forward-filtering/backward-sampling ● Can be used with any (non-cyclic) finite-state automaton 13 ● Very similar to the forward-backward algorithm for HMMs

Learning a Language Model from Continuous Speech Forward Filtering ● Forward filtering is identical to the forward step in the forward-backward algorithm p(s 3 |s 1 ) s 1 s 3 p(s 5 |s 3 ) p(s 1 |s 0 ) p(s 3 |s 2 ) s 0 s 5 p(s 4 |s 1 ) p(s 2 |s 0 ) p(s 5 |s 4 ) s 2 s 4 p(s 4 |s 2 ) forward filtering add forward probabilities in order 14

Learning a Language Model from Continuous Speech Forward Filtering ● Forward filtering is identical to the forward step in the forward-backward algorithm p(s 3 |s 1 ) s 1 s 3 p(s 5 |s 3 ) p(s 1 |s 0 ) p(s 3 |s 2 ) s 0 s 0 s 5 p(s 4 |s 1 ) p(s 2 |s 0 ) p(s 5 |s 4 ) s 2 s 4 p(s 4 |s 2 ) forward filtering add forward probabilities in order f(s 0 ) = 1 15

Learning a Language Model from Continuous Speech Forward Filtering ● Forward filtering is identical to the forward step in the forward-backward algorithm p(s 3 |s 1 ) s 1 s 1 s 3 p(s 5 |s 3 ) p(s 1 |s 0 ) p(s 3 |s 2 ) s 0 s 5 p(s 4 |s 1 ) p(s 2 |s 0 ) p(s 5 |s 4 ) s 2 s 4 p(s 4 |s 2 ) forward filtering add forward probabilities in order f(s 0 ) = 1 f(s 1 ) = p(s 1 |s 0 )*f(s 0 ) 16

Learning a Language Model from Continuous Speech Forward Filtering ● Forward filtering is identical to the forward step in the forward-backward algorithm p(s 3 |s 1 ) s 1 s 3 p(s 5 |s 3 ) p(s 1 |s 0 ) p(s 3 |s 2 ) s 0 s 5 p(s 4 |s 1 ) p(s 2 |s 0 ) p(s 5 |s 4 ) s 2 s 2 s 4 p(s 4 |s 2 ) forward filtering add forward probabilities in order f(s 0 ) = 1 f(s 1 ) = p(s 1 |s 0 )*f(s 0 ) f(s 2 ) = p(s 2 |s 0 )*f(s 0 ) 17

Learning a Language Model from Continuous Speech Graham Neubig, - PowerPoint PPT Presentation

Learning a Language Model from Continuous Speech Learning a Language Model from Continuous Speech Graham Neubig, Masato Mimura, Shinsuke Mori, Tatsuya Kawahara School of Informatics, Kyoto University, Japan 1 Learning a Language Model from

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

Speech and Language CS 188: Artificial Intelligence Spring 2011 Speech technologies

Speech and Language CS 188: Artificial Intelligence Speech technologies Automatic

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2003 References: 1. Speech and

HMMS and Speech HMMS and Speech HMMS and Speech Recognition Recognition Recognition Presented

EECS E6870 converting speech to text Speech Recognition automatic speech recognition

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody

Project Overview Speech Speech Generation Generation Common Semantic Frame Speech Speech

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 25: Speech

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Acoustic

8-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches

GPU-Accelerated GPU-Accelerated Large Vocabulary Continuous Speech Recognition Large

Effective Open Source Speech Recognition in Your Application #kde-speech Peter Grasch

Here are the songs we sang this Sunday. This shows the song name, the artist who performed the

Generics Course Evaluations Exam Review Another way to make code more re-useful Collections

Audio Cover Song Identification: Beyond The Notes Chris Tralie Duke University ECE / Math

Software Requirements Requirements engineering Domain modeling Problem scoping /

GLAD: Learning Sparse Graph Recovery Le Song Georgia Tech Joint work with Harsh Shrivastava,

Conversion at Mu2e Hasung Song Advisor: Prof. Yury Kolomensky LBNL Flavor Group Mu2e

Dynamic Audio Power Management Lars-Peter Clausen Analog Devices What is DAPM? Oh, it's

Host Device USB 1.0/1.1 USB 2.0 USB 3.0