Language Modeling for Speech Recognition in Agglutinative Languages - PowerPoint PPT Presentation

✬ ✩ Language Modeling for Speech Recognition in Agglutinative Languages Ebru Arısoy Murat Sara¸ clar September 13, 2007 ✫ ✪ B¨ US˙ IM – Bo˘ gazi¸ ci University Signal and Image Processing Laboratory

✬ ✩ Outline • Agglutinative languages – Main characteristics – Challenges in terms of Automatic Speech Recognition (ASR) • Sub-word language language modeling units • Our approaches – Lattice Rescoring/Extension – Lexical form units • Experiments and Results • Conclusion • Ongoing Research at OGI • Demonstration videos ✫ ✪ B¨ US˙ IM – Bo˘ gazi¸ ci University Signal and Image Processing Laboratory 2

✬ ✩ Agglutinative Languages • Main characteristic: Many new words can be derived from a single stem by addition of suffixes to it one after another. • Examples: Turkish, Finnish, Estonian, Hungarian... Concatenative morphology ( in Turkish ): ∗ nominal inflection: ev+im+de+ki+ler+den (one of those that were in my house) ∗ verbal inflection: yap+tır+ma+yabil+iyor+du+k (It was possible that we did not make someone do it) • Other characteristics: Free word order, Vowel harmony ✫ ✪ B¨ US˙ IM – Bo˘ gazi¸ ci University Signal and Image Processing Laboratory 3

✬ ✩ Agglutinative Languages – Challenges for LVCSR (Vocabulary Explosion) 2 1.8 h s i n n 1.6 i F Estonian Unique words [million words] 1.4 1.2 1 0.8 h s i k r u T 0.6 0.4 h g l i s E n 0.2 0 0 4 8 12 16 20 24 28 32 36 40 44 Corpus size [million words] • Moderate vocabulary (50K) results in OOV words. • Huge vocabulary ( > 200K) suffers from non-robust language model estimates. (Thanks to Mathias Creutz for the Figure) ✫ ✪ B¨ US˙ IM – Bo˘ gazi¸ ci University Signal and Image Processing Laboratory 4

✬ ✩ Agglutinative Languages – Challenges for LVCSR (Free Word Order) • The order of constitutes can be changed without affecting the grammaticality of the sentence. Examples ( in Turkish ): – The most common order is the SOV type (Erguvanlı, 1979) . – The word which will be emphasized is placed just before the verb (Oflazer and Boz¸ sahin, 1994) . Ben ¸ cocu˘ ga kitabi verdim (I gave the book to the children) C ¸ ocuga kitabi ben verdim (It was me who gave the child the book) Ben kitabi ¸ cocuga verdim (It was the child to whom I gave the book) Challenges: – Free word order causes “sparse data”. – Sparse data results in “non-robust” N-gram estimates. ✫ ✪ B¨ US˙ IM – Bo˘ gazi¸ ci University Signal and Image Processing Laboratory 5

✬ ✩ Agglutinative Languages – Challenges for LVCSR (Vowel Harmony) • The first vowel of the morpheme must be compatible with the last vowel of the stem. Examples( in Turkish ): – Stem ending with back/front vowel takes a suffix starting with back/front vowel. ✓ a˘ ✓ ¸ ga¸ c+lar (trees) ci¸ cek+ler (flowers) – There are some exceptions: ✘ ampul+ler (lamps) Challenges: – No problem with words !!! – If sub-words are used as language modeling units: ∗ Words will be generated from sub-word sequences. ∗ Sub-word sequences may result in ugrammatical items ✫ ✪ B¨ US˙ IM – Bo˘ gazi¸ ci University Signal and Image Processing Laboratory 6

✬ ✩ Words vs. Sub-words • Using words as language modeling units: ✘ Vocabulary growth − > Higher OOV rates. ✘ Data sparseness − > non-robust language model estimates. • Using sub-words as language modeling units: (Sub-words must be “meaningful units” for ASR !!!) ✓ Handle OOV problem. ✓ Handle data sparseness. ✘ Results in ungrammatical, over generated items. ✫ ✪ B¨ US˙ IM – Bo˘ gazi¸ ci University Signal and Image Processing Laboratory 7

✬ ✩ Our Research • Our Aim: – To handle “data sparseness” ∗ Root-based models ∗ Class-based models – To handle “OOV words” ∗ Vocabulary extension for words ∗ Sub-words recognition units – To handle “over generation” by sub-word approaches ∗ Vocabulary extension for sub-words ∗ Lexical sub-word models ✫ ✪ B¨ US˙ IM – Bo˘ gazi¸ ci University Signal and Image Processing Laboratory 8

✬ ✩ Modifications to Word-based Model (Arisoy and Saraclar, 2006) 5 x 10 7 6 Number of distinct units Words 5 4 3 2 Roots 1 0 0 0.5 1 1.5 2 2.5 Number of sentences 6 x 10 • Root-based Language Models Main idea: Roots can capture regularities better than words P ( w 3 | w 2 , w 1 ) ≈ P ( r ( w 3 ) | r ( w 2 ) , r ( w 1 )) • Class-based Language Models Main idea: To handle data sparseness by grouping words P ( w 3 | w 2 , w 1 ) = P ( w 3 | r ( w 3 )) ∗ P ( r ( w 3 ) | r ( w 2 ) , r ( w 1 )) ✫ ✪ B¨ US˙ IM – Bo˘ gazi¸ ci University Signal and Image Processing Laboratory 9

✬ ✩ Modifications to Word-based Model (Arisoy and Saraclar, 2006) • Vocabulary Extension (Geutner et al., 1998) Main idea: To extend the utterance lattice with similar words, then perform second pass recognition with a larger vocabulary language model – Similarity criterion: “having the same root” – A single language model is generated using all the types (683K) in the training corpus. ✫ ✪ B¨ US˙ IM – Bo˘ gazi¸ ci University Signal and Image Processing Laboratory 10

✬ ✩ Modifications to Word-based Model • Vocabulary Extension sat:satIStan sat:satISlar sat:satIS sen:senin sen:sen fatma:fatmaya fatma:fatmanIn fatma:fatma fatura:faturaya fatura:faturanIn fatura:faturasIz fatura:fatura 2 satIS:sat sen:sen fatura:fatura 3 0 1 satIS:sat 0 fatma:fatma fatura satIS faturasIz 2 satISlar sen satIStan senin faturanIn satISlar 3 faturaya 0 1 satIStan fatma fatmanIn fatmaya satIS ✫ ✪ B¨ US˙ IM – Bo˘ gazi¸ ci University Signal and Image Processing Laboratory 11

✬ ✩ Sub-Word Approaches (Background) • Morpheme model: – Require linguistic knowledge (Morphological analyzer) Morphemes: kes il di ˘ gi # an dan # itibaren • Stem-ending model: – Require linguistic knowledge (Morphological analyzer, stemmer) Stem-endings: kes ildi˘ gi # an dan # itibaren ✫ ✪ B¨ US˙ IM – Bo˘ gazi¸ ci University Signal and Image Processing Laboratory 12

✬ ✩ Sub-Word Approaches (Background) • Statistical morph model (Creutz and Lagus, 2005): – Main idea: To find an optimal encoding of the data with concise lexicon and the concise representation of corpus. ∗ Unsupervised ∗ Data-driven ∗ Minimum Description Length (MDL) Morphemes: kes il di ˘ gi # an dan # itibaren Morphs: kesil di˘ gi # a ndan # itibar en ✫ ✪ B¨ US˙ IM – Bo˘ gazi¸ ci University Signal and Image Processing Laboratory 13

✬ ✩ Sub-Word Approaches • Statistical Morph model is used as the sub-word approach. – Dynamic vocabulary extension is applied to handle ungrammatical items. • Lexical stem ending models are proposed as a novel approach. – Lexical to surface form mapping ensures correct surface form alternations. ✫ ✪ B¨ US˙ IM – Bo˘ gazi¸ ci University Signal and Image Processing Laboratory 14

✬ ✩ Modifications to Morph-based Model (Arisoy and Saraclar, 2006) • Vocabulary Extension Motivation: – 159 morph sequences out of 6759 do not occur in the fallback (683K) lexicon. Only 19 are correct Turkish words. – Common Errors: Wrong word boundary, incorrect morphotactics, meaningless sequences – Simply removing non-lexical arcs from the lattice increases WER by 1.8%. Main idea: To remove non-vocabulary items with a mapping from morph sequences to grammatically correct similar words, then perform second pass recognition. – Similarity criterion is “having the same first morph” ✫ ✪ B¨ US˙ IM – Bo˘ gazi¸ ci University Signal and Image Processing Laboratory 15

✬ ✩ Modifications to Morph-based Model (Arisoy and Saraclar, 2006) • Vocabulary Extension 4 tik sa fatura sa fatura sI <WB> 0 1 2 5 sek 0 1 2 3 sek sen sektik sekik fatura satI faturasIz satIS 0 1 2 faturanIn satISlar faturaya satIstan seki sekiz ✫ ✪ B¨ US˙ IM – Bo˘ gazi¸ ci University Signal and Image Processing Laboratory 16

✬ ✩ Lexical Stem-ending Model (Arisoy et al., 2007) Motivation: • Same stems and morphemes in lexical form may have different phonetic realizations Surface form: ev-ler (houses) kitap-lar (books) Lexical from: ev-lAr kitap-lAr Advantages: • Lexical forms capture the suffixation process better. • In lexical to surface mapping; – compatibility of vowels is enforced. – correct morphophonemic is enforced regardless of morphotactics. ✫ ✪ B¨ US˙ IM – Bo˘ gazi¸ ci University Signal and Image Processing Laboratory 17

✬ ✩ Comparison of Language Modeling Units Lexicon Size Word OOV Rate (%) Words 50K 9.3 Morphs 34.7K 0 Stem-endings Surf: 50K (40.4K roots) 2.5 Lex: 50K (45.0K roots) 2.2 ✫ ✪ B¨ US˙ IM – Bo˘ gazi¸ ci University Signal and Image Processing Laboratory 18

Language Modeling for Speech Recognition in Agglutinative Languages - PowerPoint PPT Presentation

Language Modeling for Speech Recognition in Agglutinative Languages Ebru Arsoy Murat Sara clar September 13, 2007 B US IM Bo gazi ci University Signal and Image Processing Laboratory Outline

8-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

HMMS and Speech HMMS and Speech HMMS and Speech Recognition Recognition Recognition Presented

EECS E6870 converting speech to text Speech Recognition automatic speech recognition

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Acoustic

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 25: Speech

Combining Speech and Speaker Recognition - A Joint Modeling Approach Hang Su Supervised by:

Speech recognition Brief history Technology Computer Literacy 1 Lecture 22 How does

Speech Processing 15-492/18-492 Speech Recognition Intro Acoustic modelling HMMs Speech

Speech and Language CS 188: Artificial Intelligence Spring 2011 Speech technologies

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

Speech Processing 15-492/18-492 Speech Recognition Acoustic modeling Pronunciation dictionary

EE E6820: Speech & Audio Processing & Recognition Lecture 5: Speech modeling and

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 15: Language

Speech Processing 15-492/18-492 Speech Recognition Template matching Speech Recognition by

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 23: Speech

What have we learned about the global carbon cycle from GOSAT and OCO-2 ? David Baker, Andy

Pragmatically determined word order in Cherokee and its exceptions Brian Hsu and Benjamin Frey

Lava I Mary Sheeran, Thomas Hallgren Chalmers University of Technology Where are we? Take a

Yoga Alliance - Tue 7/28 10am (USYOGA2807B) Closed Captioning/ Transcript Disclaimer Closed

Directors of Graduate Studies Meeting Minutes Wednesday, October 11, 2017 3:30 p.m.-5:00 p.m.,

AA Based on work with Maarten Bu ffi ng and Markus Diehl MPI@LHC - San Cristbal de las

Status of the CBM experiment Claudia Hhne, GSI Darmstadt Features of the the phase phase

You Only Live Multiple Times Black box re-use of Crash-Stop Algorithms In Realistic Crash-Recovery

Sambuz

Useful Links

Newsletter

Mail Us

Language Modeling for Speech Recognition in Agglutinative Languages - PowerPoint PPT Presentation

Language Modeling for Speech Recognition in Agglutinative Languages Ebru Arsoy Murat Sara clar September 13, 2007 B US IM Bo gazi ci University Signal and Image Processing Laboratory Outline

8-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

HMMS and Speech HMMS and Speech HMMS and Speech Recognition Recognition Recognition Presented

EECS E6870 converting speech to text Speech Recognition automatic speech recognition

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Acoustic

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 25: Speech

Combining Speech and Speaker Recognition - A Joint Modeling Approach Hang Su Supervised by:

Speech recognition Brief history Technology Computer Literacy 1 Lecture 22 How does

Speech Processing 15-492/18-492 Speech Recognition Intro Acoustic modelling HMMs Speech

Speech and Language CS 188: Artificial Intelligence Spring 2011 Speech technologies

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

Speech Processing 15-492/18-492 Speech Recognition Acoustic modeling Pronunciation dictionary

EE E6820: Speech &amp; Audio Processing &amp; Recognition Lecture 5: Speech modeling and

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 15: Language

Speech Processing 15-492/18-492 Speech Recognition Template matching Speech Recognition by

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 23: Speech

What have we learned about the global carbon cycle from GOSAT and OCO-2 ? David Baker, Andy

Pragmatically determined word order in Cherokee and its exceptions Brian Hsu and Benjamin Frey

Lava I Mary Sheeran, Thomas Hallgren Chalmers University of Technology Where are we? Take a

Yoga Alliance - Tue 7/28 10am (USYOGA2807B) Closed Captioning/ Transcript Disclaimer Closed

Directors of Graduate Studies Meeting Minutes Wednesday, October 11, 2017 3:30 p.m.-5:00 p.m.,

AA Based on work with Maarten Bu ffi ng and Markus Diehl MPI@LHC - San Cristbal de las

Status of the CBM experiment Claudia Hhne, GSI Darmstadt Features of the the phase phase

You Only Live Multiple Times Black box re-use of Crash-Stop Algorithms In Realistic Crash-Recovery

Sambuz

Useful Links

Newsletter

Mail Us

EE E6820: Speech & Audio Processing & Recognition Lecture 5: Speech modeling and