Improved subword modeling for WFST-based speech recognition Peter - PowerPoint PPT Presentation

Improved subword modeling for WFST-based speech recognition Peter Smit, Sami Virpioja, Mikko Kurimo Aalto University, Department of Signal Processing and Acoustics August 23, 2017

Research questions Subword modeling WFST implementation Experiments How to do sound WFST modeling for subwords? Recap How to reconstruct words from subwords? Future work What is a good subword vocabulary? Size of vocabulary? Segmentation method? Improved subword modeling for WFST-based speech recognition 2/21 Peter Smit August 23, 2017 Aalto University

How big is your vocabulary? Subword modeling WFST # Word forms implementation WSJ small LM 5.000 Experiments WSJ big LM 20.000 Recap Future work Native English Speaker 20.000 – 35.000 CMU dict 134.000 Improved subword modeling for WFST-based speech recognition 3/21 Peter Smit August 23, 2017 Aalto University

How big is your vocabulary? Subword modeling WFST # Word forms implementation WSJ small LM 5.000 Experiments WSJ big LM 20.000 Recap Future work Native English Speaker 20.000 – 35.000 CMU dict 134.000 Finnish Adult >1.000.000 Finnish Text Collection >4.000.000 Improved subword modeling for WFST-based speech recognition 3/21 Peter Smit August 23, 2017 Aalto University

Is a big vocabulary a problem? Subword modeling WFST implementation Current systems do support vocabularies >4M Experiments Recap But: Future work Out of vocabulary problems Data sparsity – valid words might only appear once Dimensionality problems (e.g. RNNLM input/output layers) Improved subword modeling for WFST-based speech recognition 4/21 Peter Smit August 23, 2017 Aalto University

Subword modeling Subword modeling WFST implementation Experiments Split words into smaller units Recap Future work Reduces vocabulary size Split either knowlegde-driven (e.g. grammatical morphs) or data-driven (e.g. Morfessor) Improved subword modeling for WFST-based speech recognition 5/21 Peter Smit August 23, 2017 Aalto University

Subword marking and reconstruction Subword modeling WFST implementation Style (abbreviation) Example Experiments Recap boundary tag (<w>) <w> two <w> slipp er s <w> Future work left-marked (+m) two slipp +er +s right-marked (m+) two slipp+ er+ s left+right-marked (+m+) two slipp+ +er+ +s Improved subword modeling for WFST-based speech recognition 6/21 Peter Smit August 23, 2017 Aalto University

Subword marking and reconstruction Subword modeling WFST Style (abbreviation) Example implementation Experiments boundary tag (<w>) <w> two <w> slipp er s <w> Recap left-marked (+m) two slipp +er +s Future work right-marked (m+) two slipp+ er+ s left+right-marked (+m+) two slipp+ +er+ +s two <w> slipp er s <w> Vocab size V + 1 Improved subword modeling for WFST-based speech recognition 6/21 Peter Smit August 23, 2017 Aalto University

Subword marking and reconstruction Subword modeling WFST Style (abbreviation) Example implementation Experiments boundary tag (<w>) <w> two <w> slipp er s <w> Recap left-marked (+m) two slipp +er +s Future work right-marked (m+) two slipp+ er+ s left+right-marked (+m+) two slipp+ +er+ +s +two slipp +er +s Vocab size 2 V Improved subword modeling for WFST-based speech recognition 6/21 Peter Smit August 23, 2017 Aalto University

Subword marking and reconstruction Subword modeling WFST Style (abbreviation) Example implementation Experiments boundary tag (<w>) <w> two <w> slipp er s <w> Recap left-marked (+m) two slipp +er +s Future work right-marked (m+) two slipp+ er+ s left+right-marked (+m+) two slipp+ +er+ +s two slipp +er +s Vocab size 4 V Improved subword modeling for WFST-based speech recognition 6/21 Peter Smit August 23, 2017 Aalto University

Subword problems Subword modeling WFST implementation Restricting output of decoder to be valid (don’t start or Experiments end a sentence halfway a word) Recap two slip+ +per+ +s Future work +two slip+ per+ +s Word-position dependent phonemes Longer contexts are needed in language modeling Improved subword modeling for WFST-based speech recognition 7/21 Peter Smit August 23, 2017 Aalto University

Original Lexicon FST (kaldi) Subword modeling AHs:a WFST implementation Experiments Wb:one AHi: ǫ Ne: ǫ 3 4 Recap Future work Tb:two UWe: ǫ 5 ǫ : ǫ start 1 0 #a: ǫ SIL: ǫ 2 Improved subword modeling for WFST-based speech recognition 8/21 Peter Smit August 23, 2017 Aalto University

Original Lexicon FST (kaldi) Subword modeling WFST Input symbol implementation AHs:a Experiments Recap Future work Wb:one AHi: ǫ Ne: ǫ 3 4 Tb:two UWe: ǫ 5 ǫ : ǫ 1 0 start SIL: ǫ #a: ǫ 2 Improved subword modeling for WFST-based speech recognition 8/21 Peter Smit August 23, 2017 Aalto University

Original Lexicon FST (kaldi) Phone position Subword modeling WFST Input symbol implementation AHs:a Experiments Recap Future work Wb:one AHi: ǫ Ne: ǫ 3 4 Tb:two UWe: ǫ 5 ǫ : ǫ 1 0 start SIL: ǫ #a: ǫ 2 Improved subword modeling for WFST-based speech recognition 8/21 Peter Smit August 23, 2017 Aalto University

Original Lexicon FST (kaldi) Phone position Subword modeling WFST Input symbol Output symbol implementation AHs:a Experiments Recap Future work Wb:one AHi: ǫ Ne: ǫ 3 4 Tb:two UWe: ǫ 5 ǫ : ǫ 1 0 start SIL: ǫ #a: ǫ 2 Improved subword modeling for WFST-based speech recognition 8/21 Peter Smit August 23, 2017 Aalto University

Original Lexicon FST (kaldi) Subword modeling AHs:a WFST implementation Experiments Wb:one AHi: ǫ Ne: ǫ 3 4 Recap Future work Tb:two UWe: ǫ 5 ǫ : ǫ 1 0 start #a: ǫ SIL: ǫ 2 Improved subword modeling for WFST-based speech recognition 8/21 Peter Smit August 23, 2017 Aalto University

Original Lexicon FST (kaldi) $words Subword modeling WFST ǫ : ǫ implementation start 1 0 Experiments #a: ǫ SIL: ǫ Recap 2 Future work $words Wb:one AHi: ǫ Ne: ǫ 2 3 Tb:two UWe: ǫ start 1 4 0 AHs:a Improved subword modeling for WFST-based speech recognition 9/21 Peter Smit August 23, 2017 Aalto University

Subword Lexicon FST Subword modeling $words WFST implementation Experiments $infix Recap Future work $prefix $suffix #b: ǫ 1 2 3 0 start #a: ǫ SIL:<w> 4 #c:<w> Improved subword modeling for WFST-based speech recognition 10/21 Peter Smit August 23, 2017 Aalto University

Replace FST’s <w>: <w> two <w> slipp er s <w> Subword modeling $words $prefix WFST implementation two Tb UWe two Tb UWi Experiments slipp Sb Li IHi Pe slipp Sb Li IHi Pi Recap er ERs er ERb Future work s Zs s Zb $suffix $infix two Ti UWe two Ti UWi slipp Si Li IHi Pe slipp Si Li IHi Pi er ERe er ERi s Ze s Zi Improved subword modeling for WFST-based speech recognition 11/21 Peter Smit August 23, 2017 Aalto University

Replace FST’s m+: two slipp+ er+ s Subword modeling WFST $prefix $words implementation Experiments two Tb UWe slipp+ Sb Li IHi Pi Recap s Zs er+ ERs Future work $suffix $infix two Ti UWe slipp+ Si Li IHi Pi s Ze er+ ERi Improved subword modeling for WFST-based speech recognition 12/21 Peter Smit August 23, 2017 Aalto University

Experiment Setup Subword modeling AM: Finnish, Kaldi, TDNN, 150 hours, 425 speakers, WFST implementation clean read data (SPEECON) Experiments Recap LM: Variable-order n-gram, Finnish Text Collection, Future work 150M tokens, 4M word forms Test1: R EAD , SPEECON, clean, read, 20 speakers, 1 hours Test2: N EWS , Broadcast news, 5-10 speakers, 5 hours More experiments in the paper Improved subword modeling for WFST-based speech recognition 13/21 Peter Smit August 23, 2017 Aalto University

Improved subword modeling for WFST-based speech recognition Peter - PowerPoint PPT Presentation

Improved subword modeling for WFST-based speech recognition Peter Smit, Sami Virpioja, Mikko Kurimo Aalto University, Department of Signal Processing and Acoustics August 23, 2017 Research questions Subword modeling WFST implementation

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 4: WFST

The 2.5m Wide-Field Survey Telescope (WFST): Goals and Status XianZhong ZHENG

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Acoustic

Weighted Finite State Transducer (WFST) Efficient algorithms for various operations. Weights

Speech Processing 15-492/18-492 Speech Recognition Acoustic modeling Pronunciation dictionary

EE E6820: Speech & Audio Processing & Recognition Lecture 5: Speech modeling and

Deep Learning for Natural Language Processing Subword Representations for Sequence Models

EECS E6870 converting speech to text Speech Recognition automatic speech recognition

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody

Project Overview Speech Speech Generation Generation Common Semantic Frame Speech Speech

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 25: Speech

Lecture 16: Weighted Finite State Transducers (WFST) Mark Hasegawa-Johnson All content CC-SA 4.0

PRSA review: The Keepers Model Plan - setting the scene Meic Pierce Owen Edinburgh, 25 th July

Multilevel modeling: What's in a level? A position paper Mira Balaban, Michael Kifer , Igal

Mu Multi-order Attentive Ranki king Model fo for Se Sequential Recommendation Lu Yu 1 , Chuxu

Special Topics: Diffserv Model Xuan Chen Nov 22, 2002 1 Outline Diffserv architecture

Towards TVF 4 TVF 3 TVF 2 TVF 1 r log(Packet Value) r + D 3 D 3 r + D 3 + D 2 Core-Stateless

LTL Model Checking with Neco ukasz Fronc 1 Alexandre Duret-Lutz 2 e d IBISC, Universit

Markowitz Principles for Multi-Period Portfolio Selection Problems with Moments of any Order and

Modern portfolio theory IN TRODUCTION TO P ORTF OLIO AN ALYS IS IN P YTH ON Charlotte Werger

Improved subword modeling for WFST-based speech recognition Peter - PowerPoint PPT Presentation

Improved subword modeling for WFST-based speech recognition Peter Smit, Sami Virpioja, Mikko Kurimo Aalto University, Department of Signal Processing and Acoustics August 23, 2017 Research questions Subword modeling WFST implementation

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 4: WFST

The 2.5m Wide-Field Survey Telescope (WFST): Goals and Status XianZhong ZHENG

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Acoustic

Weighted Finite State Transducer (WFST) Efficient algorithms for various operations. Weights

Speech Processing 15-492/18-492 Speech Recognition Acoustic modeling Pronunciation dictionary

EE E6820: Speech &amp; Audio Processing &amp; Recognition Lecture 5: Speech modeling and

Deep Learning for Natural Language Processing Subword Representations for Sequence Models

EECS E6870 converting speech to text Speech Recognition automatic speech recognition

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody

Project Overview Speech Speech Generation Generation Common Semantic Frame Speech Speech

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 25: Speech

Lecture 16: Weighted Finite State Transducers (WFST) Mark Hasegawa-Johnson All content CC-SA 4.0

PRSA review: The Keepers Model Plan - setting the scene Meic Pierce Owen Edinburgh, 25 th July

Multilevel modeling: What's in a level? A position paper Mira Balaban, Michael Kifer , Igal

Mu Multi-order Attentive Ranki king Model fo for Se Sequential Recommendation Lu Yu 1 , Chuxu

Special Topics: Diffserv Model Xuan Chen Nov 22, 2002 1 Outline Diffserv architecture

Towards TVF 4 TVF 3 TVF 2 TVF 1 r log(Packet Value) r + D 3 D 3 r + D 3 + D 2 Core-Stateless

LTL Model Checking with Neco ukasz Fronc 1 Alexandre Duret-Lutz 2 e d IBISC, Universit

Markowitz Principles for Multi-Period Portfolio Selection Problems with Moments of any Order and

Modern portfolio theory IN TRODUCTION TO P ORTF OLIO AN ALYS IS IN P YTH ON Charlotte Werger

EE E6820: Speech & Audio Processing & Recognition Lecture 5: Speech modeling and