Improved subword modeling for WFST-based speech recognition Peter - - PowerPoint PPT Presentation

improved subword modeling for wfst based speech
SMART_READER_LITE
LIVE PREVIEW

Improved subword modeling for WFST-based speech recognition Peter - - PowerPoint PPT Presentation

Improved subword modeling for WFST-based speech recognition Peter Smit, Sami Virpioja, Mikko Kurimo Aalto University, Department of Signal Processing and Acoustics August 23, 2017 Research questions Subword modeling WFST implementation


slide-1
SLIDE 1

Improved subword modeling for WFST-based speech recognition

Peter Smit, Sami Virpioja, Mikko Kurimo

Aalto University, Department of Signal Processing and Acoustics

August 23, 2017

slide-2
SLIDE 2

Subword modeling WFST implementation Experiments Recap Future work

Research questions

How to do sound WFST modeling for subwords? How to reconstruct words from subwords? What is a good subword vocabulary?

Size of vocabulary? Segmentation method?

Improved subword modeling for WFST-based speech recognition 2/21 Peter Smit August 23, 2017 Aalto University

slide-3
SLIDE 3

Subword modeling WFST implementation Experiments Recap Future work

How big is your vocabulary?

# Word forms WSJ small LM 5.000 WSJ big LM 20.000 Native English Speaker 20.000 – 35.000 CMU dict 134.000

Improved subword modeling for WFST-based speech recognition 3/21 Peter Smit August 23, 2017 Aalto University

slide-4
SLIDE 4

Subword modeling WFST implementation Experiments Recap Future work

How big is your vocabulary?

# Word forms WSJ small LM 5.000 WSJ big LM 20.000 Native English Speaker 20.000 – 35.000 CMU dict 134.000 Finnish Adult >1.000.000 Finnish Text Collection >4.000.000

Improved subword modeling for WFST-based speech recognition 3/21 Peter Smit August 23, 2017 Aalto University

slide-5
SLIDE 5

Subword modeling WFST implementation Experiments Recap Future work

Is a big vocabulary a problem?

Current systems do support vocabularies >4M But: Out of vocabulary problems Data sparsity – valid words might only appear once Dimensionality problems (e.g. RNNLM input/output layers)

Improved subword modeling for WFST-based speech recognition 4/21 Peter Smit August 23, 2017 Aalto University

slide-6
SLIDE 6

Subword modeling WFST implementation Experiments Recap Future work

Subword modeling

Split words into smaller units Reduces vocabulary size Split either knowlegde-driven (e.g. grammatical morphs) or data-driven (e.g. Morfessor)

Improved subword modeling for WFST-based speech recognition 5/21 Peter Smit August 23, 2017 Aalto University

slide-7
SLIDE 7

Subword modeling WFST implementation Experiments Recap Future work

Subword marking and reconstruction

Style (abbreviation) Example boundary tag (<w>) <w> two <w> slipp er s <w> left-marked (+m) two slipp +er +s right-marked (m+) two slipp+ er+ s left+right-marked (+m+) two slipp+ +er+ +s

Improved subword modeling for WFST-based speech recognition 6/21 Peter Smit August 23, 2017 Aalto University

slide-8
SLIDE 8

Subword modeling WFST implementation Experiments Recap Future work

Subword marking and reconstruction

Style (abbreviation) Example boundary tag (<w>) <w> two <w> slipp er s <w> left-marked (+m) two slipp +er +s right-marked (m+) two slipp+ er+ s left+right-marked (+m+) two slipp+ +er+ +s two <w> slipp er s <w> Vocab size V + 1

Improved subword modeling for WFST-based speech recognition 6/21 Peter Smit August 23, 2017 Aalto University

slide-9
SLIDE 9

Subword modeling WFST implementation Experiments Recap Future work

Subword marking and reconstruction

Style (abbreviation) Example boundary tag (<w>) <w> two <w> slipp er s <w> left-marked (+m) two slipp +er +s right-marked (m+) two slipp+ er+ s left+right-marked (+m+) two slipp+ +er+ +s +two slipp +er +s Vocab size 2V

Improved subword modeling for WFST-based speech recognition 6/21 Peter Smit August 23, 2017 Aalto University

slide-10
SLIDE 10

Subword modeling WFST implementation Experiments Recap Future work

Subword marking and reconstruction

Style (abbreviation) Example boundary tag (<w>) <w> two <w> slipp er s <w> left-marked (+m) two slipp +er +s right-marked (m+) two slipp+ er+ s left+right-marked (+m+) two slipp+ +er+ +s two slipp +er +s Vocab size 4V

Improved subword modeling for WFST-based speech recognition 6/21 Peter Smit August 23, 2017 Aalto University

slide-11
SLIDE 11

Subword modeling WFST implementation Experiments Recap Future work

Subword problems

Restricting output of decoder to be valid (don’t start or end a sentence halfway a word) two slip+ +per+ +s +two slip+ per+ +s Word-position dependent phonemes Longer contexts are needed in language modeling

Improved subword modeling for WFST-based speech recognition 7/21 Peter Smit August 23, 2017 Aalto University

slide-12
SLIDE 12

Subword modeling WFST implementation Experiments Recap Future work

Original Lexicon FST (kaldi)

start 5 3 4 1 2

ǫ:ǫ

SIL:ǫ #a:ǫ AHs:a Wb:one Tb:two AHi:ǫ Ne:ǫ UWe:ǫ

Improved subword modeling for WFST-based speech recognition 8/21 Peter Smit August 23, 2017 Aalto University

slide-13
SLIDE 13

Subword modeling WFST implementation Experiments Recap Future work

Original Lexicon FST (kaldi)

start 5 3 4 1 2

ǫ:ǫ

SIL:ǫ #a:ǫ AHs:a Wb:one Tb:two AHi:ǫ Ne:ǫ UWe:ǫ Input symbol

Improved subword modeling for WFST-based speech recognition 8/21 Peter Smit August 23, 2017 Aalto University

slide-14
SLIDE 14

Subword modeling WFST implementation Experiments Recap Future work

Original Lexicon FST (kaldi)

start 5 3 4 1 2

ǫ:ǫ

SIL:ǫ #a:ǫ AHs:a Wb:one Tb:two AHi:ǫ Ne:ǫ UWe:ǫ Input symbol Phone position

Improved subword modeling for WFST-based speech recognition 8/21 Peter Smit August 23, 2017 Aalto University

slide-15
SLIDE 15

Subword modeling WFST implementation Experiments Recap Future work

Original Lexicon FST (kaldi)

start 5 3 4 1 2

ǫ:ǫ

SIL:ǫ #a:ǫ AHs:a Wb:one Tb:two AHi:ǫ Ne:ǫ UWe:ǫ Input symbol Output symbol Phone position

Improved subword modeling for WFST-based speech recognition 8/21 Peter Smit August 23, 2017 Aalto University

slide-16
SLIDE 16

Subword modeling WFST implementation Experiments Recap Future work

Original Lexicon FST (kaldi)

start 5 3 4 1 2

ǫ:ǫ

SIL:ǫ #a:ǫ AHs:a Wb:one Tb:two AHi:ǫ Ne:ǫ UWe:ǫ

Improved subword modeling for WFST-based speech recognition 8/21 Peter Smit August 23, 2017 Aalto University

slide-17
SLIDE 17

Subword modeling WFST implementation Experiments Recap Future work

Original Lexicon FST (kaldi)

start 5 3 4 1 2

ǫ:ǫ

SIL:ǫ #a:ǫ AHs:a Wb:one Tb:two AHi:ǫ Ne:ǫ UWe:ǫ

Improved subword modeling for WFST-based speech recognition 8/21 Peter Smit August 23, 2017 Aalto University

slide-18
SLIDE 18

Subword modeling WFST implementation Experiments Recap Future work

Original Lexicon FST (kaldi)

start 5 3 4 1 2

ǫ:ǫ

SIL:ǫ #a:ǫ AHs:a Wb:one Tb:two AHi:ǫ Ne:ǫ UWe:ǫ

Improved subword modeling for WFST-based speech recognition 8/21 Peter Smit August 23, 2017 Aalto University

slide-19
SLIDE 19

Subword modeling WFST implementation Experiments Recap Future work

Original Lexicon FST (kaldi)

start 5 3 4 1 2

ǫ:ǫ

SIL:ǫ #a:ǫ AHs:a Wb:one Tb:two AHi:ǫ Ne:ǫ UWe:ǫ

Improved subword modeling for WFST-based speech recognition 8/21 Peter Smit August 23, 2017 Aalto University

slide-20
SLIDE 20

Subword modeling WFST implementation Experiments Recap Future work

Original Lexicon FST (kaldi)

start 5 3 4 1 2

ǫ:ǫ

SIL:ǫ #a:ǫ AHs:a Wb:one Tb:two AHi:ǫ Ne:ǫ UWe:ǫ

Improved subword modeling for WFST-based speech recognition 8/21 Peter Smit August 23, 2017 Aalto University

slide-21
SLIDE 21

Subword modeling WFST implementation Experiments Recap Future work

Original Lexicon FST (kaldi)

start 1 2

ǫ:ǫ

$words SIL:ǫ #a:ǫ $words 1 start 4 2 3 Wb:one Tb:two AHs:a AHi:ǫ Ne:ǫ UWe:ǫ

Improved subword modeling for WFST-based speech recognition 9/21 Peter Smit August 23, 2017 Aalto University

slide-22
SLIDE 22

Subword modeling WFST implementation Experiments Recap Future work

Subword Lexicon FST

start 3 2 1 4 #c:<w> $prefix #b:ǫ SIL:<w> #a:ǫ $suffix $infix $words

Improved subword modeling for WFST-based speech recognition 10/21 Peter Smit August 23, 2017 Aalto University

slide-23
SLIDE 23

Subword modeling WFST implementation Experiments Recap Future work

Replace FST’s <w>: <w> two <w> slipp er s <w>

$words

two Tb UWe slipp Sb Li IHi Pe er ERs s Zs

$prefix

two Tb UWi slipp Sb Li IHi Pi er ERb s Zb

$suffix

two Ti UWe slipp Si Li IHi Pe er ERe s Ze

$infix

two Ti UWi slipp Si Li IHi Pi er ERi s Zi

Improved subword modeling for WFST-based speech recognition 11/21 Peter Smit August 23, 2017 Aalto University

slide-24
SLIDE 24

Subword modeling WFST implementation Experiments Recap Future work

Replace FST’s m+: two slipp+ er+ s

$words two Tb UWe s Zs $prefix slipp+ Sb Li IHi Pi er+ ERs $suffix two Ti UWe s Ze $infix slipp+ Si Li IHi Pi er+ ERi

Improved subword modeling for WFST-based speech recognition 12/21 Peter Smit August 23, 2017 Aalto University

slide-25
SLIDE 25

Subword modeling WFST implementation Experiments Recap Future work

Experiment Setup

AM: Finnish, Kaldi, TDNN, 150 hours, 425 speakers, clean read data (SPEECON) LM: Variable-order n-gram, Finnish Text Collection, 150M tokens, 4M word forms Test1: READ, SPEECON, clean, read, 20 speakers, 1 hours Test2: NEWS, Broadcast news, 5-10 speakers, 5 hours More experiments in the paper

Improved subword modeling for WFST-based speech recognition 13/21 Peter Smit August 23, 2017 Aalto University

slide-26
SLIDE 26

Subword modeling WFST implementation Experiments Recap Future work

Results – Different marking styles

Word Error Rate (%) devset Morfessor segmentation Optimized vocabulary size

NEWS READ Word 23.73 8.60 Naive +m 24.11 9.70 Naive +m+ 25.45 9.10 Proposed <w> 22.89 6.62 Proposed +m+ 22.96 6.55 Proposed +m 23.47 7.12 Proposed m+ 23.79 7.24

Improved subword modeling for WFST-based speech recognition 14/21 Peter Smit August 23, 2017 Aalto University

slide-27
SLIDE 27

Subword modeling WFST implementation Experiments Recap Future work

Results – Different segmentation methods

Word Error Rate (%) devset

NEWS Subword vocab size 5k 10k 15k Morfessor 23.02 22.82 22.79 Greedy Unigram 23.06 22.93 23.02 Byte Pair Encoding 23.18 23.17 23.17 Only minor differences between segmentation methods

Improved subword modeling for WFST-based speech recognition 15/21 Peter Smit August 23, 2017 Aalto University

slide-28
SLIDE 28

Subword modeling WFST implementation Experiments Recap Future work

Comparison previous results on Finnish and Estonian datasets

Eval-sets Word Subword Previous best et-bn-ak 17.48 18.28 et-bn-er 8.36 7.70 8.2 (Alumäe, 2014) fi-news 25.49 24.98 28.9 (Varjokallio, 2017) fi-phone 14.07 12.79 21.88 (Varjokallio, 2013) fi-read 11.11 9.44 13.3 (Alumäe, 2014)

Improved subword modeling for WFST-based speech recognition 16/21 Peter Smit August 23, 2017 Aalto University

slide-29
SLIDE 29

Subword modeling WFST implementation Experiments Recap Future work

Recap

Subword modeling is beneficial (or required) for languages with large vocabulary. In a WFST-based decoder the lexicon FST can be modified such that only valid subword sequences are allowed and position-dependent-phones are preserved. Experimental results show up to 23% improvement over word modeling, 28% improvement over naive subword modelling. The optimal marking style for subwords depends on language or dataset. Only small differences in performance between segmentation methods

Improved subword modeling for WFST-based speech recognition 17/21 Peter Smit August 23, 2017 Aalto University

slide-30
SLIDE 30

Introduction Methods Experiments Results Conclusions

Automatic Construction of the Finnish Parliament Speech Corpus

André Mansikkaniemi, Peter Smit and Mikko Kurimo

Aalto University, School of Electrical Engineering Department of Signal Processing and Acoustics

August 24, 2017

slide-31
SLIDE 31

Aalto system for the 2017 Arabic multi-genre broadcast challenge

Peter Smit, Siva Reddy Gangireddy, Seppo Enarvi, Sami Virpioja, Mikko Kurimo

Aalto University, Department of Signal Processing and Acoustics

December 20, 2017 - ASRU - pending review

slide-32
SLIDE 32

Character-based units for Unlimited Vocabulary Continuous Speech Recognition

Peter Smit, Siva Reddy Gangireddy, Seppo Enarvi, Sami Virpioja, Mikko Kurimo

Aalto University, Department of Signal Processing and Acoustics

December 20, 2017 - ASRU - pending review

slide-33
SLIDE 33

Subword modeling WFST implementation Experiments Recap Future work

Code

Code released under open source license

github.com/aalto-speech/subword-kaldi

Improved subword modeling for WFST-based speech recognition 21/21 Peter Smit August 23, 2017 Aalto University