A New Perspective on October 11-12 Combining GMM and DNN - PowerPoint PPT Presentation

SLSP-2016 A New Perspective on October 11-12 Combining GMM and DNN Frameworks for Speaker Adaptation Natalia Tomashenko 1,2,3 natalia.tomashenko@univ-lemans.fr Statistical Language and Speech Processing Yuri Khokhlov 3 khokhlov@speechpro.com 1 University of Le Mans, France Yannick Esteve 1 2 ITMO University, Saint-Petersburg, Russia yannick.esteve@univ-lemans.fr 3 STC-innovations Ltd, Saint-Petersburg, Russia

Outline 1. Introduction • Speaker adaptation • GMM vs DNN acoustic models • GMM adaptation • DNN adaptation: related work • Combining GMM and DNN in speech recognition 2. Proposed approach for speaker adaptation: GMM-derived features 3. System fusion 4. Experiments 5. Conclusions 6. Future work 2

Adaptation: Motivation Sources of Why do we need adaptation? speech variability Differences between training and testing conditions may significantly degrade Speaker Environment recognition accuracy in speech recognition systems. Adaptation is an efficient way to reduce gender, age, channel, the mismatch between the models and emotional state, background the data from a particular speaker or speaking rate, noises, channel. accent, style,… reverberation 4

Speaker adaptation The adaptation of pre-existing models towards the optimal recognition of a new target speaker using limited adaptation data from the target speaker Adaptation General speaker independent (SI) Speaker adapted acoustic models, acoustic models trained on a large obtained from the SI model using corpus of acoustic data from different data of a new speaker speakers 5

Acoustic Models: GMM vs DNN GMM DNN Deep Neural Networks Gaussian Mixture Models GMM-HMMs have a long history: Big advances in speech recognition since 1980s have been used in over the past 3-5 years speech recognition DNNs show higher performance than Speaker adaptation is a well-studied GMMs field of research Neural networks are state-of-the-art of acoustic modelling Speaker adaptation is still a very challenging task 6

GMM adaptation Model based: Adapt the parameters of the acoustic models to better match the observed data • Maximum a posteriori ( MAP ) adaptation of GMM parameters MAP In MAP adaptation each Gaussian is updated individually: • Maximum likelihood linear regression ( MLLR ) of Gaussian parameters In MLLR adaptation all Gaussians of the same regression class share the same transform: Feature space: Transform features • Feature space maximum likelihood linear regression ( fMLLR ) 7

DNN adaptation: Related work DNN adaptation Model- Multi-task Adaptation Linear Regularization Auxiliary space learning based on transformation techniques features adaptation (MTL) GMM 10 Xue et al, 2014 8 Swietojanski et al, 2014 13 Liu et al, 2014 1 Gemello et al, 2006 LIN 1 , 5 Liao, 2013 L2-prior 5 , Speaker LHUC 8 fMLLR 2 , fDLR 2 , 2 Seid et al, 2011 codes 10 , KL-divergence 6 , TVWR 13 , 12 Price et al, 2014 LHN 1 , 6 Yu et al, 2013 (fMAP) linear GMM- 3 Li et al, 2010 LON 3 , i-vectors 11 regression 9 Conservative derived oDLR 4 , 4 Yao et al, 2012 Training 7 , … 9 Huang et al, 2014 features 14 11 Senior et al, 2014 fMLLR 2 , … 14 Tomashenko & Kkokhlov, 2014 7 Albesano, Gemello et al, 2006 8

Combining GMM and DNN in speech recognition Tandem features 17 17 Hermansky et al, 2000 Bottleneck features 18 18 Grézl et al, 2007 GMM log-likelihoods as features for MLP 19 19 Pinto & Hermansky, 2008 Log-likelihoods combination ROVER * , lattice-based combination, CNC ** , … * ROVER – Recognizer Output Voting Error Reduction ** CNC – Confusion Network Combination 9

Proposed approach: Motivation • It has been shown that speaker adaptation is more effective for GMM acoustic models than for DNN acoustic models . • Many adaptation algorithms that work well for GMM systems cannot be easily applied to DNN s. • Neural networks and GMM s may be complementary and benefit from their combination. • To take advantage of existing adaptation methods developed for GMM s and apply them to DNN s. 11

Proposed approach: GMM-derived features for DNN GMM GM DNN GMM-derived (GMMD) features Extract features using GMM models and feed these GMM-derived features to DNN . Train DNN model on GMM-derived features. Using GMM adaptation algorithms adapt GMM-derived features. 12

Bottleneck-based GMM-derived features for DNNs the log-likelihood estimated using the GMM speaker independent For a given acoustic BN-feature vector 𝑷 𝒖 a new GMM-derived feature vector 𝒈 𝒖 is obtained by calculating likelihoods across all the states of the auxiliary adapted GMM on the given vector. 13

System Fusion Feature level: fusion for training and decoding stages Input features 1 Feature Output Result DNN Decoder concatenation posteriors Input features 2 15

System Fusion Posterior combination Output Input DNN 1 posteriors 1 features 1 Posterior Decoder Result combination Input Output DNN 2 features 2 posteriors 2 16

System Fusion Lattice combination Output Input Lattices 1 Decoder DNN 1 Confusion posteriors 1 features 1 Result Network Input Output Combination Lattices 2 DNN 2 Decoder features 2 posteriors 2 17

Experiments: Data TED-LIUM corpus: * 1495 TED talks, 207 hours: 141 hours of male, 66 hours of female speech data, 1242 speakers, 16kHz Duration, Number of Mean duration per Data set hours Speakers speaker, minutes Training 172 1029 10 Development 3.5 14 15 Test 1 3.5 14 15 Test 2 4.9 14 21 LM: ** 150K word vocabulary and publicly available trigram LM * A. Rousseau, P. Deleglise, and Y. Esteve , “Enhancing the TED - LIUM corpus with selected data for language modeling and more TED talks“ 2014 ** cantab-TEDLIUMpruned.lm31 19

Experiments: Baseline systems We follow Kaldi TED-LIUM recipe for training baselines models: Speaker-adaptive training with fMLLR Train DNN Model #2 RBM, CE, sMBR Speaker-independent model Train DNN Model #1 20

Experiments: Training models with GMMD features 2 types of integration of GMMD features into the baseline recipe: 1. Adapted features AF 1 (with monophone auxiliary GMM) Train DNN Models #3, #4 2. Adapted features AF 2 (with triphone auxiliary GMM) Train DNN Model #5 21

Results: Adaptation performance for DNNs WER, % τ # Adaptation Features Dev Test 1 Test 2 baseline 1 No BN 12.14 10.77 13.75 2 fMLLR BN 10.64 9.52 12.78 3 MAP AF 1 2 10.27 9.59 12.94 GMMD 4 MAP AF 1 + align. # 2 5 10.26 9.40 12.52 5 MAP+fMLLR AF 2 + align. # 2 5 10.42 9.74 13.29 better than speaker-adapted baseline τ parameter in MAP adaptation 22

Results: Adaptation and Fusion α is a weight of the baseline model in the fusion WER, % * α # Adaptation Features baseline WER in #1 was Dev Test 1 Test 2 calculated from lattices, in other 13.75 * 12.14 * 10.77 * 1 No BN lines – from consensus 2 fMLLR BN 10.57 9.46 12.67 hypothesis GMMD ↓ 4 MAP AF 1 + align. #2 10.23 9.31 10.46 Relative WER 5 MAP+fMLLR AF 2 + align. #2 10.37 9.69 13.23 reduction in ↓ 6.2 ↓ 4.3 ↓ 5.0 comparison 6 Posterior fusion: # 2 + # 4 0.45 9.91 9.06 12.04 with adapted fusion ↓ 6.2 ↓ 3.8 12.23 ↓ 3.5 baseline # 2 7 Posterior fusion: # 2 + # 5 0.55 9.91 9.10 ↓ 4.8 ↓ 4.0 ↓ 4.4 8 Lattice fusion: # 2 + # 4 0.44 10.06 9.09 12.12 ↓ 5.3 ↓ 3.1 ↓ 3.3 Best 9 Lattice fusion: # 2 + # 5 0.50 10.01 9.17 12.25 improvement • Two types of fusion: posterior level and lattice level provide additional comparable improvement, • In most cases posterior level fusion provides slightly better results than the lattice level fusion. 23

A New Perspective on October 11-12 Combining GMM and DNN - PowerPoint PPT Presentation

SLSP-2016 A New Perspective on October 11-12 Combining GMM and DNN Frameworks for Speaker Adaptation Natalia Tomashenko 1,2,3 natalia.tomashenko@univ-lemans.fr Statistical Language and Speech Processing Yuri Khokhlov 3 khokhlov@speechpro.com

New Defence Perspective New Defence Perspective New Defence Perspective New Defence Perspective

Perspective LanguaL Structured Vocabulary: USDA Perspective Joanne Holden Perspective: Earth

A legal perspective A legal perspective A legal perspective A legal perspective I. Engineers

TITANIUM EYEWEAR DESIGNED IN ICELAND, MADE IN ITALY AGNAR NEW NEW NEW ALBA NEW NEW NEW

Introducing the new Predator 68 New Predator 68 New Predator 68 New Predator 68 New Predator 68

Change from a Practical Perspective Change from a Practical Perspective Change from a Practical

CBMS: CBMS: The Philippine Perspective Perspective Perspective HON. SECRETARY DOMINGO F.

Supplier s Perspective on s Perspective on Supplier CMMI- -ACQ ACQ CMMI Reflections

Technical Debt from the Technical Debt from the Stakeholder Perspective Stakeholder Perspective

The First Perspective Image Perspective in Art Giotto, 1300 Campin, 1430 (before perspective)

Lecture outline COMP30019 Graphics and Interaction Introduction to perspective geometry

Unit8: Perspective Mike Chantler, 3/10/2008 Unit contents Perspective Motion in three

A New Business Perspective A New Business Perspective Turning Green Manufacturing Into

Mike New man Mike New man Mike New man Mike New man Mike New man Mike New man Mike New man

Perspective Meeting Citrix Vendredi 24 juin 2011 Royal Monceau - Paris 1 Citrix Perspective

The Sociological Perspective 1 Sociological Perspective.notebook August 28, 2014 What is

A posteriori analysis of discontinuous Galerkin schemes for systems of hyperbolic conservation

Shannons Theory Debdeep Mukhopadhyay IIT Kharagpur Objectives Understand the definition

Maximum A Posteriori (MAP) Estimation Pieter Abbeel UC Berkeley EECS Overview X 0 X t-1 X t n

Trademark and Unfair Competition Law Slides 26: Rights of Publicity LAWS 7341-001 Prof.

Stellar Content via Maximum a posteriori Ocvirk et al. (2006a) Ocvirk et al. (2006b) WHAT DOES

A posteriori soundness for nondeterministic abstract interpretations Matthew Might (University

Termination of retirement funds a legal minefield Deirdre Phillips INTRODUCTION S7A

Greenwood Genetic Center Founded in 1974 by the SC Department of Disabilities & Special Needs

Sambuz

Useful Links

Newsletter

Mail Us

A New Perspective on October 11-12 Combining GMM and DNN - PowerPoint PPT Presentation

SLSP-2016 A New Perspective on October 11-12 Combining GMM and DNN Frameworks for Speaker Adaptation Natalia Tomashenko 1,2,3 natalia.tomashenko@univ-lemans.fr Statistical Language and Speech Processing Yuri Khokhlov 3 khokhlov@speechpro.com

New Defence Perspective New Defence Perspective New Defence Perspective New Defence Perspective

Perspective LanguaL Structured Vocabulary: USDA Perspective Joanne Holden Perspective: Earth

A legal perspective A legal perspective A legal perspective A legal perspective I. Engineers

TITANIUM EYEWEAR DESIGNED IN ICELAND, MADE IN ITALY AGNAR NEW NEW NEW ALBA NEW NEW NEW

Introducing the new Predator 68 New Predator 68 New Predator 68 New Predator 68 New Predator 68

Change from a Practical Perspective Change from a Practical Perspective Change from a Practical

CBMS: CBMS: The Philippine Perspective Perspective Perspective HON. SECRETARY DOMINGO F.

Supplier s Perspective on s Perspective on Supplier CMMI- -ACQ ACQ CMMI Reflections

Technical Debt from the Technical Debt from the Stakeholder Perspective Stakeholder Perspective

The First Perspective Image Perspective in Art Giotto, 1300 Campin, 1430 (before perspective)

Lecture outline COMP30019 Graphics and Interaction Introduction to perspective geometry

Unit8: Perspective Mike Chantler, 3/10/2008 Unit contents Perspective Motion in three

A New Business Perspective A New Business Perspective Turning Green Manufacturing Into

Mike New man Mike New man Mike New man Mike New man Mike New man Mike New man Mike New man

Perspective Meeting Citrix Vendredi 24 juin 2011 Royal Monceau - Paris 1 Citrix Perspective

The Sociological Perspective 1 Sociological Perspective.notebook August 28, 2014 What is

A posteriori analysis of discontinuous Galerkin schemes for systems of hyperbolic conservation

Shannons Theory Debdeep Mukhopadhyay IIT Kharagpur Objectives Understand the definition

Maximum A Posteriori (MAP) Estimation Pieter Abbeel UC Berkeley EECS Overview X 0 X t-1 X t n

Trademark and Unfair Competition Law Slides 26: Rights of Publicity LAWS 7341-001 Prof.

Stellar Content via Maximum a posteriori Ocvirk et al. (2006a) Ocvirk et al. (2006b) WHAT DOES

A posteriori soundness for nondeterministic abstract interpretations Matthew Might (University

Termination of retirement funds a legal minefield Deirdre Phillips INTRODUCTION S7A

Greenwood Genetic Center Founded in 1974 by the SC Department of Disabilities &amp; Special Needs

Sambuz

Useful Links

Newsletter

Mail Us

Greenwood Genetic Center Founded in 1974 by the SC Department of Disabilities & Special Needs