CU-HTK April 2002 Switchboard System Phil Woodland, Gunnar Evermann, - PowerPoint PPT Presentation

CU-HTK April 2002 Switchboard System Phil Woodland, Gunnar Evermann, Mark Gales, Thomas Hain, Andrew Liu, Gareth Moore, Dan Povey & Lan Wang May 7th 2002 Cambridge University Engineering Department Rich Transcription Workshop 2002

Woodland, Evermann, Gales, Hain, Liu, Moore, Povey & Wang: CU-HTK April 2002 Switchboard system Overview • Review of CU-HTK 2001 system • Minimum Phone Error (MPE) training • HLDA • Speaker Adaptive Training • Single Pronunciation dictionaries • 2002 system & results • Fast contrast systems • Conclusions Cambridge University Rich Transcription Workshop 2002 1 Engineering Department

Woodland, Evermann, Gales, Hain, Liu, Moore, Povey & Wang: CU-HTK April 2002 Switchboard system Review of CU-HTK 2001 System: Basic Features • Front-end – Reduced bandwidth 125–3800 Hz – 12 MF-PLP cepstral parameters + C0 and 1st/2nd derivatives – Side-based cepstral mean and variance normalisation – Vocal tract length normalisation in training and test • Decision tree state clustered, context dependent triphone & quinphone models: MMIE and MLE versions • Generate lattices with MLLR-adapted models • Rescore using iterative lattice MLLR + Full-Variance transform adaptation • Posterior probability decoding via confusion networks • System combination Cambridge University Rich Transcription Workshop 2002 2 Engineering Department

✂✄ �✁ ☎✆ ✝✞ Woodland, Evermann, Gales, Hain, Liu, Moore, Povey & Wang: CU-HTK April 2002 Switchboard system 2001 System Structure GI, MMIE GD, MLE, ST P1 GI, MLE triphones, 27k, tgint98 4−gram Lattices Resegmentation Gender detection LATMLLR MLLR FV 1 trans. 2−4 trans. Triphones PPROB P4b P4a VTLN,CMN, CVN CN P2 LATMLLR MLLR Lattice 2−4 trans. 1 trans. GI, MMIE triphones, 54k, fgint00 CN Quinphones P5a P5b 1−best P3 MLLR, 1 speech transform CNC GI, MMIE triphones, 54k, fgintcat00 4−gram Lattices Final result cu−htk1 Cambridge University Rich Transcription Workshop 2002 3 Engineering Department

Woodland, Evermann, Gales, Hain, Liu, Moore, Povey & Wang: CU-HTK April 2002 Switchboard system Acoustic Training/Test Data h5train00 248 hours Switchboard (Swbd1), 17 hours CallHome English (CHE) h5train00sub 60 hours Swbd1, 8 hours CHE h5train02 h5train00 + LDC cell1 corpus (without dev01/eval01 sides) extra 17 hours of data Development test sets dev01 40 sides Swbd2 (eval98), 40 sides Swbd1 (eval00), 38 sides Swbd2 cellular (dev01-cell) dev01sub half of the dev01 selected to give similar WER to full set eval98 40 sides Swbd2 (eval98-swbd2), 40 sides of CHE (eval98-che) Cambridge University Rich Transcription Workshop 2002 4 Engineering Department

Woodland, Evermann, Gales, Hain, Liu, Moore, Povey & Wang: CU-HTK April 2002 Switchboard system 2001 System Results on dev01 set Swbd1 Swbd2 Cellular Total P1 VTLN/gender det 31.7 46.9 48.1 42.1 P2 initial trans. 23.5 38.6 39.2 33.7 P3 lat gen 21.1 36.0 36.7 31.2 P4a MMIE tri 20.0 33.5 34.0 29.1 P4b MLE tri 21.3 35.0 35.4 30.5 P5a MMIE quin 19.8 33.2 33.4 28.7 P5b MLE quin 20.2 34.0 34.2 29.4 CNC P5a+P4a+P5b 18.3 31.9 32.1 27.3 %WER on dev01 for all stages of 2001 system • final confidence scores have NCE 0.254 Cambridge University Rich Transcription Workshop 2002 5 Engineering Department

Woodland, Evermann, Gales, Hain, Liu, Moore, Povey & Wang: CU-HTK April 2002 Switchboard system Minimum Phone Error & Other Discriminative Criteria • MMIE maximises the posterior probability of the correct sentence Problem: sensitive to outliers • MCE maximises a smoothed approximation to the sentence accuracy Problem: cannot easily be implemented with lattices; scales poorly to long sentences • Criterion we evaluate in testing is word error rate: makes sense to maximise something similar to it • MPE uses smoothed approximation to phone error but can use lattice-based implementation developed for MMIE • Note that MPE is an approximation to phone error in a word recognition context i.e. uses word-level recognition, but scoring is on a phone error basis. • Can directly maximise a smoothed word error rate → Minimum Word Error (MWE). Performance for MWE slightly worse than MPE, so main focus here on MPE Cambridge University Rich Transcription Workshop 2002 6 Engineering Department

Woodland, Evermann, Gales, Hain, Liu, Moore, Povey & Wang: CU-HTK April 2002 Switchboard system MPE Objective Function • Maximise the following function: R s p λ ( O r | s ) κ P ( s )RawAccuracy( s ) � � F MPE ( λ ) = � s p λ ( O r | s ) κ P ( s ) r where λ are the HMM parameters, O r the speech data for file r , κ a probability scale and P ( s ) the LM probability of s • RawAccuracy( s ) measures the number of phones correctly transcribed in sentence s (derived from word recognition). i.e. # correct phones in s − # inserted phones in s • F MPE ( λ ) is weighted average of RawAccuracy( s ) over all s • Scale acoustic log-likelihoods by scale κ . • Criterion is to be maximised, not minimised (for compatibility with MMIE) Cambridge University Rich Transcription Workshop 2002 7 Engineering Department

Woodland, Evermann, Gales, Hain, Liu, Moore, Povey & Wang: CU-HTK April 2002 Switchboard system Lattice Implementation of MMIE: Review • Generate lattices marked with time information at HMM level – Numerator ( num ) from correct transcription – Denominator ( den ) for confusable hypotheses from recognition • Use Extended Baum-Welch (Gopalakrishnan et al, Normandin) updates e.g. for means θ num jm ( O ) − θ den � � jm ( O ) + Dµ jm µ jm = ˆ � γ num jm − γ den � + D jm – Gaussian occupancies (summed over time) are γ jm from forward-backward – θ jm ( O ) is sum of data, weighted by occupancy. • For rapid convergence use Gaussian-specific D-constant • For better generalisation broaden posterior probability distribution – Acoustic scaling – Weakened language model (unigram) Cambridge University Rich Transcription Workshop 2002 8 Engineering Department

Woodland, Evermann, Gales, Hain, Liu, Moore, Povey & Wang: CU-HTK April 2002 Switchboard system Lattice Implementation of MPE • Problem: RawAccuracy( s ) , defined on sentence level as (#correct - #inserted) requires alignment with correct transcription • Express RawAccuracy( s ) as a sum of PhoneAcc( q ) for all phones q in the sentence hypothesis s :   1 if correct phone   PhoneAcc( q ) = 0 if substitution − 1 if insertion   • Calculating PhoneAcc( q ) still requires alignment to reference transcription • Use an approximation to PhoneAcc( q ) based on time-alignment information – compute the proportion e that each hypothesis phone overlaps the reference – gives a lower-bound on true value of RawAccuracy( s ) Cambridge University Rich Transcription Workshop 2002 9 Engineering Department

Woodland, Evermann, Gales, Hain, Liu, Moore, Povey & Wang: CU-HTK April 2002 Switchboard system Approximating PhoneAcc using Time Information � � − 1 + 2 e if same phone PhoneAcc( q ) = − 1 + e if different phone a b c Reference a b b d Hypothesis 1.0 0.8 0.2 0.15 0.85 Proportion e 1.0 0.6 −0.6 −0.85 −0.15 −1 + (correct:2*e, incorrect:e) 1.0 0.6 −0.6 −0.15 Max of above Approximated sentence raw accuracy from above = 0.85 Exact value of raw accuracy: 2 corr − 1 ins = 1 Cambridge University Rich Transcription Workshop 2002 10 Engineering Department

Woodland, Evermann, Gales, Hain, Liu, Moore, Povey & Wang: CU-HTK April 2002 Switchboard system PhoneAcc Approximation For Lattices Calc PhoneAcc( q ) for each phone q , then find ∂ F MPE ( λ ) ∂ log p ( q ) (forward-backward) b f a c Correct b b 0.6 b 0.6 d −0.15 a 1.0 Hypothesis lattice c −0.2 (PhoneAcc) a 1.0 d −0.15 b 1.0 b −0.15 b −0.177 d −0.177 a −0.15 c −0.022 dF / d(phone lgprob) a 0.15 d 0.177 b 0.177 Better than average path Worse than average path Cambridge University Rich Transcription Workshop 2002 11 Engineering Department

Woodland, Evermann, Gales, Hain, Liu, Moore, Povey & Wang: CU-HTK April 2002 Switchboard system Applying Extended Baum-Welch to MPE • Use EBW update formulae as for MMIE but with modified MPE statistics ∂ F MMIE ( λ ) 1 • For MMIE, the occupation probability for an arc q equals for ∂ log p ( q ) κ numerator ( ×− 1 for the denominator). The denominator occupancy-weighted statistics are subtracted from the numerator in the update formulae ∂ F MPE ( λ ) • Statistics for MPE update use 1 ∂ log p ( q ) of the criterion w.r.t. the phone arc κ log likelihood which can be calculated efficiently • Either MPE numerator or denominator statistics are updated depending on the sign of ∂ F MPE ( λ ) ∂ log p ( q ) , which is the “MPE arc occupancy” • After accumulating statistics, apply EBW equations • EBW is viewed as a gradient descent technique and can be shown to be a valid update for MPE. Cambridge University Rich Transcription Workshop 2002 12 Engineering Department

CU-HTK April 2002 Switchboard System Phil Woodland, Gunnar Evermann, - PowerPoint PPT Presentation

CU-HTK April 2002 Switchboard System Phil Woodland, Gunnar Evermann, Mark Gales, Thomas Hain, Andrew Liu, Gareth Moore, Dan Povey & Lan Wang May 7th 2002 Cambridge University Engineering Department Rich Transcription Workshop 2002

Introduction to HTK Toolkit Berlin Chen 2004 Reference: - Steve Young et al. The HTK Book .

Introduction to The HTK Toolkit Hsin-min Wang Reference: - The HTK Book Outline An Overview

HTK Version 3.4 Features (cont) Mark Gales, Andrew Liu & Phil Woodland 19th April 2007 HTK3

FYE 03/2002 2Q Financial Results FYE 03/2002 FYE 03/2002 FYE 03/2002 2Q Financial Results 2Q

FYE 03/2002 3Q Financial Results FYE 03/2002 FYE 03/2002 FYE 03/2002 3Q Financial Results 3Q

TABULA The Absolute Switchboard System for Maximum Safety and Reliability TABULA - A Qualified

Tucson Fire Department 2002 Awards Presentation Included in this PDF is information of the

Aviva plc plc Aviva 2002 Interim Results 2002 Interim Results 1 August 2002 1 August 2002

A General Artificial Neural Network Extension for HTK Chao Zhang & Phil Woodland University

Acoustic Modeling for Speech Recognition Berlin Chen 2004 References: 1. X. Huang et. al. Spoken

Acoustic Modeling for Speech Recognition Berlin Chen 2003 References: 1. X. Huang et. al.,

Switchboard: A Matchmaking System for Multiplayer Mobile Games Justin Manweiler , Sharad Agarwal,

HRA HRA -Health Reimbursement Health Reimbursement - Arrangement- - Arrangement June 2002

C O M P A N Y P R O F I L E The main product ...are switchboard boxes for : 1) railway vehicles

Overview of UB Archives and Civic Switchboard project Fatemeh Rezaei Special Collections &

Switchboard Testing as per IEC 61439 Part 1 & 2 LV Assemblies Standard IEC 61439 Series

How did Linux become a mainstream embedded operating system? Chris Simmonds 2net Limited

The network-untangling problem: From interactions to activity timelines Polina Rozenshtein

Qt and Tizen together can do more Tomasz Olszak Qt, Tizen and Open Source enthusiast Why Qt and

Source-to-Source Compilation in Racket You Want it in Which Language? University of Bergen

Distance Learning for All Secondary Guidance Compared Adapted from Salem-Keizer secondary

CRIMSON CIRCLE N E T W O R K WELCOME SHAUMBRA S H O U D 5 J A N U A R Y 2 0 2 0

Relax Into Enlightenment W E L C O M E S H A U M B R A W O R L D W I D E Wings through the

A stochastic multi-item lot-sizing problem with bounded number of setups Sminaire des

Sambuz

Useful Links

Newsletter

Mail Us

CU-HTK April 2002 Switchboard System Phil Woodland, Gunnar Evermann, - PowerPoint PPT Presentation

CU-HTK April 2002 Switchboard System Phil Woodland, Gunnar Evermann, Mark Gales, Thomas Hain, Andrew Liu, Gareth Moore, Dan Povey & Lan Wang May 7th 2002 Cambridge University Engineering Department Rich Transcription Workshop 2002

Introduction to HTK Toolkit Berlin Chen 2004 Reference: - Steve Young et al. The HTK Book .

Introduction to The HTK Toolkit Hsin-min Wang Reference: - The HTK Book Outline An Overview

HTK Version 3.4 Features (cont) Mark Gales, Andrew Liu &amp; Phil Woodland 19th April 2007 HTK3

FYE 03/2002 2Q Financial Results FYE 03/2002 FYE 03/2002 FYE 03/2002 2Q Financial Results 2Q

FYE 03/2002 3Q Financial Results FYE 03/2002 FYE 03/2002 FYE 03/2002 3Q Financial Results 3Q

TABULA The Absolute Switchboard System for Maximum Safety and Reliability TABULA - A Qualified

Tucson Fire Department 2002 Awards Presentation Included in this PDF is information of the

Aviva plc plc Aviva 2002 Interim Results 2002 Interim Results 1 August 2002 1 August 2002

A General Artificial Neural Network Extension for HTK Chao Zhang &amp; Phil Woodland University

Acoustic Modeling for Speech Recognition Berlin Chen 2004 References: 1. X. Huang et. al. Spoken

Acoustic Modeling for Speech Recognition Berlin Chen 2003 References: 1. X. Huang et. al.,

Switchboard: A Matchmaking System for Multiplayer Mobile Games Justin Manweiler , Sharad Agarwal,

HRA HRA -Health Reimbursement Health Reimbursement - Arrangement- - Arrangement June 2002

C O M P A N Y P R O F I L E The main product ...are switchboard boxes for : 1) railway vehicles

Overview of UB Archives and Civic Switchboard project Fatemeh Rezaei Special Collections &amp;

Switchboard Testing as per IEC 61439 Part 1 &amp; 2 LV Assemblies Standard IEC 61439 Series

How did Linux become a mainstream embedded operating system? Chris Simmonds 2net Limited

The network-untangling problem: From interactions to activity timelines Polina Rozenshtein

Qt and Tizen together can do more Tomasz Olszak Qt, Tizen and Open Source enthusiast Why Qt and

Source-to-Source Compilation in Racket You Want it in Which Language? University of Bergen

Distance Learning for All Secondary Guidance Compared Adapted from Salem-Keizer secondary

CRIMSON CIRCLE N E T W O R K WELCOME SHAUMBRA S H O U D 5 J A N U A R Y 2 0 2 0

Relax Into Enlightenment W E L C O M E S H A U M B R A W O R L D W I D E Wings through the

A stochastic multi-item lot-sizing problem with bounded number of setups Sminaire des

Sambuz

Useful Links

Newsletter

Mail Us

HTK Version 3.4 Features (cont) Mark Gales, Andrew Liu & Phil Woodland 19th April 2007 HTK3

A General Artificial Neural Network Extension for HTK Chao Zhang & Phil Woodland University

Overview of UB Archives and Civic Switchboard project Fatemeh Rezaei Special Collections &

Switchboard Testing as per IEC 61439 Part 1 & 2 LV Assemblies Standard IEC 61439 Series