tandem modeling investigations dan ellis international
play

Tandem modeling investigations Dan Ellis International Computer - PowerPoint PPT Presentation

Tandem modeling investigations Dan Ellis International Computer Science Institute, Berkeley CA <dpwe@icsi.berkeley.edu> Outline 1 What makes Tandem successful? 2 Can we make Tandem better? 3 Does Tandem work with LVCSR tricks?


  1. Tandem modeling investigations Dan Ellis International Computer Science Institute, Berkeley CA <dpwe@icsi.berkeley.edu> Outline 1 What makes Tandem successful? 2 Can we make Tandem better? 3 Does Tandem work with LVCSR tricks? Tandem investigations - Dan Ellis 2001-01-25 - 1

  2. What makes Tandem work? 1 (with Manuel Reyes) Combo over msg: +20% plp Neural net classifier C 0 h# pcl bcl C 1 tcl dcl C 2 C k t n t n+w Gauss mix HTK PCA models decoder Pre-nonlinearity over orthog'n + posteriors: +12% Input sound s ah t msg Neural net Words Combo-into-HTK over KLT over classifier Combo-into-noway: direct: h# C 0 pcl bcl C 1 tcl +15% dcl +8% C 2 C k t n Combo over plp: t n+w +20% Combo over mfcc: NN over HTK: Tandem over HTK: Tandem over hybrid: +25% +15% +35% +25% Tandem combo over HTK mfcc baseline: +53% • Model diversity? - try a phone-based GMM model - try training the NN model to HTK state labels • Discriminative network training? - (try posteriors from GMM & Bayes) Tandem investigations - Dan Ellis 2001-01-25 - 2

  3. Phone vs. word models Neural net Gauss mix KLT HTK plp classifier models orthog'n decoder h# C 0 pcl bcl C 1 tcl dcl Input C 2 Words s ah t C k sound t n t n+w Trained on Trained to phoneme targets subword states • Try a phone-based HTK model (instead of whole-word models) • Try training NN model to subword-state labels - 181 net outputs; reduce to 40 in KLT • Results (Aurora2k, HTK-baseline WER ratio): System test A: matched test B: var noise test C: var chan Tandem PLP baseline 63.5% 70l.3% 59.5% Phone-based HTK sys 63.6% 72.5% 61.5% Subword-based NN sys 63.1% 62.8% 55.1% • Diversity doesn’t help - subword units may be good for NN Tandem investigations - Dan Ellis 2001-01-25 - 3

  4. Enhancements to Tandem-Aurora 2 • More tandem-feature-domain processing: Neural net Gauss mix KLT classifier models norm / orthog'n norm / deltas? deltas? h# C 0 pcl bcl C 1 tcl dcl C 2 C k t n t n+w • Results (HTK baseline WER ratio): System test A: matched test B: var noise test C: var chan PLP: Tandem baseline 63.5% 70l.3% 59.5% PLP: norm - KLT 72.6% 71.2% 63.6% PLP: KLT - norm 57.8% 58.8% 51.3% PLP: KLT - delta 59.0% 60.2% 52.9% PLP: KLT - delta - norm 58.1% 59.9% 48.9% PLP: delta - KLT - norm 54.7% 53.6% 46.9% - delta-KLT-norm: 80% Tdm baseline WER Tandem investigations - Dan Ellis 2001-01-25 - 4

  5. Best effort Tandem system • Deltas & norms help PLP: try on combo (PLP+MSG) system : System test A: matched test B: var noise test C: var chan PLP+MSG: baseline 51.1% 52.0% 45.6% PLP+MSG: dlt-KLT-nrm 50.9% 50.5% 43.6% PLP+MSG: KLT-nrm 48.3% 49.5% 39.4% - deltas hurt for MSG: features too sluggish? • Deltas help clean, norms help noisy: 7 70 baseline K-D K-N 6 60 5 50 WER / % WER / % 40 4 3 30 2 20 10 1 0 0 -5 0 5 10 15 20 clean SNR / dB Tandem investigations - Dan Ellis 2001-01-25 - 5

  6. Tandem for LVCSR: the SPINE task 3 (with Rita Singh/CMU & Sunil Sivadas/OGI) • Noisy spontaneous speech, ~5000 word vocab • Recognition: PLP feature Neural net Tandem SPHINX calculation classifier 1 feature recognizer calculation C 0 h# pcl bcl C 1 tcl dcl C 2 C k t n t n+w PCA GMM HMM classifier decoder decorrelation + Pre-nonlinearity Input outputs Subword Words sound s ah t likelihoods MSG feature Neural net calculation classifier 2 MLLR adaptation h# C 0 pcl bcl C 1 tcl dcl C 2 C k t n t n+w - same tandem features - NN training from Broadcast News boot + iterate - GMM-HMM has context-dependence, MLLR Tandem investigations - Dan Ellis 2001-01-25 - 6

  7. SPINE-Tandem results • Evaluation WER results: Features (dimensions) CI system CD system CD + MLLR MFCC + d + dd (39) 69.5% 35.1% 33.5% Tandem features (56) 47.6% 35.7% 32.8% - much better for CI systems - differences evaporate with CD, MLLR • Not quite fair: - CD senones optimized for MFCC - worth 2-3% absolute? • Not unexpected: - NN confounds CD variants - Tandem ‘space’ very nonlinear - bad for MLLR • Any hope? - more training data / train CD classes / ... Tandem investigations - Dan Ellis 2001-01-25 - 7

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend