Tandem modeling investigations Dan Ellis International Computer - - PowerPoint PPT Presentation

tandem modeling investigations dan ellis international
SMART_READER_LITE
LIVE PREVIEW

Tandem modeling investigations Dan Ellis International Computer - - PowerPoint PPT Presentation

Tandem modeling investigations Dan Ellis International Computer Science Institute, Berkeley CA <dpwe@icsi.berkeley.edu> Outline 1 What makes Tandem successful? 2 Can we make Tandem better? 3 Does Tandem work with LVCSR tricks?


slide-1
SLIDE 1

Tandem investigations - Dan Ellis 2001-01-25 - 1

Tandem modeling investigations

Dan Ellis International Computer Science Institute, Berkeley CA <dpwe@icsi.berkeley.edu>

Outline What makes Tandem successful? Can we make Tandem better? Does Tandem work with LVCSR tricks? 1 2 3

slide-2
SLIDE 2

Tandem investigations - Dan Ellis 2001-01-25 - 2

What makes Tandem work?

(with Manuel Reyes)

  • Model diversity?
  • try a phone-based GMM model
  • try training the NN model to HTK state labels
  • Discriminative network training?
  • (try posteriors from GMM & Bayes)

1

plp

Input sound

Neural net classifier

C0 C1 C2 Ck tn tn+w h# pcl bcl tcl dcl

PCA

  • rthog'n

msg Neural net classifier

C0 C1 C2 Ck tn tn+w h# pcl bcl tcl dcl

Gauss mix models HTK decoder

Words

s ah t

+

Tandem combo over HTK mfcc baseline: +53%

Combo-into-HTK over Combo-into-noway: +15% Combo over msg: +20% NN over HTK: +15% Combo over mfcc: +25% Tandem over hybrid: +25% Tandem over HTK: +35% Combo over plp: +20% KLT over direct: +8% Pre-nonlinearity over posteriors: +12%

slide-3
SLIDE 3

Tandem investigations - Dan Ellis 2001-01-25 - 3

Phone vs. word models

  • Try a phone-based HTK model

(instead of whole-word models)

  • Try training NN model to subword-state labels
  • 181 net outputs; reduce to 40 in KLT
  • Results (Aurora2k, HTK-baseline WER ratio):
  • Diversity doesn’t help
  • subword units may be good for NN

System test A: matched test B: var noise test C: var chan Tandem PLP baseline 63.5% 70l.3% 59.5% Phone-based HTK sys 63.6% 72.5% 61.5% Subword-based NN sys 63.1% 62.8% 55.1%

plp

Input sound

Neural net classifier

C0 C1 C2 Ck tn tn+w h# pcl bcl tcl dcl

KLT

  • rthog'n

Gauss mix models HTK decoder

Words

s ah t

Trained on phoneme targets Trained to subword states

slide-4
SLIDE 4

Tandem investigations - Dan Ellis 2001-01-25 - 4

Enhancements to Tandem-Aurora

  • More tandem-feature-domain processing:
  • Results (HTK baseline WER ratio):
  • delta-KLT-norm: 80% Tdm baseline WER

System test A: matched test B: var noise test C: var chan PLP: Tandem baseline 63.5% 70l.3% 59.5% PLP: norm - KLT 72.6% 71.2% 63.6% PLP: KLT - norm 57.8% 58.8% 51.3% PLP: KLT - delta 59.0% 60.2% 52.9% PLP: KLT - delta - norm 58.1% 59.9% 48.9% PLP: delta - KLT - norm 54.7% 53.6% 46.9%

2

Gauss mix models Neural net classifier

C0 C1 C2 Ck tn tn+w h# pcl bcl tcl dcl

norm / deltas? norm / deltas? KLT

  • rthog'n
slide-5
SLIDE 5

Tandem investigations - Dan Ellis 2001-01-25 - 5

Best effort Tandem system

  • Deltas & norms help PLP:

try on combo (PLP+MSG) system :

  • deltas

hurt for MSG: features too sluggish?

  • Deltas help clean, norms help noisy:

System test A: matched test B: var noise test C: var chan PLP+MSG: baseline 51.1% 52.0% 45.6% PLP+MSG: dlt-KLT-nrm 50.9% 50.5% 43.6% PLP+MSG: KLT-nrm 48.3% 49.5% 39.4%

  • 5

5 10 15 20 clean 1 2 3 4 5 6 7 10 20 30 40 50 60 70 baseline K-D K-N

WER / % WER / %

SNR / dB

slide-6
SLIDE 6

Tandem investigations - Dan Ellis 2001-01-25 - 6

Tandem for LVCSR: the SPINE task

(with Rita Singh/CMU & Sunil Sivadas/OGI)

  • Noisy spontaneous speech, ~5000 word vocab
  • Recognition:
  • same tandem features
  • NN training from Broadcast News boot + iterate
  • GMM-HMM has context-dependence, MLLR

3

Input sound

Neural net classifier 1

Pre-nonlinearity

  • utputs
C0 C1 C2 Ck tn tn+w h# pcl bcl tcl dcl

PCA decorrelation Neural net classifier 2

C0 C1 C2 Ck tn tn+w h# pcl bcl tcl dcl

Subword likelihoods MLLR adaptation Words

+

Tandem feature calculation SPHINX recognizer

PLP feature calculation MSG feature calculation GMM classifier HMM decoder

s ah t

slide-7
SLIDE 7

Tandem investigations - Dan Ellis 2001-01-25 - 7

SPINE-Tandem results

  • Evaluation WER results:
  • much better for CI systems
  • differences evaporate with CD, MLLR
  • Not quite fair:
  • CD senones optimized for MFCC
  • worth 2-3% absolute?
  • Not unexpected:
  • NN confounds CD variants
  • Tandem ‘space’ very nonlinear - bad for MLLR
  • Any hope?
  • more training data / train CD classes / ...

Features (dimensions) CI system CD system CD + MLLR MFCC + d + dd (39) 69.5% 35.1% 33.5% Tandem features (56) 47.6% 35.7% 32.8%