Robustness is dead! Long live robustness!
Michael L. Seltzer
Microsoft Research REVERB 2014 | May 10, 2014 Collaborators: Dong Yu, Yan Huang, Frank Seide, Jinyu Li, Jui-Ting Huang
Long live robustness! Michael L. Seltzer Microsoft Research REVERB - - PowerPoint PPT Presentation
Robustness is dead! Long live robustness! Michael L. Seltzer Microsoft Research REVERB 2014 | May 10, 2014 Collaborators: Dong Yu, Yan Huang, Frank Seide, Jinyu Li, Jui-Ting Huang Golden age of speech recognition More investment, more
Microsoft Research REVERB 2014 | May 10, 2014 Collaborators: Dong Yu, Yan Huang, Frank Seide, Jinyu Li, Jui-Ting Huang
REVERB 2014 2
REVERB 2014 3
REVERB 2014 4
REVERB 2014 5
REVERB 2014 7
hidden layers
maximize the conditional likelihood at the frame or sequence level
pre-training helps
scaled likelihoods and decode as usual
𝒘 𝒊1 𝒊2 𝒊𝑂 𝑻
REVERB 2014 8
Amount of Data Vocabulary Size TIMIT WSJ Broadcast News Aurora 2 Meetings (ICSI, AMI) Switchboard Voice Search Aurora 4
REVERB 2014 9
[Huang 2014]
REVERB 2014 10
[Huang 2014]
REVERB 2014 11
# of Layers x # of Neurons SWBD WER (%) [300hrs] Aurora 4 WER(%) [10hrs] 1 x 2k 24.2
18.4 14.2 5 x 2k 17.2 13.8 7 x 2k 17.1 13.7 9 x 2k 17.0 13.9
REVERB 2014 12
# of Layers x # of Neurons SWBD WER (%) [300hrs] Aurora 4 WER(%) [10hrs] 1 x 2k 24.2
18.4 14.2 5 x 2k 17.2 13.8 7 x 2k 17.1 13.7 9 x 2k 17.0 13.9 1 x 16k 22.1
REVERB 2014 13
form arbitrarily complex nonlinearities
and internal representations
and discriminative than at lower layers
log-linear classifier nonlinear feature extraction
14
𝒘 𝒊1 𝒊2 𝒊𝑂 𝒕
REVERB 2014
𝒊𝑚 𝒊𝑚+1 +𝜀𝑚 +? ? ?
REVERB 2014 15
𝜖𝑔 𝜖ℎ ≈ 𝑔 ℎ + 𝜀 − 𝑔 ℎ 𝜀
ℎ𝑚+1 = 𝜏 𝑋
𝑚ℎ𝑚 = 𝑔(ℎ𝑚)
𝒊𝑚 𝒊𝑚+1 +𝜀𝑚 +𝜀𝑚+1 𝜀𝑚+1 ≈ 𝜏′ 𝑋
𝑚ℎ𝑚 𝑋 𝑚 𝑈𝜀𝑚
𝜀𝑚+1 = 𝜏 𝑋
𝑚 ℎ𝑚 + 𝜀𝑚
− 𝜏 𝑋
𝑚ℎ𝑚
𝜀𝑚+1 < 𝑒𝑗𝑏 ℎ𝑚+1 ∘ 1 − ℎ𝑚+1 𝑋
𝑚 𝑈
𝜀𝑚
REVERB 2014 16
𝜀𝑚+1 < 𝑒𝑗𝑏 ℎ𝑚+1 ∘ 1 − ℎ𝑚+1 𝑋
𝑚 𝑈
𝜀𝑚
0% 10% 20% 30% 40% 50% 60% 70% 80% 1 2 3 4 5 6 h>0.99 h<0.01
REVERB 2014 17
𝜀𝑚+1 < 𝑒𝑗𝑏 ℎ𝑚+1 ∘ 1 − ℎ𝑚+1 𝑋
𝑚 𝑈
𝜀𝑚
are very small
0% 10% 20% 30% 40% 50% 60% 70% 80% 1 2 3 4 5 6 h>0.99 h<0.01
REVERB 2014 18
to the next
each layer improves invariance
0.2 0.4 0.6 0.8 1 1.2 1.4 1 2 3 4 5 6
SWB Dev Set
average maximum
𝑒𝑗𝑏 ℎ𝑚+1 ∘ 1 − ℎ𝑚+1 𝑋
𝑚 𝑈
REVERB 2014 19
REVERB 2014 20
[van der Maaten 2008]
REVERB 2014 21
REVERB 2014 22
REVERB 2014 23
Silence
REVERB 2014 24
Preprocessing Technique Task DNN Relative Imp VTLN (speaker) SWBD <1% [Seide 2011] C-MMSE (noise) Aurora4/VS <0% [Seltzer 2013] IBM/IRM Masking (noise) Aurora 4 <0% [Sim 2014]
REVERB 2014 25
“The more training data used, the greater the chance that a new sample can be trivially related to samples in the training data, thereby lessening the need for any complex reasoning that may be beneficial in the cases of sparse training data.” [Brill 2002]
REVERB 2014 26
REVERB 2014 27
REVERB 2014 28
Preprocessing
REVERB 2014 29
Preprocessing
REVERB 2014 30
informative side information
Knowledge + Auxiliary information
REVERB 2014 31
to penalize frequent switching
hyp 2 hyp 1
REVERB 2014 32
[Weng 2014]
training of GMM acoustic models
variability [Seltzer 2013]
𝒘 𝒊𝑂 𝒕 𝒊1 𝒐
REVERB 2014 33
training of GMM acoustic models
variability [Seltzer 2013]
𝒘 𝒊𝑂 𝒕 𝒊1 𝒋
REVERB 2014 34
training of GMM acoustic models
variability [Seltzer 2013]
𝒘 𝒊𝑂 𝒕 𝒊1 𝒐
REVERB 2014 35
training of GMM acoustic models
variability [Seltzer 2013]
𝒘 𝒊𝑂 𝒕 𝒊1 𝒋
REVERB 2014 36
and deep networks
REVERB 2014 37
REVERB 2014 38
REVERB 2014 39
submission
ICASSP, 2014
REVERB 2014 40