SLIDE 9 Transducer,
Model
Encodert SlowRNNn FastRNNu Outputu
- Unrolled generalized model over alignment frames u.
- Input xT ′
1
to Encoder, Output αu per frame. Bold output is non-blank label.
- Dependencies are optional.
- EncoderT
1 (xT ′ 1 )
– BLSTM – potentially downsampled
– LSTM or FFNN – per (non-blank) label n – like language model (LM)
– LSTM+FFNN or FFNN – per alignment frame u (per time t if time-sync.)
– blank or new (non-blank) label
5 of 14 Zeyer & Merboldt & Schl¨ uter & Ney: A new training pipeline for an improved neural transducer Lehrstuhl Informatik 6: Human Language Technology and Pattern Recognition RWTH Aachen University
Image