SLIDE 10 AB
Introduction Feature Extraction Experiments Discussion Speech Recognition System Data Results
Feature Extraction
◮ Pre-processing:
◮ Pre-emphasis filter 1 − 0.97z−1 and Hamming-windowed
frames
◮ 125 frames per second, 256 samples per frame
◮ MFCC:
◮ FFT based log-magnitude spectrum ◮ Filterbank of 23 logarithmically spaced triangular filters ◮ DCT of filterbank output to get cepstral coefficients
◮ SWLP-MFCC: FFT spectrum replaced with SWLP estimate ◮ Post-processing:
◮ 39-dimensional features: frame energy, 12 cepstral coefficients,
first and second derivatives
◮ Cepstral mean subtraction ◮ Normalization, MLLT Heikki Kallasjoki Noise Robust LVCSR Feature Extraction Based on the SWLP