AB Feature Extraction Experiments Discussion Noise Robust LVCSR - - PowerPoint PPT Presentation

▶

May 01, 2023 126 likes •321 views

Introduction AB Feature Extraction Experiments Discussion Noise Robust LVCSR Feature Extraction Based on the Stabilized Weighted Linear Prediction HUT-TUT Fall DSP Seminar 2008 Heikki Kallasjoki Adaptive Informatics Research Centre

SLIDE 1

AB

Introduction Feature Extraction Experiments Discussion

Noise Robust LVCSR Feature Extraction Based

n the Stabilized Weighted Linear Prediction

HUT-TUT Fall DSP Seminar 2008 Heikki Kallasjoki

Adaptive Informatics Research Centre Helsinki University of Technology

21.11.2008

Heikki Kallasjoki Noise Robust LVCSR Feature Extraction Based on the SWLP

SLIDE 2

AB

Introduction Feature Extraction Experiments Discussion

Outline

Introduction Introduction Feature Extraction MFCC SWLP Experiments Speech Recognition System Data Results Discussion Conclusions and Future Work Questions?

Heikki Kallasjoki Noise Robust LVCSR Feature Extraction Based on the SWLP

SLIDE 3

AB

Introduction Feature Extraction Experiments Discussion Introduction

Introduction

◮ The “standard” mel-frequency cepstral coefficient (MFCC)

based feature extraction is relatively sensitive to noise

◮ Noise robustness can be improved by replacing the “raw” FFT

spectrum with a suitable spectral envelope estimate which (optimally) models only the interesting things

◮ SWLP is one method of generating such an envelope

estimate, based on temporally weighted linear prediction

Heikki Kallasjoki Noise Robust LVCSR Feature Extraction Based on the SWLP

SLIDE 4

AB

Introduction Feature Extraction Experiments Discussion MFCC SWLP

Mel-frequency Cepstral Coefficients (MFCCs)

◮ The de-facto standard for speech recognition features ◮ Easily computed:

◮ Window the input signal into overlapping frames ◮ Estimate the log of the amplitude spectrum ◮ Wrap to mel scale using logarithmically spaced triangular filters ◮ Take a DCT of the result to get cepstral coefficients

◮ Replacing the direct FFT-based amplitude spectrum with a

spectral envelope estimate leads to MFCC variants that are more noise-robust

Heikki Kallasjoki Noise Robust LVCSR Feature Extraction Based on the SWLP

SLIDE 5

AB

Introduction Feature Extraction Experiments Discussion MFCC SWLP

Conventional Linear Prediction (LP)

◮ Linear prediction gives an all-pole model for predicting a signal ◮ Conventional LP:

ˆ xn = −

p

aixn−i

◮ The ai coefficients are found by minimizing a cost function:

E (a) =

N+p

ε2

n(a),

where εn(a) = xn − ˆ xn

Heikki Kallasjoki Noise Robust LVCSR Feature Extraction Based on the SWLP

SLIDE 6

AB

Introduction Feature Extraction Experiments Discussion MFCC SWLP

Stabilized Weighted Linear Prediction (SWLP)

◮ In Weighted Linear Prediction (WLP), a temporal weight term

is added to the LP cost function

◮ The weight function makes it possible to give a higher

importance to particular, hopefully less noisy regions of the signal

◮ SWLP is a formulation of WLP which guarantees that the

generated all-pole model is stable

Heikki Kallasjoki Noise Robust LVCSR Feature Extraction Based on the SWLP

SLIDE 7

AB

Introduction Feature Extraction Experiments Discussion MFCC SWLP

SWLP Weight Function Selection

◮ Simple choice for the SWLP weight function is the short-time

energy (STE) function

◮ The weight function given by STE causes the SWLP model to

emphasize strong speech regions, where SNR is generally more favorable

Heikki Kallasjoki Noise Robust LVCSR Feature Extraction Based on the SWLP

SLIDE 8

AB

Introduction Feature Extraction Experiments Discussion MFCC SWLP

Effect of the STE Window Width

◮ As the STE window width is adjusted, the spectral behavior

approaches that of conventional LP

1000 2000 3000 4000 5000 6000 7000 8000 10 20 30 40 50 60 Frequency/Hz Amplitude/dB FFT SWLP M=8 SWLP M=128 LP

Heikki Kallasjoki Noise Robust LVCSR Feature Extraction Based on the SWLP

SLIDE 9

AB

Introduction Feature Extraction Experiments Discussion Speech Recognition System Data Results

Speech Recognition System

◮ Acoustic model:

◮ Cross-word triphones ◮ State-clustered hidden Markov models ◮ Gaussian mixture models in speech feature space ◮ Gamma distribution for duration modeling

◮ Language model: n-grams of “statistical morphs”

Heikki Kallasjoki Noise Robust LVCSR Feature Extraction Based on the SWLP

SLIDE 10

AB

Introduction Feature Extraction Experiments Discussion Speech Recognition System Data Results

Feature Extraction

◮ Pre-processing:

◮ Pre-emphasis filter 1 − 0.97z−1 and Hamming-windowed

frames

◮ 125 frames per second, 256 samples per frame

◮ MFCC:

◮ FFT based log-magnitude spectrum ◮ Filterbank of 23 logarithmically spaced triangular filters ◮ DCT of filterbank output to get cepstral coefficients

◮ SWLP-MFCC: FFT spectrum replaced with SWLP estimate ◮ Post-processing:

◮ 39-dimensional features: frame energy, 12 cepstral coefficients,

first and second derivatives

◮ Cepstral mean subtraction ◮ Normalization, MLLT Heikki Kallasjoki Noise Robust LVCSR Feature Extraction Based on the SWLP

SLIDE 11

AB

Introduction Feature Extraction Experiments Discussion Speech Recognition System Data Results

Data

◮ SPEECON Finnish language corpus ◮ Training sets:

◮ Clean set: 21 hours of clean speech from 293 speakers ◮ Multicondition set: similar length, even split of clean and noisy

speech

◮ Test sets:

◮ Car environment: 60 minutes ◮ Public places environment: 90 minutes

◮ Both test sets had all recordings from three separate

microphones positioned at different distances

Heikki Kallasjoki Noise Robust LVCSR Feature Extraction Based on the SWLP

SLIDE 12

AB

Introduction Feature Extraction Experiments Discussion Speech Recognition System Data Results

Results

Results.

Heikki Kallasjoki Noise Robust LVCSR Feature Extraction Based on the SWLP

SLIDE 13

AB

Introduction Feature Extraction Experiments Discussion Speech Recognition System Data Results

Car environment, clean training set

10 20 30 40 50 60 70 1 2 Letter error rate (%) Recording channel MFCC LP-MFCC SWLP-MFCC 4.0 29.6 67.5 3.9 27.1 54.2 3.9 27.0 53.8

Heikki Kallasjoki Noise Robust LVCSR Feature Extraction Based on the SWLP

SLIDE 14

AB

Introduction Feature Extraction Experiments Discussion Speech Recognition System Data Results

Car environment, noisy training set

5 10 15 20 25 1 2 Letter error rate (%) Recording channel MFCC LP-MFCC SWLP-MFCC 3.8 6.8 18.1 4.0 7.3 17.9 4.0 8.0 18.4

Heikki Kallasjoki Noise Robust LVCSR Feature Extraction Based on the SWLP

SLIDE 15

AB

Introduction Feature Extraction Experiments Discussion Speech Recognition System Data Results

Public place environment, clean training set

5 10 15 20 25 30 35 40 45 1 2 Letter error rate (%) Recording channel MFCC LP-MFCC SWLP-MFCC 3.6 24.3 40.2 3.4 21.2 34.9 3.3 21.7 34.4

Heikki Kallasjoki Noise Robust LVCSR Feature Extraction Based on the SWLP

SLIDE 16

AB

Introduction Feature Extraction Experiments Discussion Speech Recognition System Data Results

Public place environment, noisy training set

2 4 6 8 10 12 14 16 18 1 2 Letter error rate (%) Recording channel MFCC LP-MFCC SWLP-MFCC 3.4 6.3 11.9 3.6 7.1 12.3 3.7 7.1 12.5

Heikki Kallasjoki Noise Robust LVCSR Feature Extraction Based on the SWLP

SLIDE 17

AB

Introduction Feature Extraction Experiments Discussion Conclusions and Future Work Questions?

Conclusions and Future (and Current) Work

◮ Spectral envelope estimation helps, when the noise is

something unexpected

◮ How to use the SWLP weighting better? ◮ Adaptive control of the SWLP STE window width

◮ Improvements possible if we can select the “correct” M ◮ Using log-probabilities given by the decoder shows some

promise

◮ Alternative idea: on-line noise estimation ◮ Maybe even replacing STE weighting with something else

entirely

◮ SWLP against other methods, esp. MVDR based

Heikki Kallasjoki Noise Robust LVCSR Feature Extraction Based on the SWLP

SLIDE 18

AB

Introduction Feature Extraction Experiments Discussion Conclusions and Future Work Questions?

Questions?

?

Heikki Kallasjoki Noise Robust LVCSR Feature Extraction Based on the SWLP