Filter Banks
SPEECH RECOGNITION 40833
1
Filter Banks SPEECH RECOGNITION 40833 1 2 Spectral Analysis - - PowerPoint PPT Presentation
Filter Banks SPEECH RECOGNITION 40833 1 2 Spectral Analysis Models (a) Pattern Recognition (b) Acoustic phonetic approaches to speech recognition 3 Spectral Analysis Models LPC analysis model 4 THE BANK-OF-FILTERS FRONT- END
SPEECH RECOGNITION 40833
1
2
approaches to speech recognition
3
4
5
6
Typical waveforms and spectra for analysis of a pure sinusoid in the filter-bank model
7
Typical waveforms and spectra of a voice speech signal in the bank-of-filters analysis model
8
Ideal (a) and realistic (b) set of filter responses of a Q-channel filter bank covering the frequency range Fs/N to (Q+1/2)Fs/N
N F b N/2 Q Q i 1 i, N F f
s i s i
9
1 i 1 j 1 i j 1 i 1 i i 1
10
1600Hz b Hz, 2400 f : 4 Filter Hz 800 b Hz, 1200 f : 3 Filter Hz 400 b Hz, 600 f : 2 Filter 200Hz b Hz, 300 f : 1 Filter
4 4 3 3 2 2 1 1
11
12
Ideal specification of a 4-channel octave band-filter bank (a), a 12-channel third-octave band filter bank (b), and a 7-channel critical band scale filter bank
13
The variation of bandwidth with frequency for the perceptually based critical band scale Ideal specification of a 4-channel octave band-filter bank (c) covering the telephone bandwidth range (200-3200 HZ)
14
assume each bandpass filter impulse response to be represented by:
lowpass window, w(n), modulated by complex exponential
i
j n
e
15
The signals s(m) and w(n-m) used in evaluation of the short-time Fourier transform
16
Short-time Fourier transform using a long (500 points or 50 msec) Hamming window on a section of voiced speech
FREQUENCY SAMPLE VALUE LOG MAGNITUDE (dB)
17
Short-time Fourier transform using a short (50 points or 5 msec) hamming window on a section of voiced speech
FREQUENCY SAMPLE VALUE LOG MAGNITUDE (dB)
18
FREQUENCY SAMPLE VALUE LOG MAGNITUDE (dB)
Short-time Fourier transform using a long (500 points or 50 msec) hamming window on a section of unvoiced speech
19
FREQUENCY SAMPLE VALUE LOG MAGNITUDE (dB)
Short-time Fourier transform using a short (50 points or 5 msec) hamming window on a section of unvoiced speech
20
) (
~ n
s
) (n s
) (n w
i
j
e
) (
1
j n e
S
21
FFT implementation of a uniform filter bank
22
) (n s
) (
1 n
h ) (n X Q ) (
2 n
h
) (n hQ
) (
1 n
X
) (
2 n
X
23
Two arbitrary nonuniform filter-bank filter specifications consisting
24
25
FREQUENCY (kHz) TIME IN SAMPLES VALUE MAGNITUDE (dB)
26
Window sequence, w(n), (part a), the individual filter response (part b), and the composite response (part c) of a Q = 15 channel, uniform filter bank, designed singal 101-point Kaiser window smoothed lowpass window (after Dautrich et al).
FREQUENCY (kHz) MAGNITUDE (dB)
27
FREQUENCY (kHz) TIME IN SAMPLES VALUE MAGNITUDE (dB)
28
Window sequence, w(n), (part a), the individual filter response (part b), and the composite response (part c) of a Q = 15 channel, uniform filter bank, designed singal 101-point Kaiser window directly as the lowpass window (after Dautrich etal).
FREQUENCY (kHz) MAGNITUDE (dB)
29
30
31
32