Speech Signal Representations Part 2: Speech Signal Processing - - PowerPoint PPT Presentation

speech signal representations part 2 speech signal
SMART_READER_LITE
LIVE PREVIEW

Speech Signal Representations Part 2: Speech Signal Processing - - PowerPoint PPT Presentation

Speech Signal Representations Part 2: Speech Signal Processing Hsin-min Wang References: 1 X. Huang et al., Spoken Language Processing, Chapters 5-6 2 J. R. Deller et al., Discrete-Time Processing of Speech Signals, Chapters 4-6 3 J. W. Picone,


slide-1
SLIDE 1

1

Speech Signal Representations

Part 2: Speech Signal Processing

Hsin-min Wang

References:

1 X. Huang et al., Spoken Language Processing, Chapters 5-6 2 J. R. Deller et al., Discrete-Time Processing of Speech Signals, Chapters 4-6 3 J. W. Picone, “Signal modeling techniques in speech recognition,” proceedings of the IEEE, September 1993, pp. 1215-1247

slide-2
SLIDE 2

2

Speech Recognition - Acoustic Processing

Speech Waveform Framing

O

  • 1 o2 o3 o4 ............... ot

...................

Feature vector sequence

a 11 a 12 a 22 a 23 a 33

s=1 s=2 s=3

  • 1 o2
  • t

b (o)

1

b (o)

2

b (o)

3

) ; ; ( ) | ( ) ( ) | (

1 1 ik ik t M k ik t t t i t t ij

  • N

c i s

  • P
  • b

i s j s P a Σ = = = = = =

= −

µ

Hidden Markov Model

S*

s1 s2 s3 s4 ............... st

...................

) | ( max arg

*

S O P S

S

=

) | ( max arg

*

W O P W

W

=

Signal Processing

slide-3
SLIDE 3

3

Source-Filter Model

Source-Filter model: decomposition of speech signals

− A source passed through a linear-time-varying filter − Source (excitation): the air flow at the vocal cords (聲帶 ) − Filter: the resonances (共鳴) of the vocal tract (聲道) which change over time − Once the filter has been estimated, the source can be obtained by passing the speech signal through the inverse filter h[n] x[n] e[n]

slide-4
SLIDE 4

4

Source-Filter Model (cont.)

Phoneme classification is mostly dependent on the characteristics of the filter

− Speech recognizers estimate the filter characteristics and ignore the source

  • Speech production model: linear prediction coding and cepstral

analysis

  • Speech perception model: mel-frequency cepstrum

− Speech synthesis techniques use a source-filter model because it allows flexibility in altering the pitch and the filter − Speech coders use a source-filter model because it allows a low bit rate

slide-5
SLIDE 5

5

Characteristics of the Source-Filter Model

The characteristics of the vocal tract define the uttered phoneme

− Such characteristics are evidenced in the frequency domain by the location of the formants, i.e., the peaks given by resonances

  • f the vocal tract
slide-6
SLIDE 6

6

Main Considerations in Feature Extraction

Perceptually Meaningful

− Parameters represent salient aspects of the speech signal − Parameters are analogous to those used by human auditory system (perceptually meaningful)

Robust Parameters

− Parameters are robust to variations in environments such as the channels, speakers, and transducers

Time-Dynamic Parameters

− Parameters can capture spectral dynamics, or changes of the spectrum with time (temporal correlation)

slide-7
SLIDE 7

7

Typical Procedures for Feature Extraction

A/D Conversion Pre-emphasis Framing and Windowing Fourier Transform Filter Bank

  • r

Linear Prediction (LP) Cepstral Processing

Spectral Shaping Spectral Shaping Spectral Analysis Spectral Analysis Parametric Transform Parametric Transform Speech Signal Parameters Conditioned Signal Measurements

slide-8
SLIDE 8

8

Spectral Shaping

A/D Conversion

− Conversion of the signal from a sound pressure wave to a digital signal − Sampling

Digital Filtering (Pre-emphasis)

− Emphasizing important frequency components in the signal

Framing and Windowing

− Short-time processing

slide-9
SLIDE 9

9

A/D Conversion

Undesired side effects of A/D conversion

− Line frequency noise (50/60-Hz hum) − Loss of low- and high-frequency information − Nonlinear input-output distortion − Example:

  • Frequency response of a typical

telephone grade A/D converter

  • The sharp attenuation of low

frequency and high frequency response causes problem for subsequent parametric spectral analysis algorithms

The most popular sampling frequency

− Telecommunication: 8kHz − Non-telecommunication: 10~16kHz

slide-10
SLIDE 10

10

Sampling Frequency vs. Recognition Accuracy

slide-11
SLIDE 11

11

Pre-emphasis

slide-12
SLIDE 12

12

Pre-emphasis

The pre-emphasis filter

− A FIR high-pass filter − A first-order finite impulse response filter is widely used

  • apre: values close to 1.0 that can be efficiently implemented in fixed

point hardware, such as -1 or –(1-1/16), are most common

  • Boost the signal spectrum approximately

20 dB per decade H(z)=1-a • z-1 0<a≤1 H(z)=1-a • z-1 0<a≤1 Speech signal x[n] x’[n]=x[n]-ax[n-1]

( )

1

1

− = z a z H

pre pre

( ) ( )

k N k pre pre

z k a z H

pre

− =

∑ =

decade 20dB 20dB

frequency

slide-13
SLIDE 13

13

Why Pre-emphasis?

Reason 1: Eliminate the glottal formants

− The component of the glottal signal can be modeled by a simple two-real-pole filter whose poles are near z=1 − The lip radiation characteristic, with its zero near z=1, tends to cancel the spectral effects of one of the glottal pole − By introducing a second zero near z=1 (pre-emphasis), we can eliminate effectively the larynx and lips spectral contributions ==>Analysis can be asserted to be seeking the parameters corresponding to the vocal tract only

u[n] H(z) 1-cz-1 x[n] lip vocal tract

1 2 1 1

1 1 1 1 ] [

− −

− ⋅ − = z b z b z G

glottal signal uG[n]

slide-14
SLIDE 14

14

Why Pre-emphasis? (cont.)

Reason 2: Prevent Numerical Instability

− If the speech signal is dominated by low frequencies, it is highly predictable and a large LP model will result in an ill-conditioned autocorrelation matrix

Reason 3 :

− Voiced sections of the speech signal naturally have a negative spectral slope (attenuation) of approximately 20 dB per decade due to physiological characteristics of the speech production system − High frequency formants have small amplitude with respect to low frequency formants. A pre-emphasis of high frequencies is therefore required to obtain similar amplitude for all formants

slide-15
SLIDE 15

15

Why Pre-emphasis? (cont.)

Reason 4 :

− Hearing is more sensitive above the 1 kHz region of the spectrum − The pre-emphasis filter amplifies this most perceptually important area of the spectrum

slide-16
SLIDE 16

16

Framing and Windowing

slide-17
SLIDE 17

17

Short-Time Fourier Analysis

Spectral Analysis Spectrogram Representation

− A spectrogram of a time signal is a two-dimension representation that displays time in its horizontal axis and frequency in its vertical axis − A gray scale is typically used to indicate the energy at each point (t,f)

  • “white”: low energy

“black”: high energy

slide-18
SLIDE 18

18

Framing and Windowing

Short-time-analysis by framing: decompose the speech signal into a series of overlapping frames

− Traditional methods for spectral evaluation are reliable in the case of a stationary signal (i.e., a signal whose statistical characteristics are invariant with respect to time)

  • The frame has to be short enough for the behavior (periodicity or

noise-like appearance) of the signal to be approximately constant or assumed stationary – the signal characteristics (whether periodicity or noise-like appearance) are uniform in that region

Terminology

− Frame Duration (N) : the length of time over which a set of parameters is valid, typically on the order of 20 ~ 30 ms − Frame Period (L): the length of time between successive parameter calculations (Target Rate) − Frame Rate: the number of frames computed per second

slide-19
SLIDE 19

19

Framing and Windowing (cont.)

frame m frame m+1 N L

Given a speech signal x[n], we define the short-time signal xm[n] of frame m as the product of x[n] by a window function wm[n]

− wm[n] = w[m-n] where w[n]=0 for |n|>N/2

  • In practice, the window length N is on the order of 20 to 30

− The short-time Fourier representation for frame m is defined as

[ ] [ ] [ ]

n w n x n x

m m

=

( )

∑ − = ∑ =

∞ −∞ = − ∞ −∞ = − n jwn n jwn m jw m

e n x n m w e n x e X ] [ ] [ ] [

slide-20
SLIDE 20

20

Framing and Windowing (cont.)

Rectangular window

− w[n]=1 for 0≤n≤N-1

  • Just extract the frame part of

signal without further processing

  • Its frequency response has

high side lobes

Main lobe: spreads out in a wider frequency range the narrow band power of the signal, and thus reduces the local frequency resolution Side lobe: swaps energy from different and distant frequencies of xm[n], which is called spectral leakage

Twice as wide as the rectangle window 2π/16

slide-21
SLIDE 21

21

Framing and Windowing (cont.)

[ ] [ ]

∞ −∞ =

− =

k

kP n n x δ ∑ − =

− = 1

) / 2 ( 2 ) (

N k jw

P k w P e X π δ π

Hamming window

  • f length N

Main lobe width = 4π/N →N ≥2P

slide-22
SLIDE 22

22

Framing and Windowing (cont.)

31 dB 44 dB 17 dB

The Hamming window offers less spectral leakage than the rectangular window

N π 2 N π 4

The rectangular window provides better time resolution than the Hamming window

N π 4

Rectangular windows are rarely used for speech analysis despite their better time resolution

slide-23
SLIDE 23

23

Framing and Windowing (cont.)

We want to select a window satisfy

− the main lobe is as narrow as possible in its width − the side lobe is as low as possible in its magnitude However, this is a trade-off!

In practice, the windows lengths are on the order of 20 to 30 ms

− This choice is a compromise between stationarity assumption and the frequency resolution

slide-24
SLIDE 24

24

Framing and Windowing (cont.)

The Hamming window is most widely used

[ ]

     − =       − − =

  • therwise

1 ,......, 1 , , 1 2 cos 46 . 54 . N n N n n w π

slide-25
SLIDE 25

25

Framing and Windowing (cont.)

Male Voiced Speech

slide-26
SLIDE 26

26

Framing and Windowing (cont.)

Female Voiced Speech

slide-27
SLIDE 27

27

Framing and Windowing (cont.)

Unvoiced Speech

No regularity is observed

slide-28
SLIDE 28

28

Linear Predictive Coding

slide-29
SLIDE 29

29

Linear Predictive Coding

The theory of linear predictive coding (LPC), as applied to speech, has been well understood for many years

− LPC provides a good model of the speech signal

  • This is especially true for the quasi steady state regions of speech in which

the all-pole model of LPC provides a good approximation to the vocal tract spectral envelope

  • During the unvoiced and transient regions of speech, the LPC model is less

effective than for the voiced regions, but it still provides an acceptably useful model for speech recognition purpose

− The way in which LPC is applied to the analysis of speech signals leads to a reasonable source-vocal tract separation

  • A good representation of the vocal tract characteristics becomes possible

− LPC is an analytically tractable model

  • The method of LPC is mathematically precise and is simple and

straightforward to implement in either software or hardware

− The LPC model works well in speech recognition

  • LPC front-end processing has been used in a large number of recognizers
slide-30
SLIDE 30

30

The LPC Model

The basic idea behind the LPC model is that a given speech sample at time n, x[n], can be approximated as a linear combination of the past p speech samples, such that where the coefficients a1,a2,…,ap are assumed constant over the speech analysis frame By including an excitation term, Gu[n], x[n] can be expressed as By expressing in the z-domain, we get The transfer function is

− An all-pole filter with a sufficient number of poles is a good approximation to model the vocal tract (filter) for speech signals

( ) ( ) ( ) ( )

z A z a z GU z X z H

p k k k

1 1 1

1

= ∑ − = =

= −

[ ] [ ] [ ] [ ]

p n x a n x a n x a n x

p

− + + − + − ≈ L 2 1

2 1

[ ] [ ] [ ]

n Gu k n x a n x

p k k

+ ∑ − =

=1

( ) ( ) ( )

∑ + =

= − p k k k

z GU z X z a z X

1

slide-31
SLIDE 31

31

The LPC Model (cont.)

] [n u ] [n x

slide-32
SLIDE 32

32

The LPC Model (cont.)

Based on the LPC model, the exact relation between x[n] and u[n] is We approximate x[n] as the linear combination of past speech samples The prediction error, e[n], is defined as When x[n] is actually generated by a LPC model, then the prediction error e[n] will equal to Gu[n] The basic problem of linear prediction analysis is to determine the set of predictor coefficients, {ak}, directly from the speech signal

− Since the spectral characteristics of speech vary over time, the basic approach is to find a set of predictor coefficients that minimize the mean-square prediction error over a short segment of the speech waveform

[ ] [ ] [ ]

n Gu k n x a n x

p k k

+ ∑ − =

=1

[ ] [ ]

∑ − =

= p k k

k n x a n x

1

~

[ ] [ ]

∑ − − = − =

= p k k

k n x a n x n x n x n e

1

] [ ~ ] [ ] [

slide-33
SLIDE 33

33

LPC – the Orthogonality Principle

To estimate the LPC coefficients from a set of speech samples, we use the short-term analysis technique

[ ] [ ] [ ] ( ) [ ] [ ]

∑ ∑ ∑ ∑

      − − = − = =

= n p j m j m n m m n m m

j n x a n x n x n x n e E

2 1 2 2

~

Framing/Windowing, The short-term prediction error for a specific frame m

[ ] [ ] [ ] [ ] [ ] [ ] [ ]

1 , , 1 , 1 ,

1 2 1

p i i n x n e p i i n x j n x a n x p i a j n x a n x a E

n m m i m m n m p j m j m i n p j m j m i m

≤ ≤ = ∑ − = ≤ ≤ = ∑       −       ∑ − − ≤ ≤ = ∂         ∑       ∑ − − ∂ = ∂ ∂

= =

x e

Take the derivative

[ ]

n e m

We can estimate the LPC coefficients as those that Minimize the total prediction error

  • rthogonality principle: The error vector is orthogonal to the past vectors
slide-34
SLIDE 34

34

LPC – the Yule-Walker Equations

[ ] [ ] [ ] [ ] [ ]

( )

[ ] [ ] ( ) [ ] [ ] ( ) [ ] [ ] ( )

p i n x i n x j n x i n x a p i n x i n x j n x i n x a p i i n x j n x a n x

n m m p j n m m j n m m n p j m m j n m p j m j m

≤ ≤ ∑ − = ∑       ∑ − − ⇒ ≤ ≤ ∑ − = ∑      ∑ − − ⇒ ≤ ≤ = ∑       −       ∑ − −

= = =

1 , 1 , 1 0,

1 1 1

[ ] [ ] [ ]

( )

∑ − − =

n m m m

j n x i n x j i, : ts coefficien n correlatio Define φ

The Yule-Walker equations: Solution of the set of p linear equations results in the p LPC coefficients that minimize the prediction error

[ ] [ ]

1 , , ,

1

p i i j i a

m p j m j

≤ ≤ = ∑ ⇒

=

φ φ

slide-35
SLIDE 35

35

LPC – the Minimum Mean-Squared Error

[ ] [ ] [ ] ( ) [ ] [ ] [ ] [ ] [ ] [ ] [ ]

∑ ∑       ∑ ∑ − − +       ∑ − − ∑ = ∑       ∑ − − = ∑ − = ∑ =

= = = = n n p j p k m k m j p j m j m n m n p j m j m n m m n m m

k n x a j n x a j n x a n x n x j n x a n x n x n x n e E

1 1 1 2 2 1 2 2

2 ~

same

According to the previous slide

[ ] [ ] [ ] [ ] ( ) [ ] [ ]

∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑

= = = = =

− =       − − =       − −

p j n m m j p j p k n m m k j n p j p k m k m j

n x j n x a k n x j n x a a k n x a j n x a

1 1 1 1 1

[ ] [ ] [ ] ( ) [ ] [ ]

∑ − = ∑ ∑ − − ∑ =

= = p j m j m p j n m m j n m m

j a j n x n x a n x E

1 1 2

, , φ φ

slide-36
SLIDE 36

36

LPC – Solution of the LPC Equations

[ ] [ ] [ ]

1 ,

1

N- n n e k n x a n x

m p k m k m

≤ ≤ + − = ∑

=

[ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ]

                   − =                     − +                                         − − − − − − − − − − − − 1 . . . 1 1 . . . 1 . . . 1 ... 2 1 1 1 . ... . . . ... . . . ... . . 1 ... 2 1 1 1 ... 2 1

2 1

N x x x N e e e a a a p N x N x N x p x x x p x x x

m m m m m m p m m m m m m m m m

[ ] [ ] [ ] ( ) [ ] [ ] [ ] ( )

i N x i x i x N e e e

m m m iT m m m T m

m

  • 1
  • ,...,
  • 1

,

  • 1
  • ,...,

1 , = = x e

m

e a

[ ]

( )

P m m m

x x x X ....

2 1

=

m

x

( )

( )

m T T m T T m T m T n m m m m

n e E x X X X a x X Xa X Xa x X e X x e Xa

1 2

if minimal is ] [

= ⇒ = ⇒ = − ⇒ = ∑ = = +

The solution can be achieved with any standard matrix inverse package. Because of the special form of the matrix here, some efficient solutions are possible; e.g. the autocorrelation method, the covariance method, and the lattice method

slide-37
SLIDE 37

37

LPC – the Covariance Method

One way to solve for the LPC coefficients is to fix the interval over which the mean-squared error is computed to the range 0≤n≤N-1 and to use the unweighted speech directly; i.e.,

[ ] [ ] [ ]

( )

∑ − = ∑ =

− = − = 1 2 1 2

~

N m m m N m m m

n x n x n e E

[ ]

n x

mL+N-1

[ ] [ ]

mL n x n x m + =

mL

Shift -mL

N-1

slide-38
SLIDE 38

38

LPC – the Covariance Method (cont.)

[ ] [ ] [ ] [ ] [ ] [ ] ( ) [ ] [ ] ( ) [ ] [ ]

i j i j n x n x j i n x n x j n x i n x j n x i n x j i

m j N j n m m i N i n m m N n m m N n m m m

, ,

1 1 1 1

φ φ = ∑ − + = ∑ − + = ∑ − − = ∑ − − =

− − − = − − − = − = − =

j N-1+j i

[ ]

j n xm −

[ ]

i n xm −

N-1 N-1

[ ]

n xm

N-1+i

slide-39
SLIDE 39

39

LPC – the Covariance Method (cont.)

[ ] [ ]

p i i j i a

m P j m j

,..., 2 , 1 , , ,

1

= = ∑

=

φ φ

[ ] [ ]

i j j i

m m

, , and φ φ =

[ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ]

               =                                 ⇒ , ... , 3 , 2 , 1 ... , ... 3 , 2 , 1 , ... ... ... ... ... , 3 ... 3 , 3 2 , 3 1 , 3 , 2 ... 3 , 2 2 , 2 1 , 2 , 1 ... 3 , 1 2 , 1 1 , 1

3 2 1

p a a a a p p p p p p p p

m m m m p m m m m m m m m m m m m m m m m

φ φ φ φ φ φ φ φ φ φ φ φ φ φ φ φ φ φ φ φ

ψ Φa = ⇒

Φ: symmetric and positive definite

p T

R ∈ > x Φx x vectors nonzero all for

slide-40
SLIDE 40

40

LPC – the Covariance Method (cont.)

ψ Φa = matrix. diagonal a is and s), 1' are elements diagonal main (whose matrix ngular lower tria a is where as expressed is matrix The D V VDV Φ Φ

t

=

          + + + + + =           ×           =           ×           ×           =           Φ Φ Φ Φ Φ Φ Φ Φ Φ

3 32 32 2 31 31 1 32 2 21 31 1 31 1 32 2 31 21 1 2 21 21 1 21 1 31 1 21 1 1 32 31 21 3 32 2 31 1 2 21 1 1 32 31 21 3 2 1 32 31 21 33 32 31 32 22 21 31 21 11

1 1 1 1 1 1 1 1 1 d V V d V V d V d V V d V d V d V V d d V V d V d V d V d d V V V d V d V d d V d d V V V d d d V V V

slide-41
SLIDE 41

41

LPC – the Covariance Method (cont.)

ψ Φa = matrix. diagonal a is and s), 1' are elements diagonal main (whose matrix ngular lower tria a is where as expressed is matrix The D V VDV Φ Φ

t

=

] 1 , 1 [ with 2 , ] , [ ely alternativ

  • r

] , [ elements diagonal for the and 1 ] , [ ely alternativ

  • r

1 ] , [ as expressed be can

  • f

element each So

1 1 1 2 1 1 1 1

φ φ φ φ φ = ∑ ≥ − = ∑ = < ≤ ∑ − = < ≤ ∑ =

− = = − = =

d i d V i i d V d V i i i j V d V j i d V i j V d V j i

i k k ik i i k ik k ik j k jk k ik j ij j k jk k ik

Φ

Eq2 Eq3 The Cholesky decomposition starts with Eq1 then alternates between Eq 2 and Eq3 to solve V and D Eq1

slide-42
SLIDE 42

42

LPC – the Covariance Method (cont.)

t

VDV Φ ψ Φa = = where

Y D a V a DV Y ψ VY ψ a VDV D V

1

ely alternativ

  • r

where , determined been have and Once

= = = ⇒ = ⇒

t t t 1 1 1 1

condition initial with the , 2 , as y recursivel solved be can , matrix Given ψ ψ = ≤ ≤ ∑ − =

− =

Y p i Y V Y

i j j ij i i

Y V

          =           ×          

3 2 1 3 2 1 32 31 21

1 1 1 ψ ψ ψ Y Y Y V V V           ×           =           ×          

− − − 3 2 1 32 31 21 3 2 1 1 3 1 2 1 1

1 1 1 a a a V V V Y Y Y d d d

backwards proceeds index the where / condition initial with the , 1 , / as y recursivel solved be can , determined Having

1

i d Y a p i a V d Y a

p p p p i j j ji i i i

= < ≤ ∑ − =

+ =

a Y

slide-43
SLIDE 43

43

LPC – the Autocorrelation Method

Assume identically zero outside the interval 0≤n≤N-1

[ ]

n xm

[ ] [ ] [ ]

   − ≤ ≤ + =

  • therwise

, 1 , N n n w mL n x n xm

L: Frame Period , the length

  • f time between successive

frames

[ ]

n x

mL+N-1

[ ] [ ]

mL n x n x m + = ~

mL

shift -mL

N-1

windowing

[ ] [ ] [ ]

n w n x n x

m m

~ =

N-1

slide-44
SLIDE 44

44

LPC – the Autocorrelation Method (cont.)

The mean-squared error is [ ] [ ] [ ]

( )

∑ ∑

+ − = + − =

− = =

p N m m m p N m m m

n x n x n e E

1 2 1 2

~

N+P-1 N-1 N+P-1 N-1

[ ]

n x m

[ ]

n e m

[ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] ( ) [ ]

( )

∑ − + = ∑ − − = ∑ − − = ≤ ≤ = ∑ ⇒

− − − = + − = − + = = j i N n m m j N i n m m p N n m m m m p j m j

j i n x n x j n x i n x j n x i n x j i p i i j i a

1 1 1 1

, 1 , , , φ φ φ

j N-1+j i N-1+i

[ ]

j n x m −

[ ]

i n x m −

N-1+p N-1

[ ]

n x m

slide-45
SLIDE 45

45

LPC – the Autocorrelation Method (cont.)

Define the autocorrelation function of as

− Then

[ ] [ ]

j i R j i

m

− = , φ

[ ] [ ] [ ]

− − =

+ =

k N n m m m

k n x n x k R

1

[ ]

n x m

[ ] [ ]

[ ]

[ ]

, 2 1 , , 2 1 , , ,

1 1

p ,... , i i R j i R a p ,... , i i j i a

m p j m j m p j m j

= = ∑ − ⇒ = = ∑

= =

φ φ

[ ] [ ]

k R k R

m m

− =

Why?

[ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ]

               =                                 − − − − − − p R R R R a a a a R p R p R p R p R R R R p R R R R p R R R R

m m m m p m m m m m m m m m m m m m m m m

... 3 2 1 ... ... 3 2 1 ... ... ... ... ... 3 ... 1 2 2 ... 1 1 1 ... 2 1

3 2 1

A Toeplitz Matrix: symmetric and all the elements in the diagonals are identical

[ ] [ ] ( ) [ ]

( )

∑ − + =

− − − = j i N n m m m

j i n x n x j i

1

, φ

slide-46
SLIDE 46

46

LPC – the Autocorrelation Method (cont.)

Levinson-Durbin Recursion

− 1. Initialization − 2. Iteration. For i=1,…,p do the following recursion − 3. Final Solution:

( )

[ ]

m

R E =

( ) [ ] ( ) [ ] ( )

1 1

1 1

− − − − =

− =

i E j i R i a i R i k

m i j j m

( ) ( ) [ ]

( ) (

)

1 1

2

− − = i E i k i E

( ) ( ) ( ) ( )

1 1 for , 1 1 i- j i a i k i a i a

j i j j

≤ ≤ − − − =

( ) ( )

i k i a i =

( )

p j p a a

j j

≤ ≤ = 1 for

k(i): the reflection coefficients,

  • 1≤k(i)≤1

Presentation topic: The derivation of the Levinson-Durbin recursion and lattice formulation.

slide-47
SLIDE 47

47

LPC – the Autocorrelation Method (cont.)

[ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ]

( )

[ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ]

1 ) 2 ( 2 ) 2 ( 1 1 1 2 1 2 ) 2 ( 1 1 1 2 ) 2 ( 1 1 1 1 1 2 ) 2 ( 1 ) 1 ( ) 2 ( 1 ) 2 ( 2 ) 2 ( 1 ) 2 ( ) 2 ( ) 2 ( 1 2 1 1 1 1 1 2 1 ) 1 ( ) 2 ( ) 1 ( 1 2 1 1 1 1 2 1 1 ) 2 ( ) 2 ( E(1) 1 ) 1 ( 2 ) 1 ( 1 2 1 1 2 1 1 2 1 1 ) 2 ( 2 1 ) 2 ( ) 2 ( 1 1 1 1 E(1) , 1 ) 1 ( 1 ) 1 ( E(0)

1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 2 2 2 2 2 1 1 2 2 1 1 2 2 2 2 2 2 2 2 1 1 1 m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m

R a R a R R R R R R R R R R R a R R R R R R R R R a R R R R R R R R R R R R a E k E R a R a R E a R R R R R R R R R R R R R R R a k a R R R R R R R R R R R R R R a k R a R R R R R R R R R R R R R R R R R R R R a R R a a R R R R R R R R R R a R a R R − − =                 − − + − − =         −         − − − − =         × −                 − − − = − = − − = = − − = × − − − = × − − − = = = × − = − − = − − = =       =             × − = = ⇒ = =

slide-48
SLIDE 48

48

Spectral Analysis via LPC

LPC spectrum matches more closely the peaks than the valleys

− Because the regions where contribute more to the error than those where

[ ]

( ) ( ) ( )

ω π ω π

π π ω ω π π ω

d e H e X G d e E n e E

j j m p N n j m n m

∫ ∑ ∫

− + − = −

= = = =

2 2 1 2 2 2

2 1 2 1

( ) ( )

jw jw

e H G e H ⋅ = ′

( ) ( )

ω ω j j m

e H e X >

( ) ( )

ω ω j m j

e X e H >

The higher p, the more details of the spectrum are preserved.

Parseval’s Theorem

slide-49
SLIDE 49

49

LPC Spectra (cont.)

If p is large enough, we can approximate the signal spectrum with arbitrarily small error.

slide-50
SLIDE 50

50

The Prediction Error

The prediction error signal is also called the excitation or residual signal For unvoiced speech we expect the residual to be approximately white noise

− In practice, this approximation is quite good

For voiced speech we expect the residual to approximate an impulse train

− In practice, this is not the case

[ ]

∑ − − =

= p k k

k n x a n x n e

1

] [ ] [

slide-51
SLIDE 51

51

The Prediction Error (cont.)

How do we choose p?

− Larger values of p lead to lower prediction errors − Unvoiced speech has higher error than voiced speech because the LPC model is more accurate for voiced speech − If we use a large value of p, we are fitting the individual harmonics; thus the LPC filter is modeling the source, and the separation between source and filter is not going to be so good

Rule of Thumb: a pole per kHz plus 2-4 poles to model the radiation and glottal effects

slide-52
SLIDE 52

52

Cepstral Processing

slide-53
SLIDE 53

53

Cepstral Processing

The real cepstrum of a digital signal x[n] is defined as The complex cepstrum of x[n] is defined as Both real and complex cepstrum are homomorphic transformations If the signal x[n] is real, both c[n] and are real signals The term cepstrum is coined by reversing the first syllable of the word spectrum, given that it is obtained by taking the inverse Fourier transform of the log spectrum The term quefrency is defined to represent the independent variable n in c[n]. The quefrency has dimension of time.

∫ =

− π π

π dw e e X n c

jwn jw)

( ln 2 1 ] [

( ) ( ln ) ( ln ) ( ˆ where , ) ( ln 2 1 ] [ ˆ

jw jw jw jw jwn jw

e X j e X e X e X dw e e X n x ∠ + = = ∫ =

− π π

π

] [ ˆ n x

)

slide-54
SLIDE 54

54

LPC-Cepstrum

∏ − = ∑ − =

= − = − p k k p k k k

z b G z a G z H

1 1 1

) 1 ( 1 ) (

( )

∑ = ∑ − − =

∞ −∞ = − = − k k p k k

z k h z b G z H ] [ ˆ 1 ln ln ) ( ˆ

1 1

Take the logarithm

cepstrum complex : ] [ ˆ k h

) ( ) ( ) ( z X z H z E =

( )

∑ − = −

∞ =1

1 ln

n n

n x x

∑ = ∑ ∑ + = ∑ ∑ − =

∞ −∞ = − ∞ = = − = ∞ = − n n n p k n n k p k n n n k

z n h n z b G n z b G z H ] [ ˆ ln ln ) ( ˆ

1 1 1 1

Given the LPC filter

n n ln ] [ ˆ

1

       > ∑ = < =

= p k n k

n b G n n h

Equate terms in z-n While there are a finite number of LPC coefficients, the number of cepstrum coefficients is infinite Typically, a finite number of cepstrum coefficients are sufficient to approximate it

slide-55
SLIDE 55

55

LPC-Cepstrum (cont.)

∑ − =

= − p k k kz

a G z H

1

1 ) (

( )

∑ = ∑ − − =

∞ −∞ = − = − k k p l l l

z k h z a G z H ] [ ˆ 1 ln ln ) ( ˆ

1

Take the logarithm Given the LPC filter

∑ − = ∑ − ∑ −

∞ −∞ = − − = − = − − k k p l l l p n n n

z k h k z a z na

1 1 1 1

] [ ˆ 1

Take the derivative of both sides with respect to z

∑ ∑ − ∑ = ∑

= ∞ −∞ = − − ∞ −∞ = − = − p l k l k l n n p n n n

z a k h k z n h n z na

1 1

] [ ˆ ] [ ˆ

( )

∑ − −

= − p l l lz

a z

1

1

Multiply both sides by Replace l=n-k 1≤n-k≤p

∑ ∑ − ∑ = ∑

∞ −∞ = − − = − − ∞ −∞ = − = − n n p n k n k n n n p n n n

z a k h k z n h n z na

1 1

] [ ˆ ] [ ˆ

p n a k h k n h n p n a k h k n h n na

n p n k k n n k k n n

> ∑ − = ≤ < ∑ − =

− − = − − = −

, ] [ ˆ ] [ ˆ , ] [ ˆ ] [ ˆ

1 1 1

Equate terms in z-n

slide-56
SLIDE 56

56

LPC-Cepstrum (cont.)

p n a k h k n h n p n a k h k n h n na

n p n k k n n k k n n

> ∑ − = ≤ < ∑ − =

− − = − − = −

, ] [ ˆ ] [ ˆ , ] [ ˆ ] [ ˆ

1 1 1

( )

∑ − = −

∞ =1

1 ln

n n

n x x

( ) ( )

∑ = ∑ ∑ + = ∑ − − =

∞ −∞ = − ∞ = = − = − k k n n p l l l p l l l

z k h n z a G z a G z H ] [ ˆ ln 1 ln ln ) ( ˆ

1 1 1

G h ln ] [ ˆ =

         > ∑       ≤ < ∑       + = < =

− = − − = −

p n a k h n k p n a k h n k a n G n n h

n k k n n k k n n 1 1 1 1

] [ ˆ ] [ ˆ ln ] [ ˆ LPC-cepstrum The LPC coefficients can be computed recursively from the LPC coefficients

slide-57
SLIDE 57

57

Homomorphic Transform and Cepstral Processing

A homomorphic transform is a transform that converts a convolution into a sum Cepstrum is regarded as a homomorphic function that allows us to separate the source from the filter

− We can find a value N such that the cepstrum of the filter , and that the cepstrum of the excitation

( )

⋅ D

[ ] [ ] [ ]

n h n e n x ∗ =

[ ] [ ] ( ) [ ] [ ]

n h ˆ n e ˆ n x D n x ˆ + = =

x(n)=e(n)*h(n) X(ω)=E(ω)H(ω) |X(ω)|=|E(ω)||H(ω)|log|X(ω)|=log|E(ω)|+log|H(ω)|

[ ]

N n n h ≥ ≈ for ˆ

[ ]

N n n e < ≈ for ˆ

   ≥ < = N n N n n l 1 ] [ ] [ recover to 1 ] [ use n e N n N n n l    < ≥ =

slide-58
SLIDE 58

58

Source-Filter Separation via Cepstrum

If we use a large value of p, the separation between source and filter is not going to be so good. Why?

slide-59
SLIDE 59

59

Mel-Frequency Cepstrum Coefficients (MFCC)

slide-60
SLIDE 60

60

Mel-Frequency Cepstrum Coefficients (MFCC)

Most widely used in the speech recognition Has generally obtained a better accuracy and a minor computational complexity

Speech signal

Pre-emphasis

Window

DFT Mel filterbanks Log(Σ|·|2)

MFCC

energy derivatives

[ ] [ ] { } { } [ ] { } { }

         ∆ ∆ ∆ ∆ =

l 2 l 2 l l l l l

e , n c e , n c e , n c c

[ ]

n x ~

l

[ ]

n s

[ ]

k X l

[ ]

m S l

[ ]

n s ~

[ ]

n cl

l

e

Spectral Shaping Spectral Shaping Spectral Analysis Spectral Analysis Parametric Transform Parametric Transform

IDFT or Cosine Transform

slide-61
SLIDE 61

61

DFT and Mel-filterbank Processing

For each frame of signal (N points, e.g., N=512)

− The Discrete Fourier Transform (DFT) is first performed to obtain its spectrum (N points, for example N=512) − A filterbank with M filters designed according to Mel scale is then applied, and each filter output is the sum of its filtered spectral components (M output values, for example M=24) t f

[ ]

1 N ,..., 1 , n n x ~ − =

DFT

Time domain signal

A windowed short-time signal

Spectrum

[ ]

S

sum

f

[ ]

1 S

sum

f

[ ]

N k e n x k X

N n N kn j a

< ≤ ∑ =

− = −

] [ ~

1 / 2π

[ ]

1 M S −

sum

f

slide-62
SLIDE 62

62

DFT and Mel-filterbank Processing (cont.)

1 2

f[m-1] f[m]

[ ] [ ] [ ]

1 1 ] [

* *

− − − − = m f m f m f k k H m

[ ] [ ] [ ]

1 ] [

* * 1

− − − =

m f m f k m f k H m

m

[ ] ( ) ( ) ( )

     + − + =

1

1

M f B f B m f B B F N m f

l h l s

( )

h

f B

( )

l

f B

Mel frequency Linear frequency

H1[k] H2[k] H3[k] Hm[k] k

0, 1,…, N-1

f[1] f[0]

… …

f[M]

( ) ( ) ( )

1 1125 / exp 700

1

− =

b b B

( ) ( )

700 / 1 1125 f f B + =

… … m-1

M M-1

Hm-1[k]

k*

slide-63
SLIDE 63

63

DFT and Mel-filterbank Processing (cont.)

We then compute the log-energy at the output of each filter as The mel-frequency cepstrum is then the discrete cosine transform of the M filter outputs M varies for different implementations from 24 to 40 For speech recognition, typically only the first 13 cepstrum coefficients are used

M m k H k X m S

N k m a

≤ <       ∑ =

− =

, ] [ ] [ ln ] [

1 2

M n M m n m S n c

M m

< ≤ ∑ − =

=

, ) / ) 2 / 1 ( cos( ] [ ] [

1

π

slide-64
SLIDE 64

64

Motivations for Mel-filterbank Analysis

The human ear resolves frequencies non-linearly across the audio spectrum and empirical evidence suggests that designing a front-end to operate in a similar non-linear manner improves recognition performance The position of maximum displacement along the basilar membrane for stimuli such as pure tone is proportional to the logarithm of the frequency of the tone Frequencies of a complex sound within a certain bandwidth of some nominal frequency cannot be individually identified

− When one of the components of this sound falls outside this bandwidth, it can be individually distinguished − This bandwidth is referred to as the critical bandwidth − A critical bandwidth is nominally 10% to 20% of the center frequency of the sound

slide-65
SLIDE 65

65

Motivations for Mel-filterbank Analysis (cont.)

For speech recognition purpose :

− Filters are non-uniformly spaced along the frequency axis − The part of the spectrum below 1kHz is processed by more filter banks

  • This part contains more information on the vocal tract such as the

first formant

− Non-linear frequency analysis is also used to achieve frequency/time resolution

  • Narrow band-pass filters at low frequencies enables harmonics to

be detected

  • Wide bandwidth at higher frequencies allows for higher temporal

resolution of bursts

slide-66
SLIDE 66

66

Why Log Energy Computation?

Using the magnitude (power) only to discard phase information

− Phase information is useless in speech recognition

  • Replacing the phase part of the original speech signal with

continuous random phase won’t be perceived by human ear

Using the logarithmic operation to compress the component amplitudes at every frequency

− The characteristic of the human hearing system − The compression makes feature extraction less sensitive to dynamic variations − To separate the excitation (source) produced by the vocal cords and the filter that represents the vocal tract (homomorphic systems)

slide-67
SLIDE 67

67

Why Inverse DFT?

Discrete Cosine Transform (DCT)

− Since the log-power spectrum is real and symmetric, the inverse DFT reduces to a Discrete Cosine Transform (DCT). The DCT has the property to produce more highly uncorrelated features − Cepstral coefficients are more compact since they are sorted in variance order

  • Can be truncated to retain the highest energy coefficients, which

represents an implicit lifterring operation with a rectangular window

− Successfully separates the vocal tract and the excitation

  • The envelope of the vocal tract changes slowly, and thus at low

quefrencies (lower order cepstrum), while the periodic excitation are at high quefrencies (higher order cepstrum)

[ ] [ ]

M L n m M n m S M n c

M m l l

< =             − =

=

,..., 1 , , 2 1 cos 2

1

π

slide-68
SLIDE 68

68

Parametric Transform - Derivatives

The performance of a speech recognition system can be greatly enhanced by adding time derivatives to the basic static parameters

MFCC stream

quefrency(N) Frame index

l-1 l l+1 l+2

quefrency(N) Frame index

ΔMFCC stream

quefrency(N)

Δ2 MFCC stream

[ ] [ ] [ ]

( )

∑ ∑

= = − +

− = ∆

P 1 p 2 P 1 p p l p l l

p 2 n c n c p n c

[ ] [ ] [ ]

( )

∑ ∑

= = − +

∆ − ∆ = ∆

P 1 p 2 P 1 P p l p l l 2

p 2 n c n c p n c

[ ]

n cl Frame index

slide-69
SLIDE 69

69

MFCC vs. LPC Cepstral Coefficients

MFCC outperforms LPC Cepstral coefficients

− Perceptually motivated mel-scale representation indeed helps recognition

Higher-order MFCC does not further reduce the error rate in comparison with the 13-order MFCC Another perceptually motivated features such as first- and second-order delta features can significantly reduce the recognition errors

slide-70
SLIDE 70

70

Vector Quantization (VQ)

slide-71
SLIDE 71

71

Vector Quantization (VQ)

The results of either the filter-bank analysis or the LPC analysis are a series of characteristic vectors of the time-varying spectral characteristics of the speech signal Vector quantization: an arbitrary spectral vector is represented as the index of the codebook vector that best matches the input vector Advantages:

− Reduced storage for spectral analysis information − Reduced computation for determining similarity of spectral analysis vectors − Discrete representation of speech sounds

Disadvantages:

− An inherent spectral distortion in representing the actual analysis vector − The storage required for codebook vectors is often nontrivial

slide-72
SLIDE 72

72

VQ – Training and Classification

slide-73
SLIDE 73

73

VQ – The K-means Algorithm

The way in which a set of L training vectors can be clustered into a set of M codebook vectors is the following (the K-means algorithm)

  • 1. Initialization: Arbitrarily choose M vectors (initially out of the

training set of L vectors) as the initial set of code words in the codebook

  • 2. Nearest-Neighbor Search: For each training vector, find the

code word in the current codebook that is closest (in terms of spectral distance), and assign that vector to the corresponding cell (associated with the closest code word)

− We often use the Euclidean distance for spectral distance measure

  • 3. Centroid Update: Update the code word in each cell using the

centroid of the training vectors assigned to that cell

  • 4. Iteration: Repeat steps 2 and 3 until the average distance falls

below a preset threshold

slide-74
SLIDE 74

74

VQ – The Binary Split Algorithm

  • 1. Design a 1-vector codebook; this is

the centroid of the entire set of the training vectors (no iteration is required in this step)

  • 2. Double the size of the codebook by

splitting each code word yn in current codebook according to the rule

  • 3. Use the K-means iterative algorithm

to get the best set of centroids for the split codebook

  • 4. Iterate steps 2 and 3 until a

codebook of size M is obtained

0.001 say value, positive small a : ) 1 ( ) 1 ( ε ε ε − = + =

− + n n n n

y y y y

slide-75
SLIDE 75

75

Pitch Determination

slide-76
SLIDE 76

76

The Role of Pitch

Pitch determination is very important for many speech processing algorithms

− Speech synthesis methods require pitch tracking on the desired speech segment if prosody modification is to be done − Chinese speech recognition systems use pitch tracking for tone recognition

LPC and cepstrum represent the filter while pitch represents the source Pitch determination algorithms also use short-time analysis techniques

− For every frame xm, we get a score f(T| xm) that is a function of the candidate pitch periods T − The pitch determination algorithms determine the optimal pitch according to ) | ( max arg

m T m

T f T x =

slide-77
SLIDE 77

77

Pitch Determination

– the Autocorrelation Method

A commonly used method to estimate pitch is based on detecting the highest value of the autocorrelation function in the region of interest

) cos( ] [ φ x + = n w n

Given a sinusoidal random process The autocorrelation is

[ ]

), cos( 2 1 2 1 ) cos( ) 2 ) 2 ( cos( 2 1 2 1 ) ) ( cos( ) cos( ]} [ ] [ { ] [

*

m w d m w m n w d m n w n w m n n E m R = ∫ + + + = ∫ + + + = + =

− − π π π π

π π φ φ φ φ φ x x which has maxima for m=lT0, the pitch period and its harmonics Any WSS (wide-sense stationary) periodic process x[n] with period T0 also has an autocorrelation R[m] which exhibits its maxima at m=lT0

slide-78
SLIDE 78

78

Pitch Determination

– the Autocorrelation Method (cont.)

In practice, we need to obtain an estimate from knowledge of only N samples

] [ ˆ m R

∑ + + =

− − = m N n

m n m n w n n w N m R

1

] [ ] [ ] [ ] [ 1 ] [ ˆ x x Use a window w[n] of length N on x[n], the empirical autocorrelation function is given by

{ }

( )

] [ ] [ ] [ ] [ ] [ ] [ ] [ 1 ] [ ] [ ] [ ] [ 1 ] [ ˆ

1 1

m w m w m R d f m n n m n w n w N d f m n m n w n n w N m R E

m N n m N n

− ∗ = ∑ + ∫ + = ∫ ∑ + + =

− − = − − =

x x x x x x

x x x x

{ }

2 ) cos( 1 ] [ ˆ N m m w N m m R E <         − = For the case of a sinusoidal random process with a rectangular window of length N

slide-79
SLIDE 79

79

Pitch Determination

– the Autocorrelation Method (cont.)

89

slide-80
SLIDE 80

80

Output of LTI Systems

[ ] [ ]

∞ −∞ =

− =

k

kP n n x δ

[ ] [ ]

1 , < = ∑

∞ −∞ =

a n u a n h

k n

DTFT

( )

            − =

∞ −∞ =

k P P X

k

π ω δ π ω 2 2

( )

jw

ae H

− = 1 1 ω

DTFT

[ ] [ ]

( ) ( )

jw jw

e X e H n h n x ⇔ ∗

( ) ( ) ( )

            − − =             − − = =

∑ ∑

∞ −∞ = − ∞ −∞ = −

k P ae P k P P ae X H Y

k k P j k j

π ω δ π π ω δ π ω ω ω

π ω

2 1 1 2 2 2 1 1

2

slide-81
SLIDE 81

81

The Sampling Theorem

( )

Ω j X p

( )

Ω j X p

( )

Ω j P

frequency) (sampling 2 2

s s

F T π π = = Ω

     = Ω < Ω T

s N

π 2 1      = Ω > Ω T

s N

π 2 1

aliasing distortion ( ) ( )

∞ −∞ =

Ω − Ω = Ω

k s

k T j P δ π 2

( ) ( ) ( ) ( ) ( ) ( )

∞ −∞ =

Ω − Ω = Ω Ω ∗ Ω = Ω

k s p p

k j X T j X j P j X j X 1 2 1 π

frequency the : 2 / Nyquist Fs

N

Ω −

( )

Ω j X

N

slide-82
SLIDE 82

82

The Rectangular Window

( ) ( )

2 / ) 1 ( 2 / ) 1 ( 2 / 2 / 2 / 2 / 2 / 2 / 1 1

) ( ) 2 / sin( ) 2 / sin( 1 1 ) ( 1 1 ) ( ] [ ] [ ] [

− − − − − − − − − − − − − = −

= = − − = − − = − − = ∑ = − − ≡

N jw N jw jw jw jw jwN jwN jwN jw jwN jw N N n n

e w A e w wN e e e e e e e e e H z z z z H N n u n u n h

π π π

N-1

n 1

(w/2π)

N w π 2 =

0.02

,...} 2 , , { / 2 for , ) ( N N k N k w w A

k

± ± ≠ = = π

slide-83
SLIDE 83

83

Fourier Transforms of the Impulse Train

∑ − =

∞ −∞ = k N

kN n n p ] [ ] [ δ

The impulse train is periodic with period N

[ ]

∑ = ∑ = ⇒ = ∑ =

− = − = − = − 1 / 2 1 / 2 1 / 2

1 ] [ 1 ] [ 1 ] [

N k N nk j N k N nk j N N N n N nk j N N

e N e k P N n p e n p k P

π π π

Alternate expression

  • f an impulse train

∑ − = ∴

− = 1

) / 2 ( 2 ) (

N k jw N

N k w N e P π δ π ) ( 2 w w e

n jw

− ↔ πδ Q

n jw jwn

e dw e w w ) ( 2 2 1 = ∫ − πδ π

The Fourier transform of an impulse train signal is also an impulse train

slide-84
SLIDE 84

84

Fourier Transforms of Periodic Signals –

general periodic signals

∑ − =         ∑ − =

− = − = 1 / 2 1

) / 2 ( ) ( 2 ) / 2 ( 2 ) ( ) (

N k N k j N k jw jw N

N k w e X N N k w N e X e X π δ π π δ π

π

[ ]

] [ ] [ ] [ ] [ ] [ n p n x kN n n x kN n x n x

N k k N

∗ = ∑ − ∗ = ∑ − =

∞ −∞ = ∞ −∞ =

δ ] [ ] [ let    < ≤ ≡

  • therwise

N n n x n x

N

Given a periodic signal xN[n] with period N, Then