Linear Prediction 1 Outline Windowing LPC Introduction to - - PowerPoint PPT Presentation

linear prediction
SMART_READER_LITE
LIVE PREVIEW

Linear Prediction 1 Outline Windowing LPC Introduction to - - PowerPoint PPT Presentation

Linear Prediction 1 Outline Windowing LPC Introduction to Vocoders Excitation modeling Pitch Detection Short-Time Processing Speech signal is inherently non-stationary For continuant phonemes there are stationary


slide-1
SLIDE 1

1

Linear Prediction

slide-2
SLIDE 2

Outline

 Windowing  LPC  Introduction to Vocoders  Excitation modeling

Pitch Detection

slide-3
SLIDE 3

Short-Time Processing

 Speech signal is inherently non-stationary  For continuant phonemes there are stationary periods of

at least 20-25ms

 The short-time speech frames are assumed stationary  The frame length should be chosen to include just one

phoneme or allophone

 Frame lengths are usually chosen to be between 10-

50ms

 We consider rectangular and Hamming windows here

3

slide-4
SLIDE 4

Rectangular Window

slide-5
SLIDE 5

Hamming Window

slide-6
SLIDE 6

Comparison of Windows

6

slide-7
SLIDE 7

Comparison of Windows (cont’d)

slide-8
SLIDE 8

Linear Prediction Coding (LPC)

 Based on all-pole model for speech production system:  In time domain, we get:  In other words, we can predict s[n] as a function of p

previous signal samples (and the excitation).

 The set of {ak} is one way of representing the time

varying filter. There are many other ways to represent this filter (e.g., pole value, Lattice filter value, LSP, …).

 

 

p k k k z

a A z H

1

. 1 ) (

] [ ] [ . ] [

1

n Au k n s a n s

g p k k

  

slide-9
SLIDE 9

LPC parameter estimation

 There are many methods to estimate the

LPC parameters:

Autocorrelation method: results in the

  • ptimization of a in a set of p linear equations.

Covariance method

 Procedures (such as Levinson-Durbin,

Burg, Le Roux) obtain efficient estimation

  • f these parameters.
slide-10
SLIDE 10

LPC Parameters in Coding (vocoders)

DT impulse generator G(z) glottal filter white noise generator H(z) vocal tract filter R(z) lip radiation filter s(n) speech signal voiced unvoiced Θ0 gain Θ0 gain Pitch period, P DT impulse generator white noise generator all-pole filter s(n) speech signal voiced unvoiced Θ0 gain Pitch period, P V UV V UV

slide-11
SLIDE 11

11

Linear Prediction (Introduction):

 The object of linear prediction is to

estimate the output sequence from a linear combination of input samples, past output samples or both :

The factors a(i) and b(j) are called predictor

coefficients.

 

 

   

p i q j

i n y i a j n x j b n y

1

) ( ) ( ) ( ) ( ) ( ˆ

slide-12
SLIDE 12

12

Linear Prediction (Introduction):

 Many systems of interest to us are describable

by a linear, constant-coefficient difference equation :

 If Y(z)/X(z)=H(z), where H(z) is a ratio of

polynomials N(z)/D(z), then

 Thus the predictor coefficients give us immediate access to

the poles and zeros of H(z).

 

 

  

q j p i

j n x j b i n y i a ) ( ) ( ) ( ) (

 

   

 

p i i q j j

z i a z D z j b z N ) ( ) ( and ) ( ) (

slide-13
SLIDE 13

13

Linear Prediction (Types of System Model):

 There are two important variants :

All-pole model (in statistics, autoregressive

(AR) model ) :

 The numerator N(z) is a constant.

All-zero model (in statistics, moving-average

(MA) model ) :

 The denominator D(z) is equal to unity.

The mixed pole-zero model is called the

autoregressive moving-average (ARMA) model.

slide-14
SLIDE 14

14

Linear Prediction (Derivation of LP equations):

 Given a zero-mean signal y(n), in the AR

model :

 The error is :  To derive the predictor we use the orthogonality

principle, the principle states that the desired coefficients are those which make the error

  • rthogonal to the samples y(n-1), y(n-2),…, y(n-p).

  

p i

i n y i a n y

1

) ( ) ( ) ( ˆ

   

p i

i n y i a n y n y n e ) ( ) ( ) ( ˆ ) ( ) (

slide-15
SLIDE 15

15

Linear Prediction (Derivation of LP equations):

Thus we require that

 Or,  Interchanging the operation of averaging and

summing, and representing < > by summing over n, we have

 The required predictors are found by solving these

equations.

p ..., 2, 1, j for ) ( ) (     n e j n y

) ( ) ( ) (    

 p i

i n y i a j n y p 1,..., j , ) ( ) ( ) (    

 

 n p i

j n y i n y i a

slide-16
SLIDE 16

16

Linear Prediction (Derivation of LP equations):

 The orthogonality principle also states that resulting

minimum error is given by

 Or,

 We can minimize the error over all time :

   where

E r i a

p i i 

0

) (

) ( ) ( ) (

2

n e n y n e E   E n y i n y i a

n p i

 

 

) ( ) ( ) (

  

 

n i

i n y n y r ) ( ) (

, ...,p , j r i a

j i p i

2 1 , ) (  

 

slide-17
SLIDE 17

17

Linear Prediction (Applications):

 Autocorrelation matching :

We have a signal y(n) with known

autocorrelation . We model this with the AR system shown below : ) (n ryy

 

  

p i i iz

a z A z H

1

1 ) ( ) (  

) (n e σ 1-A(z) ) (n y

slide-18
SLIDE 18

18

Linear Prediction (Order of Linear Prediction):

 The choice of predictor order depends on

the analysis bandwidth. The rule of thumb is :

For a normal vocal tract, there is an average

  • f about one formant per kilo Hertz of BW.

One formant requires two complex conjugate

poles.

Hence for every formant we require two

predictor coefficients, or two coefficients per kilo Hertz of bandwidth.

c BW p   1000 2

slide-19
SLIDE 19

19

Linear Prediction (AR Modeling of Speech Signal):

 True Model:

DT Impulse generator G(z) Glottal Filter

Uncorrelated

Noise generator H(z) Vocal tract Filter R(z) LP Filter Voiced Unvoiced Pitch Gain Gain V U

U(n) Voiced Volume velocity

s(n) Speech Signal

slide-20
SLIDE 20

20

Linear Prediction (AR Modeling of Speech Signal):

 Using LP analysis :

DT Impulse generator White Noise generator All-Pole Filter (AR) Voiced Unvoiced Pitch

Gain estimate

V U

H(z)

s(n) Speech Signal

slide-21
SLIDE 21

Introduction to Vocoders

 Beside the estimation of the vocal tract parameters, a

vocoder needs excitation estimation.

 In early vocoders, this has been achieved by the

estimation of V/UV, pitch, and gain.

 More modern vocoders involve more sophisticated

estimation of the excitation, such as in CELP, where vector quantization is used.

vocoder analysis Channel (or storage) vocoder synthesizer ŝ(n) synthesized speech signal V/UV pitch filter parameters s(n)

  • riginal

speech signal

slide-22
SLIDE 22

Pitch Detection

 Since speech signal in voiced frames is

quasi-periodic (and not fully periodic), the pitch detection is not always easy.

 Especially in some phonemes that manifest

less periodic behavior, pitch detection is difficult.

 Some pitch detection methods:

AMDF (Average Magnitude Difference Function) Autocorrelation with center clipping