MBE Vocoder Page 0 of 34 Outline Introduction to vocoders MBE - - PowerPoint PPT Presentation

mbe vocoder
SMART_READER_LITE
LIVE PREVIEW

MBE Vocoder Page 0 of 34 Outline Introduction to vocoders MBE - - PowerPoint PPT Presentation

MBE Vocoder Page 0 of 34 Outline Introduction to vocoders MBE vocoder MBE Parameters Parameter estimation Analysis and synthesis algorithm AMBE IMBE Page 1 of 34 Vocoders - analyzer Speech analyzed first by segmenting


slide-1
SLIDE 1

Page 0 of 34

MBE Vocoder

slide-2
SLIDE 2

Page 1 of 34

Outline

 Introduction to vocoders  MBE vocoder

– MBE Parameters – Parameter estimation – Analysis and synthesis algorithm

 AMBE  IMBE

slide-3
SLIDE 3

Page 2 of 34

Vocoders - analyzer

1.

Speech analyzed first by segmenting speech using a window (e.g. Hamming window)

2.

Excitation and system parameters are calculated for each segment

1. Excitation parameters : voiced/unvoiced, pitch period 2. System parameters: spectral envelope / system impulse response

3.

Sending this parameters

slide-4
SLIDE 4

Page 3 of 34

Vocoders - Synthesizer

System parameters Excitation Signal White noise/ unvoiced Pulse train/voiced Synthesized voice

slide-5
SLIDE 5

Page 4 of 34

Vocoders

 But usually vocoders have poor quality

– Fundamental limitation in speech models – Inaccurate parameter estimation – Incapability of pulse train/ white noise to produce all voice

  • speech synthesized entirely with a periodic source exhibits a

“buzzy” quality, and speech synthesized entirely with a noise source exhibits a “hoarse” quality

 Potential solution to buzziness of vocoders is to use of mixed

excitation models

 In these vocoders periodic and noise like excitations are mixed

with a calculated ratio and this ration will be sent along the parameters

slide-6
SLIDE 6

Page 5 of 34

Multi Band Excitation Speech Model

Due to stationary nature of a speech signal, a window w(n) is usually applied to signal

The Fourier transform of a windowed segment can be modeled as the product

  • f a spectral envelope

and an excitation spectrum

In most models is a smoothed version of the original speech spectrum

) (

w

s

) (

w

H

) ( ) ( ) ( n s n w n sw 

| ) ( | 

w

E

| ) ( | ) ( ) ( ˆ   

w w w

E H s 

) (

w

H

) (

w

s

slide-7
SLIDE 7

Page 6 of 34

MBE model (Cont’d)

 the spectral envelope must be represented accurately enough

to prevent degradations in the spectral envelope from dominating.

– quality improvements achieved by the addition of a frequency dependent voiced/unvoiced mixture function.

 In previous simple models, the excitation spectrum is totally

specified by the fundamental frequency w0 and a voiced/unvoiced decision for the entire spectrum.

 In MBE model, the excitation spectrum is specified by the

fundamental frequency w0 and a frequency dependent voiced/unvoiced mixture function.

slide-8
SLIDE 8

Page 7 of 34

Multi Banding

 In general, a continuously varying frequency dependent

voiced/unvoiced mixture function would require a large number of parameters to represent it accurately. The addition of a large number of parameters would severely decrease the utility of this model in such applications as bit-rate reduction.

 To further reduce the number of these binary parameters, the spectrum

is divided into multiple frequency bands and a binary voiced/unvoiced parameter is allocated to each band.

 MBE model differs from previous models in that the spectrum is

divided into a large number of frequency bands (typically 20 or more), whereas previous models used three frequency bands at most .

slide-9
SLIDE 9

Page 8 of 34

Multi Banding

Original spectrum Spectral envelope Periodic spectrum V/UV information Noise spectrum Excitation spectrum Synthetic spectrum

slide-10
SLIDE 10

Page 9 of 34

MBE Parameters

The parameters used in MBE model are:

1. spectral envelope

  • 2. the fundamental frequency
  • 3. the V/UV information for each harmonic
  • 4. and the phase of each harmonic declared
  • voiced. The phases of harmonics in frequency

bands declared unvoiced are not included since they are not required by the synthesis algorithm

slide-11
SLIDE 11

Page 10 of 34

Parameter Estimation

 In many approaches (LPC based algorithms) the algorithms for

estimation of excitation parameters and estimation of spectral envelope parameters operate independently.

 These parameters are usually estimated based on heuristic criterion

without explicit consideration of how close the synthesized speech will be to the original speech. – This can result in a synthetic spectrum quite different from the

  • riginal spectrum.

 In MBE the excitation and spectral envelope parameters are estimated

simultaneously so that the synthesized spectrum is closest in the least squares sense to the spectrum of the original speech “analysis by synthesis”

slide-12
SLIDE 12

Page 11 of 34

Parameter Estimation (Cont’d)

the estimation process has been divided into two major steps.

1. In the first step, the pitch period and spectral envelope parameters are estimated to minimize the error between the original spectrum and the synthetic spectrum. 2. Then, the V/UV decisions are made based on the closeness of fit between the original and the synthetic spectrum at each harmonic of the estimated fundamental.

slide-13
SLIDE 13

Page 12 of 34

Parameter Estimation (cont’d)

The parameters estimated by minimizing the following error criterion:

– Where

The error in an interval is minimized at:

 

    

 

d s s

w w 2

) ( ˆ ) ( 2 1 

 

| ) ( | ) ( ) ( ˆ   

w w w

E H s 

  

    d E A s

m m

b a W m w m 2

) ( ) ( 2 1   

     d Ew d Ew Sw A

m m m m

b a b a m 2

) ( ) ( ) (

 

slide-14
SLIDE 14

Page 13 of 34

Pitch Estimation and Spectral Envelope

 An efficient method for obtaining a good approximation

for the periodic transform P ( w ) in this interval is to precompute samples of the Fourier transform of the window w (n) and center it around the harmonic frequency associated with this interval.

 For unvoiced frequency intervals, the envelope parameters

are estimated by substituting idealized white noise (unity across the band) for |E (a)| in previous formulas which reduces to averaging the original spectrum in each frequency interval.

 For unvoiced regions, only the magnitude of A, is

estimated since the phase of A, is not required for speech synthesis.

slide-15
SLIDE 15

Page 14 of 34

More about pitch estimation

 Experimentally, the error E tends to vary slowly

with the pitch period P

 the initial estimate is obtained by evaluating the

error for integer pitch periods

 Since integer multiples of the correct pitch period

have spectra with harmonics at the correct frequencies, the error E will be comparable for the correct pitch period and its integer multiples.

slide-16
SLIDE 16

Page 15 of 34

More about pitch estimation (Cont’d)

Speech segment

Original spectrum Error/Pitch Original and Synthetic P=42.48 Original and Synthetic P=42

slide-17
SLIDE 17

Page 16 of 34

V/UV Decision

The voiced/unvoiced decision for each harmonic is made by comparing the normalized error over each harmonic

  • f the estimated fundamental to a

threshold

When the normalized error over mth harmonic is below the threshold, this frame will be marked as voiced else unvoiced

     d Sw

m m

b a m m 2

) ( 2 1  

slide-18
SLIDE 18

Page 17 of 34

Analysis Algorithm Flowchart

Window Speech segment

start

Compute error vs. pitch period Autocorrelation approach Select initial pitch period (Dynamic programming Pitch tracker) Refine initial pitch period (frequency domain approach) Make V/UV decision for each Frequency band Select V/UV spectral Envelope parameters For each freq. band

Stop

slide-19
SLIDE 19

Page 18 of 34

Speech Synthesis

 The voiced signal can be synthesized as the sum of

sinusoidal oscillators with frequencies at the harmonics of the fundamental and amplitudes set by the spectral envelope parameters (The time domain method).

 The unvoiced signal can be synthesized as the sum of

bandpass filtered white noise

 The frequency domain method was selected for

synthesizing the unvoiced portion of the synthetic speech.

slide-20
SLIDE 20

Page 19 of 34

Synthesis algorithm block diagram

Separate Voiced/Unvoiced Envelope samples Bank of Harmonic

  • scillators

STFT Replace envelope Weighted Overlap-add Linear interpolation V/UV Decision Envelope samples Voiced envelope samples Unvoiced envelope samples Voiced envelope samples Unvoiced envelope samples Voiced speech Unvoiced envelope samples White noise sequence Unvoiced speech

slide-21
SLIDE 21

Page 20 of 34

MBE Synthesis algorithm

 First, the spectral envelope samples are separated into voiced or

unvoiced spectral envelope samples depending on whether they are in frequency bands declared voiced or unvoiced

 Voiced envelope samples include both magnitude and phase, whereas

unvoiced envelope samples include only the magnitude.

 Voiced speech is synthesized from the voiced envelope samples by

summing the outputs of a band of sinusoidal oscillators running at the harmonics of the fundamental frequency

m m m v

t t A t s )) ( cos( ) ( ) ( ˆ 

slide-22
SLIDE 22

Page 21 of 34

MBE Synthesis algorithm (Voiced)

 The phase function is determined by an initial

phase and a frequency track as follows:

 The frequency track is linearly interpolated

between the mth harmonic of the current frame and that of the next frame by:

m

) (t

m

) ( ) (        

t m m

d t

) (t

m

m m

S t S m S t S m t          ) ( ) ( ) ( ) (

slide-23
SLIDE 23

Page 22 of 34

MBE Synthesis algorithm (Unvoiced)

 Unvoiced speech is synthesized from the unvoiced envelope samples

by first synthesizing a white noise sequence.

 For each frame, the white noise sequence is windowed and an FFT is

applied to produce samples of the Fourier transform

In each unvoiced frequency band, the noise transform samples are normalized to have unity magnitude. The unvoiced spectral envelope is constructed by linearly interpolating between the envelope samples |Am(t)|.

 The normalized noise transform is multiplied by the spectral envelope

to produce the synthetic transform. The synthetic transforms are then used to synthesize unvoiced speech using the weighted overlap-add method.

slide-24
SLIDE 24

Page 23 of 34

MBE Synthesis (Cont’d)

 The final synthesized speech is generated by summing the

voiced and unvoiced synthesized speech signals

+

Synthesized speech Voiced speech Unvoiced speech

slide-25
SLIDE 25

Page 24 of 34

Bit Allocation

Parameter Bits Fundamental Frequency 9 Harmonic Magnitude 139-94 Harmonic Phase 0-45 Voiced/Unvoiced Bits 12 Total 160

slide-26
SLIDE 26

Page 25 of 34

Advanced MBE (AMBE)

MBE coding rate at 2400 bps

AMBE coding rate at 1200/2400 bps

Four new features

1. Enhanced V/UV decision 2. Initial pitch detection 3. Refined pitch determination 4. Dual rate coding

slide-27
SLIDE 27

Page 26 of 34

Enhanced V/UV decision

 divide the whole speech frequency band

into 4 subbands and 2 subbands for 2.4 kbps and 1.2 kbps respectively.

 That is to say only 4 bits and 2 bits are used

to encode U/V decisions for 2.4 kb/s and 1.2 kb/s vocoder respectively.

slide-28
SLIDE 28

Page 27 of 34

Initial pitch detection

 MBE takes 2 steps to detect the refined initial pitch period

– Spectrum matching technique to find the initial pitch period – Using DTW-based (Discrete Time Wrapping) technique to smooth the estimation

 Computational complexity is very high  In MBE, a modified three-level center clipped auto-

correlation method is used to detect the initial pitch period, and also use a simple smoothing method to correct the pitch errors.

slide-29
SLIDE 29

Page 28 of 34

Redefined pitch determination

 To find the best pitch the basic method is to compute the error between

the original speech spectrum and the shaped voiced speech spectrum by first supposing a pitch period

 The supposed pitch of which the spectrum error is minimum is chosen

as the last pitch

 To reduce the computational complexity, AMBE uses a 256- point

FFT to get the speech spectrum, and 5-point window spectrum is used to form the voiced harmonic spectrum.

 To get the refined pitch, AMBE perform seven times of spectrum

matching process. In every time. AMBE first set a supposed pitch, then shape a harmonic spectrum over the overall frequency band according to the supposed pitch and window spectrum, and an error can be calculated by subtracting the shaped spectrum from speech spectrum. After the seven times of matching process, the refined pitch can easily be determined

slide-30
SLIDE 30

Page 29 of 34

Dual rate coding

Parameter 2400 bps 1200 bps Pitch quantization 8 6 V/UV decision 4 2 Amplitude quantization 41 19 total 53 27

slide-31
SLIDE 31

Page 30 of 34

Improved MBE (IMBE)

 A 2400 bps coder based on MBE  Substantially better than U.S government

standard LPC-10e

 The parameters of the MBE speech model :

– the fundamental frequency – voiced/unvoiced information – the spectral envelope.

slide-32
SLIDE 32

Page 31 of 34

IMBE algorithm

 estimate the excitation and system parameters

which minimize the distance between the original and synthetic speech spectra (analysis by synthesis)

 Once these parameters are estimated,

voiced/unvoiced decisions are made by comparing the spectral error over a series of harmonics to a prescribed threshold

slide-33
SLIDE 33

Page 32 of 34

IMBE block diagram

IMBE algorithm block diagram

slide-34
SLIDE 34

Page 33 of 33

IMBE Coding

 IMBE offered in 2.4, 4.8 and 8.0 kbps  Analysis and synthesis routines are the same

except the bit allocation

 The fundamental frequency needs accuracy of

about l Hz. and requires about 9 bits per frame.

 The V/UV decisions are encoded with one bit

per decision.

 The remaining bits are allocated to error

control and the spectral envelope information.