MELP Vocoder Page 0 of 23 Outline Introduction MELP Vocoder - PowerPoint PPT Presentation

MELP Vocoder Page 0 of 23

Outline Introduction  MELP Vocoder Features  Algorithm Description  Parameters & Comparison  Page 1 of 23

Introduction Traditional pitched-excited LPC  vocoders use either a periodic train or white noise for synthesis filter  intelligible speech at very low bit rates But sometimes results in mechanical or  buzzy sound and are prone to tonal noise Page 2 of 23

Introduction These problems arise from:   Inability of a simple pulse train to reproduce all kind of voiced speech MELP vocoder uses a mixed-excitation  model and it represents a richer ensemble of speech characteristic  Produce more natural sounding speech Page 3 of 23

MELP vocoder Robust in background  Mixed noise environments excitation Aperiodic pulses Based on traditional LPC  model, also includes Pulse dispersion additional features Adaptive spectral enhancement Page 4 of 23

ردكو MELP هدننك دك LPC LPC MSVQ LSF LPC LSF LSF هحفص 5 زا 54

ردكو MELP هحفص 6 زا 54

ردكو MELP هيروف ليدبت ياه هنماد هبساحم • FFT هحفص 7 زا 54

ردكو MELP يدويرپ ريغ مچرپ نييعت و ييادص ياه تدش هبساحمك L=40,41 ,…, 160 هحفص 8 زا 54

ردكو MELP جوا طاقن يگدنكارپ نازيم P=12.64 P=6.77 1  79 2 [ ] e n   n 80 160  p 1  79 e [ n ]   n 80 160 P=1.16 P=1.1 هحفص 9 زا 54

ردكو MELP جوا طاقن يگدنكارپ نازيم 1  79 2 e [ n ]   n 80 160  p 1  79 e [ n ]   n 80 160 هحفص 10 زا 54

ردكو MELP تيب صاصتخا لودج LSF 25 25 8 - 8 8 VS1 7 7 4 - 1 - - 13 1 1 54 54 هحفص 11 زا 54

Mixed Excitation Mixed-excitation is implemented using a  multi-band mixing model This model can simulate frequency  dependent voicing strength Using a mixture of Aperiodic/periodic  and white noise as excitation Primary effect of this unit is to reduce  the buzz in broadband acoustic noise Page 12 of 23

Aperiodic pulses When input signal is voiced, MELP  vocoder can synthesize speech using either aperiodic or periodic pulses. Aperiodic pulses used during transition  regions between voiced and unvoiced segments of speech signal  Producing erratic glottal pulses without tonal noise Page 13 of 23

Pulse Dispersion Pulse dispersion is implemented using fixed pulse  dispersion filter based on a flattened triangle pulse The pulse dispersion filter improves the match of  bandpass filtered synthetic and natural speech waveforms in frequency bands which do not contain a formant resonance.  Spreading the excitation energy with a pitch period  Reduce harsh quality of the synthetic speech Page 14 of 23

Adaptive spectral enhancement filter Based on the poles of the vocal tract  filter Is used to enhance the formant structure  in the synthetic speech This filter improves the match between  synthetic and natural bandpass waveforms  more natural speech output Page 15 of 23

MELP Algorithm Description (Encoder) filter out any low frequency noise .1 This filtered speech is again filtered in .2 order to perform the initial pitch search for the pitch estimation The next step is to perform the .3 Bandpass voicing analysis - In this step we decide to use periodic/Aperiodic train or white noise model Page 16 of 23

MELP Algorithm Description (Encoder) cont’d In this stage A voice degree parameter is estimated in each  band, based on the normalized correlation function of the speech signal and the smoothed rectified signal in the non-DC band Let s k ( n ) denote the speech signal in band k , u k ( n ) denote the  DC-removed smoothed rectified signal of s k ( n ). The correlation function:  N 1   x ( n ) x ( n p )   n 0 R ( p )   x 1 1 N N    2 2 1 / 2 [ x ( n ) x ( n p ) ]   n 0 n 0 P – the pitch of current frame N – the frame length k – the voicing strength for band (defined as max( R sk ( P ) ,R uk ( P ))) Page 17 of 23

MELP Algorithm Description (Encoder ) cont’d The jittery state is determined by the peakiness of  the fullwave rectified LP residue e ( n ):  N 1 1  2 1 / 2 [ e ( n ) ] N   n 0 Peakiness  N 1 1  e ( n ) N  n 0  If peakiness is greater than some threshold, the speech frame is then flagged as jittered (Aperiodic flag will be set) Page 18 of 23

MELP Algorithm Description (Encoder) cont’d Applying a LPC analysis 4. Calculating final pitch estimate 5. Calculating Gain estimate 6. quantize the LPC coefficients, pitch, gain and 7. bandpass voicing Fourier magnitudes are determined and .8 quantized  The information in these coefficients improves the accuracy of the speech production model at the perceptually-important lower frequencies Page 19 of 23

MELP Encoder Bandpass Gain Pitch Input Voicing Pre filter Calculator Search Decision signal Quantize LPC Final Pitch Gain, pitch, LSF Analysis And voicing Voicing, quantization Filter Decision jitter Fourier Apply Transmitted Magnitude Forward Bitstream calculation Error Correction Page 20 of 23

MELP Algorithm (Decoder) Decoding the pitch .1 Applying gain attenuation .2 Interpolating linearly all of the synthesis .3 parameters pitch-synchronously Generating mixed-excitation .4 Page 21 of 23

MELP Algorithm (Decoder) cont’d Applying an adaptive spectral .5 enhancement filter LPC synthesis and applying gain factor .6 Dispersion filtering .7 Page 22 of 23

MELP Decoder Received Adaptive Noise Bitstream Decode Noise + Spectral Shaping parameters Generator Enhancement Filter Pulse Pulse Pulse Position Shaping Generator Jitter Filter Synthesized LPC Pulse Synthesis Dispersion gain Speech Filter Filter Page 23 of 23

Parameter Quantization Parameters Voiced Unvoiced LSF parameters 25 25 Fourier magnitudes 8 - Gain (2 per frames) 8 8 Pitch. overall voicing 7 7 Bandpass voicing 4 - Aperiodic flag 1 - Error protection - 13 Sync bit 1 1 Total bits / 22.5 ms 54 54 frame Page 24 of 23

Bit transmission order Page 25 of 23

Comparison of the 2400 BPS MELP with other Standard Coders Diagnostic Acceptability  Measure Two Conditions  Quiet  Office  Continuously Variable Slope Delta Modulation  (CVSD) 16,000 bps ○ Code Excited Linear Prediction (CELP)  ○ 4800 bps FS1016 ○ Mixed Excitation Linear Prediction (MELP)  2400 bps ○ FIPS Publication 137 ○ Linear Predictive Coding (LPC)  2400 bps ○ Page 26 of 23

Comparison of the 2400 BPS MELP with other Standard Coders (cont’d) Mean Opinion Score in Six  Conditions Quiet Anechoic Sound Chamber  Dynamic Microphone  Quiet - H250 Anechoic Sound Chamber  H250 Microphone  1% Random Bit Errors Anechoic Sound Chamber  Dynamic Microphone  0.5% Random Block Errors Anechoic Sound Chamber  Dynamic Microphone  50% Errors within a 35ms block  Office Modern Office Environment  Dynamic Microphone  Mobile Command Environment  Field Shelter  EV M87 Microphone Page 27 of 23

Comparison of the 2400 BPS MELP with other Standard Coders (cont’d) Complexity with  three Measurements RAM   ROM MIPS  Page 28 of 23

Voice samples LPC 10 Page 29 of 23

Voice samples Original Sound MELP 1800 MELP 2000 MELP 2200 Page 30 of 30

MELP Vocoder Page 0 of 23 Outline Introduction MELP Vocoder - PowerPoint PPT Presentation

MELP Vocoder Page 0 of 23 Outline Introduction MELP Vocoder Features Algorithm Description Parameters & Comparison Page 1 of 23 Introduction Traditional pitched-excited LPC vocoders use either a periodic train or

MELP Vocoder Outline 1 Introduction MELP Vocoder Features Algorithm Description

Vocoders 1 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass

MBE Vocoder Page 0 of 34 Outline Introduction to vocoders MBE vocoder MBE Parameters

Analog FM Modulator MIC Digital Voice Vocoder Modulator MIC D-STAR GMSK AMBE by Digital

Lecture 6: Music Mark Hasegawa-Johnson ECE 401: Signal and Image Analysis, Fall 2020 Review

Supporting Non-Native English Speakers at the University of Minnesota: A Survey of Faculty &

Chris Smith Master Equipment Leasing Director of Policy, Program for Transit Capital

Digital Voice VHF, UHF, and HF Analog Voice - AM/SSB Analog Voice - FM Digital Voice GMSK UHF

Tacotron: End-to-End TTS Tacotron [Wang 2017]: Neural Vocoder Convert spectrogram to

A template-based approach for speech synthesis intonation generation using LSTMs Srikanth Ronanki

GCT535- Sound Technology for Multimedia Time-Stretching and Pitch-Shifting Graduate School of

System Modeling Introduction Rugby Meta-Model Finite State Machines Petri Nets Untimed Model

AADLv2: an Architecture Description Language for the Analysis and Generation of Embedded Systems

EECS E6870 Speech Recognition Michael Picheny, Stanley F. Chen, Bhuvana Ramabhadran IBM T.J.

Find the inverse Z-transform of 2 z 2 + 2 z G ( z ) = z 2 + 2 z 3 G ( z ) 2 z + 2 = z ( z +

From Laminar Flow to Wave Turbulence in Holographic Superfluid University of Chinese Academy of

Lecture 5.4: Periodic forcing terms Matthew Macauley Department of Mathematical Sciences Clemson

GRAND Simulations of Ultra high Energy Cosmic Ray showers Motivations: Estimate the

Simplicity Study for a Self-Structuring Antenna in an Automobile Environment B.T. Perry* and E.J.

Losses in PS Booster Magdalena Kowalska with the suport of Elena Benedetto, Christian Carli, Joao

UV Lasers System for Calibration in LAr TPCs Yifan Chen University of Bern Workshop on

Field Measurement of PRISM-FFAG Magnet Y. Arimoto 13th, Apr. 2007@FFAG 2007 CNRS Contents

ECG782: Multidimensional Digital Signal Processing http://www.ee.unlv.edu/~b1morris/ecg782/ 2

Physics Studies for High Intensity Proton Beams at the Fermilab Booster J. Eldred , for Fermilab

MELP Vocoder Page 0 of 23 Outline Introduction MELP Vocoder - PowerPoint PPT Presentation

MELP Vocoder Page 0 of 23 Outline Introduction MELP Vocoder Features Algorithm Description Parameters & Comparison Page 1 of 23 Introduction Traditional pitched-excited LPC vocoders use either a periodic train or

MELP Vocoder Outline 1 Introduction MELP Vocoder Features Algorithm Description

Vocoders 1 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass

MBE Vocoder Page 0 of 34 Outline Introduction to vocoders MBE vocoder MBE Parameters

Analog FM Modulator MIC Digital Voice Vocoder Modulator MIC D-STAR GMSK AMBE by Digital

Lecture 6: Music Mark Hasegawa-Johnson ECE 401: Signal and Image Analysis, Fall 2020 Review

Supporting Non-Native English Speakers at the University of Minnesota: A Survey of Faculty &amp;

Chris Smith Master Equipment Leasing Director of Policy, Program for Transit Capital

Digital Voice VHF, UHF, and HF Analog Voice - AM/SSB Analog Voice - FM Digital Voice GMSK UHF

Tacotron: End-to-End TTS Tacotron [Wang 2017]: Neural Vocoder Convert spectrogram to

A template-based approach for speech synthesis intonation generation using LSTMs Srikanth Ronanki

GCT535- Sound Technology for Multimedia Time-Stretching and Pitch-Shifting Graduate School of

System Modeling Introduction Rugby Meta-Model Finite State Machines Petri Nets Untimed Model

AADLv2: an Architecture Description Language for the Analysis and Generation of Embedded Systems

EECS E6870 Speech Recognition Michael Picheny, Stanley F. Chen, Bhuvana Ramabhadran IBM T.J.

Find the inverse Z-transform of 2 z 2 + 2 z G ( z ) = z 2 + 2 z 3 G ( z ) 2 z + 2 = z ( z +

From Laminar Flow to Wave Turbulence in Holographic Superfluid University of Chinese Academy of

Lecture 5.4: Periodic forcing terms Matthew Macauley Department of Mathematical Sciences Clemson

GRAND Simulations of Ultra high Energy Cosmic Ray showers Motivations: Estimate the

Simplicity Study for a Self-Structuring Antenna in an Automobile Environment B.T. Perry* and E.J.

Losses in PS Booster Magdalena Kowalska with the suport of Elena Benedetto, Christian Carli, Joao

UV Lasers System for Calibration in LAr TPCs Yifan Chen University of Bern Workshop on

Field Measurement of PRISM-FFAG Magnet Y. Arimoto 13th, Apr. 2007@FFAG 2007 CNRS Contents

ECG782: Multidimensional Digital Signal Processing http://www.ee.unlv.edu/~b1morris/ecg782/ 2

Physics Studies for High Intensity Proton Beams at the Fermilab Booster J. Eldred , for Fermilab

Supporting Non-Native English Speakers at the University of Minnesota: A Survey of Faculty &