 
              DESI GN OF A CELP CODER AND A STUDY DESI GN OF A CELP CODER AND A STUDY OF I TS PERFORMANCE USI NG VARI OUS OF I TS PERFORMANCE USI NG VARI OUS QUANTI ZATI ON METHODS QUANTI ZATI ON METHODS EECS 651: PROJECT PRESENTATI ON UNI VERSI TY OF MI CHI GAN, ANN ARBOR APRI L 18, 2005 By By Awais M. Kamboh Awais M. Kamboh Krispian C. Lawrence C. Lawrence Krispian Aditya M. Thomas M. Thomas Aditya Philip I. Tsai Philip I. Tsai
PROJECT GOALS PROJECT GOALS � To design and implement a CELP To design and implement a CELP � coder in matlab matlab coder in � To use different quantization methods To use different quantization methods � to quantize the LP parameters of the to quantize the LP parameters of the coder coder � To evaluate the performance of the To evaluate the performance of the � coder in terms of MSE and ‘ ‘perceptual perceptual coder in terms of MSE and MSE’ ’ using the various methods of using the various methods of MSE quantization quantization
Presentation Outline Presentation Outline � Introduction to Speech coding Introduction to Speech coding � � CELP CELP � � CELP coder CELP coder � � Quantization Methods Quantization Methods � � Results and Comparisons Results and Comparisons � � Conclusions and recommendations Conclusions and recommendations � � Q&A Q&A �
I ntroduction to Speech I ntroduction to Speech Coding Coding � Concerned with obtaining compact Concerned with obtaining compact � digital representation of voice signals digital representation of voice signals for more efficient transmission or for more efficient transmission or smaller storage size. smaller storage size. � Objective is to represent speech signal Objective is to represent speech signal � with minimum number of bits yet with minimum number of bits yet maintain the perceptual quality. maintain the perceptual quality.
Speech Production Speech Production Speech � Speech � – Air pushed from the lungs past Air pushed from the lungs past – the vocal cords and along the the vocal cords and along the vocal tract vocal tract – The basic vibrations The basic vibrations – – vocal vocal – cords cords – The sound is altered by the The sound is altered by the – disposition of the vocal tract disposition of the vocal tract ( tongue and mouth) ( tongue and mouth) Model the vocal tract as a filter Model the vocal tract as a filter � � – The shape changes relatively The shape changes relatively – slowly slowly The vibrations at the vocal cords � The vibrations at the vocal cords � – The excitation signal The excitation signal –
Speech sounds Speech sounds � Voiced sound Voiced sound � – The vocal cords vibrate open and close – The vocal cords vibrate open and close – Quasi – Quasi- -periodic pulses of air periodic pulses of air – The rate of the opening and closing – The rate of the opening and closing – – the pitch the pitch � Unvoiced sounds Unvoiced sounds � – Forcing air at high velocities through a constriction Forcing air at high velocities through a constriction – – Noise Noise- -like turbulence like turbulence – – Show little long Show little long- -term periodicity term periodicity – – Short Short- -term correlations still present term correlations still present – � Plosive sounds Plosive sounds � – A complete closure in the vocal tract A complete closure in the vocal tract – – Air pressure is built up and released suddenly Air pressure is built up and released suddenly –
Code- -Excited Linear Predictor (CELP) Excited Linear Predictor (CELP) Code � Variants of CELP (LD Variants of CELP (LD- -CELP, ACELP etc.) CELP, ACELP etc.) � � Main difference in generation of excitation Main difference in generation of excitation � signal, Filters and Bit rate. signal, Filters and Bit rate. � Performance Performance � – 4kbps or lower bit – 4kbps or lower bit- -rates give synthetic quality rates give synthetic quality speech / mechanical speech. speech / mechanical speech. – Most modern CELP variants produce relatively – Most modern CELP variants produce relatively higher bit- -rates and good quality speech. rates and good quality speech. higher bit – Performance cannot be judged by MSE alone. – Performance cannot be judged by MSE alone.
Linear Predictive Coding. Linear Predictive Coding. � Lungs generate an excitation signal which is Lungs generate an excitation signal which is � modeled as white noise. modeled as white noise. � Vocal cords either remain open or vibrate with Vocal cords either remain open or vibrate with � some frequency, called ‘ ‘Pitch Pitch’ ’. . some frequency, called � The resulting speech is either unvoiced or voiced The resulting speech is either unvoiced or voiced � respectively. respectively. � Vocal tract acts as an IIR filter. Vocal tract acts as an IIR filter. �
CELP Parameters (I n this I mplementation) CELP Parameters (I n this I mplementation) � Excitation Signal: Excitation Signal: A number of signals are stored in � A number of signals are stored in a codebook. We choose the signal that best suits a particular a codebook. We choose the signal that best suits a particular chunk of data (frame). chunk of data (frame). � LP Coefficients: LP Coefficients: The coefficients of vocal tract filter. � The coefficients of vocal tract filter. � Gain Gain : Represents the loudness/energy of speech. � : Represents the loudness/energy of speech. � Pitch Filter Coefficient Pitch Filter Coefficient : We determine pitch by � : We determine pitch by modeling it as a long delay correlation filter which produces modeling it as a long delay correlation filter which produces quasi- -periodic signals when excited. periodic signals when excited. quasi � Pitch: Pitch: Pitch of the sound. In the range 50Hz to 500Hz. In � Pitch of the sound. In the range 50Hz to 500Hz. In this case it is referred to as Pitch Delay measured in # of this case it is referred to as Pitch Delay measured in # of samples samples
Rate of CELP Rate of CELP Frame Size: 160 samples. (20 ms) Subframe Size: 40 samples (5 ms) LP coefficients are transmitted once per frame. All others are transmitted once per subframe. Code Book : 512 entries; 9 bits Gain: Generally between -2 to + 2: 8 bits Pitch: 50Hz to 500Hz = > 16 to 160 samples (at 8KHz Sampling): 8 bits Pitch filter Coeff: 0 to 1.4: 6 bits LP Coefficients: Different for different Rates.
CELP Encoder CELP Encoder LP Coefficients ‘a’ Speech LP Analyzer Gain Speech Code Book Reconstruction Perceptual Select Min Excitation X Pitch Filter - Filter Sequence Filter Energy E Gain Speech E 1 1 A ( z ) e k min . X - − bz − P A ( z ) A ( z / c ) 1
CELP Encoder (Contd.) CELP Encoder (Contd.) Gain ‘G’ Scalar Pitch Filter Coefficient ‘b’ Quantizer Pitch Delay ‘P’ Excitation Sequence ‘k’ SQ Linear Predictor Coefficients ‘a’ Binary VQ Encoded DPCM Data
CELP Decoder CELP Decoder Reconstruction Gain ‘G’ Pitch Filter Reconstructed Coefficient ‘b’ Gain Speech Pitch Delay ‘P’ 1 1 Excitation e k X bz − − Sequence ‘k’ P 1 A ( z ) Binary Decoding Reconstruction Linear Predictor Coefficients ‘a’
Perceptual Perceptual Filtering Filtering c = 0.8 A ( z ) A ( z ) = H ( z ) = A ( z / c ) red A ( z / c ) 1 = green A ( z / c ) 1 = A ( z ) red = blue A ( z / c ) A ( z ) Frequency (Hz)
A ( z ) Perceptual Filtering (Contd.) Perceptual Filtering (Contd.) A ( z / c ) Different values of ‘c’ in Perceptual filter.
Performance of CELP (Unquantized) mse = 0.0041 Unquantized Original
Performance of CELP (Quantized) mse = 0.0120 LP Coefficients: Unquantized Other Parameters: Quantized
Quantization Methods Used Quantization Methods Used � Scalar Quantization Scalar Quantization � � DPCM DPCM � � Vector Quantization Vector Quantization � � TSVQ TSVQ �
Scalar Quantization Scalar Quantization � Quantize one sample at a time Quantize one sample at a time � � The simplest quantization scheme The simplest quantization scheme � � Design Design quantizers quantizers with sizes M = 2, 4 , with sizes M = 2, 4 , � 8, 16, 32, 64, 128, 256 8, 16, 32, 64, 128, 256
Scalar Quantizer Quantizer Design Design Scalar � Lloyd algorithm Lloyd algorithm � � Initial guess: Initial guess: � a uniform codebook a uniform codebook
Scalar Quantizer Quantizer Design Design Scalar � Training data: Training data: � 15000 samples of LP coefficients 15000 samples of LP coefficients generated from different speech generated from different speech sources sources 15000/256 = 58 points/cell for M= 256 15000/256 = 58 points/cell for M= 256 15000/2 = 7500 points/cell for M= 2 15000/2 = 7500 points/cell for M= 2
Performance of the SQ Performance of the SQ
Recommend
More recommend