speech information processing
play

Speech Information Processing Akinori Ito Graduate School of - PowerPoint PPT Presentation

Sound Media Engineering part II Speech Information Processing Akinori Ito Graduate School of Engineering, Tohoku Univ. aito@fw.ipsj.or.jp 1 Overview of the lecture #1: Production and coding of speech (1) Speech production, feature of


  1. Sound Media Engineering part II Speech Information Processing Akinori Ito Graduate School of Engineering, Tohoku Univ. aito@fw.ipsj.or.jp 1

  2. Overview of the lecture ● #1: Production and coding of speech (1) – Speech production, feature of speech sound – Basic codecs: PCM,DPCM,ADPCM ● #2: Coding of speech (2) – Linear Prediction of speech: Linear Prediction Coefficients, PARCOR Coefficients and LSP – CELP coding – Audio coding ● #3: Speech enhancement – Spectral subtraction – Microphone array 2

  3. Production of speech ● Organs that produce speech – vocal cords – larynx – pharynx – tongue vocal – gums tract – teeth – lips – nasal cavity 3

  4. Acoustic tube model ● Human speech production is similar to wind instruments 鼻腔 喉頭 唇 声道 声帯 Linguistic content Pitch of voice Personality 4

  5. Linguistic and speaker feature 鼻腔 喉頭 唇 声道 声帯 A speaker can control shape of this part 5

  6. Linguistic and speaker feature 鼻腔 喉頭 唇 声道 声帯 A speaker cannot control shape of this part, total length of vocal tract 6

  7. Speech waveform ● Complex enough /a/ /i/ /u/ /e/ /o/ 7

  8. Speech waveform ● It is complex, but almost periodic Fundamental period Fundamental period T [s] Fundamental frequency F 0 [Hz] = 1/ T 8

  9. Various "a" ● Two /a/'s with different fundamental frequencies – Same phone = same vocal tract shape – Completely different waveforms – What is the same between these waveforms? 9

  10. Spectrum of speech ● Spectrum of two /a/'s – Spectral shapes are similar →Shape of vocal tract – "Jaggies" of speectrum differ→Fundamental Freq. 10

  11. Spectrum and formant frequencies ● F 0 : 基本周波数 ● F 1 ,F 2 ,..: ホルマント (formant) 周波数 Formant frequencies F 1 ホルマント周波数 F 0 F 2 F 3 F 4 11 Fundamental frequency 基本周波数

  12. Speech coding ● Sound (analog) → Convert to digital data – Handle with computer – Transmission over digital line ● How do we digitize sound? – Goals ● Good quality when converting back to analog sound ● Less bit-rate – Methodology ● Exploit various features of speech 12

  13. Basics of speech coding ● Sampling – Observe the temporally continuous signal at discrete time – Period of "discrete" observation: sampling frequency f s – The original signal can be restored from sampled data when the original signal only contains frequency component under f s /2 (Sampling Theorem) 13

  14. Basics of speech coding ● Quantization – Round off magnitude of signal into discrete level ● Magnitude of signal can be represented in integers – The discrete level : quantization step – Difference between the original signal and quantized signal : Quantization error 14

  15. Sampling and quantization: how are they determined? ● Sampling frequency is determined by the highest frequency in the sound – Telephone : 8kHz (up to 4kHz sound) – High-quality speech: 16kHz (up to 8kHz sound) – CD : 44.1kHz (up to 22.05kHz sound) ● Quantization is determined by the dynamic range of the sound – To code speech is to quantize speech 15

  16. PCM coding ● PCM(Pulse Code Modulation) – Represent the quantized values as binary numbers ● What to be determined in PCM – How many bits to be used for one sample – How to determine levels of quantization ● Equal steps: linear quantization ● Inequal steps: nonlinear quantization ● Examples of PCM coding – CD:16bit linear quantization – VoIP(G.711): 8bit nonlinear quantization 16

  17. PCM linear quantization ● There are nothing difficult 10 5 0 -5 -10 -7 -7 5 2 -6 -2 0 1 4 0 -2 11 11 0 -1 -2 0 3 2 0 1 33 CD: quantize in 16bit(-32768 ~ +32767) 17

  18. Nonlinear quantization ● Most samples are nearly zero →Total error can be reduced by finely quanti- zing values around zero 10 10 5 5 0 0 -5 -5 -10 -10 18

  19. Example of nonlinear quantization: G.711 ● Speech coding for 64kbit/s digital phone line – 8kHz sampling, 8bit nonlinear quantization – μ-Law (Japan, US) A-Law (Europe) – μ-Law: 14bit linear quant.→8bit nonlinear quant. 150 100 log  1  255 ∣ X ∣ 8192  50 8bit mu-Law 0 Y = 128 sign  X  log256 -50 -100 -150 19 -8000 -6000 -4000 -2000 0 2000 4000 6000 8000 14bit linear

  20. Differential PCM (DPCM) ● In ordinary speech signal, values of two contiguous samples do not differ very much →Reduce bit-rate by transmitting the differences of samples Q - z -1 20

  21. Differential PCM(DPCM) ● Original 15000 10000 5000 waveform 0 -5000 -10000 -15000 -20000 -25000 0 500 1000 1500 2000 2500 3000 3500 4000 4500 ● Differential 15000 waveform 10000 5000 0 -5000 -10000 -15000 -20000 -25000 0 500 1000 1500 2000 2500 3000 3500 4000 4500 21

  22. Adaptive Differential PCM(ADPCM) ● To enhance efficiency of DPCM – Use more sophisticated prediction rather than simple difference – Adaptively change quantization steps ● When difference between two samples is large, the difference to the next sample is likely to be large too ● When difference between two samples is small, the difference to the next sample is likely to be small too 22

  23. Block diagram of ADPCM differential signal d  k  I  k  + adaptive ADPCM PCM input + quantizer output x  k  - x e  k  + prediction signal adaptive signal + predictor de-quantizer + x r  k  d q  k  reconstructed quantized signal differential signal 23

  24. Calculation algorithm of ADPCM x e  k  1.Compute prediction signal 2.Compute difference d  k = x e  k − x  k  3.Quantize (ADPCM output) I  k = Q  d  k   4.De-quantize − 1  I  k   d q  k = Q 5.Reconstruct signal x r  k = x e  k  d q  k  x e  k  1 = pred  x r  k  ,d q  k  ,   6.Compute next prediction 24

  25. Prediction of speech signal ● ADPCM quantizes difference between the input signal and predicted signal ● How to predict signal – DPCM x e  k = x r  k − 1  – A little better way x e  k = 2 x r  k − 1 − x r  k − 2  – G.726 2 6 x e  k = ∑ a i x r  k − i  ∑ b i d q  k − i  i = 1 i = 1 25

  26. Determine quantization step adaptively (example) ● Observe difference between previous sample using the scale ● If the difference is 7 7 "blue", half the size 6 6 of the next scale 5 5 4 4 ● If the difference is 3 3 "red", double the 2 2 1 1 size of the next 0 0 - - 1 1 scale - - 2 2 - - 3 3 - - 4 4 - - 5 5 - - 6 6 - - 7 7 - - 8 8 26

  27. For high-efficiency speech coding ● PCM, DPCM, ADPCM encodes general sound signal – DPCM, ADPCM partly exploits property of input signal ● Human speech is a small part of sound signal →We can enhance efficiency of coding by considering property of human speech ● What is the property of human speech? 27

  28. High-level speech coding digital speech words/ phones semantics speech data feature sentences CELP coder under summarizing PCM coder (mobile phone) research telephone? (public phone) digital speech words/ phones semantics 音声 data feature sentences AD/DA vocoder speech Text-to- synthesis Speech 28

  29. Speech production model nasal 鼻腔 cavity radiation larynx 喉頭 lips 唇 X  vocal cords vocal tract 声道 声帯 T  R  S  X = S  T  R  29

  30. Speech production model S  30

  31. Speech production model T  R  S  31

  32. Modeling speech using parameters ● Modeling speech using linear prediction (LPC) – Spectral shape: parameters of linear prediction filter – Vocal cord vibration : residue Estimate p x  k =− ∑ coefficients to a i x  k − i  e  k  minimize residue i = 1 – In spectral domain E  X = = E  H  p 1  ∑ ni  a n e n = 1 S  T  R  32

  33. Analysis and transmission of speech by LPC ● Information to be transmitted – LP coefficients a i and residue e ( k ) ● How to transmit them? – Estimate a i for a fixed number of samples (a block) – Calculate e ( k ) using estimated a i – Transmit a i and e ( k ) as parameters of the block ● How to restore the signal? – Using LPC formula p x  k =− ∑ a i x  k − i  e  k  33 i = 1

  34. Estimation of LP coefficients ● How to estimate LPC from x (1)... x ( k ) – Solve a simultaneous equation (Yule-Walker equation ) → LPC are calculated as the least-error solution – Faster algorithm (Levinson-Durbin algorithm) ● LPC equation x  1    a p  −  =  x  p     e  p   a 1 x  k − 1  x  k − 2  ⋯ x  k − p  x  k  e  k  x  k − 2  x  k − 3  ⋯ x  k − p − 1  a 2 x  k − 1  e  k − 1  ⋮ ⋮ ⋱ ⋮ ⋮ ⋮ ⋮ x  p − 1  x  p − 2  ⋯ − FA = V  E 34

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend