bandwidth extension of narrowband speech for low bit rate
play

Bandwidth Extension of Narrowband Speech for Low Bit- Rate Wideband - PowerPoint PPT Presentation

IEEE Speech Coding Workshop Sept 1720, 2000 Lake Lawn Resort Delavan, WI Jean-Marc Valin, Roch Lefebvre University of Sherbrooke Bandwidth Extension of Narrowband Speech for Low Bit- Rate Wideband Coding Speech Coding Workshop 2000


  1. IEEE Speech Coding Workshop Sept 17–20, 2000 Lake Lawn Resort Delavan, WI Jean-Marc Valin, Roch Lefebvre University of Sherbrooke Bandwidth Extension of Narrowband Speech for Low Bit- Rate Wideband Coding Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 1

  2. Outline • Problem statement • Proposed solution • System performance • Discussion Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 2

  3. Problem Statement • Telephone Band: 300 - 3400 Hz • AM Band: 50 - 7000 Hz • How to make sound like with 500 bits/sec? (G.729) • We need to recover information from both low and high-frequency bands Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 3

  4. Proposed Solution • 1) Do our best to recover the wideband information from narrowband speech • 2) Use coding for the information that cannot be recovered – Recovered information : 1 1 0 • Low-frequency band 1 0 0 • High-frequency excitation 9 0 – Coded information : 8 0 ) B 7 0 d ( • High-frequency spectral e d u t 6 0 i l p envelope m A 5 0 4 0 3 0 2 0 1 0 0 1 0 0 0 2 0 0 0 3 0 0 0 4 0 0 0 5 0 0 0 6 0 0 0 7 0 0 0 8 0 0 0 F r e q u e n c y ( H z ) Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 4

  5. System Overview Low-frequency 50-300 Hz band regeneration 8 kHz 16 kHz Inverse narrowband wideband IRM 300-3400 Hz  2 band Filter High-frequency 3400-8000 Hz Side information regeneration band • Inverse IRM filter is optional – produces a flat response from 200-3500 Hz Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 5

  6. Low-Frequency Regeneration (1/2) • Assumptions : – Only pitch harmonics need to be recovered • In general, no more than two pitch harmonics below 200 Hz – Absolute phase is not perceptually relevant • Frequency of harmonics determined from pitch analysis • Amplitudes determined from feed-forward multi- layer perceptron (output in log domain) Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 6

  7. Low-Frequency Regeneration (2/2) Low frequencies 1 st harmonic LP Low-frequency Scale  2 2 nd harmonic filter harmonic synthesis and sum Narrowband speech Pitch delay (1) Pitch Pitch gain (1) Multi-layer analysis (16) Perceptron Scale factors MFCC Cepstral coefficients calculation Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 7

  8. High-Frequency Extension • Excitation-filter model (16 ms frames) • Problem is separated in two parts – Excitation extension (no side information) – Spectral envelope coding (side information) Narrowband High- High Excitation 1 A ( z ) speech frequency B ( z ) extension pass band Spectral envelope LPC Extension analysis Side information (High-frequency spectral envelope) Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 8

  9. Excitation Extension High Absolute Whitening Narrowband wideband  2 excitation value pass filter excitation 1 1 0 . 8 0 . 8 0 . 6 0 . 5 0 . 6 0 . 4 0 . 4 0 . 2 0 0 . 2 0 - 0 . 5 0 - 0 . 2 0 5 1 0 1 5 2 0 0 5 1 0 1 5 2 0 0 5 1 0 1 5 2 0 1 0 1 5 5 8 4 1 0 6 3 4 2 5 2 1 0 0 0 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 9

  10. Spectral Envelope Coding • Spectral envelope calculated from the wideband LPC coefficients • Quantization of the 3000-8000 Hz range (40 points) – Log domain – 8-bit Vector Quantization (500 bits/s side information, using 16 ms frames) • Concatenation with envelope obtained from LPC analysis on narrowband speech Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 10

  11. Objective results • Low-frequency band – 3 dB RMS error on harmonic amplitude • High-frequency band – 3.6 dB RMS error on envelope – No objective measure for excitation extension (perceptually very close to original) Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 11

  12. Subjective Results female male Original wideband Recovered from original IRM-filtered speech Recovered from G.729 coded speech Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 12

  13. Discussion • Highlights – Expand IRM-filtered telephone-band speech to AM band – Very low side information rate (500 bits/s) • Areas of improvement – Use high-band spectral estimation before coding – Use residual low-frequency information (below 300 Hz) – Noise robustness – Post-filtering Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 13

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend