Speech & Audio Coding TSBK01 Image Coding and Data Compression - - PowerPoint PPT Presentation

speech audio coding
SMART_READER_LITE
LIVE PREVIEW

Speech & Audio Coding TSBK01 Image Coding and Data Compression - - PowerPoint PPT Presentation

Speech & Audio Coding TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jrgen Ahlberg Outline Part I - Speech Speech History of speech synthesis & coding Speech coding methods Part II Audio


slide-1
SLIDE 1

Speech & Audio Coding

TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg

slide-2
SLIDE 2

Outline

  • Part I - Speech

– Speech – History of speech synthesis & coding – Speech coding methods

  • Part II – Audio

– Psychoacoustic models – MPEG-4 Audio

slide-3
SLIDE 3

Speech Production

  • The human’s vocal

apparatus consists of:

– lungs – trachea (wind pipe) – larynx

  • contains 2 folds of skin

called vocal cords which blow apart and flap together as air is forced through

  • ral tract

– nasal tract

slide-4
SLIDE 4
  • The Speech Signal
slide-5
SLIDE 5

The Speech Signal

slide-6
SLIDE 6
  • The Speech Signal
slide-7
SLIDE 7
  • The Speech Signal
slide-8
SLIDE 8
  • History of Speech Coding
slide-9
SLIDE 9
  • History of Speech Coding
slide-10
SLIDE 10
slide-11
SLIDE 11
  • µ
slide-12
SLIDE 12
  • Y 1
  • Source-filter Model of Speech Production
slide-13
SLIDE 13

Speech Coding Strategies

  • 1. PCM
  • Invented 1926, deployed 1962.
  • The speech signal is sampled at 8 kHz.
  • Uniform quantization requires >10 bits/sample.
  • Non-uniform quantization (G.711, 1972)
  • Quantizing y to 8 bits -> 64 kbit/s.
slide-14
SLIDE 14

Speech Coding Strategies

  • 2. Adaptive DPCM
  • Example: G.726 (1974)
  • Adaptive predictor based on six previous

differences.

  • Gain-adaptive quantizer with 15 levels 32

kbit/s.

slide-15
SLIDE 15

Speech Coding Strategies

  • 3. Model-based Speech Coding
  • Advanced speech coders are based on

models of how speech is produced:

Excitation source Vocal tract

slide-16
SLIDE 16

An Excitation Source Noise generator Pulse generator Pitch

slide-17
SLIDE 17

Vocal Tract Filter 1: A Fixed Filter Bank BP

g1

BP

g2

BP

gn

slide-18
SLIDE 18

Vocal Tract Filter 2: A Controllable Filter

slide-19
SLIDE 19

Linear Predictive Coding (LPC)

  • The controllable filter is modelled as

yn = ∑ ai yn-i + Gεn where εn is the input signal and yn is the output.

  • We need to estimate the vocal tract parameters (ai

and G) and the exciatation parameters (pitch, v/uv).

  • Typically the source signal is divided in short

segments and the parameters are estimated for each segment.

  • Example: The speech signal is sampled at 8 kHz and

divided in segments of 180 samples (22.5 ms/segment).

slide-20
SLIDE 20

Typical Scheme of an LPC Coder

Noise generator Pulse generator Pitch Vocal tract filter v/uv Gain Filter coeffs

slide-21
SLIDE 21

Estimating the Parameters

  • v/uv estimation

– Based on energy and frequency spectrum.

  • Pitch-period estimation

– Look for periodicity, either via the a.c.f our some

  • ther measure, for example

that gives you a minimum value when p equals the pitch period. – Typical pitch-periods: 20 - 160 samples.

slide-22
SLIDE 22

Estimating the Parameters

  • Vocal tract filter estimation

– Find the filter coefficients that minimize the error ε2 = ( yn - ∑ ai yn-i + Gεn )2 – Compare to the computation of optimal predictors (Lecture 7).

slide-23
SLIDE 23

Estimating the Parameters

  • Assuming a stationary signal:

where R and p contain acf values.

  • This is called the autocorrelation method.
slide-24
SLIDE 24

Estimating the Parameters

  • Alternatively, in case of a non-stationary

signal: where

  • This is called the autocovariance method.
slide-25
SLIDE 25

Example

  • Coding of parameters using LPC10 (1984):

54 bits 2.4 kbit/s Sum: 1 bit Synchronization 46 bits Unvoiced filter 46 bits Voiced filter 6 bits Pitch 1 bit v/uv

slide-26
SLIDE 26

The Vocal Tract Filter

  • Different representations:

– LPC parameters – PARCOR (Partial Correlation Coefficients) – LSF (Line Spectrum Frequencies)

slide-27
SLIDE 27
  • Code Excited Linear Prediction Coding (CELP)
slide-28
SLIDE 28

Examples

  • G.728

– V(z) is chosen as a large FIR-filter (M 50). – The gain and FIR-parametrers are estimated recursively from previously received samples. – The code book contains 127 sequences.

  • GSM

– The code book contains regular pulse trains with variabel frequency and amplitudes.

  • MELP

– Mixed excitation linear prediction – The code book is combined with a noise generator.

slide-29
SLIDE 29

Other Variations

  • SELP – Self Excited Linear Prediction
  • MPLP – Multi-Pulse Excited Linear Prediction
  • MBE – Multi-Band Excitation Coding
slide-30
SLIDE 30

Quality Levels

Bitrate Bandwidth Quality level <4 kbit/s Synthetic quality 4 – 16 kbit/s Communication quality 16 – 64 kbit/s 300 – 3400 kHz Network (tool) quality >64 kbit/s 10 kHz Broadcast quality

slide-31
SLIDE 31
  • Subjective Assessment
slide-32
SLIDE 32
  • Subjective Assessment