Multimedia Communications Spring 2006-07 Voice Traffic - - PowerPoint PPT Presentation

▶

Jan 13, 2023 282 likes •453 views

CS 584 / CMPE 584 Multimedia Communications Spring 2006-07 Voice Traffic Characteristics Shahab Baqai LUMS Voice Communication Characteristics Speech produces a signal that varies slowly in time 4 kHz bandwidth 2 Voice Coding

SLIDE 1

CS 584 / CMPE 584

Multimedia Communications

Spring 2006-07

Voice Traffic Characteristics

Shahab Baqai LUMS

SLIDE 2

Voice Communication Characteristics Speech produces a signal that varies slowly in time 4 kHz bandwidth

SLIDE 3

Voice Coding Voice processing comprises of two steps Speech analysis

Converts an analogue voice signal to digital form

Speech Synthesis

Converts a digital voice data into its analogue form

Two Methods used for voice processing Waveform coding

Pulse Code Modulation (PCM) Code-excited Linear Prediction Coding (CELP)

Vocoding

SLIDE 4

PCM Signal is sampled at regular intervals Sampling rate = 8 kHz (Nyquist Rate) Samples are quantized and transmitted 8 bits/sample ⇒ 64 kbps

SLIDE 5

Sampling and Quantization

SLIDE 6

Voice Quality Measure Quantization is a source for degradation (noise) May be measured by

Where

is the probability density function of the signal
is the decision level

X ( )

p x

( ) ( ) ( )

1 1

2 2 1 2 2 1

k k k k

X N k X x X N q k k X

x p x dx SNR x y p x dx σ σ

− −

= =

= = −

∑ ∫ ∑ ∫

SLIDE 7

Uniform Quantizer Interval between consecutive decision levels is constant

(constant)

Problem SNR is not constant Depends on amplitude

The soft speaker is penalized more than a loud speaker

1 k k

X X − − = Δ

SLIDE 8

Non Uniform Quantizer μ-Law (North America) A-Law (Europe)

SLIDE 9

Adaptive Differential PCM Takes advantage of the slow rate of change in the voice signal:

– Quantizes and transmits the difference between consecutive samples – May use linear prediction of the signal

SLIDE 10

CELP (Code-excited Linear Prediction) Coding Coder

– Voice is analyzed in frames of 10~30 ms represented by:

Synthesis filter

Updated by linear prediction

Excitation

Optimally selected so as to minimize a “perceptually” weighted

measure of distortion

Makes use of a codebook

– A data frame is produced & transmitted

Decoder

Excitation Signal

LP filter

Reproduced Waveform

SLIDE 11

VoCoding For very low bit rates (≅ 2 kbps) Based on modeling the speech production mechanisms rather than the waveform

– Speech is processed in frames of 10~25 ms – Distinction between voiced & unvoiced frames

Voiced speech: vocal cords vibrating (e.g. vowels) Unvoiced speech: vocal cords held firm w/o vibration (e.g. consonants)

Speech is represented by

– Coefficients that define vocal tract resonance characteristics – Excitation energy – Pitch value

SLIDE 12

VoCoding (cont) Low quality

– Unnatural, buzzy

Works only for human speech

– Not optimized for other audio signals

Little current interest

– No international standard yet

SLIDE 13

Motivating Voice Compression

– MOS: Mean Opinion Score – subjective measure of voice quality – CELP: Code-excited Linear Prediction – LD: Low Delay – CS-ACELP: Conjugate Structure – Algebraic CELP – MP-MLQ: Multi-Pulse Excitation with a Maximum Likelihood Quantizer

SLIDE 14

Speech Activity

Speech alternates between two states

– Silence – Talk spurt

SLIDE 15

Speech Activity (cont)

– One speaker talking : 64 ~ 73 % – Both speakers talking: 3 ~ 7 % – Both speakers silent: 33 ~ 20 % Silence Talkspurt Avg Time ≈ 1.8 sec Avg Time ≈ 1.2 sec

SLIDE 16

Silence Suppression Voice Activity Detector (VAD)

– When silence is detected, background noise is transmitted – When speech is detected, full fixed bit rate stream is transmitted

About 60% reduction in data rate

– Resulting traffic is no longer constant bit rate – Statistical Multiplexing gain may be significant