Speech & Audio Coding TSBK01 Image Coding and Data Compression - PowerPoint PPT Presentation

Speech & Audio Coding TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg

Outline • Part I - Speech – Speech – History of speech synthesis & coding – Speech coding methods • Part II – Audio – Psychoacoustic models – MPEG-4 Audio

Speech Production • The human’s vocal apparatus consists of: – lungs – trachea (wind pipe) – larynx • contains 2 folds of skin called vocal cords which blow apart and flap together as air is forced through – oral tract – nasal tract

The Speech Signal �

The Speech Signal

The Speech Signal ��

History of Speech Coding � ��

��

�� µ ��

Source-filter Model of Speech Production � �� Y �� 1 ��

Speech Coding Strategies 1. PCM • Invented 1926, deployed 1962. • The speech signal is sampled at 8 kHz. • Uniform quantization requires >10 bits/sample. • Non-uniform quantization (G.711, 1972) • Quantizing y to 8 bits -> 64 kbit/s.

Speech Coding Strategies 2. Adaptive DPCM • Example: G.726 (1974) • Adaptive predictor based on six previous differences. • Gain-adaptive quantizer with 15 levels � 32 kbit/s.

Speech Coding Strategies 3. Model-based Speech Coding • Advanced speech coders are based on models of how speech is produced: Excitation Vocal source tract

An Excitation Source Noise generator Pitch Pulse generator

Vocal Tract Filter 1: A Fixed Filter Bank g 1 BP g 2 BP g n BP

Vocal Tract Filter 2: A Controllable Filter

Linear Predictive Coding (LPC) • The controllable filter is modelled as y n = ∑ a i y n-i + G ε n where ε n is the input signal and y n is the output. • We need to estimate the vocal tract parameters (a i and G) and the exciatation parameters (pitch, v/uv). • Typically the source signal is divided in short segments and the parameters are estimated for each segment. • Example: The speech signal is sampled at 8 kHz and divided in segments of 180 samples (22.5 ms/segment).

Typical Scheme of an LPC Coder Noise generator Vocal tract filter Pulse generator Pitch v/uv Gain Filter coeffs

Estimating the Parameters • v/uv estimation – Based on energy and frequency spectrum. • Pitch-period estimation – Look for periodicity, either via the a.c.f our some other measure, for example that gives you a minimum value when p equals the pitch period. – Typical pitch-periods: 20 - 160 samples.

Estimating the Parameters • Vocal tract filter estimation – Find the filter coefficients that minimize the error ε 2 = ( y n - ∑ a i y n-i + G ε n ) 2 – Compare to the computation of optimal predictors (Lecture 7).

Estimating the Parameters • Assuming a stationary signal: where R and p contain acf values. • This is called the autocorrelation method .

Estimating the Parameters • Alternatively, in case of a non-stationary signal: where • This is called the autocovariance method .

Example • Coding of parameters using LPC10 (1984): v/uv 1 bit Pitch 6 bits Voiced filter 46 bits Unvoiced filter 46 bits Synchronization 1 bit 54 bits � 2.4 kbit/s Sum:

Speech & Audio Coding TSBK01 Image Coding and Data Compression - PowerPoint PPT Presentation

Speech & Audio Coding TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jrgen Ahlberg Outline Part I - Speech Speech History of speech synthesis & coding Speech coding methods Part II Audio

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Game Audio Coding vs. Aesthetics Leonard Paul of Lotus Audio Vancouver, Canada Game Audio :

Formal Modeling in Cognitive Science 1 Coding Theorems Lecture 28: Kraft Inequality; Source Coding

EE E6820: Speech & Audio Processing & Recognition Lecture 7: Audio Compression &

Bandwidth Extension of Narrowband Speech for Low Bit- Rate Wideband Coding Speech Coding

Audio Device Client Better and Faster Audio I/O on Web Hongchan Choi Google Chrome Web Audio

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

Audio and Speech August 13, 2001 Audio 2 Digital sound anti-aliasing amplifier codec filter

Image and Video Coding: Video Coding Extensions Screen Content Coding Screen Content Coding

ADVANCED MULTIMEDIA ADVANCED MULTIMEDIA CODING CODING Fernando Pereira Instituto Superior

Dynamical systems Expanding maps on the circle. Coding Jana Rodriguez Hertz ICTP 2018 coding

Cirrus Audio Solutions Cirrus Audio Solutions Home Audio Portable Audio Personal CD Player

Risk-Based Coding and Reimbursement What is Risk-Based Coding? Risk-Based Coding Overview A

Entropy Coding Definition of Entropy Three Entropy coding techniques: (taken from the

Coding and Applications in Sensor Networks Coding and Applications in Sensor Networks Why coding?

Applications of Random Coding and Algebraic Coding Theories to Universal Lossless Source Coding

Introduction to English Linguistics 2: Phonetics and Phonology Phonetics articulary describes

Dual-Channel Acoustic Detection of X. Niu & J. van Santen Nasalization Statuses

Phonetics-phonology The phonetics-phonology interface: basic assumptions mismatches

BBNANG243 Phonological analysis Laryngeal contrast in English consonants Zoltn G. Kiss,

Section 3: Digitising Speech, Music & Video 29Dec'06 Comp30282 Sectn 3 1 3.1 Digitising

TIMBRE CONNECTIONS 1 YU / LAMONT MARCH 6, 2018 2 MAP ON THURSDAY DR. MEI-YAU SHIH

RESOURCES FOR SPEECH SYNTHESIS OF VIENNESE VARIETIES Contents Project Viennese Sociolect

Information Transmission Chapter 3, text and speech OVE EDFORS ELECTRICAL AND INFORMATION

Sambuz

Useful Links

Newsletter

Mail Us

Speech & Audio Coding TSBK01 Image Coding and Data Compression - PowerPoint PPT Presentation

Speech & Audio Coding TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jrgen Ahlberg Outline Part I - Speech Speech History of speech synthesis & coding Speech coding methods Part II Audio

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Game Audio Coding vs. Aesthetics Leonard Paul of Lotus Audio Vancouver, Canada Game Audio :

Formal Modeling in Cognitive Science 1 Coding Theorems Lecture 28: Kraft Inequality; Source Coding

EE E6820: Speech &amp; Audio Processing &amp; Recognition Lecture 7: Audio Compression &amp;

Bandwidth Extension of Narrowband Speech for Low Bit- Rate Wideband Coding Speech Coding

Audio Device Client Better and Faster Audio I/O on Web Hongchan Choi Google Chrome Web Audio

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

Audio and Speech August 13, 2001 Audio 2 Digital sound anti-aliasing amplifier codec filter

Image and Video Coding: Video Coding Extensions Screen Content Coding Screen Content Coding

ADVANCED MULTIMEDIA ADVANCED MULTIMEDIA CODING CODING Fernando Pereira Instituto Superior

Dynamical systems Expanding maps on the circle. Coding Jana Rodriguez Hertz ICTP 2018 coding

Cirrus Audio Solutions Cirrus Audio Solutions Home Audio Portable Audio Personal CD Player

Risk-Based Coding and Reimbursement What is Risk-Based Coding? Risk-Based Coding Overview A

Entropy Coding Definition of Entropy Three Entropy coding techniques: (taken from the

Coding and Applications in Sensor Networks Coding and Applications in Sensor Networks Why coding?

Applications of Random Coding and Algebraic Coding Theories to Universal Lossless Source Coding

Introduction to English Linguistics 2: Phonetics and Phonology Phonetics articulary describes

Dual-Channel Acoustic Detection of X. Niu &amp; J. van Santen Nasalization Statuses

Phonetics-phonology The phonetics-phonology interface: basic assumptions mismatches

BBNANG243 Phonological analysis Laryngeal contrast in English consonants Zoltn G. Kiss,

Section 3: Digitising Speech, Music &amp; Video 29Dec'06 Comp30282 Sectn 3 1 3.1 Digitising

TIMBRE CONNECTIONS 1 YU / LAMONT MARCH 6, 2018 2 MAP ON THURSDAY DR. MEI-YAU SHIH

RESOURCES FOR SPEECH SYNTHESIS OF VIENNESE VARIETIES Contents Project Viennese Sociolect

Information Transmission Chapter 3, text and speech OVE EDFORS ELECTRICAL AND INFORMATION

Sambuz

Useful Links

Newsletter

Mail Us

EE E6820: Speech & Audio Processing & Recognition Lecture 7: Audio Compression &

Dual-Channel Acoustic Detection of X. Niu & J. van Santen Nasalization Statuses

Section 3: Digitising Speech, Music & Video 29Dec'06 Comp30282 Sectn 3 1 3.1 Digitising