Section 3: Digitising Speech, Music & Video 29Dec'06 Comp30282 - PowerPoint PPT Presentation

University of Manchester School of Computer Science CS3282 : Digital Communications Section 3: Digitising Speech, Music & Video 29Dec'06 Comp30282 Sectn 3 1

3.1 Digitising speech • Traditional telephone channels restrict speech to 300- 3400 Hz. • Considered not to incur serious loss of intelligibility. • Significant effect on naturalness of sound. • Once band-limited, speech may be sampled at 8 kHz. •ITU-T G711 standard for speech in POTS allocates 64000 b/s for 8 kHz sampling rate with 8 bits / sample. •Exercise 3.1: Why are components below 300 Hz removed? 29Dec'06 Comp30282 Sectn 3 2

3.1.1.International standards for speech coding: • ITU committee CCITT until 1993 part of UNESCO. • Since 1993, CCITT has become part of ITU-T. • Within ITU-T is study group responsible for speech digitisation & coding standards. •Among other organisations defining standards for telecoms & telephony are: � “TCH-HS”: part of ETSI (GSM). � “TIA” USA equivalent of ETSI. � “RCR” Japanese equivalent of ETSI. � “Inmarsat” & various committees within NATO. • Standards exist for digitising “wide-band” speech (50 Hz to 7 kHz) e.g. ITU G722 . 29Dec'06 Comp30282 Sectn 3 3

3.1.2. Uniform quantisation. •Quantisation: each sample, x[n], of x(t) approximated by closest available quantisation level. •Uniform quantisation: constant voltage difference ∆ between levels. •With 8 bits, & input range ± V , have 256 levels with ∆ = V/128. •If x(t) between ± V, & samples are rounded, uniform quantisation produces x’[n] = x[n] + e[n] where −∆ /2 ≤ e[n] ≤ ∆ /2 • Otherwise, overflow will occur & magnitude of error may >> ∆ /2. • Overflow is best avoided. 29Dec'06 Comp30282 Sectn 3 4

Noise due to uniform quantisation error •Samples e[n] “random” within ±∆ /2. •When quantised signal converted back to analogue, adds random error or “noise” signal to x(t). •Noise heard as sound added to x(t). •Samples e[n] have uniform probability between ±∆ /2. It follows that the mean square value of e[n] is: ∆ / 2   ∆ 3 2 1 1 e ∆ ∆ / 2 / 2 ∫ ∫ = = = 2 2 e p ( e ) de e de   ∆ ∆ 3 12 − ∆ − ∆ / 2 / 2   − ∆ / 2 Power of analogue quantisatn noise in 0 Hz to f S /2. 29Dec'06 Comp30282 Sectn 3 5

3.1.3. Signal-to-quantisation noise ratio (SQNR) Measure how seriously signal degraded by quantisatn noise.   signal power   = SQNR 10 log in decibels (dB.)   10 quantisati on noise power   • With uniform quantisation, quantisn-noise power in range 0 to f s /2 is ∆ 2 /12 & is independent of signal power. • ∴ SQNR will depend on signal power. • Amplify signal as much as possible without overflow. • Then, for sinusoidal waveforms with m-bit uniform quantiser: SQNR ≈ 6m +1.8 dB. • •Approximately true for speech also. 29Dec'06 Comp30282 Sectn 3 6

Variation of input levels •For telephone users with loud voices & quiet voices, quantisation noise will have same power. •SQNR better for loud voices than for quiet voices. •If SQNR made acceptable for quiet voices it may be better than necessary for loud voices. •Useful to know over what dynamic range of input powers the SQNR will remain acceptable to users. volts 111 ∆ 001 ∆ 000 ∆ too big for ∆ too small for OK quiet voice loud voice 29Dec'06 Comp30282 Sectn 3 7

3.1.4. Dynamic Range (Dy)   Max possible signal power ( no overflow ) 10 log dB .   10 Min . power which gives acceptable SQNR   Exercise: If SQNR must be at least 30dB to be acceptable, what is Dy assuming sine-waves & 8-bit uniform quantiser? Solution: = Max possible - Min acceptable SQNR (dB) = (6m + 1.8) - 30 = 49.8 - 30 = 19.8 dB. Too small for telephony Exercise: Repeat this calculation for 12-bit uniform quantisation. 29Dec'06 Comp30282 Sectn 3 8

3.1.5. Instantaneous companding •Eight bits per sample not sufficient for good speech encoding with uniform quantisation. •Problem lies with setting a suitable quantisation step-size ∆ . •If ∆ too large, small signal levels will have SQNR too low •If ∆ too small, large signal levels distorted due to overflow. •One solution is to use instantaneous companding •Step-size adjusted according to amplitude of sample. •For larger amplitudes, larger step-sizes used as illustrated next. •‘Instantaneous’ because step-size changes from sample to sample. 29Dec'06 Comp30282 Sectn 3 9

Non-uniform quantisation used for companding x(t) t 001 111 Fig. 3.1 29Dec'06 Comp30282 Sectn 3 10

Analogue implementation of companding •Pass x(t) thro’ compressor to produce of y(t). • y(t) is quantised uniformly & transmitted or stored as {y’[n]}. •At receiver, {y’[n]} DAC converted & passed thro’ expander • Expander reverses effect of compressor. •Analog implementation uncommon but shows concept well. y’(t) {y’[n]} y(t) x’(t) ADC with x(t) Expand Com- Uniform uniform -er pressor DAC quantiser Transmit or store 29Dec'06 Comp30282 Sectn 3 11

Digital implementation of companding • x(t) sampled & digitised with high word-length (say 12 or16 bits) • Each x[n] passed thro’ compressor to produce of y[n]. • Each y[n] truncated to required word-length (say 8 bits) • {y’[n]} transmitted or stored. • At receiver, each y’[n] passed thro’ expander which reverses effect of compressor to give x’[n]. •{x’[n]} may be DAC converted with 12 or 16 bit wordlength. {x’[n]} {x[n]} {y[n]} {y’[n]} x’(t) Com- Uniform Ex- x(t) 16-bit 16-bit pressor quantiser pander DAC ADC Transmit or store 29Dec'06 Comp30282 Sectn 3 12

Uniformly quantising a digital signal • If a signal is digitised with uniform quantisation at 16 bits/sample, it can be further quantised (uniformly) to 12 bits by shifting right and discarding the ‘carry out’ bits. • Equivalent to dividing by 16 & taking integer part. 29Dec'06 Comp30282 Sectn 3 13

‘A-Law’ instantaneous companding Common compressor is linear for |x(t)  close to zero & logarithmic for larger values. A suitable formula is:  Ax(t) V ≤ : | x(t) |  (KV) A = y(t)    1   | x(t) | V + ≥ >    sign(x(t)) 1 log   : V | x(t) | V A e    K   where K = 1+ log e (A) A is constant which determines cross-over between linear & log. 29Dec'06 Comp30282 Sectn 3 14

Mapping from x(t) to y(t) by A-law companding y(t) 1 1/K x(t) − V/A − V V/A +V − 1/K A ≈ 3 − 1 29Dec'06 Comp30282 Sectn 3 15

A-law mapping again y(t) 1 1/K -V -V/A x(t) V/A +V -1/K A ≈ 13 (Too difficult to draw if A is any larger) -1 29Dec'06 Comp30282 Sectn 3 16

G711 standard ‘A-law’ companding with A=87.6 •A-law companding as used in UK with A = 87.6 & K=5.47. • General formula becomes:  V ≤ 16x(t)/V : | x(t) |  87.6 = y(t)      | x(t) | V + ≥ > sign(x(t)) 1 0 . 183 log   : V | x(t) |    V 87.6 e      • ≈ 1 % of domain of x(t) linearly mapped onto ≈ 20 % of range of y(t). • Remaining ≈ 99% of domain of x(t) logarithmically mapped onto ≈ 80% of range for y(t). 29Dec'06 Comp30282 Sectn 3 17

Effect of compressor on sine-wave x(t) y(t) V 1 t t -V -1 29Dec'06 Comp30282 Sectn 3 18

Effect of compressor on triangular wave x(t) y(t) V 1 t -V -1 29Dec'06 Comp30282 Sectn 3 19

A-law expander formula ≤  ˆ ˆ VK y (t)/A : | y (t) | 1/K = ˆ x (t)  − < ≤ ˆ K ( | y (t)| 1 ) ˆ ˆ sign( y (t))Ve : 1/K | y (t) | 1  29Dec'06 Comp30282 Sectn 3 20

Graph of A-law expander formula x(t) +V -1 V/A -1/K 1/K -V/A 1 y(t) A ≈ 3 -V 29Dec'06 Comp30282 Sectn 3 21

Effect of expander on ‘small samples’ • Without quantisation, passing y(t) thro’ expander would produce original signal x(t) exactly. • ‘Small’ samples reduced by factor 16 (when A=86.6). • Small changes affecting these samples are also divided by 16. • Reduces changes due to quantisation by factor 16. • Increases the SQNR for ‘small’ samples by 16 dB as: 20log 10 (1/16) = -20 log 10 (2 4 ) = -80 log 10 (2) ≈ -80x 0.3 = -24 29Dec'06 Comp30282 Sectn 3 22

Effect of expander on ‘large’ samples • If y(t) increasing by ∆ y causes x(t) to increase by ∆ x , ∆ x/ ∆ y ≈ dx(t)/dy(t) Therefore ∆ x ≈ (dx(t)/dy(t)) ∆ y • • dy(t)/dx(t) is ‘quantisation step amplification factor’ 29Dec'06 Comp30282 Sectn 3 23

Quantisation step amplification factor Differentiating (see notes) gives: ≤  VK/A : | x(t) | V/A dx(t) =  < ≤ Kx(t) : V/A | x(t) | V dy(t)  in general and when A=87.6, ≤  V/16 : | x(t) | V/A dx(t) =  < ≤ dy(t) 5.47x(t) : V/A | x(t) | V  29Dec'06 Comp30282 Sectn 3 24

Quantisation step amplification factor dx(t)/dy(t) KV KV/A -V x(t) +V V/A For |x(t)|<V/A amplification constant at KV/A (=16). For |x(t)|>V/A increases in proportion to x(t). -KV 29Dec'06 Comp30282 Sectn 3 25

Section 3: Digitising Speech, Music & Video 29Dec'06 Comp30282 - PowerPoint PPT Presentation

University of Manchester School of Computer Science CS3282 : Digital Communications Section 3: Digitising Speech, Music & Video 29Dec'06 Comp30282 Sectn 3 1 3.1 Digitising speech Traditional telephone channels restrict speech to

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Digitising Music Collections Marion Leonard and Jacqueline Waldock Institute of Popular Music

MUSIC THERAPY MUSIC THERAPY What is music therapy? Music therapy is simply the process of using

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

JEWISH MUSIC 101: WHAT IS JEWISH MUSIC? A PROGRAM OF THE LOWELL MILKEN FUND FOR AMERICAN JEWISH

The intriguing case of sad music Dr. Jonna Vuoskoski jonna.vuoskoski@music.ox.ac.uk Music &

Speech recognition (briefly) Chapter 15, Section 6 Chapter 15, Section 6 1 Outline Speech

Speech recognition (briefly) Chapter 15, Section 6 Chapter 15, Section 6 1 Outline Speech

Music Pre-test Product Overview TWO SCENARIOS FOR MUSIC PRE-TEST With audio stimulus With video

Music and Pain: A Music Therapy Perspective Deborah Salmon, MA, MTA, CMT BRAMS, Universit de

FOLK MUSIC AT KMH A presentation of the Folk Music Department at the Royal College of Music,

Music, Language and Computation Aline Honingh LoLaCo Guestlecture 2012 Outline Music at the

EECS E6870 converting speech to text Speech Recognition automatic speech recognition

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody

Speech & Audio Coding TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jrgen

Introduction to English Linguistics 2: Phonetics and Phonology Phonetics articulary describes

Dual-Channel Acoustic Detection of X. Niu & J. van Santen Nasalization Statuses

Phonetics-phonology The phonetics-phonology interface: basic assumptions mismatches

TIMBRE CONNECTIONS 1 YU / LAMONT MARCH 6, 2018 2 MAP ON THURSDAY DR. MEI-YAU SHIH

RESOURCES FOR SPEECH SYNTHESIS OF VIENNESE VARIETIES Contents Project Viennese Sociolect

Information Transmission Chapter 3, text and speech OVE EDFORS ELECTRICAL AND INFORMATION

glu deployment automation platform July 2011 Yan Pujante in: http://www.linkedin.com/in/yan

Section 3: Digitising Speech, Music & Video 29Dec'06 Comp30282 - PowerPoint PPT Presentation

University of Manchester School of Computer Science CS3282 : Digital Communications Section 3: Digitising Speech, Music & Video 29Dec'06 Comp30282 Sectn 3 1 3.1 Digitising speech Traditional telephone channels restrict speech to

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Digitising Music Collections Marion Leonard and Jacqueline Waldock Institute of Popular Music

MUSIC THERAPY MUSIC THERAPY What is music therapy? Music therapy is simply the process of using

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

JEWISH MUSIC 101: WHAT IS JEWISH MUSIC? A PROGRAM OF THE LOWELL MILKEN FUND FOR AMERICAN JEWISH

The intriguing case of sad music Dr. Jonna Vuoskoski jonna.vuoskoski@music.ox.ac.uk Music &amp;

Speech recognition (briefly) Chapter 15, Section 6 Chapter 15, Section 6 1 Outline Speech

Speech recognition (briefly) Chapter 15, Section 6 Chapter 15, Section 6 1 Outline Speech

Music Pre-test Product Overview TWO SCENARIOS FOR MUSIC PRE-TEST With audio stimulus With video

Music and Pain: A Music Therapy Perspective Deborah Salmon, MA, MTA, CMT BRAMS, Universit de

FOLK MUSIC AT KMH A presentation of the Folk Music Department at the Royal College of Music,

Music, Language and Computation Aline Honingh LoLaCo Guestlecture 2012 Outline Music at the

EECS E6870 converting speech to text Speech Recognition automatic speech recognition

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody

Speech &amp; Audio Coding TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jrgen

Introduction to English Linguistics 2: Phonetics and Phonology Phonetics articulary describes

Dual-Channel Acoustic Detection of X. Niu &amp; J. van Santen Nasalization Statuses

Phonetics-phonology The phonetics-phonology interface: basic assumptions mismatches

TIMBRE CONNECTIONS 1 YU / LAMONT MARCH 6, 2018 2 MAP ON THURSDAY DR. MEI-YAU SHIH

RESOURCES FOR SPEECH SYNTHESIS OF VIENNESE VARIETIES Contents Project Viennese Sociolect

Information Transmission Chapter 3, text and speech OVE EDFORS ELECTRICAL AND INFORMATION

glu deployment automation platform July 2011 Yan Pujante in: http://www.linkedin.com/in/yan

The intriguing case of sad music Dr. Jonna Vuoskoski jonna.vuoskoski@music.ox.ac.uk Music &

Speech & Audio Coding TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jrgen

Dual-Channel Acoustic Detection of X. Niu & J. van Santen Nasalization Statuses