CTP431- Music and Audio Computing Digital Audio Graduate School of - - PowerPoint PPT Presentation

ctp431 music and audio computing digital audio
SMART_READER_LITE
LIVE PREVIEW

CTP431- Music and Audio Computing Digital Audio Graduate School of - - PowerPoint PPT Presentation

CTP431- Music and Audio Computing Digital Audio Graduate School of Culture Technology KAIST Juhan Nam 1 Digital Representations 0 1 1 0 1 1 0 Sound 1 0 0 1 1 0 1 Image 0 0 1 1 0 1 1 Text Digital


slide-1
SLIDE 1

CTP431- Music and Audio Computing Digital Audio

Graduate School of Culture Technology KAIST Juhan Nam

1

slide-2
SLIDE 2

Digital Representations

Sound Image Text

… 0 1 1 0 1 1 0 … … 1 0 0 1 1 0 1 … … 0 0 1 1 0 1 1 …

slide-3
SLIDE 3

Digital Representations

§ Sampling and Quantization

– Sound (samples) – Image (pixels)

§ Trade-off

– Resolution (quality) and data size

slide-4
SLIDE 4

Digital Audio Chain

4

…0 0 1 0 1 0 …

slide-5
SLIDE 5

Sampling

  • Convert continuous-time signal to discrete-time signal by periodically

picking up the instantaneous values

– Represented as a sequence of numbers; pulse code modulation (PCM) – Sampling period (Ts): the amount of time between samples – Sampling rate ( fs =1/Ts )

5

Ts

x(t) → x(nTs)

Signal notation

slide-6
SLIDE 6

Sampling Theorem

§ What is an appropriate sampling rate?

– Too high: increase data rate – Too low: become hard to reconstruct the original signal

§ Sampling Theorem

– In order for a band-limited signal to be reconstructed fully, the sampling rate must be greater than twice the maximum frequency in the signal – Half the sampling rate is called Nyquist frequency ( )

6

fs > 2⋅ fm

fs 2

slide-7
SLIDE 7

Sampling in Frequency Domain

§ Sampling in time creates imaginary content of the original at every fs frequency § Why?

7

fm

  • fm

fm

  • fm

fs+fm fs-fm fs

  • fs

𝑦 𝑢 = sin 2𝜌𝑔

*𝑢

𝑦 𝑜 = sin 2𝜌𝑔

*𝑜𝑈

  • = sin 2𝜌𝑔

*𝑜/𝑔

  • = sin 2𝜌𝑔

*𝑜/𝑔

  • ± 2𝜌𝑙𝑜 = sin 2𝜌𝑜(𝑔

* ± 𝑙𝑔

  • )/𝑔
  • Nyquist Frequency
  • fs +fm

Audible range Audible range

slide-8
SLIDE 8

Aliasing

§ If the sampling rate is less than twice the maximum frequency, the high- frequency content is folded over to lower frequency range

8

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 x 10

4

−0.8 −0.6 −0.4 −0.2 0.2 0.4 0.6 0.8 1

slide-9
SLIDE 9

Aliasing in Frequency Domain

§ Sampling in time creates imaginary content of the original at every fs frequency § The frequency that we hear is 𝑔

  • − 𝑔

*

9

fm

  • fm

fm

  • fm

fs+fm fs-fm fs

  • fs
  • fs +fm

Audible range Audible range

fm < fs − fm

In order to avoid aliasing

slide-10
SLIDE 10

Aliasing in Frequency Domain

§ For general signals, high-frequency content is folded over to lower frequency range

10

fm

  • fm

fs+fm fs-fm fs

  • fs

Audible range

slide-11
SLIDE 11

Avoid Aliasing

§ Increase sampling rate § Use lowpass filters before sampling

11

fs/2 fs

  • fs
  • fs/2

fs > 2⋅ fm

fm

  • fm

fs+fm fs-fm fs

  • fs

Lowpass Filter

slide-12
SLIDE 12

Example of Aliasing

12

Frequency sweep of the trivial sawtooth wave

Time (s) Frequency (Hz) 1 1.5 2 2.5 3 3.5 4 4.5 0.5 1 1.5 2 x 10

4

Bandlimited sawtooth wave spectrum

5 10 15 20 −60 −40 −20 Frequency (kHz) Magnitude (dB)

5 10 15 20 −60 −40 −20 Frequency (kHz) Magnitude (dB)

Trivial sawtooth wave spectrum

slide-13
SLIDE 13

Example of Aliasing

13

https://www.youtube.com/watch?v=jHS9JGkEOmA Aliasing in Video

slide-14
SLIDE 14

Sampling Rates

§ Determined by the bandwidth of signals or hearing limits

– Consumer audio product: 44.1 kHz (CD) – Professional audio gears: 48/96/192 kHz – Speech communication: 8/16 kHz

14

slide-15
SLIDE 15

Quantization

§ Discretizing the amplitude of real-valued signals

– Round the amplitude to the nearest discrete steps – The discrete steps are determined by the number of bit bits

  • Audio CD: 16 bits (-215 ~ 215-1) ß B bits (-2B-1 ~ 2B-1-1)

15

slide-16
SLIDE 16

Quantization Error

§ Quantization causes noise

– Average power of quantization noise: obtained from the probability density function (PDF) of the error

§ Signal to Noise Ratio (SNR)

– Based on average power – Based on the max levels

16

1/2

  • 1/2

P(e)

1

x2p(e)dx

−1/2 1/2

= 112

Root mean square (RMS) of noise 20log10 Srms Nrms = 20log10 2B−1 / 2 112 = 6.02B+1.76 dB

(With 16bits, SNR = 98.08dB)

20log10 Smax Nmax = 20log10 2B−1 12 = 6.02B dB (With 16bits, SNR = 96.32 dB) RMS of full-scale sine wave

slide-17
SLIDE 17

Dynamic Range

§ Dynamic range

– The ratio between the loudest and softest levels

§ Human ear’s dynamic range

– Depending on frequency band

17

20log10 Srms,max Srms,min = 20log10 2B−1 / 2 1/ 2 = 6.02B − 6 (With 16bits, DR = 90.31 dB) Equal Loudness Curve

Again, RMS of full-scale sine wave for both loudest and softest

slide-18
SLIDE 18

Clipping and Headroom

§ Clipping

– Non-linear distortion that occurs when a signal is above the max level

§ Headroom

– Margin between the peak level and the max level

18

0 dB

  • 98.08 dB

Noise floor (By quantization) Max level Head room Clipping Min level

  • 90.31 dB

B = 16 bits In digital audio, 0dB is regarded as the maximum level