GCT535- Sound Technology for Multimedia Digital Audio Graduate - - PowerPoint PPT Presentation

gct535 sound technology for multimedia digital audio
SMART_READER_LITE
LIVE PREVIEW

GCT535- Sound Technology for Multimedia Digital Audio Graduate - - PowerPoint PPT Presentation

GCT535- Sound Technology for Multimedia Digital Audio Graduate School of Culture Technology KAIST Juhan Nam 1 Digital Representations 0 1 1 0 1 1 0 Sound 1 0 0 1 1 0 1 Image 0 0 1 1 0 1 1 Text Digital


slide-1
SLIDE 1

GCT535- Sound Technology for Multimedia Digital Audio

Graduate School of Culture Technology KAIST Juhan Nam

1

slide-2
SLIDE 2

Digital Representations

Sound Image Text

… 0 1 1 0 1 1 0 … … 1 0 0 1 1 0 1 … … 0 0 1 1 0 1 1 …

slide-3
SLIDE 3

Digital Representations

§ Sampling and Quantization

– Sound (samples) – Image (pixels)

§ Trade-off

– Resolution (quality) and data size

slide-4
SLIDE 4

Digital Representations

§ Encoding and Decoding (compression or de-compression)

– Lossless : redundancy removal (e.g. zip) – Lossy: drop bits (quantize data more) such that they are perceptually not noticeable

Reduce data (i.e. bits) with no loss of information

slide-5
SLIDE 5

Outlines

§ Digital Representation of Sound § Sampling § Quantization § Compression

5

slide-6
SLIDE 6

Digital Audio Chain

6

…0 0 1 0 1 0 …

slide-7
SLIDE 7

Transducers

§ Microphones

– Air vibration to electrical signal – Dynamic / condenser microphones – The signal is very weak: use of pre-amp

§ Speakers

– Electrical signal to air vibration – Generate some distortion (by diaphragm) – Crossover networks: woofer / tweeter

7

slide-8
SLIDE 8

Sampling

§ Convert continuous-time signal to discrete-time signal by periodically picking up the instantaneous values

– Represented as a sequence of numbers; pulse code modulation (PCM) – Sampling period (Ts): the amount of time between samples – Sampling rate ( fs =1/Ts )

8

Ts

x(t) → x(nTs)

Signal notation

slide-9
SLIDE 9

Sampling Theorem

§ What is an appropriate sampling rate?

– Too high: increase data rate – Too low: become hard to reconstruct the original signal

§ Sampling Theorem

– In order for a band-limited signal to be reconstructed fully, the sampling rate must be greater than twice the maximum frequency in the signal – Half the sampling rate is called Nyquist frequency ( )

9

fs > 2⋅ fm

fs 2

slide-10
SLIDE 10

Sampling in Frequency Domain

§ Sampling in time creates imaginary content of the original at every fs frequency § Why?

10

fm

  • fm

fm

  • fm

fs+fm fs-fm fs

  • fs

𝑦 𝑢 = sin 2𝜌𝑔

*𝑢

𝑦 𝑜 = sin 2𝜌𝑔

*𝑜𝑈

  • = sin 2𝜌𝑔

*𝑜/𝑔

  • = sin 2𝜌𝑔

*𝑜/𝑔

  • ± 2𝜌𝑙𝑜 = sin 2𝜌𝑜(𝑔

* ± 𝑙𝑔

  • )/𝑔
  • Nyquist Frequency
  • fs +fm

Audible range Audible range

slide-11
SLIDE 11

Aliasing

§ If the sampling rate is less than twice the maximum frequency, the high- frequency content is folded over to lower frequency range

11

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 x 10

4

−0.8 −0.6 −0.4 −0.2 0.2 0.4 0.6 0.8 1

slide-12
SLIDE 12

Aliasing in Frequency Domain

§ Sampling in time creates imaginary content of the original at every fs frequency § The frequency that we hear is 𝑔

  • − 𝑔

*

12

fm

  • fm

fm

  • fm

fs+fm fs-fm fs

  • fs
  • fs +fm

Audible range Audible range

fm < fs − fm

In order to avoid aliasing

slide-13
SLIDE 13

Aliasing in Frequency Domain

§ For general signals, high-frequency content is folded over to lower frequency range

13

fm

  • fm

fs+fm fs-fm fs

  • fs

Audible range

slide-14
SLIDE 14

Avoid Aliasing

§ Increase sampling rate § Use lowpass filters before sampling

14

fs/2 fs

  • fs
  • fs/2

fs > 2⋅ fm

fm

  • fm

fs+fm fs-fm fs

  • fs

Lowpass Filter

slide-15
SLIDE 15

Examples of Aliasing

15

Frequency sweep of the trivial sawtooth wave

Time (s) Frequency (Hz) 1 1.5 2 2.5 3 3.5 4 4.5 0.5 1 1.5 2 x 10

4

Bandlimited sawtooth wave spectrum

5 10 15 20 −60 −40 −20 Frequency (kHz) Magnitude (dB)

5 10 15 20 −60 −40 −20 Frequency (kHz) Magnitude (dB)

Trivial sawtooth wave spectrum

slide-16
SLIDE 16

Examples of Aliasing

§ Aliasing in Video

– https://www.youtube.com/watch?v=QOqtdl2sJk0 – https://www.youtube.com/watch?v=jHS9JGkEOmA

16

( Note that video frame rate corresponds to the sampling rate )

slide-17
SLIDE 17

Sampling Rates

§ Determined by the bandwidth of signals or hearing limits

– Consumer audio product: 44.1 kHz (CD) – Professional audio gears: 48/96/192 kHz – Speech communication: 8/16 kHz

17

slide-18
SLIDE 18

Reconstruction in Frequency Domain

§ The sampled signal can be reconstructed by applying a low-pass filter in the frequency domain view

18

fm

  • fm

fm

  • fm

fs

  • fs

fs/2

slide-19
SLIDE 19

Reconstruction in Time Domain

§ The reconstruction corresponds to convolution with a sinc function in the time domain

– The ideal low-pass corresponds to the sinc function – In practice, DACs are composed of sample-and-hold and low-pass filtering circuitry

19

sinc functions!

sinc(x) = sin(π x) π x

Before sampling After sampling Reconstruction Time domain Frequency domain

slide-20
SLIDE 20

Quantization

§ Discretizing the amplitude of real-valued signals

– Round the amplitude to the nearest discrete steps – The discrete steps are determined by the number of bit bits

  • Audio CD: 16 bits (-215 ~ 215-1) ß B bits (-2B-1 ~ 2B-1-1)

20

slide-21
SLIDE 21

Quick Review: Number Representations on Computer

§ Fixed-point number

– Unsigned: 0 ~ 2^B-1

  • 8 bits: 0 (0x00000000) ~ 255 (0x11111111)

– Signed: -2^(B-1) ~ 2^(B-1)-1

  • 8 bits: -128 (0x10000000) ~ 127 (0x01111111)
  • Audio signals are usually represented with signed numbers

– 8 or 16 bits are popular choices – WAV file format

§ Floating-point number

– Composed of sign, exponent and mantissa – The represented number is (-1)s x m x 2e (base 2) or (-1)s x m x 10e (base 10) – Examples

  • 1.653 à 1653 x 10-3 (s = 0, e=-3, m = 1653)
  • -1329.6 à (-1) x 13296 x 10-1 (s = -1, e=-1, m = 13296)

– The floating point can represent a much wider range of numbers – 32 or 64 bits are popular choices – Internal processing in DAW

21

B bits

Sine Mantissa Exponent

e m s

slide-22
SLIDE 22

Quantization Error

§ Quantization causes noise

– Average power of quantization noise: obtained from the probability density function (PDF) of the error

§ Signal to Noise Ratio (SNR)

– Based on RMS – Based on the max levels

22

1/2

  • 1/2

P(e)

1

x2p(e)dx

−1/2 1/2

= 112

Root mean square (RMS) of noise 20log10 Srms Nrms = 20log10 2B−1 / 2 112 = 6.02B+1.76 dB

(With 16bits, SNR = 98.08dB)

20log10 Smax Nmax = 20log10 2B−1 12 = 6.02B dB (With 16bits, SNR = 96.32 dB) RMS of full-scale sine wave

slide-23
SLIDE 23

Dynamic Range

§ Dynamic range

– The ratio between the loudest and softest levels

§ Human ear’s dynamic range

– Depending on frequency band

23

20log10 Srms,max Srms,min = 20log10 2B−1 / 2 1/ 2 = 6.02B − 6 (With 16bits, DR = 90.31 dB) Equal Loudness Curve

Again, RMS of full-scale sine wave for both loudest and softest

slide-24
SLIDE 24

Clipping and Headroom

§ Clipping

– Non-linear distortion that occurs when a signal is above the max level

§ Headroom

– Margin between the peak level and the max level

24

0 dB

  • 98.08 dB

Noise floor (By quantization) Max level Head room Clipping Min level

  • 90.31 dB

B = 16 bits In digital audio, 0dB is regarded as the maximum level

slide-25
SLIDE 25

Dithering

§ Note that the SNR for the quantization noise depends on signal levels

– As the signal level goes down, SNR decreases – Low-level signals can have colored noise

§ Dithering

– Adding a small white noise to the signal before sampling (or high to low bit conversion) – This adds white noise but coloration is prevented – The amount is the order of 3dB

25

No dithering With dithering

X(ω) ! X (ω)

See the added white noise. This is less annoying than the colored noise by quantization

! x (t) = x(t)+ ndithering(t)

slide-26
SLIDE 26

Compression

§ Lossy compression

– Perceptual audio coding: leverage human perception of tones – E.g. MP3 (.mp3), AAC (.mp4, m4a, ..), AC3 (Dolby DVD, …)

§ Lossless compression

– Redundancy reduction: Huffman coding, arithmetic coding – E.g. FLAC

26

slide-27
SLIDE 27

Perceptual Coding

§ Leverage the auditory masking phenomenon

– Decrease the dynamic range in cochlea – The masked threshold depend on the tone frequency and critical bands – Allocate bits according to the signal-to-masking ratio

27

asis of MPEG Audio

masked threshold

log freq

absolute threshold masking tone

Intensity / dB

Borrowed from D. Ellis’ E4896 slides

slide-28
SLIDE 28

Huffman Coding

§ Assigning bits according to the statistics of each source

28

0 (0.4) 1 (0.35) 2 (0.2) 3 (0.05)

Probability

110 111 10 11 1 0.25 0.6 1 0.4 1* 0.4 + 2* 0.35 + 3*0.2 + 3*0.05 = 1.85 bits à Save 0.15 bits