gct535 sound technology for multimedia digital audio
play

GCT535- Sound Technology for Multimedia Digital Audio Graduate - PowerPoint PPT Presentation

GCT535- Sound Technology for Multimedia Digital Audio Graduate School of Culture Technology KAIST Juhan Nam 1 Digital Representations 0 1 1 0 1 1 0 Sound 1 0 0 1 1 0 1 Image 0 0 1 1 0 1 1 Text Digital


  1. GCT535- Sound Technology for Multimedia Digital Audio Graduate School of Culture Technology KAIST Juhan Nam 1

  2. Digital Representations … 0 1 1 0 1 1 0 … Sound … 1 0 0 1 1 0 1 … Image … 0 0 1 1 0 1 1 … Text

  3. Digital Representations § Sampling and Quantization – Sound (samples) – Image (pixels) § Trade-off – Resolution (quality) and data size

  4. Digital Representations § Encoding and Decoding (compression or de-compression) – Lossless : redundancy removal (e.g. zip) – Lossy: drop bits (quantize data more) such that they are perceptually not noticeable Reduce data (i.e. bits) with no loss of information

  5. Outlines § Digital Representation of Sound § Sampling § Quantization § Compression 5

  6. Digital Audio Chain …0 0 1 0 1 0 … 6

  7. Transducers § Microphones – Air vibration to electrical signal – Dynamic / condenser microphones – The signal is very weak: use of pre-amp § Speakers – Electrical signal to air vibration – Generate some distortion (by diaphragm) – Crossover networks: woofer / tweeter 7

  8. Sampling § Convert continuous-time signal to discrete-time signal by periodically picking up the instantaneous values – Represented as a sequence of numbers; pulse code modulation (PCM) – Sampling period ( T s ): the amount of time between samples – Sampling rate ( f s = 1/ T s ) Signal notation T s x ( t ) → x ( nT s ) 8

  9. Sampling Theorem § What is an appropriate sampling rate? – Too high: increase data rate – Too low: become hard to reconstruct the original signal § Sampling Theorem – In order for a band-limited signal to be reconstructed fully, the sampling rate must be greater than twice the maximum frequency in the signal f s > 2 ⋅ f m f s – Half the sampling rate is called Nyquist frequency ( ) 2 9

  10. Sampling in Frequency Domain § Sampling in time creates imaginary content of the original at every f s frequency Audible range Audible range -f m f m f m f s -f m f s -f s -f s +f m -f m f s +f m Nyquist Frequency § Why? 𝑦 𝑜 = sin 2𝜌𝑔 * 𝑜𝑈 - = sin 2𝜌𝑔 * 𝑜/𝑔 - 𝑦 𝑢 = sin 2𝜌𝑔 * 𝑢 = sin 2𝜌𝑔 * 𝑜/𝑔 - ± 2𝜌𝑙𝑜 = sin 2𝜌𝑜(𝑔 * ± 𝑙𝑔 - )/𝑔 - 10

  11. Aliasing § If the sampling rate is less than twice the maximum frequency, the high- frequency content is folded over to lower frequency range 1 0.8 0.6 0.4 0.2 0 − 0.2 − 0.4 − 0.6 − 0.8 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 4 x 10 11

  12. Aliasing in Frequency Domain § Sampling in time creates imaginary content of the original at every f s frequency Audible range Audible range -f m f m -f s -f s +f m f m f s -f m f s -f m f s +f m § The frequency that we hear is 𝑔 - − 𝑔 * In order to avoid aliasing f m < f s − f m 12

  13. Aliasing in Frequency Domain § For general signals, high-frequency content is folded over to lower frequency range Audible range f s -f s -f m f s -f m f m f s +f m 13

  14. Avoid Aliasing § Increase sampling rate f s > 2 ⋅ f m § Use lowpass filters before sampling -f s -f m f s -f m f s f m f s +f m Lowpass Filter f s -f s -f s /2 f s /2 14

  15. Examples of Aliasing 0 0 Magnitude (dB) Magnitude (dB) − 20 − 20 − 40 − 40 − 60 − 60 5 10 15 20 5 10 15 20 Frequency (kHz) Frequency (kHz) Bandlimited sawtooth wave spectrum Trivial sawtooth wave spectrum 4 x 10 2 1.5 Frequency (Hz) 1 0.5 Frequency sweep of the trivial sawtooth wave 0 15 1 1.5 2 2.5 3 3.5 4 4.5 Time (s)

  16. Examples of Aliasing § Aliasing in Video – https://www.youtube.com/watch?v=QOqtdl2sJk0 – https://www.youtube.com/watch?v=jHS9JGkEOmA ( Note that video frame rate corresponds to the sampling rate ) 16

  17. Sampling Rates § Determined by the bandwidth of signals or hearing limits – Consumer audio product: 44.1 kHz (CD) – Professional audio gears: 48/96/192 kHz – Speech communication: 8/16 kHz 17

  18. Reconstruction in Frequency Domain § The sampled signal can be reconstructed by applying a low-pass filter in the frequency domain view f s /2 -f m f m f m f s -f s -f m 18

  19. Reconstruction in Time Domain § The reconstruction corresponds to convolution with a sinc function in the time domain – The ideal low-pass corresponds to the sinc function – In practice, DACs are composed of sample-and-hold and low-pass filtering circuitry Frequency domain Before sampling sinc( x ) = sin( π x ) π x After sampling Time domain Reconstruction sinc functions! 19

  20. Quantization § Discretizing the amplitude of real-valued signals – Round the amplitude to the nearest discrete steps – The discrete steps are determined by the number of bit bits • Audio CD: 16 bits (-2 15 ~ 2 15 -1) ß B bits (-2 B-1 ~ 2 B-1 -1) 20

  21. Quick Review: Number Representations on Computer Fixed-point number § – Unsigned: 0 ~ 2 ^B -1 • 8 bits: 0 (0x00000000) ~ 255 (0x11111111) – Signed: -2 ^(B-1) ~ 2 ^(B-1)- 1 … • 8 bits: -128 (0x10000000) ~ 127 (0x01111111) • Audio signals are usually represented with signed numbers – 8 or 16 bits are popular choices – WAV file format B bits Floating-point number § – Composed of sign, exponent and mantissa – The represented number is (-1) s x m x 2 e (base 2) or (-1) s x m x 10 e (base 10) – Examples • 1.653 à 1653 x 10 -3 (s = 0, e=-3, m = 1653) Sine Exponent Mantissa • -1329.6 à (-1) x 13296 x 10 -1 (s = -1, e=-1, m = 13296) s e m – The floating point can represent a much wider range of numbers – 32 or 64 bits are popular choices – Internal processing in DAW 21

  22. Quantization Error § Quantization causes noise – Average power of quantization noise: obtained from the probability density function (PDF) of the error P ( e ) Root mean square (RMS) of noise 1 1/2 112 x 2 p ( e ) dx ∫ = − 1/2 -1/2 1/2 § Signal to Noise Ratio (SNR) RMS of full-scale sine wave – Based on RMS 2 B − 1 / S rms 2 (With 16bits, SNR = 98.08dB) 20log 10 = 20log 10 = 6.02 B + 1.76 dB N rms 112 – Based on the max levels 2 B − 1 S max = 6.02 B dB (With 16bits, SNR = 96.32 dB) 20log 10 = 20log 10 12 N max 22

  23. Dynamic Range § Dynamic range Again, RMS of full-scale sine wave – The ratio between the loudest and softest levels for both loudest and softest 2 B − 1 / S rms,max 2 (With 16bits, DR = 90.31 dB) 20log 10 = 20log 10 = 6.02 B − 6 S rms,min 1/ 2 § Human ear’s dynamic range – Depending on frequency band 23 Equal Loudness Curve

  24. Clipping and Headroom § Clipping – Non-linear distortion that occurs when a signal is above the max level § Headroom – Margin between the peak level and the max level In digital audio, 0dB is regarded as the maximum level Clipping 0 dB Max level Head room B = 16 bits -90.31 dB Min level -98.08 dB Noise floor (By quantization) 24

  25. Dithering § Note that the SNR for the quantization noise depends on signal levels – As the signal level goes down, SNR decreases – Low-level signals can have colored noise § Dithering – Adding a small white noise to the signal before sampling (or high to low bit conversion) x ( t ) = x ( t ) + n dithering ( t ) ! – This adds white noise but coloration is prevented – The amount is the order of 3dB No dithering X ( ω ) See the added white noise. This is less annoying With dithering than the colored noise X ( ω ) ! by quantization 25

  26. Compression § Lossy compression – Perceptual audio coding: leverage human perception of tones – E.g. MP3 (.mp3), AAC (.mp4, m4a, ..), AC3 (Dolby DVD, …) § Lossless compression – Redundancy reduction: Huffman coding, arithmetic coding – E.g. FLAC 26

  27. Perceptual Coding § Leverage the auditory masking phenomenon – Decrease the dynamic range in cochlea – The masked threshold depend on the tone frequency and critical bands – Allocate bits according to the signal-to-masking ratio masking tone absolute threshold Intensity / dB masked threshold log freq asis of MPEG Audio Borrowed from D. Ellis’ E4896 slides 27

  28. Huffman Coding § Assigning bits according to the statistics of each source 0 0 (0.4) 0.4 10 1 (0.35) 1 1 110 2 (0.2) 0.6 11 111 0.25 3 (0.05) 1* 0.4 + 2* 0.35 + 3*0.2 + 3*0.05 = 1.85 bits à Save 0.15 bits Probability 28

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend