CTP431- Music and Audio Computing Fundamentals of Sound and Digital - - PowerPoint PPT Presentation

ctp431 music and audio computing fundamentals of sound
SMART_READER_LITE
LIVE PREVIEW

CTP431- Music and Audio Computing Fundamentals of Sound and Digital - - PowerPoint PPT Presentation

CTP431- Music and Audio Computing Fundamentals of Sound and Digital Audio Graduate School of Culture Technology KAIST Juhan Nam 1 Outlines What is Sound? Sound Properties Loudness Pitch Timbre Digital Representation of


slide-1
SLIDE 1

CTP431- Music and Audio Computing Fundamentals of Sound and Digital Audio

Graduate School of Culture Technology KAIST Juhan Nam

1

slide-2
SLIDE 2

Outlines

§ What is Sound? § Sound Properties

– Loudness – Pitch – Timbre

§ Digital Representation of Sound

– Sampling – Quantization

2

slide-3
SLIDE 3

What Is Sound?

§ Vibration of air that you can hear

– Compression and rarefaction of air pressure

3

Production Perception Propagation

Vibration on materials (e.g. string, pipe, membrane) Traveling via the air Sensation of the air vibration through ears Physical Psychological

slide-4
SLIDE 4

Physical Sound

§ Governed by “Newton’s law” and ”Wave” properties § Sound production and propagation in musical instruments

1. Drive force on a sound object 2. Vibration by restoration force 3. Propagation 4. Reflection 5. Superposition 6. Standing Wave (modes): generate a tone 7. Radiation from the object 8. Propagation through air

4

http://www.acs.psu.edu/drussell/demos.html

Demos

https://www.youtube.com/watch?v=_X72on6CSL0

slide-5
SLIDE 5

Psychological Sound

§ Governed by ears (physiological sense) and brain (cognitive sense)

– human auditory system

§ Ears

– A series of highly sensitive transducers – Transform sound into subband signals

§ Brain

– Segregate and organize the auditory stimulus – Recognize loudness, pitch and timbre

5

http://www.youtube.com/watch?v=PeTriGTENoc Auditory Transduction Video

Air Mechanical Fluid Electric (Cook, 1999)

slide-6
SLIDE 6

Sound Properties

6

Amplitude

Physical Psychological

Frequency Waveshape Time Envelop (ADSR) Spectral Envelope (Modes) … Loudness Pitch Timbre

slide-7
SLIDE 7

Loudness

§ Perceptual correlate of sound intensity § Sound Pressure Level (SPL)

– Objective measure of sound intensity – Log scale: – Loudness is proportional to SPL but not exactly

§ Equal-Loudness Curve

– Most sensitive to 2-5KHz tones – Threshold of hearing

7

Equal-Loudness Curve (also called Fetcher-Munson Curve) 20log10(P / P

0)

P

0 = 20µPa : threshold of human hearing

slide-8
SLIDE 8

Pitch

§ Perceptual correlate of fundamental frequency (F0) § Pitch Scale

– Human ears are sensitive to frequency changes in a log scale

  • Ex) Piano note scale

§ Frequency Range of Hearing

– 20 to 20kHz

8

time [second] MIDI note number 10 20 30 40 50 20 40 60 80 100 120 time [second] frequency−Hz 10 20 30 40 50 500 1000 1500 2000 2500 3000 3500 4000

Chromatic Scale of Piano notes (Linear Frequency) Chromatic Scale of Piano notes (Log Frequency)

slide-9
SLIDE 9

Timbre

§ Related to identifying a particular sound object

– Musical instruments, human voices, …

§ Determined by multiple physical attributes

– Time envelope (ADSR) – Spectral envelope – Changes of spectral envelope and fundamental frequency – Harmonicity: ratio between tonal and noise-like characteristics – The onset of a sound differing notably from the sustained vibration

9

Changes of spectral envelope ADSR

slide-10
SLIDE 10

Timbre

§ Determined by multiple parameters

– Perspective of sound synthesis

10

Source: http://www.matrixsynth.com/2011/05/kid-with-buchla.html

slide-11
SLIDE 11

Digital Audio Chain

11

…0 0 1 0 1 0 …

slide-12
SLIDE 12

Microphones / Speakers

§ Microphones

– Air vibration to electrical signal – Dynamic / condenser microphones – The signal is very weak: use of pre-amp

§ Speakers

– Electrical signal to air vibration – Generate some distortion (by diaphragm) – Crossover networks: woofer / tweeter

12

slide-13
SLIDE 13

Sampling

  • Convert continuous-time signal to discrete-time signal by periodically

picking up the instantaneous values

– Represented as a sequence of numbers; pulse code modulation (PCM) – Sampling period (Ts): the amount of time between samples – Sampling rate ( fs =1/Ts )

13

Ts

x(t) → x(nTs)

Signal notation

slide-14
SLIDE 14

Sampling Theorem

§ What is an appropriate sampling rate?

– Too high: increase data rate – Too low: become hard to reconstruct the original signal

§ Sampling Theorem

– In order for a band-limited signal to be reconstructed fully, the sampling rate must be greater than twice the maximum frequency in the signal – Half the sampling rate is called Nyquist frequency ( )

14

fs > 2⋅ fm

fs 2

slide-15
SLIDE 15

Sampling in Frequency Domain

§ Sampling in time creates imaginary content of the original at every fs frequency § Why ?

15

x1(t) = Asin(ω1t) = Asin(2π f1n / fs)

x2(t) = Asin(ω2t) = Asin(2π f2n / fs) = Asin(2π( f1 ± mfs)n / fs) = Asin(2π f1n / fs ± 2πmn) = Asin(2π f1n / fs) = x1(t) f2 = f1 ± mfs

fm

  • fm

fm

  • fm

fs+fm fs-fm fs

  • fs

fm < fs − fm

To avoid overlap

slide-16
SLIDE 16

Aliasing

§ If the sampling rate is less than twice the maximum frequency, the high- frequency content is folded over to lower frequency range

16

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 x 10

4

−0.8 −0.6 −0.4 −0.2 0.2 0.4 0.6 0.8 1

slide-17
SLIDE 17

Aliasing in Frequency Domain

§ The high-frequency content is folded over to lower frequency range from the replicated images § A low-pass filter is applied before sampling to avoid the aliasing noise

17

fm

  • fm

fs+fm fs-fm fs

  • fs

fs/2 fs

  • fs
  • fs/2
slide-18
SLIDE 18

Example of Aliasing

18

Frequency sweep of the trivial sawtooth wave

Time (s) Frequency (Hz) 1 1.5 2 2.5 3 3.5 4 4.5 0.5 1 1.5 2 x 10

4

Bandlimited sawtooth wave spectrum

5 10 15 20 −60 −40 −20 Frequency (kHz) Magnitude (dB)

5 10 15 20 −60 −40 −20 Frequency (kHz) Magnitude (dB)

Trivial sawtooth wave spectrum

slide-19
SLIDE 19

Example of Aliasing

§ Aliasing in Video

– https://www.youtube.com/watch?v=QOqtdl2sJk0 – https://www.youtube.com/watch?v=jHS9JGkEOmA

19

( Note that video frame rate corresponds to the sampling rate )

slide-20
SLIDE 20

Sampling Rates

§ Determined by the bandwidth of signals or hearing limits

– Consumer audio product: 44.1 kHz (CD) – Professional audio gears: 48/96/192 kHz – Speech communication: 8/16 kHz

20

slide-21
SLIDE 21

Quantization

§ Discretizing the amplitude of real-valued signals

– Round the amplitude to the nearest discrete steps – The discrete steps are determined by the number of bit bits

  • Audio CD: 16 bits (-215 ~ 215-1) ß B bits (-2B-1 ~ 2B-1-1)

21

slide-22
SLIDE 22

Quantization Error

§ Quantization causes noise

– Average power of quantization noise: obtained from the probability density function (PDF) of the error

§ Signal to Noise Ratio (SNR)

– Based on average power – Based on the max levels

22

1/2

  • 1/2

P(e)

1

x2p(e)dx

−1/2 1/2

= 112

Root mean square (RMS) of noise 20log10 Srms Nrms = 20log10 2B−1 / 2 112 = 6.02B+1.76 dB

(With 16bits, SNR = 98.08dB)

20log10 Smax Nmax = 20log10 2B−1 12 = 6.02B dB (With 16bits, SNR = 96.32 dB) RMS of full-scale sine wave