gct535 sound technology for multimedia fourier
play

GCT535- Sound Technology for Multimedia Fourier Representations of - PowerPoint PPT Presentation

GCT535- Sound Technology for Multimedia Fourier Representations of Audio Graduate School of Culture Technology KAIST Juhan Nam 1 Waveforms The basic audio representation that computers can take x(n) = [a1, a2, a3, ...] Great to


  1. GCT535- Sound Technology for Multimedia Fourier Representations of Audio Graduate School of Culture Technology KAIST Juhan Nam 1

  2. Waveforms § The basic audio representation that computers can take – x(n) = [a1, a2, a3, ...] § Great to observe energy change over time but less intuitive to observe pitch or timbre characteristics § Better representations than this? 2

  3. Sound Generation § Mass-Spring Model – Simple harmonic motion 1 0.5 Restoration force Inertial force k 0 F = − kx = m d 2 x − 0.5 x m dt 2 − 1 0 1 2 3 4 5 6 7 8 − 3 x 10 T = 1 – This generates a sinusoidal oscillation f angular frequency k / m ω = x = A sin( ω t ) = A sin(2 π ft ) f = ω / 2 π frequency period T = 1/ f – Practical models have dampers that make the oscillation decay over time 3

  4. Sound Generation § Any oscillatory object can be modeled as a complex network of masses and springs – This generates a complex tone . . . § Generation steps (e.g. guitar) – Excitation: wideband energy – Propagation on the string – Reflection on the ends – Superposition with reflected waves – Standing waves: constructive superposition – Radiation from the object – Propagation through air Demos http://www.acs.psu.edu/drussell/demos.html String oscillation 4

  5. Frequency-Domain Representation § Can we represent 𝑦 𝑜 with a finite set of sinusoids? $ %+$ % ∑ – 𝑦 𝑜 = 𝐵 𝑙 𝑠 * 𝑜 *,- 34*5 + ϕ(𝑙) ): discrete-time sinusoid with length N • 𝑠 * 𝑜 = cos ( % – Find 𝐵 𝑙 , ϕ(𝑙) 5

  6. Euler’s identity § Euler’s identity 𝑓 =@ = cos𝜄 + 𝑘sin𝜄 – Can be proved by Taylor’s series – If 𝜄 = 𝜌 , 𝑓 =4 + 1 = 0 (“the most beautiful equation in math”) § Properties sin𝜄 = 𝑓 =@ − 𝑓 +=@ cos𝜄 = 𝑓 =@ + 𝑓 +=@ 2𝑘 2 6

  7. Complex Sinusoids § Cosine and sine can be represented in a single term = cos 2𝜌𝑙𝑜 + 𝑘sin 2𝜌𝑙𝑜 𝑡 * 𝑜 = 𝑓 =34*5 % 𝑂 𝑂 34* * – Frequencies: % radian or G Hz ( 𝐺 G : the sampling rate) ( 𝐿 = 0, 1, 2, … , 𝑂 − 1) % 𝐺 – Example: N = 8 7 Figures are from https://ccrma.stanford.edu/~jos/dft/

  8. Complex Sinusoids N=8 8 Figures are from https://ccrma.stanford.edu/~jos/dft/

  9. Frequency-Domain Representation Using Complex Sinusoids § 𝑦 𝑜 is expressed in a simpler form: %+$ 𝑦 𝑜 = 1 𝑂 M 𝐵 𝑙 cos 2𝜌𝑙𝑜 + 𝜚(𝑙) 𝑂 *,- %+$ %+$ = 1 = 1 𝑂 M 𝐵 𝑙 (𝑓 =(34*5 OP * ) +𝑓 +=(34*5 𝑂 M(𝐵 𝑙 𝑓 =P(*) 𝑓 =34*5 + 𝐵 𝑙 𝑓 +=P(*) 𝑓 +=34*5 OP * ) )/2 % )/2 % % % *,- *,- %+$ %+$ = 1 = Real{1 𝑂 M(𝑌 𝑙 𝑓 =34*5 + 𝑌 𝑙 𝑓 +=34*5 𝑂 M 𝑌 𝑙 𝑓 =34*5 % )/2 } % % *,- *,- %+$ = 1 𝑂 M 𝑌 𝑙 𝑓 =34*5 𝑌 𝑙 = 𝐵(𝑙)𝑓 =X * = 𝐵 𝑙 cos ϕ 𝑙 + 𝑘 sin ϕ 𝑙 % *,- – Now, how can we find 𝑌 𝑙 ? 9

  10. Orthogonality of Sinusoids § Inner product between two complex sinusoids %+$ ∗ 𝑜 = M 𝑓 =34Y5 Z 𝑓 +=34[5 𝑂 if 𝑞 = 𝑟 𝑡 Y 𝑜 Z 𝑡 [ = ] % % 0 otherwise 5,- # 0 otherwise % N − 1 N − 1 ∑ ∑ sin(2 π pn / N )sin(2 π qn / N )) = N / 2 if p = q cos(2 π pn / N )sin(2 π qn / N )) = 0 $ % n = 0 n = 0 − N / 2 if p = N − q % & # N − 1 N / 2 if p = q or p = N − q % ∑ cos(2 π pn / N )cos(2 π qn / N )) = $ 0 otherwise % & n = 0 10

  11. Orthogonal Projection on Complex Sinusoids § Do the inner product with the signal and sinusoids %+$ %+$ %+$ = M(1 𝑦 𝑜 Z 𝑡 * (𝑜) = M 𝑦 𝑜 𝑓 +=34[5 𝑂 M 𝑌 𝑙 𝑓 =34Y5 )𝑓 +=34[5 % % % 5,- 5,- Y,- %+$ %+$ = 1 = 1 𝑂 M 𝑌 𝑙 (M 𝑓 =34Y5 𝑓 +=34[5 % ) 𝑂 𝑌 𝑙 𝑂 = 𝑌 𝑙 = 𝐵 𝑙 𝑓 =X * % Y,- 5,- 11

  12. � To Wrap Up § Discrete Fourier Transform %+$ 𝑌 𝑙 = M 𝑦 𝑜 𝑓 +=34*5 = 𝑌 e 𝑙 + 𝑘𝑌 f 𝑙 = 𝐵(𝑙) =X * % 5,- 3 𝑙 + 𝑌 f 3 𝑙 – Magnitude spectrum: 𝑌 𝑙 = 𝐵 𝑙 = 𝑌 e ∠𝑌 𝑙 = ϕ 𝑙 = tan +$ (𝑌 f (𝑙) – Phase spectrum: 𝑌 e (𝑙)) § Inverse Discrete Fourier Transform %+$ 𝑦(𝑜) = 1 𝑂 M 𝑌 𝑙 𝑓 =34*5 % *,- 12

  13. Why we choose this set of frequencies in sinusoids? § Underlying assumption in DFT – The N samples are periodic – In the view of “Fourier Series”, a periodic signal with period N can be represented as sinusoids with period N, N/2, N/3 , … ( 1/N, 2/N, 3/N , ... in frequency ) 0.2 0.1 amplitude 0 − 0.1 − 0.2 0 20 40 60 80 time − seconds 0.2 0.1 amplitude 0 − 0.1 − 0.2 13 0 50 100 150 200 250 300 time − seconds

  14. Properties of DFT § Periodicity – 𝑌 𝑙 = 𝑌 𝑙 + 𝑂 = 𝑌 𝑙 + 2𝑂 = … – 𝑌 𝑙 = 𝑌 𝑙 − 𝑂 = 𝑌 𝑙 − 2𝑂 = … § Symmetry – Magnitude response: 𝑌 𝑙 = 𝑌 𝑂 − 𝑙 = 𝑌 −𝑙 – Phase response : ∠𝑌 𝑙 = −∠𝑌 −𝑙 = −∠𝑌 𝑂 − 𝑙 – We often display only half the amplitude and phase responses 14

  15. Properties of DFT Waveform Waveform 1 1 0.5 0.5 0 0 -0.5 -0.5 -1 -1 0 5 10 15 20 25 30 0 5 10 15 20 25 30 Magnitude (N=32) Magnitude (N=32) 15 15 𝑌 𝑙 = 𝑌 𝑂 − 𝑙 𝑌 𝑙 = 𝑌 −𝑙 10 10 5 5 0 0 0 5 10 15 20 25 30 -15 -10 -5 0 5 10 15 Phase (N=32) Phase (N=32) 4 4 ∠𝑌 𝑙 = −∠𝑌 𝑂 − 𝑙 ∠𝑌 𝑙 = −∠𝑌 −𝑙 2 2 0 0 -2 -2 -4 -4 0 5 10 15 20 25 30 -15 -10 -5 0 5 10 15 15

  16. Frequency Scales 𝑙 = 0, 1, … , 𝑂 corresponds to frequency values that are evenly § 𝑌 𝑙 distributed between 0 to 𝑔𝑡 in Hz N -N -N/2 0 N/2 f s -f s 0 f s /2 -f s /2 16

  17. Cracks in Sinusoids § If the frequency compoment in 𝑦 𝑜 is not exactly on one of the sinusoids – For example, if 𝑦 𝑜 is a sinusoid with an arbitrary frequency 𝜕: 𝑦 𝑜 = 𝑓 =l5 %+$ %+$ %+$ 𝑌 𝑙 = M 𝑦 𝑜 𝑓 +=34*5 = M 𝑓 =l5 𝑓 +=34*5 = M 𝑓 =(l+34* % )5 % % 5,- 5,- 5,- ((𝜕 − 2𝜌𝑙 = 1 − 𝑓 =(l+34* % )% %/3 sin 𝑂 )𝑂/2) % ) = 𝑓 = l+34* % ((𝜕 − 2𝜌𝑙 1 − 𝑓 =(l+34* sin 𝑂 )/2) 17

  18. Cracks in Sinusoids on the sinusoids off the sinusoids 2 2 1 1 Amplitude 0 0 − 1 − 1 − 2 − 2 5 10 15 20 25 30 5 10 15 20 25 30 20 20 ((𝜕 − 2𝜌𝑙 ((𝜕 − 2𝜌𝑙 sin 𝑂 )𝑂/2) sin 𝑂 )𝑂/2) 15 15 Magntude Magntude ((𝜕 − 2𝜌𝑙 ((𝜕 − 2𝜌𝑙 sin 𝑂 )/2) sin 𝑂 )/2) 10 10 5 5 0 0 0 2 4 6 8 10 0 2 4 6 8 10 𝜕 𝜕 18

  19. Zero-padding § Adding zeros to a windowed frame in time domain – Corresponds to “ideal interpolation” in frequency domain – In practice, FFT size increases by the size of zero-padding Before Zeropadding After Zeropadding (x4) 2 2 Amplitude 1 1 0 0 − 1 − 1 − 2 − 2 50 100 150 200 250 200 400 600 800 1000 1200 100 150 80 Magntude 100 60 40 50 20 0 0 0 2 4 6 8 10 12 14 16 0 10 20 30 40 50 60 19

  20. Examples of DFT 150 0.5 magnitude amplitude 100 0 50 − 0.5 0 5 10 15 20 25 30 35 40 45 50 0 500 1000 1500 2000 2500 3000 3500 4000 time − milliseconds freqeuncy Sine: waveform Sine: spectrum 15 0.5 magnitude amplitude 10 0 5 − 0.5 0 0 20 40 60 80 100 120 140 160 0 0.5 1 1.5 2 2.5 time − milliseconds freqeuncy 4 x 10 Drum: waveform Drum: spectrum 0.4 40 0.2 30 magnitude amplitude 0 20 − 0.2 10 − 0.4 0 0 0.5 1 1.5 2 2.5 50 52 54 56 58 60 time − milliseconds freqeuncy 4 x 10 Flute: waveform 20 Flute: spectrum

  21. Fast Fourier Transform (FFT) § Matrix multiplication view of DFT § In fact, we don’t compute this directly. There is a more efficiently way, which is called “Fast Fourier Transform (FFT)” – Complexity reduction by FFT: O( N 2 ) à O( N log 2 N ) – Divide and conquer 21

  22. Short-Time Fourier Transform (STFT) § DFT assumes that the signal is stationary – It is not a good idea to apply DFT to a long and dynamically changing signal like music – Instead, we segment the signal and apply DFT separately § Short-Time Fourier Transform : hop size ℎ %+$ 𝑌(𝑙, 𝑚) = M 𝑥(𝑜)𝑦(𝑜 + 𝑚 Z ℎ)𝑓 += 34*5 𝑥(𝑜) : window % : FFT size 𝑂 5,- § This produces 2-D time-frequency representations – Get “spectrogram” from the magnitude – Parameters: window size, window type, FFT size, hop size 22

  23. Windowing § Types of window functions – Trade-off between the width of main-lobe and the level of side-lobe Main-lobe width Side-lobe level 23

  24. Short-Time Fourier Transform (STFT) Source: the JOS SASP book 50% overlap 24

  25. Example: Waveform Flute A4 Note Piano C4 Note 25

  26. Example: Spectrogram - 2D color map Piano C4 Note Flute A4 Note 26

  27. Example: Spectrogram - 3D waterfall Piano C4 Note Flute A4 Note 27

  28. Example: Pop Music 28

  29. Example: Deep Note 29

  30. Time-Frequency Resolutions in STFT § Trade-off between time- and frequency-resolution by window size < Short window > < Long window > low freq.-resolution high freq.-resolution high time-resolution low time-resolution 30

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend