ctp431 music and audio computing spectral analysis
play

CTP431- Music and Audio Computing Spectral Analysis Graduate School - PowerPoint PPT Presentation

CTP431- Music and Audio Computing Spectral Analysis Graduate School of Culture Technology KAIST Juhan Nam 1 Outlines Time-domain representation of sound Waveform Time-Frequency domain representation of sound Discrete Fourier


  1. CTP431- Music and Audio Computing Spectral Analysis Graduate School of Culture Technology KAIST Juhan Nam 1

  2. Outlines § Time-domain representation of sound – Waveform § Time-Frequency domain representation of sound – Discrete Fourier Transform (DFT) – Short-time Fourier Transform (STFT) 2

  3. Waveform § Time-domain representation of sound – Show the amplitude over time § Amplitude envelope – Short-term loudness: e.g. sound level meter – Computed by various methods • max-peak picking • root-mean-square (RMS) – ADSR • The amplitude envelope of musical sounds are often described with attack, decay, sustain and release. – Also used for dynamic range compression: e.g. compressor, expander 3

  4. Example: Waveform and Amplitude Envelopes Flute A4 Note Piano C4 Note 4

  5. Spectrogram § Time/Frequency-domain representation of sound – Show the amplitude envelope of individual frequency components over time – Better representation to observe pitch and timbre characteristics – Often called “Sonogram” § Visualization – 2D color map or waterfall 5

  6. Example: Spectrogram - 2D color map Piano C4 Note Flute A4 Note 6

  7. Example: Spectrogram - 3D waterfall Piano C4 Note Flute A4 Note 7

  8. Frequency-Domain Representation § Can we represent 𝑦 𝑜 with a finite set of sinusoids? $ %+$ % ∑ – 𝑦 𝑜 = 𝐵 𝑙 𝑠 * 𝑜 *,- 34*5 + ϕ(𝑙) ): discrete-time sinusoid with length N • 𝑠 * 𝑜 = cos ( % – Find 𝐵 𝑙 , ϕ(𝑙) 8

  9. Euler’s identity § Euler’s identity 𝑓 =@ = cos𝜄 + 𝑘sin𝜄 – Can be proved by Taylor’s series – If 𝜄 = 𝜌 , 𝑓 =4 + 1 = 0 (“the most beautiful equation in math”) § Properties sin𝜄 = 𝑓 =@ − 𝑓 +=@ cos𝜄 = 𝑓 =@ + 𝑓 +=@ 2𝑘 2 9

  10. Complex Sinusoids § Cosine and sine can be represented in a single term = cos 2𝜌𝑙𝑜 + 𝑘sin 2𝜌𝑙𝑜 𝑡 * 𝑜 = 𝑓 =34*5 % 𝑂 𝑂 34* * – Frequencies: % radian or G Hz ( 𝐺 G : the sampling rate) ( 𝐿 = 0, 1, 2, … , 𝑂 − 1) % 𝐺 – Example: N = 8 10 Figures are from https://ccrma.stanford.edu/~jos/dft/

  11. Complex Sinusoids N=8 11 Figures are from https://ccrma.stanford.edu/~jos/dft/

  12. Frequency-Domain Representation Using Complex Sinusoids § 𝑦 𝑜 is expressed in a simpler form: %+$ 𝑦 𝑜 = 1 𝑂 M 𝐵 𝑙 cos 2𝜌𝑙𝑜 + 𝜚(𝑙) 𝑂 *,- %+$ %+$ = 1 = 1 𝑂 M 𝐵 𝑙 (𝑓 =(34*5 OP * ) +𝑓 +=(34*5 𝑂 M(𝐵 𝑙 𝑓 =P(*) 𝑓 =34*5 + 𝐵 𝑙 𝑓 +=P(*) 𝑓 +=34*5 OP * ) )/2 % )/2 % % % *,- *,- %+$ %+$ = 1 = Real{1 𝑂 M(𝑌 𝑙 𝑓 =34*5 + 𝑌 𝑙 𝑓 +=34*5 𝑂 M 𝑌 𝑙 𝑓 =34*5 % )/2 } % % *,- *,- %+$ = 1 𝑂 M 𝑌 𝑙 𝑓 =34*5 𝑌 𝑙 = 𝐵(𝑙)𝑓 =X * = 𝐵 𝑙 cos ϕ 𝑙 + 𝑘 sin ϕ 𝑙 % *,- – Now, how can we find 𝑌 𝑙 ? 12

  13. Orthogonality of Sinusoids § Inner product between two complex sinusoids %+$ ∗ 𝑜 = M 𝑓 =34Y5 Z 𝑓 +=34[5 𝑂 if 𝑞 = 𝑟 𝑡 Y 𝑜 Z 𝑡 [ = ] % % 0 otherwise 5,- # 0 otherwise % N − 1 N − 1 ∑ ∑ sin(2 π pn / N )sin(2 π qn / N )) = N / 2 if p = q cos(2 π pn / N )sin(2 π qn / N )) = 0 $ % n = 0 n = 0 − N / 2 if p = N − q % & # N − 1 N / 2 if p = q or p = N − q % ∑ cos(2 π pn / N )cos(2 π qn / N )) = $ 0 otherwise % & n = 0 13

  14. Orthogonal Projection on Complex Sinusoids § Do the inner product with the signal and sinusoids %+$ %+$ %+$ = M(1 𝑦 𝑜 Z 𝑡 [ (𝑜) = M 𝑦 𝑜 𝑓 +=34[5 𝑂 M 𝑌 𝑙 𝑓 =34Y5 )𝑓 +=34[5 % % % 5,- 5,- Y,- %+$ %+$ = 1 = 1 𝑂 M 𝑌 𝑙 (M 𝑓 =34Y5 𝑓 +=34[5 % ) 𝑂 𝑌 𝑙 𝑂 = 𝑌 𝑙 = 𝐵 𝑙 𝑓 =X * % Y,- 5,- 14

  15. � To Wrap Up § Discrete Fourier Transform %+$ 𝑌 𝑙 = M 𝑦 𝑜 𝑓 +=34*5 = 𝑌 e 𝑙 + 𝑘𝑌 f 𝑙 = 𝐵(𝑙) =X * % 5,- 3 𝑙 + 𝑌 f 3 𝑙 – Magnitude spectrum: 𝑌 𝑙 = 𝐵 𝑙 = 𝑌 e ∠𝑌 𝑙 = ϕ 𝑙 = tan +$ (𝑌 f (𝑙) – Phase spectrum: 𝑌 e (𝑙)) § Inverse Discrete Fourier Transform %+$ 𝑦(𝑜) = 1 𝑂 M 𝑌 𝑙 𝑓 =34*5 % *,- 15

  16. Why we choose this set of frequencies in sinusoids? § Underlying assumption in DFT – The N samples are periodic – In the view of “Fourier Series”, a periodic signal with period N can be represented as sinusoids with period N, N/2, N/3 , … ( 1/N, 2/N, 3/N , ... in frequency ) 0.2 0.1 amplitude 0 − 0.1 − 0.2 0 20 40 60 80 time − seconds 0.2 0.1 amplitude 0 − 0.1 − 0.2 16 0 50 100 150 200 250 300 time − seconds

  17. Properties of DFT § Periodicity – 𝑌 𝑙 = 𝑌 𝑙 + 𝑂 = 𝑌 𝑙 + 2𝑂 = … – 𝑌 𝑙 = 𝑌 𝑙 − 𝑂 = 𝑌 𝑙 − 2𝑂 = … § Symmetry – Magnitude response: 𝑌 𝑙 = 𝑌 𝑂 − 𝑙 = 𝑌 −𝑙 – Phase response : ∠𝑌 𝑙 = −∠𝑌 −𝑙 = −∠𝑌 𝑂 − 𝑙 – We often display only half the amplitude and phase responses 17

  18. Properties of DFT Waveform Waveform 1 1 0.5 0.5 0 0 -0.5 -0.5 -1 -1 0 5 10 15 20 25 30 0 5 10 15 20 25 30 Magnitude (N=32) Magnitude (N=32) 15 15 𝑌 𝑙 = 𝑌 𝑂 − 𝑙 𝑌 𝑙 = 𝑌 −𝑙 10 10 5 5 0 0 0 5 10 15 20 25 30 -15 -10 -5 0 5 10 15 Phase (N=32) Phase (N=32) 4 4 ∠𝑌 𝑙 = −∠𝑌 𝑂 − 𝑙 ∠𝑌 𝑙 = −∠𝑌 −𝑙 2 2 0 0 -2 -2 -4 -4 0 5 10 15 20 25 30 -15 -10 -5 0 5 10 15 18

  19. Frequency Scales 𝑙 = 0, 1, … , 𝑂 corresponds to frequency values that are evenly § 𝑌 𝑙 distributed between 0 and 𝑔𝑡 in Hz N -N -N/2 0 N/2 f s -f s 0 f s /2 -f s /2 19

  20. Examples of DFT 150 0.5 magnitude amplitude 100 0 50 − 0.5 0 5 10 15 20 25 30 35 40 45 50 0 500 1000 1500 2000 2500 3000 3500 4000 time − milliseconds freqeuncy Sine: waveform Sine: spectrum 15 0.5 magnitude amplitude 10 0 5 − 0.5 0 0 20 40 60 80 100 120 140 160 0 0.5 1 1.5 2 2.5 time − milliseconds freqeuncy 4 x 10 Drum: waveform Drum: spectrum 0.4 40 0.2 30 magnitude amplitude 0 20 − 0.2 10 − 0.4 0 0 0.5 1 1.5 2 2.5 50 52 54 56 58 60 time − milliseconds freqeuncy 4 x 10 Flute: waveform 20 Flute: spectrum

  21. Fast Fourier Transform (FFT) § Matrix multiplication view of DFT 1 1 ⋯ 1 1 𝑌(0) 𝑦(0) %+$ 𝑋 3 𝑋 𝑋 ⋯ % 𝑌(1) 𝑦(1) 1 % % 3(%+$) o 𝑋 𝑋 ⋯ 3 𝑌(2) 1 𝑋 𝑦(2) % % % ⋯ p m(%+$) = 1 𝑌(3) m 𝑋 𝑦(3) 𝑋 𝑋 % % % ⋯ ⋮ ⋮ ⋮ ⋮ ⋮ ⋯ 𝑌(𝑂 − 2) 𝑦(𝑂 − 2) ⋯ 3(%+$) 1 𝑌(𝑂 − 1) %+$ 𝑦(𝑂 − 1) 𝑋 𝑋 (%+$)(%+$) 𝑋 % % % § In fact, we don’t compute this directly. There is a more efficiently way, which is called “Fast Fourier Transform (FFT)” – Complexity reduction by FFT: O( N 2 ) à O( N log 2 N ) – Divide and conquer 21

  22. Short-Time Fourier Transform (STFT) § DFT assumes that the signal is stationary – It is not a good idea to apply DFT to a long and dynamically changing signal like music – Instead, we segment the signal and apply DFT separately § Short-Time Fourier Transform : hop size ℎ %+$ 𝑌(𝑙, 𝑚) = M 𝑥(𝑜)𝑦(𝑜 + 𝑚 Z ℎ)𝑓 += 34*5 𝑥(𝑜) : window % : FFT size 𝑂 5,- § This produces 2-D time-frequency representations – Get “spectrogram” from the magnitude – Parameters: window size, window type, FFT size, hop size 22

  23. Windowing § Types of window functions – Trade-off between the width of main-lobe and the level of side-lobe Main-lobe width Side-lobe level 23

  24. Short-Time Fourier Transform (STFT) Source: the JOS SASP book 50% overlap 24

  25. Example: Pop Music 25

  26. Example: Deep Note 26

  27. Time-Frequency Resolutions in STFT § Trade-off between time- and frequency-resolution by window size < Short window > < Long window > low freq.-resolution high freq.-resolution high time-resolution low time-resolution 27

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend