CTP431- Music and Audio Computing Spectral Analysis Graduate School - - PowerPoint PPT Presentation

ctp431 music and audio computing spectral analysis
SMART_READER_LITE
LIVE PREVIEW

CTP431- Music and Audio Computing Spectral Analysis Graduate School - - PowerPoint PPT Presentation

CTP431- Music and Audio Computing Spectral Analysis Graduate School of Culture Technology KAIST Juhan Nam 1 Waveform Time-domain representation of sound Show the amplitude over time Amplitude envelope Short-term loudness: e.g.


slide-1
SLIDE 1

CTP431- Music and Audio Computing Spectral Analysis

Graduate School of Culture Technology KAIST Juhan Nam

1

slide-2
SLIDE 2

Waveform

§ Time-domain representation of sound

– Show the amplitude over time

§ Amplitude envelope

– Short-term loudness: e.g. sound level meter – Computed by various methods

  • max-peak picking
  • root-mean-square (RMS)
  • Hilbert transform

– ADSR

  • The amplitude envelope of musical sounds are often described with attack,

decay, sustain and release. – Also used for dynamic range compression: e.g. compressor, expander

2

slide-3
SLIDE 3

Example: Waveform and Amplitude Envelopes

3

Piano C4 Note Flute A4 Note

slide-4
SLIDE 4

Spectrogram

§ Time/Frequency-domain representation of sound

– Show the amplitude envelope of individual frequency components over time – Better representation to observe pitch and timbre characteristics – Often called “Sonogram”

§ Visualization

– 2D color map or waterfall

4

slide-5
SLIDE 5

Example: Spectrogram - 2D color map

5

Piano C4 Note Flute A4 Note

slide-6
SLIDE 6

Example: Spectrogram - 3D waterfall

6

Piano C4 Note Flute A4 Note

slide-7
SLIDE 7

Phasor

§ A complex number representing a sinusoidal function with

– Amplitude, angular frequency, initial phase

7

𝑦 𝑢 = 𝑓%('()*) = cos 𝜕𝑢 + 𝜚 + jsin(𝜕𝑢 + 𝜚)

𝜕𝑢 + 𝜚

Euler’s Identity

1 2 3 4 5 6 7 8 x 10

−3

−1 −0.5 0.5 1

𝜕 = 2𝜌𝑔 = 2𝜌/𝑈

T = 1 f

slide-8
SLIDE 8

Fourier Series

§ Any signal x(t) with period T can be represented as a sum of phasors

– The periods of phasors are T, T/2, T/3, ..., T/n, …

§ Web Audio Examples

– http://codepen.io/anon/pen/jPGJMK

§ How can you get the coefficients?

8

𝑦 𝑢 = real 1 𝑈 ? 𝑠A𝑓% BCA(

D )*E F AGH

= real 1 𝑈 ? 𝑑A

∗𝑓% BCA( D F AGH

𝑑A

∗ = 𝑠A𝑓%*E

ΦA = arctan (𝑐A 𝑏A ) 𝑠A = 𝑏AB + 𝑐A

B

  • 𝑏A = 𝑠Acos

(𝜚A) 𝑐A = 𝑠Asin (𝜚A)

slide-9
SLIDE 9

Orthogonality of Sinusoids

§ The phasors are orthogonal to each other unless they have the same frequency § Using the orthogonality

9

Q 𝑓% BCR(

D

𝑓S% BCT(

D D/B SD/B

𝑒𝑢 = V0 (𝑛 ≠ 𝑜 ) 𝑈 (𝑛 = 𝑜) 𝑑A

∗ = Q

𝑦(𝑢)𝑓S% BCA(

D D/B SD/B

𝑒𝑢

slide-10
SLIDE 10

Discrete Fourier Transform (DFT)

§ Discrete-time version of Fourier series § The number of discrete samples, N, corresponds on the period T

– We assume that the segment x(n) is repeated every N samples

§ Then, we can directly derive DFT and Inverse DFT from

10

x(n) =[x0, x1, x2,!, xN−1]

𝑦 𝑜 = 1 𝑂 ? 𝑌(𝑙)𝑓% BCAT

^ ^S_ AGH

𝑌(𝑙) = ? 𝑦(𝑜)𝑓S% BCAT

^ ^S_ TGH

DFT IDFT

slide-11
SLIDE 11

Discrete Fourier Transform

§ Discrete Fourier Transform

– Magnitude spectrum: – Phase spectrum:

§ We use the magnitude spectrum to display spectrograms

11

𝑌 𝑙 = ? 𝑦 𝑜 𝑓S% BCAT

^ ^S_ TGH

= 𝑌`(𝑙) + 𝑘𝑌b(𝑙) Φ(𝑙) = arctan (𝑌b(𝑙) 𝑌`(𝑙)) 𝐵(𝑙) = 𝑌`(𝑙)B + 𝑌b(𝑙)B

slide-12
SLIDE 12

DFT Sinusoids

12

𝑡A

∗ 𝑜 = 𝑓% BCAT ^

𝑡e

∗ 𝑜

𝑡B

∗ 𝑜

𝑡_

∗ 𝑜

𝑡H

∗ 𝑜

𝑡S_

𝑜 = 𝑡f

∗ 𝑜

𝑡SB

𝑜 = 𝑡g

∗ 𝑜

𝑡Se

𝑜 = 𝑡h

∗ 𝑜

𝑡Si

𝑜 = 𝑡i

∗ 𝑜

𝑂 = 8

Source: the JOS DFT book

slide-13
SLIDE 13

Fast Fourier Transform

§ Matrix multiplication view of DFT § In fact, we don’t compute this directly. There is a more efficiently way, which is called “Fast Fourier Transform (FFT)”

– Complexity reduction by FFT: O(N2) à O(Nlog2N) – Divide and conquer

13 Source: the JOS DFT book

slide-14
SLIDE 14

Examples of DFT

14

Sine waveform Drum Flute

slide-15
SLIDE 15

Short-Time Fourier Transform (STFT)

§ DFT assumes that the signal is stationary

– It is not a good idea to apply DFT to a long and dynamically changing signal like music – Instead, we segment the signal and apply DFT separately

§ Short-Time Fourier Transform § This produces 2-D time-frequency representations

– Get “spectrogram” from the magnitude – Parameters: window size, window type, FFT size, hop size

15

: hop size : window : FFT size

𝑌(𝑙, 𝑚) = ? 𝑥(𝑜)𝑦(𝑜 + 𝑚 n ℎ)𝑓S% BCAT

^ ^S_ TGH

𝑥(𝑜) 𝑂 ℎ

slide-16
SLIDE 16

Windowing

§ Types of window functions

– Trade-off between the width of main-lobe and the level of side-lobe

16 Main-lobe width Side-lobe level

slide-17
SLIDE 17

Short-Time Fourier Transform (STFT)

17

50% overlap

Source: the JOS SASP book

slide-18
SLIDE 18

Example: Pop Music

18

slide-19
SLIDE 19

Example: Deep Note

19

slide-20
SLIDE 20

Time-Frequency Resolutions in STFT

§ Trade-off between time- and frequency-resolution by window size

20

< Long window > high freq.-resolution low time-resolution < Short window > low freq.-resolution high time-resolution