CTP431- Music and Audio Computing Spectral Analysis Graduate School - - PowerPoint PPT Presentation

ctp431 music and audio computing spectral analysis
SMART_READER_LITE
LIVE PREVIEW

CTP431- Music and Audio Computing Spectral Analysis Graduate School - - PowerPoint PPT Presentation

CTP431- Music and Audio Computing Spectral Analysis Graduate School of Culture Technology KAIST Juhan Nam 1 Outlines Time-domain representation of sound Waveform Time-Frequency domain representation of sound Discrete Fourier


slide-1
SLIDE 1

CTP431- Music and Audio Computing Spectral Analysis

Graduate School of Culture Technology KAIST Juhan Nam

1

slide-2
SLIDE 2

Outlines

§ Time-domain representation of sound

– Waveform

§ Time-Frequency domain representation of sound

– Discrete Fourier Transform (DFT) – Short-time Fourier Transform (STFT)

2

slide-3
SLIDE 3

Waveform

§ Time-domain representation of sound

– Show the amplitude over time

§ Amplitude envelope

– Short-term loudness: e.g. sound level meter – Computed by various methods

  • max-peak picking
  • root-mean-square (RMS)

– ADSR

  • The amplitude envelope of musical sounds are often described with attack,

decay, sustain and release. – Also used for dynamic range compression: e.g. compressor, expander

3

slide-4
SLIDE 4

Example: Waveform and Amplitude Envelopes

4

Piano C4 Note Flute A4 Note

slide-5
SLIDE 5

Spectrogram

§ Time/Frequency-domain representation of sound

– Show the amplitude envelope of individual frequency components over time – Better representation to observe pitch and timbre characteristics – Often called “Sonogram”

§ Visualization

– 2D color map or waterfall

5

slide-6
SLIDE 6

Example: Spectrogram - 2D color map

6

Piano C4 Note Flute A4 Note

slide-7
SLIDE 7

Example: Spectrogram - 3D waterfall

7

Piano C4 Note Flute A4 Note

slide-8
SLIDE 8

Frequency-Domain Representation

§ Can we represent 𝑦 𝑜 with a finite set of sinusoids?

– 𝑦 𝑜 =

$ % ∑

𝐵 𝑙 𝑠

* 𝑜 %+$ *,-

  • 𝑠

* 𝑜 = cos

(

34*5 %

+ ϕ(𝑙)): discrete-time sinusoid with length N – Find 𝐵 𝑙 , ϕ(𝑙)

8

slide-9
SLIDE 9

Euler’s identity

§ Euler’s identity

– Can be proved by Taylor’s series – If 𝜄 = 𝜌, 𝑓=4 + 1 = 0 (“the most beautiful equation in math”)

§ Properties

9

𝑓=@ = cos𝜄 + 𝑘sin𝜄 cos𝜄 = 𝑓=@ + 𝑓+=@ 2 sin𝜄 = 𝑓=@ − 𝑓+=@ 2𝑘

slide-10
SLIDE 10

Complex Sinusoids

§ Cosine and sine can be represented in a single term

– Frequencies:

34* % radian or * % 𝐺 G Hz (𝐺 G: the sampling rate) ( 𝐿 = 0, 1, 2, … , 𝑂 − 1)

– Example: N = 8

10

𝑡* 𝑜 = 𝑓=34*5

%

= cos 2𝜌𝑙𝑜 𝑂 + 𝑘sin 2𝜌𝑙𝑜 𝑂

Figures are from https://ccrma.stanford.edu/~jos/dft/

slide-11
SLIDE 11

Complex Sinusoids

11

N=8

Figures are from https://ccrma.stanford.edu/~jos/dft/

slide-12
SLIDE 12

Frequency-Domain Representation Using Complex Sinusoids

§ 𝑦 𝑜 is expressed in a simpler form:

– Now, how can we find 𝑌 𝑙 ?

12

𝑦 𝑜 = 1 𝑂 M 𝐵 𝑙 cos 2𝜌𝑙𝑜 𝑂 + 𝜚(𝑙)

%+$ *,-

= 1 𝑂 M 𝐵 𝑙 (𝑓=(34*5

% OP * )+𝑓+=(34*5 % OP * ))/2 %+$ *,-

= 1 𝑂 M(𝐵 𝑙 𝑓=P(*)𝑓=34*5

%

+ 𝐵 𝑙 𝑓+=P(*)𝑓+=34*5

% )/2 %+$ *,-

= 1 𝑂 M(𝑌 𝑙 𝑓=34*5

%

+ 𝑌 𝑙 𝑓+=34*5

% )/2 %+$ *,-

= Real{1 𝑂 M 𝑌 𝑙 𝑓=34*5

% %+$ *,-

} = 1 𝑂 M 𝑌 𝑙 𝑓=34*5

% %+$ *,-

𝑌 𝑙 = 𝐵(𝑙)𝑓=X * = 𝐵 𝑙 cos ϕ 𝑙 + 𝑘 sin ϕ 𝑙

slide-13
SLIDE 13

Orthogonality of Sinusoids

§ Inner product between two complex sinusoids

13

𝑡Y 𝑜 Z 𝑡[

∗ 𝑜 = M 𝑓=34Y5 %

Z 𝑓+=34[5

% %+$ 5,-

= ] 𝑂 if 𝑞 = 𝑟 0 otherwise

cos(2

n=0 N−1

π pn / N)cos(2πqn / N)) = N / 2 if p = q or p = N −q

  • therwise

# $ % & %

sin(2

n=0 N−1

π pn / N)sin(2πqn / N)) =

  • therwise

N / 2 if p = q −N / 2 if p = N − q # $ % & % %

cos(2

n=0 N−1

π pn / N)sin(2πqn / N)) = 0

slide-14
SLIDE 14

Orthogonal Projection on Complex Sinusoids

§ Do the inner product with the signal and sinusoids

14

𝑦 𝑜 Z 𝑡[(𝑜) = M 𝑦 𝑜 𝑓+=34[5

%

= M(1 𝑂 M 𝑌 𝑙 𝑓=34Y5

% %+$ Y,-

)𝑓+=34[5

% %+$ 5,- %+$ 5,-

= 1 𝑂 M 𝑌 𝑙 (M 𝑓=34Y5

% %+$ 5,-

𝑓+=34[5

% ) %+$ Y,-

= 1 𝑂 𝑌 𝑙 𝑂 = 𝑌 𝑙 = 𝐵 𝑙 𝑓=X *

slide-15
SLIDE 15

To Wrap Up

§ Discrete Fourier Transform

– Magnitude spectrum: – Phase spectrum:

§ Inverse Discrete Fourier Transform

15

𝑦(𝑜) = 1 𝑂 M 𝑌 𝑙 𝑓=34*5

% %+$ *,-

𝑌 𝑙 = M 𝑦 𝑜 𝑓+=34*5

% %+$ 5,-

= 𝑌e 𝑙 + 𝑘𝑌f 𝑙 = 𝐵(𝑙)=X * 𝑌 𝑙 = 𝐵 𝑙 = 𝑌e

3 𝑙 + 𝑌f 3 𝑙

  • ∠𝑌 𝑙 = ϕ 𝑙 = tan+$(𝑌f(𝑙)

𝑌e(𝑙))

slide-16
SLIDE 16

50 100 150 200 250 300 −0.2 −0.1 0.1 0.2 time−seconds amplitude

20 40 60 80 −0.2 −0.1 0.1 0.2 time−seconds amplitude

Why we choose this set of frequencies in sinusoids?

§ Underlying assumption in DFT

– The N samples are periodic – In the view of “Fourier Series”, a periodic signal with period N can be represented as sinusoids with period N, N/2, N/3, … (1/N, 2/N, 3/N, ... in frequency )

16

slide-17
SLIDE 17

Properties of DFT

§ Periodicity

– 𝑌 𝑙 = 𝑌 𝑙 + 𝑂 = 𝑌 𝑙 + 2𝑂 = … – 𝑌 𝑙 = 𝑌 𝑙 − 𝑂 = 𝑌 𝑙 − 2𝑂 = …

§ Symmetry

– Magnitude response: 𝑌 𝑙 = 𝑌 −𝑙 = 𝑌 𝑂 − 𝑙 – Phase response : ∠𝑌 𝑙 = −∠𝑌 −𝑙 =−∠𝑌 𝑂 − 𝑙 – We often display only half the amplitude and phase responses

17

slide-18
SLIDE 18

Properties of DFT

18

5 10 15 20 25 30

  • 1
  • 0.5

0.5 1

Waveform

5 10 15 20 25 30 5 10 15

Magnitude (N=32)

5 10 15 20 25 30

  • 4
  • 2

2 4

Phase (N=32)

5 10 15 20 25 30

  • 1
  • 0.5

0.5 1

Waveform

  • 15
  • 10
  • 5

5 10 15 5 10 15

Magnitude (N=32)

  • 15
  • 10
  • 5

5 10 15

  • 4
  • 2

2 4

Phase (N=32)

𝑌 𝑙 = 𝑌 𝑂 − 𝑙 𝑌 𝑙 = 𝑌 −𝑙 ∠𝑌 𝑙 =−∠𝑌 𝑂 − 𝑙 ∠𝑌 𝑙 = −∠𝑌 −𝑙

slide-19
SLIDE 19

Frequency Scales

§ 𝑌 𝑙 𝑙 = 0, 1, … , 𝑂 corresponds to frequency values that are evenly distributed between 0 and 𝑔𝑡 in Hz

19

fs

  • fs

N

  • N
  • N/2

N/2

  • fs /2

fs /2

slide-20
SLIDE 20

Examples of DFT

20

Sine: waveform

5 10 15 20 25 30 35 40 45 50 −0.5 0.5 time−milliseconds amplitude 500 1000 1500 2000 2500 3000 3500 4000 50 100 150 freqeuncy magnitude

Sine: spectrum

20 40 60 80 100 120 140 160 −0.5 0.5 time−milliseconds amplitude 0.5 1 1.5 2 2.5 x 10

4

5 10 15 freqeuncy magnitude

Drum: waveform Drum: spectrum

50 52 54 56 58 60 −0.4 −0.2 0.2 0.4 time−milliseconds amplitude 0.5 1 1.5 2 2.5 x 10

4

10 20 30 40 freqeuncy magnitude

Flute: waveform Flute: spectrum

slide-21
SLIDE 21

Fast Fourier Transform (FFT)

§ Matrix multiplication view of DFT § In fact, we don’t compute this directly. There is a more efficiently way, which is called “Fast Fourier Transform (FFT)”

– Complexity reduction by FFT: O(N2)à O(Nlog2N) – Divide and conquer

21

𝑌(0) 𝑌(1) 𝑌(2) 𝑌(3) ⋮ 𝑌(𝑂 − 2) 𝑌(𝑂 − 1) = 1 1 1 1 1 ⋮ 1 𝑋

%

𝑋

% 3

𝑋

% m

⋮ 𝑋

% %+$

1 ⋯ 𝑋

% 3

𝑋

%

  • 𝑋

% p

⋮ 𝑋

% 3(%+$)

⋯ ⋯ ⋯ ⋯ ⋯ ⋯ 1 𝑋

% %+$

𝑋

% 3(%+$)

𝑋

% m(%+$)

𝑋

% (%+$)(%+$)

𝑦(0) 𝑦(1) 𝑦(2) 𝑦(3) ⋮ 𝑦(𝑂 − 2) 𝑦(𝑂 − 1)

slide-22
SLIDE 22

Short-Time Fourier Transform (STFT)

§ DFT assumes that the signal is stationary

– It is not a good idea to apply DFT to a long and dynamically changing signal like music – Instead, we segment the signal and apply DFT separately

§ Short-Time Fourier Transform § This produces 2-D time-frequency representations

– Get “spectrogram” from the magnitude – Parameters: window size, window type, FFT size, hop size

22

: hop size : window : FFT size

𝑌(𝑙, 𝑚) = M 𝑥(𝑜)𝑦(𝑜 + 𝑚 Z ℎ)𝑓+= 34*5

% %+$ 5,-

𝑥(𝑜) 𝑂 ℎ

slide-23
SLIDE 23

Windowing

§ Types of window functions

– Trade-off between the width of main-lobe and the level of side-lobe

23 Main-lobe width Side-lobe level

slide-24
SLIDE 24

Short-Time Fourier Transform (STFT)

24

50% overlap

Source: the JOS SASP book

slide-25
SLIDE 25

Example: Pop Music

25

slide-26
SLIDE 26

Example: Deep Note

26

slide-27
SLIDE 27

Time-Frequency Resolutions in STFT

§ Trade-off between time- and frequency-resolution by window size

27

< Long window > high freq.-resolution low time-resolution < Short window > low freq.-resolution high time-resolution