GCT535- Sound Technology for Multimedia Fourier Representations of - - PowerPoint PPT Presentation

gct535 sound technology for multimedia fourier
SMART_READER_LITE
LIVE PREVIEW

GCT535- Sound Technology for Multimedia Fourier Representations of - - PowerPoint PPT Presentation

GCT535- Sound Technology for Multimedia Fourier Representations of Audio Graduate School of Culture Technology KAIST Juhan Nam 1 Waveforms The basic audio representation that computers can take x(n) = [a1, a2, a3, ...] Great to


slide-1
SLIDE 1

GCT535- Sound Technology for Multimedia Fourier Representations of Audio

Graduate School of Culture Technology KAIST Juhan Nam

1

slide-2
SLIDE 2

Waveforms

§ The basic audio representation that computers can take

– x(n) = [a1, a2, a3, ...]

§ Great to observe energy change over time but less intuitive to observe pitch or timbre characteristics § Better representations than this?

2

slide-3
SLIDE 3

Sound Generation

§ Mass-Spring Model

– Simple harmonic motion – This generates a sinusoidal oscillation – Practical models have dampers that make the oscillation decay over time

3

Inertial force

F = −kx = m d 2x dt2 x k m

Restoration force

1 2 3 4 5 6 7 8 x 10

−3

−1 −0.5 0.5 1

T = 1 f

x = Asin(ωt) = Asin(2π ft) ω = k / m f =ω / 2π T =1/ f

angular frequency frequency period

slide-4
SLIDE 4

Sound Generation

§ Any oscillatory object can be modeled as a complex network of masses and springs

– This generates a complex tone

§ Generation steps (e.g. guitar)

– Excitation: wideband energy – Propagation on the string – Reflection on the ends – Superposition with reflected waves – Standing waves: constructive superposition – Radiation from the object – Propagation through air

4

. . . String oscillation

http://www.acs.psu.edu/drussell/demos.html

Demos

slide-5
SLIDE 5

Frequency-Domain Representation

§ Can we represent 𝑦 𝑜 with a finite set of sinusoids?

– 𝑦 𝑜 =

$ % ∑

𝐵 𝑙 𝑠

* 𝑜 %+$ *,-

  • 𝑠

* 𝑜 = cos

(

34*5 %

+ ϕ(𝑙)): discrete-time sinusoid with length N – Find 𝐵 𝑙 , ϕ(𝑙)

5

slide-6
SLIDE 6

Euler’s identity

§ Euler’s identity

– Can be proved by Taylor’s series – If 𝜄 = 𝜌, 𝑓=4 + 1 = 0 (“the most beautiful equation in math”)

§ Properties

6

𝑓=@ = cos𝜄 + 𝑘sin𝜄 cos𝜄 = 𝑓=@ + 𝑓+=@ 2 sin𝜄 = 𝑓=@ − 𝑓+=@ 2𝑘

slide-7
SLIDE 7

Complex Sinusoids

§ Cosine and sine can be represented in a single term

– Frequencies:

34* % radian or * % 𝐺 G Hz (𝐺 G: the sampling rate) ( 𝐿 = 0, 1, 2, … , 𝑂 − 1)

– Example: N = 8

7

𝑡* 𝑜 = 𝑓=34*5

%

= cos 2𝜌𝑙𝑜 𝑂 + 𝑘sin 2𝜌𝑙𝑜 𝑂

Figures are from https://ccrma.stanford.edu/~jos/dft/

slide-8
SLIDE 8

Complex Sinusoids

8

N=8

Figures are from https://ccrma.stanford.edu/~jos/dft/

slide-9
SLIDE 9

Frequency-Domain Representation Using Complex Sinusoids

§ 𝑦 𝑜 is expressed in a simpler form:

– Now, how can we find 𝑌 𝑙 ?

9

𝑦 𝑜 = 1 𝑂 M 𝐵 𝑙 cos 2𝜌𝑙𝑜 𝑂 + 𝜚(𝑙)

%+$ *,-

= 1 𝑂 M 𝐵 𝑙 (𝑓=(34*5

% OP * )+𝑓+=(34*5 % OP * ))/2 %+$ *,-

= 1 𝑂 M(𝐵 𝑙 𝑓=P(*)𝑓=34*5

%

+ 𝐵 𝑙 𝑓+=P(*)𝑓+=34*5

% )/2 %+$ *,-

= 1 𝑂 M(𝑌 𝑙 𝑓=34*5

%

+ 𝑌 𝑙 𝑓+=34*5

% )/2 %+$ *,-

= Real{1 𝑂 M 𝑌 𝑙 𝑓=34*5

% %+$ *,-

} = 1 𝑂 M 𝑌 𝑙 𝑓=34*5

% %+$ *,-

𝑌 𝑙 = 𝐵(𝑙)𝑓=X * = 𝐵 𝑙 cos ϕ 𝑙 + 𝑘 sin ϕ 𝑙

slide-10
SLIDE 10

Orthogonality of Sinusoids

§ Inner product between two complex sinusoids

10

𝑡Y 𝑜 Z 𝑡[

∗ 𝑜 = M 𝑓=34Y5 %

Z 𝑓+=34[5

% %+$ 5,-

= ] 𝑂 if 𝑞 = 𝑟 0 otherwise

cos(2

n=0 N−1

π pn / N)cos(2πqn / N)) = N / 2 if p = q or p = N −q

  • therwise

# $ % & %

sin(2

n=0 N−1

π pn / N)sin(2πqn / N)) =

  • therwise

N / 2 if p = q −N / 2 if p = N − q # $ % & % %

cos(2

n=0 N−1

π pn / N)sin(2πqn / N)) = 0

slide-11
SLIDE 11

Orthogonal Projection on Complex Sinusoids

§ Do the inner product with the signal and sinusoids

11

𝑦 𝑜 Z 𝑡*(𝑜) = M 𝑦 𝑜 𝑓+=34[5

%

= M(1 𝑂 M 𝑌 𝑙 𝑓=34Y5

% %+$ Y,-

)𝑓+=34[5

% %+$ 5,- %+$ 5,-

= 1 𝑂 M 𝑌 𝑙 (M 𝑓=34Y5

% %+$ 5,-

𝑓+=34[5

% ) %+$ Y,-

= 1 𝑂 𝑌 𝑙 𝑂 = 𝑌 𝑙 = 𝐵 𝑙 𝑓=X *

slide-12
SLIDE 12

To Wrap Up

§ Discrete Fourier Transform

– Magnitude spectrum: – Phase spectrum:

§ Inverse Discrete Fourier Transform

12

𝑦(𝑜) = 1 𝑂 M 𝑌 𝑙 𝑓=34*5

% %+$ *,-

𝑌 𝑙 = M 𝑦 𝑜 𝑓+=34*5

% %+$ 5,-

= 𝑌e 𝑙 + 𝑘𝑌f 𝑙 = 𝐵(𝑙)=X * 𝑌 𝑙 = 𝐵 𝑙 = 𝑌e

3 𝑙 + 𝑌f 3 𝑙

  • ∠𝑌 𝑙 = ϕ 𝑙 = tan+$(𝑌f(𝑙)

𝑌e(𝑙))

slide-13
SLIDE 13

50 100 150 200 250 300 −0.2 −0.1 0.1 0.2 time−seconds amplitude

20 40 60 80 −0.2 −0.1 0.1 0.2 time−seconds amplitude

Why we choose this set of frequencies in sinusoids?

§ Underlying assumption in DFT

– The N samples are periodic – In the view of “Fourier Series”, a periodic signal with period N can be represented as sinusoids with period N, N/2, N/3, … (1/N, 2/N, 3/N, ... in frequency )

13

slide-14
SLIDE 14

Properties of DFT

§ Periodicity

– 𝑌 𝑙 = 𝑌 𝑙 + 𝑂 = 𝑌 𝑙 + 2𝑂 = … – 𝑌 𝑙 = 𝑌 𝑙 − 𝑂 = 𝑌 𝑙 − 2𝑂 = …

§ Symmetry

– Magnitude response: 𝑌 𝑙 = 𝑌 −𝑙 = 𝑌 𝑂 − 𝑙 – Phase response : ∠𝑌 𝑙 = −∠𝑌 −𝑙 =−∠𝑌 𝑂 − 𝑙 – We often display only half the amplitude and phase responses

14

slide-15
SLIDE 15

Properties of DFT

15

5 10 15 20 25 30

  • 1
  • 0.5

0.5 1

Waveform

5 10 15 20 25 30 5 10 15

Magnitude (N=32)

5 10 15 20 25 30

  • 4
  • 2

2 4

Phase (N=32)

5 10 15 20 25 30

  • 1
  • 0.5

0.5 1

Waveform

  • 15
  • 10
  • 5

5 10 15 5 10 15

Magnitude (N=32)

  • 15
  • 10
  • 5

5 10 15

  • 4
  • 2

2 4

Phase (N=32)

𝑌 𝑙 = 𝑌 𝑂 − 𝑙 𝑌 𝑙 = 𝑌 −𝑙 ∠𝑌 𝑙 =−∠𝑌 𝑂 − 𝑙 ∠𝑌 𝑙 = −∠𝑌 −𝑙

slide-16
SLIDE 16

Frequency Scales

§ 𝑌 𝑙 𝑙 = 0, 1, … , 𝑂 corresponds to frequency values that are evenly distributed between 0 to 𝑔𝑡 in Hz

16

fs

  • fs

N

  • N
  • N/2

N/2

  • fs /2

fs /2

slide-17
SLIDE 17

Cracks in Sinusoids

§ If the frequency compoment in 𝑦 𝑜 is not exactly on one of the sinusoids

– For example, if 𝑦 𝑜 is a sinusoid with an arbitrary frequency 𝜕: 𝑦 𝑜 = 𝑓=l5

17

𝑌 𝑙 = M 𝑦 𝑜 𝑓+=34*5

% %+$ 5,-

= M 𝑓=l5𝑓+=34*5

% %+$ 5,-

= M 𝑓=(l+34*

% )5 %+$ 5,-

= 1 − 𝑓=(l+34*

% )%

1 − 𝑓=(l+34*

% ) = 𝑓= l+34* % %/3 sin

((𝜕 − 2𝜌𝑙 𝑂 )𝑂/2) sin ((𝜕 − 2𝜌𝑙 𝑂 )/2)

slide-18
SLIDE 18

5 10 15 20 25 30 −2 −1 1 2 Amplitude

  • n the sinusoids

2 4 6 8 10 5 10 15 20 Magntude 5 10 15 20 25 30 −2 −1 1 2

  • ff the sinusoids

2 4 6 8 10 5 10 15 20 Magntude

Cracks in Sinusoids

18 sin ((𝜕 − 2𝜌𝑙 𝑂 )𝑂/2) sin ((𝜕 − 2𝜌𝑙 𝑂 )/2)

𝜕

sin ((𝜕 − 2𝜌𝑙 𝑂 )𝑂/2) sin ((𝜕 − 2𝜌𝑙 𝑂 )/2)

𝜕

slide-19
SLIDE 19

50 100 150 200 250 −2 −1 1 2

Amplitude Before Zeropadding

2 4 6 8 10 12 14 16 20 40 60 80 100

Magntude

200 400 600 800 1000 1200 −2 −1 1 2

After Zeropadding (x4)

10 20 30 40 50 60 50 100 150

Zero-padding

§ Adding zeros to a windowed frame in time domain

– Corresponds to “ideal interpolation” in frequency domain – In practice, FFT size increases by the size of zero-padding

19

slide-20
SLIDE 20

Examples of DFT

20

Sine: waveform

5 10 15 20 25 30 35 40 45 50 −0.5 0.5 time−milliseconds amplitude 500 1000 1500 2000 2500 3000 3500 4000 50 100 150 freqeuncy magnitude

Sine: spectrum

20 40 60 80 100 120 140 160 −0.5 0.5 time−milliseconds amplitude 0.5 1 1.5 2 2.5 x 10

4

5 10 15 freqeuncy magnitude

Drum: waveform Drum: spectrum

50 52 54 56 58 60 −0.4 −0.2 0.2 0.4 time−milliseconds amplitude 0.5 1 1.5 2 2.5 x 10

4

10 20 30 40 freqeuncy magnitude

Flute: waveform Flute: spectrum

slide-21
SLIDE 21

Fast Fourier Transform (FFT)

§ Matrix multiplication view of DFT § In fact, we don’t compute this directly. There is a more efficiently way, which is called “Fast Fourier Transform (FFT)”

– Complexity reduction by FFT: O(N2)à O(Nlog2N) – Divide and conquer

21

slide-22
SLIDE 22

Short-Time Fourier Transform (STFT)

§ DFT assumes that the signal is stationary

– It is not a good idea to apply DFT to a long and dynamically changing signal like music – Instead, we segment the signal and apply DFT separately

§ Short-Time Fourier Transform § This produces 2-D time-frequency representations

– Get “spectrogram” from the magnitude – Parameters: window size, window type, FFT size, hop size

22

: hop size : window : FFT size

𝑌(𝑙, 𝑚) = M 𝑥(𝑜)𝑦(𝑜 + 𝑚 Z ℎ)𝑓+= 34*5

% %+$ 5,-

𝑥(𝑜) 𝑂 ℎ

slide-23
SLIDE 23

Windowing

§ Types of window functions

– Trade-off between the width of main-lobe and the level of side-lobe

23 Main-lobe width Side-lobe level

slide-24
SLIDE 24

Short-Time Fourier Transform (STFT)

24

50% overlap

Source: the JOS SASP book

slide-25
SLIDE 25

Example: Waveform

25

Piano C4 Note Flute A4 Note

slide-26
SLIDE 26

Example: Spectrogram - 2D color map

26

Piano C4 Note Flute A4 Note

slide-27
SLIDE 27

Example: Spectrogram - 3D waterfall

27

Piano C4 Note Flute A4 Note

slide-28
SLIDE 28

Example: Pop Music

28

slide-29
SLIDE 29

Example: Deep Note

29

slide-30
SLIDE 30

Time-Frequency Resolutions in STFT

§ Trade-off between time- and frequency-resolution by window size

30

< Long window > high freq.-resolution low time-resolution < Short window > low freq.-resolution high time-resolution

slide-31
SLIDE 31

If you want to know more about DFT, …

§ Mathematics of The Discrete Fourier Transform (DFT) with Audio Applications, 2nd Edition, Julius O. Smith III

– https://ccrma.stanford.edu/~jos/dft/

§ The Scientist and Engineer's Guide to Digital Signal Processing, Steven W. Smith

– http://www.dspguide.com/pdfbook.htm

31

slide-32
SLIDE 32

Demo: Fourier Series

§ http://codepen.io/anon/pen/jPGJMK

32