Audio DSP basics Paris Smaragdis paris@illinois.edu - - PowerPoint PPT Presentation

audio dsp basics
SMART_READER_LITE
LIVE PREVIEW

Audio DSP basics Paris Smaragdis paris@illinois.edu - - PowerPoint PPT Presentation

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N CS 498PS Audio Computing Lab Audio DSP basics Paris Smaragdis paris@illinois.edu paris.cs.illinois.edu Overview U N I V E R S I T Y O F I L L I N O I S @ U R B A N


slide-1
SLIDE 1

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

Paris Smaragdis

paris@illinois.edu paris.cs.illinois.edu

CS 498PS – Audio Computing Lab

Audio DSP basics

slide-2
SLIDE 2

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

Overview

  • Basics of digital audio
  • Signal representations
  • Time, Frequency, Time/Frequency
  • Sampling, Quantization
  • The Fourier transform
  • DFT and FFT
  • The Spectogram

2

slide-3
SLIDE 3

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

Why digital audio?

  • Cheaper
  • Get a smartphone, do anything you want
  • No burning circuits!
  • Easier
  • You can easily rewrite code
  • But cannot easily rewire circuits
  • Smaller
  • Do everything on one chip

3

slide-4
SLIDE 4

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

Sound as “numbers”

  • We treat sound as a series of amplitudes
  • More on the details later
  • This is the waveform representation
  • Encodes instantaneous pressure over time

4

slide-5
SLIDE 5

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

PCM format

  • “Pulse Code Modulation”
  • Used by CDs, telephones, audio editors, synths, etc.

5

1 2 3 4 5 6 7 8 9 10

  • 1
  • 0.5

0.5 1

0, 82, 126, 111, 44, -44, -111, -126, -82, 0

slide-6
SLIDE 6

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

This is a discrete and digital format

  • We do not use continuous values
  • We have finite samples over time
  • We (usually) encode these samples as signed integers
  • Common formats
  • Speech: 16kHz / 16-bit (or 8-bit)
  • Music: 44.1kHz / 16-bit (or 95kHz / 24-bit)
  • But how do we pick these numbers?
  • What do they mean?

6

slide-7
SLIDE 7

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

Dynamic range

  • The choice of bits defines the dynamic range
  • More bits == more dynamic range == more storage
  • What is dynamic range?
  • Ratio of highest and lowest represented pressure value
  • Usually measured in decibels (dB)
  • How much dynamic range do we need though?

7

slide-8
SLIDE 8

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

It all hinges on how we hear

  • Outer ear
  • Sound gets collected at the pinna
  • The ear canal amplifies (some) sound by ~10dB
  • The ear drum vibrates according to incoming pressure
  • Middle ear
  • The ossicles transfer sound to the oval window
  • Amplify sound by ~14dB
  • Also use muscles for damping
  • Inner ear
  • Translation to neural signal (more later)

8

slide-9
SLIDE 9

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

Perception of sound

  • The just noticeable sound is:
  • 10-12 W/m2 (cannot hear softer than this)
  • And the as noticeable as it get is:
  • 1 W/m2 (and then you go deaf!)
  • Thus our dynamic range is:
  • 10 log10( 1/10-2) = 120 dB
  • That’s a staggering trillion to one!

9

slide-10
SLIDE 10

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

To get you oriented

  • Weakest detectable sound

~0 dB

  • Soft breathing

~10dB

  • Quiet library

~40 dB

  • Office environment

~60 dB

  • Food blender

~80 dB

  • Lawn mower

~90 dB

  • Car horn at 1m

~110 dB

  • Military jet at 50ft

~130 dB

  • Shotgun blast

~165 dB

  • Loudest possible sound

194 dB

  • (after which it isn’t “sound” anymore it is a “shock wave”)

10

Dangerous levels > 90 dB Pain begins at 125 dB Pain ends at 180 dB (cause your ears just blew up)

slide-11
SLIDE 11

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

Back to digital sound

  • How many dB dynamic range to use?
  • Close to 120 dB ideally
  • Common ranges (headroom)
  • 16-bit / 96 dB (the industry standard)
  • 12-bit / 72 dB (the cheap standard)
  • 8-bit / 48 db (the 80’s standard! hipsters?)
  • 24-bit / 144 dB (the “I’m charging you extra” standard)
  • Floating point (what we will use)

11

slide-12
SLIDE 12

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

Why worry?

  • Need headroom to avoid clipping & quantization noise
  • These happen when the representation is maxed or zero
  • Very challenging with dynamic content (e.g. classical music)
  • An audio engineer’s nightmare! (and digital is worse)

12

10 20 30 40 50 60 70 80 90 100 −0.8 −0.6 −0.4 −0.2 0.2 0.4 0.6 0.8

Hiss Clipping Gone!

slide-13
SLIDE 13

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

Quantization noise examples

13

📼 📼 📼 📼 📼 📼

slide-14
SLIDE 14

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

Clipping examples

14

📼 📼 📼 📼 📼 📼

slide-15
SLIDE 15

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

Sampling in time

  • Also known as A/D conversion
  • How to we convert real-world sound to a discrete sequence?
  • The one parameter we care for: the sample rate
  • i.e. how often do we represent the input sound
  • Tradeoffs
  • Sample fast and you waste memory and energy
  • Sample slow and you risk aliasing

15

slide-16
SLIDE 16

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

What is aliasing?

  • Low sample rates can result in misinterpretations
  • Sample too low and you will miss some of the action
  • Rule of thumb: Sample at least at twice the highest frequency

16 100 200 300 400 500 600 700 800 900 1000 −1 1 100 200 300 400 500 600 700 800 900 1000 −1 1 100 200 300 400 500 600 700 800 900 1000 −1 1

slide-17
SLIDE 17

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

How high should we go?

  • Highest perceived frequency by humans is 20 kHz
  • Which goes down as you age (or as you abuse your ears)
  • We need to represent up to 20 kHz ⟶ sample at > 40 kHz

17

📼

Frequency (Hz) Time (sec) 2 4 6 8 10 12 14 16 18 0.5 1 1.5 2 x 10

4

1kHz 3kHz 5kHz 7kHz 9kHz 11kHz 13kHz 15kHz 17kHz 19kHz 21kHz

How high can you hear? (or how good are the class speakers?)

slide-18
SLIDE 18

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

What does aliasing sound like?

  • Frequencies higher than Nyquist fold over
  • Upwards movements go downwards and vice-versa
  • Most noticeable with high-frequency content
  • How does that sound?

18

Chirp @ 44,100 Hz Same chirp @ 22,050 Hz Same chirp @ 11,025 Hz

Frequency ⟶ Time ⟶ Time ⟶ Time ⟶

0 Hz 0 Hz 0 Hz 20 kHz 11 kHz 5.5 kHz

📼 📼 📼 📼 📼 📼 📼 📼 📼

at 44.1kHz at 11kHz at 4kHz at 22kHz at 5kHz at 3kHz

slide-19
SLIDE 19

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

What are the usual settings?

  • “High-quality” music: 44.1 kHz
  • Why the extra 4.1 kHz?
  • “Super” high quality music: 96 kHz
  • Dogs might like it more
  • Speech coding
  • High(ish) quality & in research: 16 kHz
  • Telephony: 8 kHz

19

slide-20
SLIDE 20

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

But why do we use the waveform?

  • Do you see a problem with it?

20

slide-21
SLIDE 21

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

What are these signals?

21

📼 📼 📼 📼

slide-22
SLIDE 22

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

Waveforms are unintuitive at long scales

  • Pressure information isn’t that perceptually relevant
  • We cannot interpret it as a percept
  • Too much data to parse visually
  • Is there a better way to represent sound?
  • How do we start looking for such a way?
  • What is it that is important when listening?

22

slide-23
SLIDE 23

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

Back to hearing …

  • What happens in the inner ear?
  • After the oval window there’s the cochlea
  • Resonates at different lengths with input
  • Effectively parses sound by frequency
  • Transmits that vibration to neural code
  • What we care about is

frequency content!

23

slide-24
SLIDE 24

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

What is a frequency component?

  • You can approximate any waveform by adding sinusoids
  • They are the elementary building blocks of sounds
  • Sinusoids have three parameters:
  • Amplitude, frequency and phase
  • s(t) = a(t) sin( f t + φ)
  • Each sinusoid is a “frequency”
  • Because that is the main

distinguishing parameter

24

Approximating a square wave

slide-25
SLIDE 25

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

Decomposing sounds to sines

  • For each sound get reconstructing sine parameters
  • And we’ll be lazy and not bother with frequency
  • Just get all amplitudes and phases for all integer frequencies
  • For this we use the Fourier transform
  • Transforms time samples to the frequency domain, and back

25

X[ f ]= FT x[t]

( )

Waveform (time domain) “Spectrum” (frequency domain)

x[t]= FT−1 X[ f ]

( )

slide-26
SLIDE 26

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

And there are many flavors of it

  • Fourier transform (Continuous time ⟷ Continuous frequency)
  • Discrete Time Fourier Transform (DTFT) (Discrete time ⟷ Cont. frequency)
  • Discrete Fourier Transform (DFT) (Discrete time ⟷ Discrete frequency)

26

The one that we will use the most

x t

( )= 1

2π X ω

( )e jωt dω

ω=−∞ ∞

↔ X ω

( )=

x t

( )e− jωt dt

t=−∞ ∞

x n ⎡ ⎣ ⎢ ⎤ ⎦ ⎥ = 1 2π Xd ω

( )e jωn dω

ω=−π π

↔ Xd ω

( )=

x n ⎡ ⎣ ⎢ ⎤ ⎦ ⎥ e− jωt dt

t=−∞ ∞

x n ⎡ ⎣ ⎢ ⎤ ⎦ ⎥ = 1 N X k ⎡ ⎣ ⎢ ⎤ ⎦ ⎥ e

j2πkn N k=0 N−1

↔ X k ⎡ ⎣ ⎢ ⎤ ⎦ ⎥ = x n ⎡ ⎣ ⎢ ⎤ ⎦ ⎥ e

− j2πkn N n=0 N−1

slide-27
SLIDE 27

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

What really happens here?!?

  • Each Fourier basis contains a sine and a cosine
  • The summation estimates how much of each sinusoid

we have in the input time series (inner product)

  • Thus we get the contribution from each frequency

27

X[k]= x[n]e

− j2πkn N n=0 N−1

= x[n] cos 2πkn N ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ ⎟− jsin 2πkn N ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ ⎟ ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ ⎟

n=0 N−1

slide-28
SLIDE 28

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

Getting to the sought-after parameters

  • The magnitude spectrum is |X[k]|
  • Tells us how much of each frequency we have (amplitudes)
  • Adding the sine and cosine terms we make phase-shifted sinusoids
  • The contribution of each frequency is the amount of sine/cosine present
  • The phase spectrum ∡X[k]
  • Gives us each frequency’s phase
  • Does so by looking at the relative amplitudes of the same-frequency

sine/cosine pairs

28

slide-29
SLIDE 29

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

Some examples

29

5 10 15 20 25 30

  • 1
  • 0.5

0.5 1

Single sine input

5 10 15 20

Complex spectrum

5 10 15 20

  • 5

5 10 15 20

Polar spectrum

5 10 15 20 5 10 15 20 25 30

  • 1
  • 0.5

0.5 1

Single square input

  • 25
  • 20
  • 15
  • 10
  • 5

5

Complex spectrum

5 10 15 20

  • 5

5 10 15 20 25

Polar spectrum

5 10 15 20

slide-30
SLIDE 30

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

Some extra info

  • Audio is real-valued
  • DFT results in a conjugate symmetric transform
  • Upper half is redundant (real-valued routines will give you lower half)
  • The first frequency bin is the DC
  • Offset of the input signal (“zero” frequency)
  • Doesn’t have a phase value (why?)
  • The highest frequency bin is the Nyquist
  • Also as no phase value (why?)

30

slide-31
SLIDE 31

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

One problem

  • Sinusoids extend infinitely on both sides
  • i.e. what we approximate is assumed to be periodic
  • So it should transition smoothly from left to right
  • We need to ensure smoothness here
  • Discontinuities will result in extra high frequencies

31

slide-32
SLIDE 32

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

Windowing

  • To avoid periodic discontinuities we can “window”
  • Taper the ends of to zero so that they join better
  • But too much tapering changes the signal!
  • Common side effect: “blurring the spectrum”

32

20 40 60 80 100 120

  • 0.5

0.5

Input

50 100 150 200 250 300 350

  • 0.5

0.5

Periodic version

2 4 6 8

Fourier transform

10 20 30 40 50 60 70 20 40 60 80 100 120

  • 0.6
  • 0.4
  • 0.2

0.2 0.4 0.6 0.8

Windowed input

50 100 150 200 250 300 350

  • 0.6
  • 0.4
  • 0.2

0.2 0.4 0.6

Periodic version

1 2 3 4 5 6

Fourier transform

10 20 30 40 50 60 70

slide-33
SLIDE 33

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

Zero-padding

  • Fourier transforms map N points to N points
  • What if we want to get more outputs? Zero padding!
  • Zero-padding interpolates the frequency domain

33

slide-34
SLIDE 34

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

Some useful Fourier properties

  • Additivity:
  • Shifting:
  • Parseval’s theorem

34

x n ⎡ ⎣ ⎢ ⎤ ⎦ ⎥

DFT

← → ⎯⎯ X k ⎡ ⎣ ⎢ ⎤ ⎦ ⎥, y n ⎡ ⎣ ⎢ ⎤ ⎦ ⎥

DFT

← → ⎯⎯ Y k ⎡ ⎣ ⎢ ⎤ ⎦ ⎥ ⇒ ax n ⎡ ⎣ ⎢ ⎤ ⎦ ⎥ +by n ⎡ ⎣ ⎢ ⎤ ⎦ ⎥

DFT

← → ⎯⎯ aX k ⎡ ⎣ ⎢ ⎤ ⎦ ⎥ +bY k ⎡ ⎣ ⎢ ⎤ ⎦ ⎥

x n ⎡ ⎣ ⎢ ⎤ ⎦ ⎥

DFT

← → ⎯⎯ X k ⎡ ⎣ ⎢ ⎤ ⎦ ⎥ ⇒ x ≪ n−no ≫ N ⎡ ⎣ ⎢ ⎤ ⎦ ⎥

DFT

← → ⎯⎯ X k ⎡ ⎣ ⎢ ⎤ ⎦ ⎥ e

− j2πk N n0

x n ⎡ ⎣ ⎢ ⎤ ⎦ ⎥

DFT

← → ⎯⎯ X k ⎡ ⎣ ⎢ ⎤ ⎦ ⎥ ⇒ x n ⎡ ⎣ ⎢ ⎤ ⎦ ⎥

2 n=0 N−1

= 1 N X k ⎡ ⎣ ⎢ ⎤ ⎦ ⎥

2 n=0 N−1

slide-35
SLIDE 35

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

Frequency domain representation

  • Representing sounds

by frequency content

  • Provides a better

glimpse of the input

  • But provides no

temporal information!

35

50 100 150 200 −1 1 20 40 60 80 100 50 100 50 100 150 200 −1 1 20 40 60 80 100 50 100 150 50 100 150 200 −1 1 20 40 60 80 100 10 20

Time series Magnitude spectrum

slide-36
SLIDE 36

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

On real sounds

  • We get a better sense of what’s in a signal
  • But not of the temporal progression

36

0.5 1 1.5 2 x 10

4

−80 −60 −40 −20 Frequency Power Spectrum Magnitude (dB) First 4 sec 0.5 1 1.5 2 x 10

4

−80 −60 −40 −20 Frequency Power Spectrum Magnitude (dB) Last 4 sec 0.5 1 1.5 2 x 10

4

−80 −60 −40 −20 Frequency Power Spectrum Magnitude (dB) Overall

📼

slide-37
SLIDE 37

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

Adding one more dimension

  • How about sampling spectra periodically?
  • Each sound segment will have it’s own spectrum
  • “Short-time frequency analysis”
  • Break input into analysis windows and DFT them
  • Plot all successive spectra side by side
  • Keeps time info, but also presents frequency content

37

slide-38
SLIDE 38

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

Time/frequency representation

  • Many names/varieties
  • Spectrogram, sonogram, periodogram,

Short-Time Fourier Transform (STFT), …

  • A time-ordered series of frequency

compositions

  • Can help show how things change in

both time and frequency

  • Most useful representation so far!
  • Reveals information about the frequency

content without sacrificing the time information

38

Time series Time/Frequency

50 100 150 200 −1 −0.5 0.5 1 0.5 1 1.5 2 2.5 3 3.5 10 20 30 50 100 150 200 −1 −0.5 0.5 1 0.5 1 1.5 2 2.5 3 3.5 10 20 30 50 100 150 200 −1 −0.5 0.5 1 0.5 1 1.5 2 2.5 3 3.5 10 20 30

slide-39
SLIDE 39

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

The details

  • Get N samples, advance H samples, repeat
  • N is transform size, H is hop size
  • On each N-sample frame apply window
  • Tapers the edges makes for better estimate
  • On each windowed frame apply DFT
  • Gets frequency domain of this section only
  • DFT can also be M-points (M > N)
  • Input is zero-padded to length M
  • Collate all spectra in 2d representation

39

1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

  • 1
  • 0.5

0.5 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 20 40 60 80 100 120

2 4 6 8 10 12 14 16 18 20 40 60 80 100 120

Input Magnitude spectra Spectrogram

slide-40
SLIDE 40

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

Pretty picture version

40

Make frames DFT the frames Show them better

slide-41
SLIDE 41

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

Spectrogram parameters

  • Size of the Discrete Fourier Transform
  • Determines how fine the frequency resolution is
  • To get more frequencies we can zero pad
  • Hop size
  • Determines how fine the temporal resolution is
  • Should not be larger than transform size! (why?)
  • Window
  • Tradeoff between artifacts and frequency resolution
  • Stronger window more blurring, weaker window more artifacts

41

slide-42
SLIDE 42

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

Time/frequency tradeoff

  • The more frequencies, the fewer time points
  • And vice-versa

42

📼

slide-43
SLIDE 43

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

Spectral warping

  • Regular spectrograms are hard to read
  • Frequency warping can help (e.g. Mel scale, Bark scale, etc)

43

slide-44
SLIDE 44

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

Back to a previous example

  • With the spectrogram we can now see what goes on

44

📼

slide-45
SLIDE 45

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

Remember these?

45

slide-46
SLIDE 46

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

We can now “see” what goes on

46

📼 📼 📼 📼

Frequency (Hz) Time (sec) 0.5 1 1.5 2 100 500 1000 2000 3500 5500 Frequency (Hz) Time (sec) 1 2 3 100 500 1000 2000 3500 8000 Frequency (Hz) Time (sec) 2 4 6 100 500 1000 2000 3500 12000 20000 Frequency (Hz) Time (sec) 2 4 6 100 500 1000 2000 3500 12000 20000

slide-47
SLIDE 47

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

Very useful diagnostic tool!

  • Always look at the spectrogram!!
  • Best way to debug audio glitches!

47

slide-48
SLIDE 48

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

Minimum!!

48

https:/ /www.youtube.com/watch?v=faBFiEfPxUU

slide-49
SLIDE 49

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

The inverse spectrogram

  • We can also go from spectrogram to waveform
  • Inverting the spectrogram procedure
  • For each (complex) spectral frame
  • Convert to time segment using inverse DFT
  • Optionally apply a window again
  • To “undo” synthesis window, or to avoid processing artifacts
  • “Overlap and add” segments that coincided

49

slide-50
SLIDE 50

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

Overlap and add

50

… …

Spectra Inverse DFT on respective time Waveform

slide-51
SLIDE 51

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

Careful when inverse windowing!

  • If you use no windows the output scales with hop size
  • Number of times you overlap-add segments
  • If you use windowing you need to satisfy COLA:
  • Constant OverLap Add:
  • where H is the hop size

51

w(n−mH)=1,∀n ∈ !

m=−∞ ∞

slide-52
SLIDE 52

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

What does that mean?

52

slide-53
SLIDE 53

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

Examples of bad windowing

53

📼 📼 📼

slide-54
SLIDE 54

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

Some uses of inverse spectrograms

  • Useful for spectral editing!
  • Demo
  • We will use later for:
  • Denoising
  • Time stretching/compression
  • Spectral manipulations
  • Fast convolutions
  • And many more …

54

slide-55
SLIDE 55

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

  • Pictures to sound

Fun applications

55

📼

slide-56
SLIDE 56

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

A commercial example

56

slide-57
SLIDE 57

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

The Fast Fourier Transform (FFT)

  • Efficient DFT algorithm
  • Huge speedup! (always use it!)
  • Most routines you will find and

use will be FFT routines

  • These return full complex spectrum
  • Some are specifically for real inputs
  • You might have to modify for real inputs

57

slide-58
SLIDE 58

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

Recap

  • Digitizing and discretizing audio
  • Basic things to remember to represent sound best
  • Frequency analysis and the DFT
  • Time-frequency analysis and the spectrogram
  • Also its inverse

58

slide-59
SLIDE 59

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

Reference material

  • Overview of DSP:
  • http://www.dspguide.com/pdfcook.htm
  • Spectral analysis of audio:
  • http://www.dsprelated.com/dspbooks/sasp/

59

slide-60
SLIDE 60

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N

Thursday is lab day

  • First graded lab
  • Implementing a forward/inverse spectrogram
  • Examining sounds using your code
  • Labs administrivia
  • Released on Thursdays, submit solutions within two weeks
  • Send your notebooks via email to me (attached or linked)
  • Use your @illinois email so that I know who you are!!
  • Use subject: “CS498 Lab #” (where # is the lab’s number)
  • Late submissions get zero grade (worst two grades thrown out)

60