Audio DSP basics Paris Smaragdis paris@illinois.edu - PowerPoint PPT Presentation

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N CS 498PS – Audio Computing Lab Audio DSP basics Paris Smaragdis paris@illinois.edu paris.cs.illinois.edu

Overview U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N • Basics of digital audio • Signal representations • Time, Frequency, Time/Frequency • Sampling, Quantization • The Fourier transform • DFT and FFT • The Spectogram 2

Why digital audio? U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N • Cheaper • Get a smartphone, do anything you want • No burning circuits! • Easier • You can easily rewrite code • But cannot easily rewire circuits • Smaller • Do everything on one chip 3

Sound as “numbers” U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N • We treat sound as a series of amplitudes • More on the details later • This is the waveform representation • Encodes instantaneous pressure over time 4

PCM format U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N • “Pulse Code Modulation” • Used by CDs, telephones, audio editors, synths, etc. 1 0.5 0 -0.5 -1 1 2 3 4 5 6 7 8 9 10 0, 82, 126, 111, 44, -44, -111, -126, -82, 0 5

This is a discrete and digital format U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N • We do not use continuous values • We have finite samples over time • We (usually) encode these samples as signed integers • Common formats • Speech: 16kHz / 16-bit (or 8-bit) • Music: 44.1kHz / 16-bit (or 95kHz / 24-bit) • But how do we pick these numbers? • What do they mean? 6

Dynamic range U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N • The choice of bits defines the dynamic range • More bits == more dynamic range == more storage • What is dynamic range? • Ratio of highest and lowest represented pressure value • Usually measured in decibels (dB) • How much dynamic range do we need though? 7

It all hinges on how we hear U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N • Outer ear • Sound gets collected at the pinna • The ear canal amplifies (some) sound by ~10dB • The ear drum vibrates according to incoming pressure • Middle ear • The ossicles transfer sound to the oval window • Amplify sound by ~14dB • Also use muscles for damping • Inner ear • Translation to neural signal (more later) 8

Perception of sound U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N • The just noticeable sound is: • 10 -12 W/m 2 (cannot hear softer than this) • And the as noticeable as it get is: • 1 W/m 2 (and then you go deaf!) • Thus our dynamic range is: • 10 log 10 ( 1/10 -2 ) = 120 dB • That’s a staggering trillion to one! 9

To get you oriented U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N • Weakest detectable sound ~0 dB • Soft breathing ~10dB • Quiet library ~40 dB • O ffi ce environment ~60 dB • Food blender ~80 dB • Lawn mower ~90 dB Dangerous levels > 90 dB • Car horn at 1m ~110 dB Pain begins at 125 dB • Military jet at 50ft ~130 dB • Shotgun blast ~165 dB Pain ends at 180 dB • Loudest possible sound 194 dB (cause your ears just blew up) • (after which it isn’t “sound” anymore it is a “shock wave”) 10

Back to digital sound U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N • How many dB dynamic range to use? • Close to 120 dB ideally • Common ranges ( headroom ) • 16-bit / 96 dB (the industry standard) • 12-bit / 72 dB (the cheap standard) • 8-bit / 48 db (the 80’s standard! hipsters?) • 24-bit / 144 dB (the “I’m charging you extra” standard) • Floating point (what we will use) 11

Why worry? U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N • Need headroom to avoid clipping & quantization noise • These happen when the representation is maxed or zero • Very challenging with dynamic content (e.g. classical music) • An audio engineer’s nightmare! (and digital is worse) 0.8 0.6 Hiss 0.4 Gone! 0.2 0 − 0.2 − 0.4 − 0.6 Clipping − 0.8 10 20 30 40 50 60 70 80 90 100 12

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N 13 Quantization noise examples 📼 📼 📼 📼 📼 📼

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N 14 Clipping examples 📼 📼 📼 📼 📼 📼

Sampling in time U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N • Also known as A/D conversion • How to we convert real-world sound to a discrete sequence? • The one parameter we care for: the sample rate • i.e. how often do we represent the input sound • Tradeo ff s • Sample fast and you waste memory and energy • Sample slow and you risk aliasing 15

What is aliasing? U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N • Low sample rates can result in misinterpretations • Sample too low and you will miss some of the action • Rule of thumb: Sample at least at twice the highest frequency 1 0 − 1 100 200 300 400 500 600 700 800 900 1000 1 0 − 1 100 200 300 400 500 600 700 800 900 1000 1 0 − 1 100 200 300 400 500 600 700 800 900 1000 16

How high should we go? U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N • Highest perceived frequency by humans is 20 kHz • Which goes down as you age (or as you abuse your ears) How high can you hear? (or how good are the class speakers?) 4 x 10 📼 21kHz 19kHz 2 17kHz 15kHz Frequency (Hz) 1.5 13kHz 11kHz 9kHz 1 7kHz 5kHz 0.5 3kHz 1kHz 0 2 4 6 8 10 12 14 16 18 Time (sec) • We need to represent up to 20 kHz ⟶ sample at > 40 kHz 17

What does aliasing sound like? U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N • Frequencies higher than Nyquist fold over • Upwards movements go downwards and vice-versa 📼 📼 📼 Chirp @ 44,100 Hz Same chirp @ 22,050 Hz Same chirp @ 11,025 Hz 20 kHz Frequency ⟶ 11 kHz 5.5 kHz 0 Hz 0 Hz 0 Hz Time ⟶ Time ⟶ Time ⟶ • Most noticeable with high-frequency content • How does that sound? at 44.1kHz at 22kHz at 11kHz at 5kHz at 4kHz at 3kHz 📼 📼 📼 📼 📼 📼 18

What are the usual settings? U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N • “High-quality” music: 44.1 kHz • Why the extra 4.1 kHz? • “Super” high quality music: 96 kHz • Dogs might like it more • Speech coding • High(ish) quality & in research: 16 kHz • Telephony: 8 kHz 19

But why do we use the waveform? U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N • Do you see a problem with it? 20

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N 21 What are these signals? 📼 📼 📼 📼

Waveforms are unintuitive at long scales U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N • Pressure information isn’t that perceptually relevant • We cannot interpret it as a percept • Too much data to parse visually • Is there a better way to represent sound? • How do we start looking for such a way? • What is it that is important when listening? 22

Back to hearing … U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N • What happens in the inner ear? • After the oval window there’s the cochlea • Resonates at di ff erent lengths with input • E ff ectively parses sound by frequency • Transmits that vibration to neural code • What we care about is frequency content! 23

What is a frequency component? U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N • You can approximate any waveform by adding sinusoids • They are the elementary building blocks of sounds • Sinusoids have three parameters: Approximating a square wave • Amplitude, frequency and phase • s ( t ) = a ( t ) sin( f t + φ ) • Each sinusoid is a “frequency” • Because that is the main distinguishing parameter 24

Decomposing sounds to sines U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N • For each sound get reconstructing sine parameters • And we’ll be lazy and not bother with frequency • Just get all amplitudes and phases for all integer frequencies • For this we use the Fourier transform • Transforms time samples to the frequency domain , and back ( ) X [ f ] = FT x [ t ] Waveform “Spectrum” (time domain) (frequency domain) x [ t ] = FT − 1 X [ f ] ( ) 25

Audio DSP basics Paris Smaragdis paris@illinois.edu - PowerPoint PPT Presentation

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N CS 498PS Audio Computing Lab Audio DSP basics Paris Smaragdis paris@illinois.edu paris.cs.illinois.edu Overview U N I V E R S I T Y O F I L L I N O I S @ U R B A N

6/23/09 J-DSP: An Online DSP Laboratory Overview J-DSP J-DSP Editor Editor J-DSP blocks

Highlights of the work J-DSP J-DSP Editor Editor Online DSP Quiz integrated with J-DSP

1 Collaborative Project Collaborative EMD Overview J-DSP J-DSP Editor Editor PLANNED IN THIS

J-DSP and Sensor Motes for Universally accessible DSP functions J-DSP Embeds Interactive

Reverse Engineering DSP Code GameCube DSP Analyzing GCN DSP code Pierre Bourdon Conclusion

Contents Slide 1-1 Some DSP Chip History Slide 1-2 Other DSP Manufacturers Slide 1-3 DSP

Solano Community College DSP Solano Community College DSP NVDA & JAWS Screen Reader Student

Contents Slide 1 Some DSP Chip History Slide 2 Other DSP Manufacturers Slide 3 DSP

Audio Device Client Better and Faster Audio I/O on Web Hongchan Choi Google Chrome Web Audio

C55 intro Highlights of the new C55x DSP Architecture The C55x DSP core supports new

Sonocent Solano Community College DSP Solano Community College DSP S onocent Audio Notetaker is a

Chapter 18: Programmable DSPs Keshab K. Parhi and Viktor Owall DSP Applications DSP applications

Cirrus Audio Solutions Cirrus Audio Solutions Home Audio Portable Audio Personal CD Player

Direct Service Purchase (DSP) Restructure Vendor Council Meeting March 30, 2011 Current DSP

Static and Dynamic DSP Operations 818 West Diamond Avenue - Third Floor, Gaithersburg, MD 20878

Component Based Software Engineering approach on DSP Targets Agenda 2 / 2 / Motivations

Implementing Lean on a Global Scale

One Public Estate Programme One Gloucestershire Neil Corbett Head of Property Services

Low Area Dual band LNA with Active Inductor for GSM applications Hyungil Chae, Justin Shaler

London Waterway Partnership Annual Meeting Reception 2 nd September 2016 Holiday Inn, Camden 1

Climate Science Community Outlook on New Global Scenarios Seita Emori Chief, Climate Risk

Water Security in Queensland Wenju Cai, Don Begbie , Matt Gooda and John Ruffini Presentation

High-Rate Sparse Superposition Codes with Iteratively Optimal Estimates Andrew Barron, Sanghee

Quantum Computation John McKinney Ventura College Mentor: Markus Ansmann Professor: Dr.

Sambuz

Useful Links

Newsletter

Mail Us

Audio DSP basics Paris Smaragdis paris@illinois.edu - PowerPoint PPT Presentation

U N I V E R S I T Y O F I L L I N O I S @ U R B A N A - C H A M P A I G N CS 498PS Audio Computing Lab Audio DSP basics Paris Smaragdis paris@illinois.edu paris.cs.illinois.edu Overview U N I V E R S I T Y O F I L L I N O I S @ U R B A N

6/23/09 J-DSP: An Online DSP Laboratory Overview J-DSP J-DSP Editor Editor J-DSP blocks

Highlights of the work J-DSP J-DSP Editor Editor Online DSP Quiz integrated with J-DSP

1 Collaborative Project Collaborative EMD Overview J-DSP J-DSP Editor Editor PLANNED IN THIS

J-DSP and Sensor Motes for Universally accessible DSP functions J-DSP Embeds Interactive

Reverse Engineering DSP Code GameCube DSP Analyzing GCN DSP code Pierre Bourdon Conclusion

Contents Slide 1-1 Some DSP Chip History Slide 1-2 Other DSP Manufacturers Slide 1-3 DSP

Solano Community College DSP Solano Community College DSP NVDA &amp; JAWS Screen Reader Student

Contents Slide 1 Some DSP Chip History Slide 2 Other DSP Manufacturers Slide 3 DSP

Audio Device Client Better and Faster Audio I/O on Web Hongchan Choi Google Chrome Web Audio

C55 intro Highlights of the new C55x DSP Architecture The C55x DSP core supports new

Sonocent Solano Community College DSP Solano Community College DSP S onocent Audio Notetaker is a

Chapter 18: Programmable DSPs Keshab K. Parhi and Viktor Owall DSP Applications DSP applications

Cirrus Audio Solutions Cirrus Audio Solutions Home Audio Portable Audio Personal CD Player

Direct Service Purchase (DSP) Restructure Vendor Council Meeting March 30, 2011 Current DSP

Static and Dynamic DSP Operations 818 West Diamond Avenue - Third Floor, Gaithersburg, MD 20878

Component Based Software Engineering approach on DSP Targets Agenda 2 / 2 / Motivations

Implementing Lean on a Global Scale

One Public Estate Programme One Gloucestershire Neil Corbett Head of Property Services

Low Area Dual band LNA with Active Inductor for GSM applications Hyungil Chae, Justin Shaler

London Waterway Partnership Annual Meeting Reception 2 nd September 2016 Holiday Inn, Camden 1

Climate Science Community Outlook on New Global Scenarios Seita Emori Chief, Climate Risk

Water Security in Queensland Wenju Cai, Don Begbie , Matt Gooda and John Ruffini Presentation

High-Rate Sparse Superposition Codes with Iteratively Optimal Estimates Andrew Barron, Sanghee

Quantum Computation John McKinney Ventura College Mentor: Markus Ansmann Professor: Dr.

Sambuz

Useful Links

Newsletter

Mail Us

Solano Community College DSP Solano Community College DSP NVDA & JAWS Screen Reader Student