EE E6820: Speech & Audio Processing & Recognition Lecture 6: - PowerPoint PPT Presentation

EE E6820: Speech & Audio Processing & Recognition Lecture 6: Music analysis and synthesis 1 Music and nonspeech 2 Music synthesis techniques 3 Sinewave synthesis 4 Music analysis 5 Transcription Dan Ellis <dpwe@ee.columbia.edu> http://www.ee.columbia.edu/~dpwe/e6820/ E6820 SAPR - Dan Ellis L06 - Music A & S 2002-03-04 - 1

Music & nonspeech 1 • What is ‘nonspeech’? - according to research effort: a little music - in the world: most everything high speech music Information content animal sounds machines & engines contact/ collision wind & water low natural man-made Origin attributes? E6820 SAPR - Dan Ellis L06 - Music A & S 2002-03-04 - 2

Sound attributes • Attributes suggest model parameters • What do we notice about ‘general’ sound? - psychophysics: pitch, loudness, ‘timbre’ - bright/dull; sharp/soft; grating/soothing - sound is not ‘abstract’: tendency is to describe by source-events • Ecological perspective - what matters about sound is ‘what happened’ → our percepts express this more-or-less directly E6820 SAPR - Dan Ellis L06 - Music A & S 2002-03-04 - 3

Aside: Sound textures • What do we hear in: - a city street - a symphony orchestra • How do we distinguish: - waterfall - rainfall - applause - static Applause04 Rain01 5000 5000 4000 4000 freq / Hz freq / Hz 3000 3000 2000 2000 1000 1000 0 0 0 1 2 3 4 0 1 2 3 4 time / s time / s • of ecological description... Levels E6820 SAPR - Dan Ellis L06 - Music A & S 2002-03-04 - 4

Motivations for modeling • Describe/classify - cast sound into model because want to use the resulting parameters • Store/transmit - model implicitly exploits limited structure of signal • Resynthesize/modify - model separates out interesting parameters Sound Model parameter space E6820 SAPR - Dan Ellis L06 - Music A & S 2002-03-04 - 5

Analysis and synthesis • Analysis is the converse of synthesis: Model / representation Synthesis Analysis Sound • Can exist apart: - analysis for classification - synthesis of artificial sounds • Often used together: - encoding/decoding of compressed formats - resynthesis based on analyses - analysis-by-synthesis E6820 SAPR - Dan Ellis L06 - Music A & S 2002-03-04 - 6

Outline 1 Music and nonspeech 2 Music synthesis techniques - Framework - Historical development 3 Sinewave synthesis 4 Music analysis 5 Transcription elements? E6820 SAPR - Dan Ellis L06 - Music A & S 2002-03-04 - 7

Music synthesis techniques 2 • What is music? → - could be anything flexible synthesis needed! • Key elements of conventional music - instruments → note-events (time, pitch, accent level) → melody, harmony, rhythm - patterns of repetition & variation • Synthesis framework: instruments: common framework for many notes score: sequence of (time, pitch, level) note events E6820 SAPR - Dan Ellis L06 - Music A & S 2002-03-04 - 8

The nature of musical instrument notes • Characterized by instrument (register), note, loudness (emphasis), articulation... Piano Violin 4000 4000 Frequency 3000 3000 2000 2000 1000 1000 0 0 0 1 2 3 4 0 1 2 3 4 Time Time Clarinet Trumpet 4000 4000 Frequency 3000 3000 2000 2000 1000 1000 0 0 0 1 2 3 4 0 1 2 3 4 Time Time distinguish how? E6820 SAPR - Dan Ellis L06 - Music A & S 2002-03-04 - 9

Development of music synthesis • Goals of music synthesis: - generate realistic / pleasant new notes - control / explore timbre (quality) • Earliest computer systems in 1960s (voice synthesis, algorithmic) • Pure synthesis approaches: - 1970s: Analog synths - 1980s: FM (Stanford/Yamaha) - 1990s: Physical modeling, hybrids • Analysis-synthesis methods: - sampling / wavetables - sinusoid modeling - harmonics + noise (+ transients) others? E6820 SAPR - Dan Ellis L06 - Music A & S 2002-03-04 - 10

Analog synthesis • The minimum to make an ‘interesting’ sound Envelope Trigger Pitch t + Cutoff + Vibrato freq + Oscillator Filter Sound Gain f t • Elements: - harmonics-rich oscillators - time-varying filters - time-varying envelope - modulation: low frequency + envelope-based • Result: - time-varying spectrum, independent pitch E6820 SAPR - Dan Ellis L06 - Music A & S 2002-03-04 - 11

FM synthesis → • Fast frequency modulation sidebands: ∞ ∑ ( ω c t β ( ω m t ) ) J n β ( ) ( ( ω c n ω m ) t ) cos + sin = cos + ∞ n = – ω ω - a harmonic series if = · r c m β • J ( ) is a Bessel function: n 1 J 0 J 1 J 2 J 3 J 4 0.5 J n( β ) ≈ 0 for β < n - 2 0 -0.5 modulation index β 0 1 2 3 4 5 6 7 8 9 → β Complex harmonic spectra by varying 4000 ω c 3000 = 2000 Hz freq / Hz 2000 ω m what = 200 Hz use? 1000 0 time / s 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 E6820 SAPR - Dan Ellis L06 - Music A & S 2002-03-04 - 12

Sampling synthesis 0.2 0.1 • Resynthesis from real notes 0 → vary pitch, duration, level -0.1 -0.2 0 0.1 0.2 time • Pitch: stretch (resample) waveform 0.2 0.2 596 Hz 894 Hz 0.1 0.1 0 0 -0.1 -0.1 -0.2 -0.2 0.008 time / s 0.008 time / s 0 0.002 0.004 0.006 0 0.002 0.004 0.006 • Duration: loop a ‘sustain’ section 0.2 0.2 0.1 0.1 0.204 0.206 0.174 0.176 0 0 -0.1 -0.1 -0.2 -0.2 0 0.1 0.2 0.3 time / s 0 0.1 0.2 0.3 time / s • Level: cross-fade different examples 0.2 0.2 Soft Loud mix 0.1 0.1 good 0 0 -0.1 -0.1 & bad? veloc -0.2 -0.2 0 0.05 0.1 0.15 time / s 0 0.05 0.1 0.15 time / s - need to ‘line up’ source samples E6820 SAPR - Dan Ellis L06 - Music A & S 2002-03-04 - 13

Outline 1 Music and nonspeech 2 Music synthesis techniques 3 Sinewave synthesis (detail) - Sinewave modeling - Sines + residual ... 4 Music analysis 5 Transcription E6820 SAPR - Dan Ellis L06 - Music A & S 2002-03-04 - 14

Sinewave synthesis 3 • If patterns of harmonics are what matter, why not generate them all explicitly: ∑ [ ] [ ] ( k ω 0 n ⋅ [ ] n ⋅ ) s n = A k n cos k - particularly powerful model for pitched signals • Analysis (as with speech): - find peaks in STFT | S [ ω , n ] | & track - or track fundamental ω 0 (harmonics / autoco) & sample STFT at k· ω 0 → set of A k [ n ] to duplicate tone: freq / Hz 8000 6000 2 mag 4000 1 2000 0 0.2 5000 0.1 0 freq / Hz time / s 0 0 time / s 0 0.05 0.1 0.15 0.2 • Synthesis via bank of oscillators E6820 SAPR - Dan Ellis L06 - Music A & S 2002-03-04 - 15

Steps to sinewave modeling - 1 • The underlying STFT: N – 1 j 2 π kn   ∑ [ , ] [ ] w n ⋅ [ ] ⋅ - - - - - - - - - - - - - X k n 0 = x n + n 0 exp –   N n = 0 What value for N ( FFT length & window size ) ? What value for H ( hop size: n 0 = r · H , r = 0, 1, 2... ) ? • STFT window length determines freq. resol’n: X w e j ω X e j ω W e j ω ( ) ( ) ( ) = * • Choose N long enough to resolve harmonics → 2-3x longest (lowest) fundamental period - e.g. 30-60 ms = 480-960 samples @ 16 kHz - choose H ≤ N /2 N too long → lost time resolution • - limits sinusoid amplitude rate of change E6820 SAPR - Dan Ellis L06 - Music A & S 2002-03-04 - 16

Steps to sinewave modeling - 2 • Choose candidate sinusoids at each time by picking peaks in each STFT frame: 8000 freq / Hz 6000 4000 2000 0 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 time / s 20 level / dB 0 -20 -40 -60 0 1000 2000 3000 4000 5000 6000 7000 freq / Hz • Quadratic fit for peak, lin. interp. for phase: 20 0 y phase / rad 10 ab 2 /4 level / dB y = ax(x-b) x 0 -5 b/2 -10 -20 -10 400 600 800 freq / Hz 400 600 800 freq / Hz + linear interp. of unwrapped phase E6820 SAPR - Dan Ellis L06 - Music A & S 2002-03-04 - 17

Steps to sinewave modeling - 3 • Which peaks to pick? Want ‘true’ sinusoids, not noise fluctuations - ‘prominence’ threshold above smoothed spec. 20 level / dB 0 -20 -40 -60 0 1000 2000 3000 4000 5000 6000 7000 freq / Hz • Sinusoids exhibit stability... - of amplitude in time - of phase derivative in time → compare with adjacent time frames to test? E6820 SAPR - Dan Ellis L06 - Music A & S 2002-03-04 - 18

Steps to sinewave modeling - 4 • ‘Grow’ tracks by appending newly-found peaks to existing tracks: freq birth existing tracks time death new peaks - ambiguous assignments possible • Unclaimed new peak - ‘birth’ of new track - backtrack to find earliest trace? • No continuation peak for existing track - ‘death’ of track - or: reduce peak threshold for hysteresis E6820 SAPR - Dan Ellis L06 - Music A & S 2002-03-04 - 19

EE E6820: Speech & Audio Processing & Recognition Lecture 6: - PowerPoint PPT Presentation

EE E6820: Speech & Audio Processing & Recognition Lecture 6: Music analysis and synthesis 1 Music and nonspeech 2 Music synthesis techniques 3 Sinewave synthesis 4 Music analysis 5 Transcription Dan Ellis

EE E6820: Speech & Audio Processing & Recognition Lecture 5: Speech modeling and

EE E6820: Speech & Audio Processing & Recognition Lecture 7: Audio Compression &

EE E6820: Speech & Audio Processing & Recognition Lecture 10: ASR: Sequence Recognition

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

EE E6820: Speech & Audio Processing & Recognition Lecture 8: Spatial sound 1 Spatial

EE E6820: Speech & Audio Processing & Recognition Lecture 4: Auditory Perception 1

EE E6820: Speech & Audio Processing & Recognition Lecture 2: Acoustics 1 The wave

8-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches

HMMS and Speech HMMS and Speech HMMS and Speech Recognition Recognition Recognition Presented

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 25: Speech

EECS E6870 converting speech to text Speech Recognition automatic speech recognition

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Acoustic

Audio- -Visual Automatic Speech Recognition: Visual Automatic Speech Recognition: Audio Theory,

Speech Processing 15-492/18-492 Speech Recognition Signal Processing Analog to Digital Speech

Speech recognition Brief history Technology Computer Literacy 1 Lecture 22 How does

Speech Processing 15-492/18-492 Speech Recognition Template matching Speech Recognition by

Stereo EECS 442 Prof. David Fouhey Winter 2019, University of Michigan

binocular cells Tues. Jan. 30, 2018 1 Recall last lecture: simple Cell Linear response half

A ID IN AFRICA : ECONOMIC RECOVERY AND G ROWTH ECONOMIC RECOVERY AND G ROWTH E Ernest Aryeetey t

Analytical Method Validation Common Problem 2 Centre for Quality Control National Pharmaceutical

Produit T., Tuia D. LaSIG, EPFL 26-Oct-12 1 OGRS 2012, Yverdon-les-Bains, Produit T.

THE IMPLEMENTATION OF THE EU SUCCESSION REGULATION IN GERMANY: A FIRST ASSESSMENT OF THE PROPOSAL

Water Company Complaints Central and Eastern Committee Meeting in Public 23 February 2016

@ipaawa #BrianGleeson What progress has been made in Indigenous Remote Service Delivery?

Sambuz

Useful Links

Newsletter

Mail Us

EE E6820: Speech & Audio Processing & Recognition Lecture 6: - PowerPoint PPT Presentation

EE E6820: Speech & Audio Processing & Recognition Lecture 6: Music analysis and synthesis 1 Music and nonspeech 2 Music synthesis techniques 3 Sinewave synthesis 4 Music analysis 5 Transcription Dan Ellis

EE E6820: Speech &amp; Audio Processing &amp; Recognition Lecture 5: Speech modeling and

EE E6820: Speech &amp; Audio Processing &amp; Recognition Lecture 7: Audio Compression &amp;

EE E6820: Speech &amp; Audio Processing &amp; Recognition Lecture 10: ASR: Sequence Recognition

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

EE E6820: Speech &amp; Audio Processing &amp; Recognition Lecture 8: Spatial sound 1 Spatial

EE E6820: Speech &amp; Audio Processing &amp; Recognition Lecture 4: Auditory Perception 1

EE E6820: Speech &amp; Audio Processing &amp; Recognition Lecture 2: Acoustics 1 The wave

8-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches

HMMS and Speech HMMS and Speech HMMS and Speech Recognition Recognition Recognition Presented

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 25: Speech

EECS E6870 converting speech to text Speech Recognition automatic speech recognition

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Acoustic

Audio- -Visual Automatic Speech Recognition: Visual Automatic Speech Recognition: Audio Theory,

Speech Processing 15-492/18-492 Speech Recognition Signal Processing Analog to Digital Speech

Speech recognition Brief history Technology Computer Literacy 1 Lecture 22 How does

Speech Processing 15-492/18-492 Speech Recognition Template matching Speech Recognition by

Stereo EECS 442 Prof. David Fouhey Winter 2019, University of Michigan

binocular cells Tues. Jan. 30, 2018 1 Recall last lecture: simple Cell Linear response half

A ID IN AFRICA : ECONOMIC RECOVERY AND G ROWTH ECONOMIC RECOVERY AND G ROWTH E Ernest Aryeetey t

Analytical Method Validation Common Problem 2 Centre for Quality Control National Pharmaceutical

Produit T., Tuia D. LaSIG, EPFL 26-Oct-12 1 OGRS 2012, Yverdon-les-Bains, Produit T.

THE IMPLEMENTATION OF THE EU SUCCESSION REGULATION IN GERMANY: A FIRST ASSESSMENT OF THE PROPOSAL

Water Company Complaints Central and Eastern Committee Meeting in Public 23 February 2016

@ipaawa #BrianGleeson What progress has been made in Indigenous Remote Service Delivery?

Sambuz

Useful Links

Newsletter

Mail Us

EE E6820: Speech & Audio Processing & Recognition Lecture 5: Speech modeling and

EE E6820: Speech & Audio Processing & Recognition Lecture 7: Audio Compression &

EE E6820: Speech & Audio Processing & Recognition Lecture 10: ASR: Sequence Recognition

EE E6820: Speech & Audio Processing & Recognition Lecture 8: Spatial sound 1 Spatial

EE E6820: Speech & Audio Processing & Recognition Lecture 4: Auditory Perception 1

EE E6820: Speech & Audio Processing & Recognition Lecture 2: Acoustics 1 The wave