ee e6820 speech audio processing recognition lecture 6
play

EE E6820: Speech & Audio Processing & Recognition Lecture 6: - PowerPoint PPT Presentation

EE E6820: Speech & Audio Processing & Recognition Lecture 6: Music analysis and synthesis 1 Music and nonspeech 2 Music synthesis techniques 3 Sinewave synthesis 4 Music analysis 5 Transcription Dan Ellis


  1. EE E6820: Speech & Audio Processing & Recognition Lecture 6: Music analysis and synthesis 1 Music and nonspeech 2 Music synthesis techniques 3 Sinewave synthesis 4 Music analysis 5 Transcription Dan Ellis <dpwe@ee.columbia.edu> http://www.ee.columbia.edu/~dpwe/e6820/ E6820 SAPR - Dan Ellis L06 - Music A & S 2002-03-04 - 1

  2. Music & nonspeech 1 • What is ‘nonspeech’? - according to research effort: a little music - in the world: most everything high speech music Information content animal sounds machines & engines contact/ collision wind & water low natural man-made Origin attributes? E6820 SAPR - Dan Ellis L06 - Music A & S 2002-03-04 - 2

  3. Sound attributes • Attributes suggest model parameters • What do we notice about ‘general’ sound? - psychophysics: pitch, loudness, ‘timbre’ - bright/dull; sharp/soft; grating/soothing - sound is not ‘abstract’: tendency is to describe by source-events • Ecological perspective - what matters about sound is ‘what happened’ → our percepts express this more-or-less directly E6820 SAPR - Dan Ellis L06 - Music A & S 2002-03-04 - 3

  4. Aside: Sound textures • What do we hear in: - a city street - a symphony orchestra • How do we distinguish: - waterfall - rainfall - applause - static Applause04 Rain01 5000 5000 4000 4000 freq / Hz freq / Hz 3000 3000 2000 2000 1000 1000 0 0 0 1 2 3 4 0 1 2 3 4 time / s time / s • of ecological description... Levels E6820 SAPR - Dan Ellis L06 - Music A & S 2002-03-04 - 4

  5. Motivations for modeling • Describe/classify - cast sound into model because want to use the resulting parameters • Store/transmit - model implicitly exploits limited structure of signal • Resynthesize/modify - model separates out interesting parameters Sound Model parameter space E6820 SAPR - Dan Ellis L06 - Music A & S 2002-03-04 - 5

  6. Analysis and synthesis • Analysis is the converse of synthesis: Model / representation Synthesis Analysis Sound • Can exist apart: - analysis for classification - synthesis of artificial sounds • Often used together: - encoding/decoding of compressed formats - resynthesis based on analyses - analysis-by-synthesis E6820 SAPR - Dan Ellis L06 - Music A & S 2002-03-04 - 6

  7. Outline 1 Music and nonspeech 2 Music synthesis techniques - Framework - Historical development 3 Sinewave synthesis 4 Music analysis 5 Transcription elements? E6820 SAPR - Dan Ellis L06 - Music A & S 2002-03-04 - 7

  8. Music synthesis techniques 2 • What is music? → - could be anything flexible synthesis needed! • Key elements of conventional music - instruments → note-events (time, pitch, accent level) → melody, harmony, rhythm - patterns of repetition & variation • Synthesis framework: instruments: common framework for many notes score: sequence of (time, pitch, level) note events E6820 SAPR - Dan Ellis L06 - Music A & S 2002-03-04 - 8

  9. The nature of musical instrument notes • Characterized by instrument (register), note, loudness (emphasis), articulation... Piano Violin 4000 4000 Frequency 3000 3000 2000 2000 1000 1000 0 0 0 1 2 3 4 0 1 2 3 4 Time Time Clarinet Trumpet 4000 4000 Frequency 3000 3000 2000 2000 1000 1000 0 0 0 1 2 3 4 0 1 2 3 4 Time Time distinguish how? E6820 SAPR - Dan Ellis L06 - Music A & S 2002-03-04 - 9

  10. Development of music synthesis • Goals of music synthesis: - generate realistic / pleasant new notes - control / explore timbre (quality) • Earliest computer systems in 1960s (voice synthesis, algorithmic) • Pure synthesis approaches: - 1970s: Analog synths - 1980s: FM (Stanford/Yamaha) - 1990s: Physical modeling, hybrids • Analysis-synthesis methods: - sampling / wavetables - sinusoid modeling - harmonics + noise (+ transients) others? E6820 SAPR - Dan Ellis L06 - Music A & S 2002-03-04 - 10

  11. Analog synthesis • The minimum to make an ‘interesting’ sound Envelope Trigger Pitch t + Cutoff + Vibrato freq + Oscillator Filter Sound Gain f t • Elements: - harmonics-rich oscillators - time-varying filters - time-varying envelope - modulation: low frequency + envelope-based • Result: - time-varying spectrum, independent pitch E6820 SAPR - Dan Ellis L06 - Music A & S 2002-03-04 - 11

  12. FM synthesis → • Fast frequency modulation sidebands: ∞ ∑ ( ω c t β ( ω m t ) ) J n β ( ) ( ( ω c n ω m ) t ) cos + sin = cos + ∞ n = – ω ω - a harmonic series if = · r c m β • J ( ) is a Bessel function: n 1 J 0 J 1 J 2 J 3 J 4 0.5 J n( β ) ≈ 0 for β < n - 2 0 -0.5 modulation index β 0 1 2 3 4 5 6 7 8 9 → β Complex harmonic spectra by varying 4000 ω c 3000 = 2000 Hz freq / Hz 2000 ω m what = 200 Hz use? 1000 0 time / s 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 E6820 SAPR - Dan Ellis L06 - Music A & S 2002-03-04 - 12

  13. Sampling synthesis 0.2 0.1 • Resynthesis from real notes 0 → vary pitch, duration, level -0.1 -0.2 0 0.1 0.2 time • Pitch: stretch (resample) waveform 0.2 0.2 596 Hz 894 Hz 0.1 0.1 0 0 -0.1 -0.1 -0.2 -0.2 0.008 time / s 0.008 time / s 0 0.002 0.004 0.006 0 0.002 0.004 0.006 • Duration: loop a ‘sustain’ section 0.2 0.2 0.1 0.1 0.204 0.206 0.174 0.176 0 0 -0.1 -0.1 -0.2 -0.2 0 0.1 0.2 0.3 time / s 0 0.1 0.2 0.3 time / s • Level: cross-fade different examples 0.2 0.2 Soft Loud mix 0.1 0.1 good 0 0 -0.1 -0.1 & bad? veloc -0.2 -0.2 0 0.05 0.1 0.15 time / s 0 0.05 0.1 0.15 time / s - need to ‘line up’ source samples E6820 SAPR - Dan Ellis L06 - Music A & S 2002-03-04 - 13

  14. Outline 1 Music and nonspeech 2 Music synthesis techniques 3 Sinewave synthesis (detail) - Sinewave modeling - Sines + residual ... 4 Music analysis 5 Transcription E6820 SAPR - Dan Ellis L06 - Music A & S 2002-03-04 - 14

  15. Sinewave synthesis 3 • If patterns of harmonics are what matter, why not generate them all explicitly: ∑ [ ] [ ] ( k ω 0 n ⋅ [ ] n ⋅ ) s n = A k n cos k - particularly powerful model for pitched signals • Analysis (as with speech): - find peaks in STFT | S [ ω , n ] | & track - or track fundamental ω 0 (harmonics / autoco) & sample STFT at k· ω 0 → set of A k [ n ] to duplicate tone: freq / Hz 8000 6000 2 mag 4000 1 2000 0 0.2 5000 0.1 0 freq / Hz time / s 0 0 time / s 0 0.05 0.1 0.15 0.2 • Synthesis via bank of oscillators E6820 SAPR - Dan Ellis L06 - Music A & S 2002-03-04 - 15

  16. Steps to sinewave modeling - 1 • The underlying STFT: N – 1 j 2 π kn   ∑ [ , ] [ ] w n ⋅ [ ] ⋅ - - - - - - - - - - - - - X k n 0 = x n + n 0 exp –   N n = 0 What value for N ( FFT length & window size ) ? What value for H ( hop size: n 0 = r · H , r = 0, 1, 2... ) ? • STFT window length determines freq. resol’n: X w e j ω X e j ω W e j ω ( ) ( ) ( ) = * • Choose N long enough to resolve harmonics → 2-3x longest (lowest) fundamental period - e.g. 30-60 ms = 480-960 samples @ 16 kHz - choose H ≤ N /2 N too long → lost time resolution • - limits sinusoid amplitude rate of change E6820 SAPR - Dan Ellis L06 - Music A & S 2002-03-04 - 16

  17. Steps to sinewave modeling - 2 • Choose candidate sinusoids at each time by picking peaks in each STFT frame: 8000 freq / Hz 6000 4000 2000 0 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 time / s 20 level / dB 0 -20 -40 -60 0 1000 2000 3000 4000 5000 6000 7000 freq / Hz • Quadratic fit for peak, lin. interp. for phase: 20 0 y phase / rad 10 ab 2 /4 level / dB y = ax(x-b) x 0 -5 b/2 -10 -20 -10 400 600 800 freq / Hz 400 600 800 freq / Hz + linear interp. of unwrapped phase E6820 SAPR - Dan Ellis L06 - Music A & S 2002-03-04 - 17

  18. Steps to sinewave modeling - 3 • Which peaks to pick? Want ‘true’ sinusoids, not noise fluctuations - ‘prominence’ threshold above smoothed spec. 20 level / dB 0 -20 -40 -60 0 1000 2000 3000 4000 5000 6000 7000 freq / Hz • Sinusoids exhibit stability... - of amplitude in time - of phase derivative in time → compare with adjacent time frames to test? E6820 SAPR - Dan Ellis L06 - Music A & S 2002-03-04 - 18

  19. Steps to sinewave modeling - 4 • ‘Grow’ tracks by appending newly-found peaks to existing tracks: freq birth existing tracks time death new peaks - ambiguous assignments possible • Unclaimed new peak - ‘birth’ of new track - backtrack to find earliest trace? • No continuation peak for existing track - ‘death’ of track - or: reduce peak threshold for hysteresis E6820 SAPR - Dan Ellis L06 - Music A & S 2002-03-04 - 19

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend