Music Processing Meinard Müller
Lecture
Audio Decomposition
International Audio Laboratories Erlangen meinard.mueller@audiolabs-erlangen.de
Audio Decomposition Meinard Mller International Audio Laboratories - - PowerPoint PPT Presentation
Lecture Music Processing Audio Decomposition Meinard Mller International Audio Laboratories Erlangen meinard.mueller@audiolabs-erlangen.de Book: Fundamentals of Music Processing Meinard Mller Fundamentals of Music Processing Audio,
Music Processing Meinard Müller
Lecture
International Audio Laboratories Erlangen meinard.mueller@audiolabs-erlangen.de
Meinard Müller Fundamentals of Music Processing Audio, Analysis, Algorithms, Applications 483 p., 249 illus., hardcover ISBN: 978-3-319-21944-8 Springer, 2015 Accompanying website: www.music-processing.de
Meinard Müller Fundamentals of Music Processing Audio, Analysis, Algorithms, Applications 483 p., 249 illus., hardcover ISBN: 978-3-319-21944-8 Springer, 2015 Accompanying website: www.music-processing.de
Meinard Müller Fundamentals of Music Processing Audio, Analysis, Algorithms, Applications 483 p., 249 illus., hardcover ISBN: 978-3-319-21944-8 Springer, 2015 Accompanying website: www.music-processing.de
In the final Chapter 8 on audio decomposition, we present a challenging research direction that is closely related to source separation. Within this wide research area, we consider three subproblems: harmonic–percussive separation, main melody extraction, and score-informed audio decomposition. Within these scenarios, we discuss a number of key techniques including instantaneous frequency estimation, fundamental frequency (F0) estimation, spectrogram inversion, and nonnegative matrix factorization (NMF). Furthermore, we encounter a number of acoustic and musical properties of audio recordings that have been introduced and discussed in previous chapters, which rounds off the book.
8.1 Harmonic-Percussive Separation 8.2 Melody Extraction 8.3 NMF-Based Audio Decomposition 8.4 Further Notes
Chopin, Mazurka Op. 63 No. 3 Example:
Chopin, Mazurka Op. 63 No. 3 Example:
Amplitude Time (seconds)
Chopin, Mazurka Op. 63 No. 3 Example:
Frequency (Hz) Time (seconds)
– Tempo – Dynamics – Note deviations – Sustain pedal
Chopin, Mazurka Op. 63 No. 3 Example:
– Tempo – Dynamics – Note deviations – Sustain pedal
Chopin, Mazurka Op. 63 No. 3 Example:
Main Melody Accompaniment Additional melody line
Time Time
Mixture:
Harmonic component Percussive component Clearly percussive sounds Clearly harmonic sounds Mixture:
Clearly percussive sounds Clearly harmonic sounds Mixture: Harmonic component Residual component Percussive component
Mixture:
sounds of singing voice and accompaniment
plosives in singing voice
sounds
Demo: https://www.audiolabs-erlangen.de/resources/2014-ISMIR-ExtHPSep/
Harmonic component Percussive component Residual component
Literature: [Driedger/Müller/Disch, ISMIR 2014]
Singing voice Accompaniment Original Recording
Original recording
HPR
Harmonic component Residual component Percussive component Harmonic portion singing voice
MR TR SL
F0 annotation Harmonic portion accompaniment Fricatives singing voice Instrument onsets accompaniment Vibrato & formants singing voice Diffuse instruments sounds accompaniment
+ +
Estimate singing voice Estimate accompaniment
Time Frequency
Exploit musical score to support separation process
Time Pitch Pitch Time Pitch Time
Frequency (Hz)
Render
Estimate
Parameters
Time (seconds) Time (seconds) Frequency (Hz)
Rebuild spectrogram information
N K K M
M
Templates Activations N M K K M Magnitude Spectrogram
Templates: Pitch + Timbre Activations: Onset time + Duration “How does it sound” “When does it sound”
Note number Frequency Note number Time
Initialized template Initialized activations
Random initialization
Note number Frequency Frequency Note number Note number Time
Learnt templates Learnt activations Initialized template Initialized activations
Random initialization → No semantic meaning
Note number Frequency Note number Time
Initialized template Initialized activations
Constrained initialization
Note number Frequency Note number Time
Activation constraints for p=55
Initialized template Initialized activations
Template constraint for p=55
Constrained initialization
Note number Frequency Frequency Note number Time Org Model Note number
Initialized template Initialized activations
Constrained initialization → NMF as refinement
Learnt templates Learnt activations
500 580 523 Frequency (Hertz) 1 0.5 Time (seconds) 9 8 7 6 1600 1200 800 400 9 8 7 6 1600 1200 800 400 500 580 554 Frequency (Hertz) 1 0.5 Time (seconds)
Application: Audio editing
Demo: https://www.audiolabs-erlangen.de/resources/MIR/2016-IEEE-TASLP-DrumSeparation Literature: [Dittmar/Müller, IEEE/ACM-TASLP 2016]
Remix:
Source signal: Bees Target signal: Beatles–Let it be Mosaic signal: Let it Bee
Demo: https://www.audiolabs-erlangen.de/resources/MIR/2015-ISMIR-LetItBee Literature: [Driedger/Müller, ISMIR 2015]
Non-negative matrix factorization (NMF) Proposed audio mosaicing approach
Non-negative matrix Components Activations Target’s spectrogram Source’s spectrogram Activations Mosaic’s spectrogram
fixed learned fixed learned fixed learned
Time source Frequency Time source Time target Time target Frequency
Time target Frequency Time source Frequency Frequency Time source Time target Time target
Spectrogram target Spectrogram source Spectrogram mosaic Activation matrix
Time target Frequency Time source Frequency Frequency Time source Time target Time target
Spectrogram target Spectrogram source Spectrogram mosaic Activation matrix
Core idea: support the development of sparse diagonal activation structures
Activation matrix
Das Bild kann nicht angezeigt werden. Das Bild kann nicht angezeigt werden.Iterative updates Preserve temporal context
Time target Frequency Time source Frequency Frequency Time source Time target Time target
Spectrogram target Spectrogram source Spectrogram mosaic Activation matrix
Time target Frequency Time source Frequency Frequency Time source Time target Time target
Spectrogram target Spectrogram source Spectrogram mosaic Activation matrix
Source signal: Whales Target signal: Chic–Good times Mosaic signal
Source signal: Race car Target signal: Adele–Rolling in the Deep Mosaic signal
https://www.sisec17.audiolabs-erlangen.de/
http://steinhardt.nyu.edu/marl/research/medleydb
https://librosa.github.io/librosa/