Audio Decomposition Meinard Mller International Audio Laboratories - - PowerPoint PPT Presentation

audio decomposition
SMART_READER_LITE
LIVE PREVIEW

Audio Decomposition Meinard Mller International Audio Laboratories - - PowerPoint PPT Presentation

Lecture Music Processing Audio Decomposition Meinard Mller International Audio Laboratories Erlangen meinard.mueller@audiolabs-erlangen.de Book: Fundamentals of Music Processing Meinard Mller Fundamentals of Music Processing Audio,


slide-1
SLIDE 1

Music Processing Meinard Müller

Lecture

Audio Decomposition

International Audio Laboratories Erlangen meinard.mueller@audiolabs-erlangen.de

slide-2
SLIDE 2

Book: Fundamentals of Music Processing

Meinard Müller Fundamentals of Music Processing Audio, Analysis, Algorithms, Applications 483 p., 249 illus., hardcover ISBN: 978-3-319-21944-8 Springer, 2015 Accompanying website: www.music-processing.de

slide-3
SLIDE 3

Book: Fundamentals of Music Processing

Meinard Müller Fundamentals of Music Processing Audio, Analysis, Algorithms, Applications 483 p., 249 illus., hardcover ISBN: 978-3-319-21944-8 Springer, 2015 Accompanying website: www.music-processing.de

slide-4
SLIDE 4

Book: Fundamentals of Music Processing

Meinard Müller Fundamentals of Music Processing Audio, Analysis, Algorithms, Applications 483 p., 249 illus., hardcover ISBN: 978-3-319-21944-8 Springer, 2015 Accompanying website: www.music-processing.de

slide-5
SLIDE 5

Chapter 8: Audio Decomposition

In the final Chapter 8 on audio decomposition, we present a challenging research direction that is closely related to source separation. Within this wide research area, we consider three subproblems: harmonic–percussive separation, main melody extraction, and score-informed audio decomposition. Within these scenarios, we discuss a number of key techniques including instantaneous frequency estimation, fundamental frequency (F0) estimation, spectrogram inversion, and nonnegative matrix factorization (NMF). Furthermore, we encounter a number of acoustic and musical properties of audio recordings that have been introduced and discussed in previous chapters, which rounds off the book.

8.1 Harmonic-Percussive Separation 8.2 Melody Extraction 8.3 NMF-Based Audio Decomposition 8.4 Further Notes

slide-6
SLIDE 6

Why is Music Processing Challenging?

Chopin, Mazurka Op. 63 No. 3 Example:

slide-7
SLIDE 7

Why is Music Processing Challenging?

  • Waveform

Chopin, Mazurka Op. 63 No. 3 Example:

Amplitude Time (seconds)

slide-8
SLIDE 8

Why is Music Processing Challenging?

  • Waveform / Spectrogram

Chopin, Mazurka Op. 63 No. 3 Example:

Frequency (Hz) Time (seconds)

slide-9
SLIDE 9

Why is Music Processing Challenging?

  • Waveform / Spectrogram
  • Performance

– Tempo – Dynamics – Note deviations – Sustain pedal

Chopin, Mazurka Op. 63 No. 3 Example:

slide-10
SLIDE 10

Why is Music Processing Challenging?

  • Waveform / Spectrogram
  • Performance

– Tempo – Dynamics – Note deviations – Sustain pedal

  • Polyphony

Chopin, Mazurka Op. 63 No. 3 Example:

Main Melody Accompaniment Additional melody line

slide-11
SLIDE 11
  • Decomposition of audio stream into different sound sources
  • Central task in digital signal processing
  • “Cocktail party effect”

Source Separation

slide-12
SLIDE 12

Source Separation

  • Decomposition of audio stream into different sound sources
  • Central task in digital signal processing
  • “Cocktail party effect”
  • Several input signals
  • Sources are assumed to be statistically independent
slide-13
SLIDE 13

Source Separation (Music)

Time Time

  • Main melody, accompaniment, drum track
  • Instrumental voices
  • Individual note events
  • Only mono or stereo
  • Sources are often highly dependent
slide-14
SLIDE 14

Harmonic-Percussive Decomposition

Mixture:

slide-15
SLIDE 15

Harmonic-Percussive Decomposition

Harmonic component Percussive component Clearly percussive sounds Clearly harmonic sounds Mixture:

slide-16
SLIDE 16

Harmonic-Percussive Decomposition

Clearly percussive sounds Clearly harmonic sounds Mixture: Harmonic component Residual component Percussive component

slide-17
SLIDE 17

Harmonic-Percussive Decomposition

Mixture:

  • Clearly harmonic

sounds of singing voice and accompaniment

  • Drum hits
  • Fricatives &

plosives in singing voice

  • Noise-like sounds
  • Vibrato/glissando

sounds

Demo: https://www.audiolabs-erlangen.de/resources/2014-ISMIR-ExtHPSep/

Harmonic component Percussive component Residual component

Literature: [Driedger/Müller/Disch, ISMIR 2014]

slide-18
SLIDE 18

Singing Voice Extraction

Singing voice Accompaniment Original Recording

slide-19
SLIDE 19

Singing Voice Extraction

Original recording

HPR

Harmonic component Residual component Percussive component Harmonic portion singing voice

MR TR SL

F0 annotation Harmonic portion accompaniment Fricatives singing voice Instrument onsets accompaniment Vibrato & formants singing voice Diffuse instruments sounds accompaniment

+ +

Estimate singing voice Estimate accompaniment

Time Frequency

slide-20
SLIDE 20

Score-Informed Source Separation

Exploit musical score to support separation process

Time Pitch Pitch Time Pitch Time

slide-21
SLIDE 21

Frequency (Hz)

Render

Parametric Model Approach

Estimate

Parameters

Time (seconds) Time (seconds) Frequency (Hz)

Rebuild spectrogram information

slide-22
SLIDE 22

NMF (Nonnegative Matrix Factorization)

N K K M

≥ 0 ≥ 0 ≥ 0

M

slide-23
SLIDE 23

NMF (Nonnegative Matrix Factorization)

Templates Activations N M K K M Magnitude Spectrogram

Templates: Pitch + Timbre Activations: Onset time + Duration “How does it sound” “When does it sound”

slide-24
SLIDE 24

NMF-Decomposition

Note number Frequency Note number Time

Initialized template Initialized activations

Random initialization

slide-25
SLIDE 25

NMF-Decomposition

Note number Frequency Frequency Note number Note number Time

Learnt templates Learnt activations Initialized template Initialized activations

Random initialization → No semantic meaning

slide-26
SLIDE 26

NMF-Decomposition

Note number Frequency Note number Time

Initialized template Initialized activations

Constrained initialization

slide-27
SLIDE 27

NMF-Decomposition

Note number Frequency Note number Time

Activation constraints for p=55

Initialized template Initialized activations

Template constraint for p=55

Constrained initialization

slide-28
SLIDE 28

NMF-Decomposition

Note number Frequency Frequency Note number Time Org Model Note number

Initialized template Initialized activations

Constrained initialization → NMF as refinement

Learnt templates Learnt activations

slide-29
SLIDE 29

Score-Informed Audio Decomposition

500 580 523 Frequency (Hertz) 1 0.5 Time (seconds) 9 8 7 6 1600 1200 800 400 9 8 7 6 1600 1200 800 400 500 580 554 Frequency (Hertz) 1 0.5 Time (seconds)

Application: Audio editing

slide-30
SLIDE 30

Informed Drum-Sound Decomposition

Demo: https://www.audiolabs-erlangen.de/resources/MIR/2016-IEEE-TASLP-DrumSeparation Literature: [Dittmar/Müller, IEEE/ACM-TASLP 2016]

Remix:

slide-31
SLIDE 31

Audio Mosaicing

Source signal: Bees Target signal: Beatles–Let it be Mosaic signal: Let it Bee

Demo: https://www.audiolabs-erlangen.de/resources/MIR/2015-ISMIR-LetItBee Literature: [Driedger/Müller, ISMIR 2015]

slide-32
SLIDE 32

NMF-Inspired Audio Mosaicing ≈ . =

Non-negative matrix factorization (NMF) Proposed audio mosaicing approach

≈ .

Non-negative matrix Components Activations Target’s spectrogram Source’s spectrogram Activations Mosaic’s spectrogram

fixed learned fixed learned fixed learned

=

Time source Frequency Time source Time target Time target Frequency

slide-33
SLIDE 33

NMF-Inspired Audio Mosaicing

Time target Frequency Time source Frequency Frequency Time source Time target Time target

. = ≈

Spectrogram target Spectrogram source Spectrogram mosaic Activation matrix

slide-34
SLIDE 34

NMF-Inspired Audio Mosaicing

Time target Frequency Time source Frequency Frequency Time source Time target Time target

. = ≈

Spectrogram target Spectrogram source Spectrogram mosaic Activation matrix

Core idea: support the development of sparse diagonal activation structures

Activation matrix

Das Bild kann nicht angezeigt werden. Das Bild kann nicht angezeigt werden.

Iterative updates Preserve temporal context

slide-35
SLIDE 35

NMF-Inspired Audio Mosaicing

Time target Frequency Time source Frequency Frequency Time source Time target Time target

. = ≈

Spectrogram target Spectrogram source Spectrogram mosaic Activation matrix

slide-36
SLIDE 36

NMF-Inspired Audio Mosaicing

Time target Frequency Time source Frequency Frequency Time source Time target Time target

. = ≈

Spectrogram target Spectrogram source Spectrogram mosaic Activation matrix

slide-37
SLIDE 37

Audio Mosaicing

Source signal: Whales Target signal: Chic–Good times Mosaic signal

slide-38
SLIDE 38

Audio Mosaicing

Source signal: Race car Target signal: Adele–Rolling in the Deep Mosaic signal

slide-39
SLIDE 39

Links

  • SiSEC: Signal Separation Evaluation Campaign

https://www.sisec17.audiolabs-erlangen.de/

  • MedleyDB: A Dataset of Multitrack Audio

http://steinhardt.nyu.edu/marl/research/medleydb

  • LibROSA (Python)

https://librosa.github.io/librosa/