Nonnegative Tensor Factorization for Source Separation of Loops in - - PowerPoint PPT Presentation

nonnegative tensor factorization for source separation of
SMART_READER_LITE
LIVE PREVIEW

Nonnegative Tensor Factorization for Source Separation of Loops in - - PowerPoint PPT Presentation

Nonnegative Tensor Factorization for Source Separation of Loops in Audio Jordan B. L. Smith National Institute of Advanced Industrial Science and Technology (AIST), Japan Masataka Goto National Institute of Advanced Industrial Science and


slide-1
SLIDE 1

Nonnegative Tensor Factorization for Source Separation of Loops in Audio

Jordan B. L. Smith

National Institute of Advanced Industrial Science and Technology (AIST), Japan

Masataka Goto

National Institute of Advanced Industrial Science and Technology (AIST), Japan

slide-2
SLIDE 2

Introduction

slide-3
SLIDE 3
  • In some musical styles, songs are built from loops. E.g.:

Extracting loops from music

  • 2. Loops arranged to make a song

0:00 0:30 1:00 A A A A A A A B B B B B B C C C C D D D

Drum Melody Bass FX

  • 1. Collection of

loops

A B C D

  • 3. Song mixed

down to audio

→ composition process → Audio examples (and test data) all borrowed from [López-Serrano et al. 2016]

slide-4
SLIDE 4
  • In some musical styles, songs are built from loops. E.g.:

Extracting loops from music

  • 2. Loops arranged to make a song

0:00 0:30 1:00 A A A A A A A B B B B B B C C C C D D D

Drum Melody Bass FX

  • Goal: decompose the audio signal to recover:
  • the layout of the song
  • the source-separated loops
  • 1. Collection of

loops

A B C D

  • 3. Song mixed

down to audio

← decomposition procedure ←

slide-5
SLIDE 5
  • Two previous approaches that inspired us:
  • Fingerprint-based loop detection [López-Serrano et al. 2016]

Extracting loops from music

Inputs:

A B C D

+ →

A A A A A A A B B B B B B C C C C D D D

Output: Original loops Mixed audio Map of loop activations Inputs:

+ →

Output: Assumption that loops are introduced additively

A: B: C: D:

Mixed audio Separated tracks,

  • ne per loop
  • Iterative NMF [Seetharaman & Pardo 2016]
slide-6
SLIDE 6
  • Our proposed system:

Extracting loops from music

Input:

A A A A A A A B B B B B B C C C C D D D

Outputs:

+

A: B: C: D:

Mixed audio Map of loop activations Separated tracks,

  • ne per loop
  • We attempt to solve both problems in one step, without

assumption of additive layout

  • We do so by extending nonnegative matrix factorization

(NMF) to handle periodicity

slide-7
SLIDE 7

Source separation using NMF*

  • Steady-state notes
  • Note sequences

repeated in time


  • Transposed notes


  • Periodicity

(especially at downbeats)

  • NMF with harmonic templates
  • NMFD with time-evolving

templates


[Smaragdis 2004]

  • NMF2D with transposed harmonic

templates


[e.g., FitzGerald, Cranitch & Coyle 2008]

  • ...no nonnegative approach!


NB: REPET, a median-filtering approach


[Rafii, Liutkus, & Pardo 2014]

NMF can handle many types of repetition:

slide-8
SLIDE 8

Method

slide-9
SLIDE 9

Nonnegative tensor factorization

  • Step 1: estimate downbeats [madmom, Böck et al. 2016]
slide-10
SLIDE 10

Nonnegative tensor factorization

  • Step 1: estimate downbeats [madmom, Böck et al. 2016]
slide-11
SLIDE 11

Nonnegative tensor factorization

  • Step 1: estimate downbeats
  • Step 2: stack the 2D spectrograms into a 3D volume (a

“spectral cube”)

slide-12
SLIDE 12

Nonnegative tensor factorization

  • Step 1: estimate downbeats
  • Step 2: stack the 2D spectrograms into a 3D volume (a

“spectral cube”)

slide-13
SLIDE 13

Nonnegative tensor factorization

  • Step 1: estimate downbeats
  • Step 2: stack the 2D spectrograms into a 3D volume (a

“spectral cube”)

slide-14
SLIDE 14

Detour: understanding the spectral cube

Frequency Bar number (time in piece) Time in bar

slide-15
SLIDE 15

Detour: understanding the spectral cube

Frequency Bar number (time in piece) Time in bar

slide-16
SLIDE 16

Detour: understanding the spectral cube

Frequency Bar number (time in piece) Time in bar

slide-17
SLIDE 17

Detour: understanding the spectral cube

Frequency Bar number (time in piece) Time in bar

slide-18
SLIDE 18

Bottom to top Back to front Left to right

Visualizing a 3D volume: CT scan

slide-19
SLIDE 19

Low frequency to high Beginning to end of piece Beginning to end of a bar

Visualizing a 3D volume: CT scan

Frequency Bar number (time in piece) Time in bar

slide-20
SLIDE 20

Nonnegative tensor factorization

  • Step 1: estimate downbeats
  • Step 2: stack the 2D spectrograms into a 3D volume (a

“spectral cube”)

  • Step 3: use nonnegative tensor factorization (NTF) to

model the spectral cube

slide-21
SLIDE 21

Nonnegative matrix factorization

  • NMF: X ≈ W ◦ H
  • W = note templates
  • H = activation functions

X

M N M × r

W

r × N

H

  • Needs post-processing to separate sources:
  • which templates in W belong to the same source?
  • different sources could use the same harmonic

components!

slide-22
SLIDE 22

Nonnegative tensor factorization

  • Tucker Decomposition: X ≈ C ◦ (W ◦ H ◦ D)
  • W = note templates
  • H = activation functions (time-in-bar)
  • D = loop activation functions (time-in-piece)
  • C = core tensor = recipe for each loop type

M P Q

=

Tucker decomposition

𝓨

slide-23
SLIDE 23

Interpreting the
 NTF model

  • W, H, and D all musically intuitive:

A A A A A A A B B B B B B C C C C D D D

Loop template activations directly estimate layout of song

slide-24
SLIDE 24

Interpreting the
 NTF model

  • Core tensor C = recipe for each loop type

Loop recipes (C)

  • Pixel C(i, j, k) tells us to play note wi with activation

function hj whenever loop dk appears.

(w4, h7) +
 (w11, h10) +
 (w24, h16)

slide-25
SLIDE 25

Interpreting the
 NTF model

  • Core tensor C = recipe for each loop type

Loop recipes (C)

  • To recover entire spectrogram: C ◦ (W ◦ H ◦ D)
  • To recover individual loop source: C[:,:,k] ◦ (W ◦ H ◦ D[k,:])
slide-26
SLIDE 26

Evaluation

slide-27
SLIDE 27

Evaluation

  • We used synthetic data [López-Serrano et al. 2016]
  • 7 sets of loops x 3 different layouts (arrangements)
  • Algorithm output 1: separated signals
  • Evaluate quality with SDR, SIR, SAR

A A A A A A A B B B B B B C C C C D D D

estimated map ground truth map estimated source tracks stem tracks

  • Algorithm output 2: loop layout
  • Evaluate accuracy with correlation
slide-28
SLIDE 28

Good separation example

  • When it works, it works

Collection of loops for genre: “Acid”

Drum Melody Bass FX

Extracted loops

1 2 3 4

slide-29
SLIDE 29

Flawed separation example

Original tracks for genre “Brezo” Source separated tracks

A A A A A A A B B B B B B C C C C D D D A A A B B B C C C C D D D

slide-30
SLIDE 30

Flawed separation example

Original tracks for genre “Brezo” Source separated tracks

A A A A A A A B B B B B B C C C C D D D A A A B B B C C C C D D D A A A A A A A B B B B B B C C C C D D D

swap rows substitute

C

=

C A A A A B B B B B B C C C C D D D

substitute

C

=

C A A A A B B B C C C C D D D A A A A A A A B B B B B B C C C C D D D A A A B B B C C C C D D D

swap rows

slide-31
SLIDE 31

(proposed) (performance ceiling)

[Seetharaman & Pardo 2016]

10 5 20 15 10 5 10 5 –5

SAR SDR SIR

Our reconstruction quality is average.
 :-| We have more noisy artifacts. :-( We have less crosstalk than

  • thers!

:-D

1.0 0.8 0.6 0.4 0.2 0.0

Correlation

We get very clean layouts! :-D

slide-32
SLIDE 32

Conclusion

slide-33
SLIDE 33

Conclusion

  • Proposed method of decomposing audio into loops that:
  • Models periodicity using the spectral cube
  • Models source signals and song composition jointly
  • Tucker decomposition is musically intuitive
  • Weaknesses include:
  • Very conservative reconstructions don’t model the

whole signal

  • Like NMFD, we cannot distinguish between

algebraically equivalent decompositions

  • Future work: searching for repetitions at multiple

hierarchical time scales

slide-34
SLIDE 34

Future work: hierarchical analysis

  • Different loops in the song have different lengths and

periods

  • Spectral cubes with different periods highlight different

consistent repetitions

1 downbeat 4 downbeats PERIOD: 2 beats

slide-35
SLIDE 35

Future work: hierarchical analysis

  • Different loops in the song have different lengths and

periods

  • Spectral cubes with different periods highlight different

consistent repetitions

1 downbeat 2 downbeats 4 downbeats PERIOD: 2 beats

slide-36
SLIDE 36

Thank you!

  • PS. Jordan is now at:

+