nonnegative tensor factorization for source separation of
play

Nonnegative Tensor Factorization for Source Separation of Loops in - PowerPoint PPT Presentation

Nonnegative Tensor Factorization for Source Separation of Loops in Audio Jordan B. L. Smith National Institute of Advanced Industrial Science and Technology (AIST), Japan Masataka Goto National Institute of Advanced Industrial Science and


  1. Nonnegative Tensor Factorization for Source Separation of Loops in Audio Jordan B. L. Smith National Institute of Advanced Industrial Science and Technology (AIST), Japan Masataka Goto National Institute of Advanced Industrial Science and Technology (AIST), Japan

  2. Introduction

  3. Extracting loops from music • In some musical styles, songs are built from loops. E.g.: → composition process → 3. Song mixed 1. Collection of 2. Loops arranged to make a song down to audio loops D D D FX C C C C Bass A B B B B B B B Melody C D A A A A A A A Drum 0:00 0:30 1:00 Audio examples (and test data) all borrowed from [López-Serrano et al. 2016]

  4. Extracting loops from music • In some musical styles, songs are built from loops. E.g.: 3. Song mixed 1. Collection of 2. Loops arranged to make a song down to audio loops D D D FX C C C C Bass A B B B B B B B Melody C D A A A A A A A Drum 0:00 0:30 1:00 ← decomposition procedure ← • Goal: decompose the audio signal to recover: • the layout of the song • the source-separated loops

  5. Extracting loops from music • Two previous approaches that inspired us: • Fingerprint-based loop detection [López-Serrano et al. 2016] Inputs: Output: D D D A B + → C C C C B B B B B B C D A A A A A A A Original loops Mixed audio Map of loop activations • Iterative NMF [Seetharaman & Pardo 2016] Inputs: Output: D: Assumption that C: + → loops are introduced B: additively A: Separated tracks, Mixed audio one per loop

  6. Extracting loops from music • Our proposed system: Input: Outputs: D: D D D → C: C C C C + B: B B B B B B A: A A A A A A A Separated tracks, Mixed audio Map of loop activations one per loop • We attempt to solve both problems in one step, without assumption of additive layout • We do so by extending nonnegative matrix factorization (NMF) to handle periodicity

  7. 
 Source separation using NMF* • • NMF can Steady-state notes NMF with harmonic templates • • Note sequences NMFD with time-evolving handle many repeated in time 
 templates 
 types of [Smaragdis 2004] • repetition: • NMF2D with transposed harmonic Transposed notes 
 templates 
 [e.g., FitzGerald, Cranitch & Coyle 2008] • ...no nonnegative approach! 
 • Periodicity NB: REPET, a median-filtering (especially at approach 
 downbeats) [Rafii, Liutkus, & Pardo 2014]

  8. Method

  9. Nonnegative tensor factorization • Step 1: estimate downbeats [madmom, Böck et al. 2016]

  10. Nonnegative tensor factorization • Step 1: estimate downbeats [madmom, Böck et al. 2016]

  11. Nonnegative tensor factorization • Step 1: estimate downbeats • Step 2: stack the 2D spectrograms into a 3D volume (a “spectral cube”)

  12. Nonnegative tensor factorization • Step 1: estimate downbeats • Step 2: stack the 2D spectrograms into a 3D volume (a “spectral cube”)

  13. Nonnegative tensor factorization • Step 1: estimate downbeats • Step 2: stack the 2D spectrograms into a 3D volume (a “spectral cube”)

  14. Detour: understanding the spectral cube Time in bar Frequency Bar number (time in piece)

  15. Detour: understanding the spectral cube Time in bar Frequency Bar number (time in piece)

  16. Detour: understanding the spectral cube Time in bar Frequency Bar number (time in piece)

  17. Detour: understanding the spectral cube Time in bar Frequency Bar number (time in piece)

  18. Visualizing a 3D volume: CT scan Back to front Left to right Bottom to top

  19. Visualizing a 3D volume: CT scan Beginning to end of piece Beginning to end of a bar Low frequency to high Time in bar Frequency Bar number (time in piece)

  20. Nonnegative tensor factorization • Step 1: estimate downbeats • Step 2: stack the 2D spectrograms into a 3D volume (a “spectral cube”) • Step 3: use nonnegative tensor factorization (NTF) to model the spectral cube

  21. Nonnegative matrix factorization • NMF: X ≈ W ◦ H • W = note templates • H = activation functions H r × N W ≈ X M M × r N • Needs post-processing to separate sources: • which templates in W belong to the same source? • di ff erent sources could use the same harmonic components!

  22. Nonnegative tensor factorization • Tucker Decomposition: X ≈ C ◦ (W ◦ H ◦ D) • W = note templates • H = activation functions (time-in-bar) • D = loop activation functions (time-in-piece) • C = core tensor = recipe for each loop type Tucker decomposition ≈ = 𝓨 M Q P

  23. Interpreting the 
 NTF model • W, H , and D all musically intuitive: Loop template D D D activations directly C C C C estimate layout of song B B B B B B A A A A A A A

  24. Interpreting the 
 NTF model • Core tensor C = recipe for each loop type Loop recipes ( C ) (w 4 , h 7 ) + 
 (w 11 , h 10 ) • Pixel C(i, j, k) tells us to play note w i with activation + 
 function h j whenever loop d k appears. (w 24 , h 16 )

  25. Interpreting the 
 NTF model • Core tensor C = recipe for each loop type Loop recipes ( C ) • To recover entire spectrogram: C ◦ (W ◦ H ◦ D) • To recover individual loop source: C [:,:,k] ◦ (W ◦ H ◦ D [k,:] )

  26. Evaluation

  27. Evaluation • We used synthetic data [López-Serrano et al. 2016] • 7 sets of loops x 3 di ff erent layouts (arrangements) • Algorithm output 1: separated signals • Evaluate quality with SDR, SIR, SAR estimated source tracks stem tracks • Algorithm output 2: loop layout • Evaluate accuracy with correlation estimated map ground truth map D D D C C C C B B B B B B A A A A A A A

  28. Good separation example Collection of loops Extracted loops for genre: “Acid” Drum Melody 1 2 Bass FX 3 4 • When it works, it works

  29. Flawed separation example Original tracks for genre “Brezo” D D D C C C C B B B B B B A A A A A A A Source separated tracks D D D C C C C B B B A A A

  30. Flawed separation D D D C C C C B B B B B B A A A A A A A example swap rows Original tracks for genre “Brezo” D D D B B B B B B C C C C D D D A A A A A A A C C C C = C substitute C A B B B B B B D D D B B B B B B C C C C A A A A A A A A A A = C substitute C A Source separated tracks D D D D D D B B B C C C C A A A C C C C swap rows B B B D D D A A A C C C C B B B A A A

  31. 10 Our reconstruction 5 SDR quality is average. 
 0 :-| –5 20 We have less 15 crosstalk than SIR 10 others! [Seetharaman & Pardo 2016] 5 :-D 0 10 We have more (proposed) SAR noisy artifacts. 5 :-( 0 Correlation 1.0 (performance ceiling) 0.8 We get very 0.6 clean layouts! 0.4 :-D 0.2 0.0

  32. Conclusion

  33. Conclusion • Proposed method of decomposing audio into loops that: • Models periodicity using the spectral cube • Models source signals and song composition jointly • Tucker decomposition is musically intuitive • Weaknesses include: • Very conservative reconstructions don’t model the whole signal • Like NMFD, we cannot distinguish between algebraically equivalent decompositions • Future work: searching for repetitions at multiple hierarchical time scales

  34. Future work: hierarchical analysis • Di ff erent loops in the song have di ff erent lengths and periods • Spectral cubes with di ff erent periods highlight di ff erent consistent repetitions PERIOD: 2 beats 1 downbeat 4 downbeats

  35. Future work: hierarchical analysis • Di ff erent loops in the song have di ff erent lengths and periods • Spectral cubes with di ff erent periods highlight di ff erent consistent repetitions PERIOD: 2 beats 1 downbeat 2 downbeats 4 downbeats

  36. Thank you! PS. Jordan is now at: +

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend