Kernel Spectrogram Models for source separation Antoine Liutkus 1 , - PowerPoint PPT Presentation

Audio separation Spectrogram models Results Conclusion Kernel Spectrogram Models for source separation Antoine Liutkus 1 , Zafar Rafii 2 , Bryan Pardo 2 Derry Fitzgerald 3 , Laurent Daudet 4 1 Inria, Universit´ e de Lorraine, LORIA, UMR 7503, France 2 Northwestern University, Evanston, IL, USA 3 NIMBUS Centre, Cork Institute of Technology, Ireland 4 Institut Langevin, Paris Diderot Univ., France HSCMA, Nancy, May 12 th 2014 Liutkus ⋆ , Rafii, Pardo, Fitzgerald, Daudet Kernel Spectrogram Models for source separation 05/12/2014 1/18

Audio separation Spectrogram models Results Conclusion Separating audio sources MIXING SEPARATION In this presentation: mono mixtures ⇒ General multichannel case in the paper Liutkus ⋆ , Rafii, Pardo, Fitzgerald, Daudet Kernel Spectrogram Models for source separation 05/12/2014 2/18

Audio separation Spectrogram models Results Conclusion Notations MIXTURE + = + STFT + Liutkus ⋆ , Rafii, Pardo, Fitzgerald, Daudet Kernel Spectrogram Models for source separation 05/12/2014 3/18

Audio separation Spectrogram models Results Conclusion Time frequency masking = Each source STFT s j ( ω, t ) is obtained by filtering the mixture ˆ s j ( ω, t ) = w j ( ω, t ) x ( ω, t ) Underdetermined separation ⇒ w j varies with both ω and t Waveforms obtained by inverse STFT Many different ways to get a Time-Frequency (TF) mask w j ( ω, t ) Liutkus ⋆ , Rafii, Pardo, Fitzgerald, Daudet Kernel Spectrogram Models for source separation 05/12/2014 4/18

Audio separation Spectrogram models Results Conclusion Time frequency masking = s j ( f , t ) is assumed equal either to x ( ω, t )or to 0 A classification task over the mixture STFT x ⇒ based on features pitch detection+harmonics selection (CASA) panning position (DUET) Y. Han and C. Raphael. Informed source separation of orchestra and soloist. In Proceedings of the 11th International Society for Music Information Retrieval Conference (ISMIR) , pages 315–320, 2010 O. Yilmaz and S. Rickard. Blind separation of speech mixtures via time-frequency masking. IEEE Trans. on Signal Processing , 52(7):1830–1847, 2004 Liutkus ⋆ , Rafii, Pardo, Fitzgerald, Daudet Kernel Spectrogram Models for source separation 05/12/2014 5/18

Audio separation Spectrogram models Results Conclusion Getting the mask Binary masking yields musical noise ⇒ Soft masking w j ( f , t ) ∈ [0 1] is better! Example: Wiener filtering for Gaussian processes Sources energies f j ( ω, t ) ≥ 0 add up to get mix energy � f j ( ω, t ) j w j ( f , t ) taken as proportion of source j in mix f j ( ω, t ) w j ( ω, t ) = j ′ f j ′ ( ω, t ) ∈ [0 1] � L. Benaroya, F. Bimbot, and R. Gribonval. Audio source separation with a single sensor. IEEE Trans. on Audio, Speech and Language Processing , 14(1):191–199, January 2006 Liutkus ⋆ , Rafii, Pardo, Fitzgerald, Daudet Kernel Spectrogram Models for source separation 05/12/2014 6/18

Audio separation Spectrogram models Results Conclusion Time-Frequency masking challenges Liutkus ⋆ , Rafii, Pardo, Fitzgerald, Daudet Kernel Spectrogram Models for source separation 05/12/2014 7/18

Audio separation Spectrogram models Results Conclusion Iterative approaches main ideas Liutkus ⋆ , Rafii, Pardo, Fitzgerald, Daudet Kernel Spectrogram Models for source separation 05/12/2014 8/18

Audio separation Spectrogram models Results Conclusion The need for spectrograms models Given ˆ s j ( ω, t ), how to estimate f j ( ω, t )? Example: spatial-only models Assuming a Local Gaussian Model s j ( ω, t ) ∼ N c (0 , f j ( ω, t ) R j ( ω )) � � we take ˆ s j ( ω, t ) | f , ˆ f j ( ω, t ) = argmax R j ( ω ) p f with R j ( ω ) related to spatial positions ⇒ only works if sources are well separated spatially We want to improve by using prior knowledge on f j N.Q.K. Duong, E. Vincent, and R. Gribonval. Under-determined reverberant audio source separation using a full-rank spatial covariance model. Audio, Speech, and Language Processing, IEEE Transactions on , 18(7):1830 –1840, sept. 2010 Liutkus ⋆ , Rafii, Pardo, Fitzgerald, Daudet Kernel Spectrogram Models for source separation 05/12/2014 9/18

Audio separation Spectrogram models Results Conclusion Global spectrogram models nonnegative matrix factorization A. Ozerov, E. Vincent, and F. Bimbot. A general flexible framework for the handling of prior information in audio source separation. Audio, Speech, and Language Processing, IEEE Transactions on , PP(99):1, 2011 Y. Sala¨ un, E. Vincent, N. Bertin, N. Souvira` a-Labastie, X. Jaureguiberry, D. Tran, and F. Bimbot. The Flexible Audio Source Separation Toolbox (FASST) version 2.0. In ICASSP , 2014 Liutkus ⋆ , Rafii, Pardo, Fitzgerald, Daudet Kernel Spectrogram Models for source separation 05/12/2014 10/18

Audio separation Spectrogram models Results Conclusion Kernel spectrogram models principles NMF is a global single model for all of f j Sometimes, our knowledge is only local ⇒ We assume f j ( ω, t ) is equal to some neighbours I j ( ω, t ) Example: harmonic/percussive local models Percussive sounds are locally constant through frequency Harmonic sounds are locally constant through time percussive harmonic D. Fitzgerald. Harmonic/percussive separation using median filtering. In Proc. of the 13th Int. Conference on Digital Audio Effects (DAFx-10) , Graz, Austria, September 2010 Liutkus ⋆ , Rafii, Pardo, Fitzgerald, Daudet Kernel Spectrogram Models for source separation 05/12/2014 11/18

Audio separation Spectrogram models Results Conclusion Kernel spectrogram models examples � ω ′ , t ′ � � ω ′ , t ′ � ∀ ∈ I j ( ω, t ) , f j ≈ f j ( ω, t ) D. Fitzgerald. Harmonic/percussive separation using median filtering. In Proc. of the 13th Int. Conference on Digital Audio Effects (DAFx-10) , Graz, Austria, September 2010 Z. Rafii and B. Pardo. A simple music/voice separation method based on the extraction of the repeating musical structure. In Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on , pages 221 –224, may 2011 D. FitzGerald. Vocal separation using nearest neighbours and median filtering. In Proceedings of the 23nd IET Irish Signals and Systems Conference , pages 583–588, Maynooth, 2012 Z. Rafii and B. Pardo. Music/voice separation using the similarity matrix. In Proceedings of the 13th International Conference on Music Information Retrieval (ISMIR) , pages 583–588, 2012 Liutkus ⋆ , Rafii, Pardo, Fitzgerald, Daudet Kernel Spectrogram Models for source separation 05/12/2014 12/18

Audio separation Spectrogram models Results Conclusion Kernel spectrogram models objective Combining all those local models together! Example: voice/music separation Musical background 5 sources repeating at different scales (beat, downbeat, ...) +1 source which is stable along time (strings, synths) Voice with a locally constant spectrogram (cross-like kernel) Liutkus ⋆ , Rafii, Pardo, Fitzgerald, Daudet Kernel Spectrogram Models for source separation 05/12/2014 13/18

Audio separation Spectrogram models Results Conclusion Kernel backfitting algorithm Liutkus ⋆ , Rafii, Pardo, Fitzgerald, Daudet Kernel Spectrogram Models for source separation 05/12/2014 14/18

Audio separation Spectrogram models Results Conclusion Kernel backfitting algorithm monochannel version Input Mixture STFT x ( ω, t ) Neighbourhoods I j ( ω, t ), also called“proximity kernels” Initialization: ∀ j , ˆ f j ( ω, t ) ← | x ( ω, t ) | 2 : simply take mix spectrogram Iterate Separation with Wiener filtering � � ˆ j ′ ˆ compute estimates ˆ s j ( ω, t ) = f j ( ω, t ) / � f j ′ ( ω, t ) x ( ω, t ) Spectrograms fitting s j ( l ) | 2 with kernel I j ( ω, t ) ˆ f j ( ω, t ) ← median filter | ˆ Output : source estimates ˆ s j Liutkus ⋆ , Rafii, Pardo, Fitzgerald, Daudet Kernel Spectrogram Models for source separation 05/12/2014 15/18

Audio separation Spectrogram models Results Conclusion BSSeval results on“pet shop sessions”by the Beach Boys ∆ SDR performance for VOCALS ∆ SDR performance for BACKGROUND 5 5 0 0 − 5 − 5 − 10 − 10 − 15 − 15 − 20 − 20 − 25 − 25 − 30 − 30 KAM multirepet+harm KAM multirepet+harm aREPET+DUET aREPET+DUET KAM multirepet KAM multirepet REPET − SIM REPET − SIM aREPET aREPET RPCA RPCA IMM IMM Liutkus ⋆ , Rafii, Pardo, Fitzgerald, Daudet Kernel Spectrogram Models for source separation 05/12/2014 16/18

Audio separation Spectrogram models Results Conclusion Demo external demo Liutkus ⋆ , Rafii, Pardo, Fitzgerald, Daudet Kernel Spectrogram Models for source separation 05/12/2014 17/18

Audio separation Spectrogram models Results Conclusion Conclusion A general framework for combining different kernel models Handles multichannel mixtures State-of-the-art performance for music separation Easy to implement and fast algorithms ⇒ full demo at www.loria.fr/~aliutkus/kam/ To go further Formalization ⇒ optimization framework with robust cost-functions ⇒ equivalence with EM algorithm in some cases Combination with other techniques Learning source kernels automatically? ⇒ maximizing size of kernel (robustness) ⇒ maximizing invariance to median filtering Liutkus ⋆ , Rafii, Pardo, Fitzgerald, Daudet Kernel Spectrogram Models for source separation 05/12/2014 18/18

Kernel Spectrogram Models for source separation Antoine Liutkus 1 , - PowerPoint PPT Presentation

Audio separation Spectrogram models Results Conclusion Kernel Spectrogram Models for source separation Antoine Liutkus 1 , Zafar Rafii 2 , Bryan Pardo 2 Derry Fitzgerald 3 , Laurent Daudet 4 1 Inria, Universit e de Lorraine, LORIA, UMR 7503,

Score informed audio source separation using a parametric model of non-negative spectrogram

Separation energies A = 21 isobaric chain one-nucleon separation energies two-nucleon separation

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Underdetermined Source Separation Using Speaker Subspace Models Thesis Defense Ron Weiss May 4,

and Retrieval Source: H. Jegou Source: H. Jegou Source: H. Jegou Source: H. Jegou Source: H.

A Classification Approach to Single Channel Source Separation CS 6772 Project Ron Weiss

Black Kernel Rot Malady of Pecan B Wood, C Bock, l Wells, T Cottrell, M Hotchkiss Black Kernel

Kernel Properties - Convexity Leila Wehbe October 1st 2013 Leila Wehbe Kernel Properties -

Processes, Protection and the Kernel: Processes, Protection and the Kernel: Mode, Space, and

Linux Kernel Debugging Your kernel just oopsed - What do you do, hotshot? Muli Ben-Yehuda

Introduction to Linux Kernel Modules Luca Abeni luca.abeni@santannapisa.it Linux Kernel Modules

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Graphical Models Graphical Models Conditional Independence 1 Steven J Zeil d-Separation 2

Debugging the Linux Kernel with GDB Kieran Bingham Debugging the Linux Kernel with GDB Many

Paper-Reading-Group Nested Kernel: An Operating System Architecture for Intra-Kernel Privilege

Audio Data Representations Juhan Nam Types of Music Data Audio MP3, WAV Score

E9 205 Machine Learning for Signal Processing Non-negative Matrix Factorization 16-09-2019 Audio

CTP431- Music and Audio Computing Audio Signal Processing (Part #1) Graduate School of Culture

Routine Visits: The Evidence David U. Himmelstein, M.D. Hunter College/CUNY Cambridge

Acoustic Fingerprinting Soundz Jake Runzer June 28, 2018 Jake Runzer Acoustic Fingerprinting

Telefonica Research @ Trecvid 2011 Xavier Anguera, Daru Xu 1

CSE 562: Mobile Systems & Applications Quals Course Systems Area Shyam Gollakota First

Semi-Supervised Adversarial Audio Source Separation applied to Singing Voice Extraction Daniel

Sambuz

Useful Links

Newsletter

Mail Us

Kernel Spectrogram Models for source separation Antoine Liutkus 1 , - PowerPoint PPT Presentation

Audio separation Spectrogram models Results Conclusion Kernel Spectrogram Models for source separation Antoine Liutkus 1 , Zafar Rafii 2 , Bryan Pardo 2 Derry Fitzgerald 3 , Laurent Daudet 4 1 Inria, Universit e de Lorraine, LORIA, UMR 7503,

Score informed audio source separation using a parametric model of non-negative spectrogram

Separation energies A = 21 isobaric chain one-nucleon separation energies two-nucleon separation

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Underdetermined Source Separation Using Speaker Subspace Models Thesis Defense Ron Weiss May 4,

and Retrieval Source: H. Jegou Source: H. Jegou Source: H. Jegou Source: H. Jegou Source: H.

A Classification Approach to Single Channel Source Separation CS 6772 Project Ron Weiss

Black Kernel Rot Malady of Pecan B Wood, C Bock, l Wells, T Cottrell, M Hotchkiss Black Kernel

Kernel Properties - Convexity Leila Wehbe October 1st 2013 Leila Wehbe Kernel Properties -

Processes, Protection and the Kernel: Processes, Protection and the Kernel: Mode, Space, and

Linux Kernel Debugging Your kernel just oopsed - What do you do, hotshot? Muli Ben-Yehuda

Introduction to Linux Kernel Modules Luca Abeni luca.abeni@santannapisa.it Linux Kernel Modules

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Graphical Models Graphical Models Conditional Independence 1 Steven J Zeil d-Separation 2

Debugging the Linux Kernel with GDB Kieran Bingham Debugging the Linux Kernel with GDB Many

Paper-Reading-Group Nested Kernel: An Operating System Architecture for Intra-Kernel Privilege

Audio Data Representations Juhan Nam Types of Music Data Audio MP3, WAV Score

E9 205 Machine Learning for Signal Processing Non-negative Matrix Factorization 16-09-2019 Audio

CTP431- Music and Audio Computing Audio Signal Processing (Part #1) Graduate School of Culture

Routine Visits: The Evidence David U. Himmelstein, M.D. Hunter College/CUNY Cambridge

Acoustic Fingerprinting Soundz Jake Runzer June 28, 2018 Jake Runzer Acoustic Fingerprinting

Telefonica Research @ Trecvid 2011 Xavier Anguera, Daru Xu 1

CSE 562: Mobile Systems &amp; Applications Quals Course Systems Area Shyam Gollakota First

Semi-Supervised Adversarial Audio Source Separation applied to Singing Voice Extraction Daniel

Sambuz

Useful Links

Newsletter

Mail Us

CSE 562: Mobile Systems & Applications Quals Course Systems Area Shyam Gollakota First