Adaptive Filtering for Music/Voice Separation Exploiting the - PowerPoint PPT Presentation

Time-Fequency masking Fixed patterns Varying patterns Demonstration Adaptive Filtering for Music/Voice Separation Exploiting the Repeating Musical Structure Adaptive REPET Antoine Liutkus 1 , Zafar Rafii 2 , Roland Badeau 1 , Bryan Pardo 2 , el Richard 1 Ga¨ 1 Telecom ParisTech, CNRS LTCI, Paris, France 2 Northwestern University, EECS Department, Evanston, USA Liutkus, Rafii et al Adaptive REPET ICASSP 2012, Kyoto, Japan

Time-Fequency masking Fixed patterns Varying patterns Demonstration Notation Source separation: notation voice v voice spectrogram V 1 5000 frequency (Hz) 3750 2500 0 1250 −1 0 Background b Background spectrogram B 1 5000 frequency (Hz) 3750 0 2500 1250 −1 0 mix x mix spectrogram X 1 5000 frequency (Hz) 3750 0 2500 1250 −1 0 0 10 20 30 40 50 1000 2000 3000 4000 5000 time (s) frame Liutkus, Rafii et al Adaptive REPET

Time-Fequency masking Fixed patterns Varying patterns Demonstration Notation Separation as an adaptive filter Separating a source = filtering the mixture Time-varying filter w t : different for each frame t Element-wise weighting of the STFT Here: W ∈ [0 1] Time−Frequency mask W mix spectrogram X Weighted mix spectrogram W .* X 5000 5000 5000 3750 3750 3750 frequency (Hz) frequency (Hz) frequency (Hz) 2500 2500 2500 1250 1250 1250 0 0 0 1000 2000 3000 4000 5000 1000 2000 3000 4000 5000 1000 2000 3000 4000 5000 frame frame frame Liutkus, Rafii et al Adaptive REPET

Time-Fequency masking Fixed patterns Varying patterns Demonstration Time-frequency masks Time-Frequency masks interpretation W ( f , t ) ∈ [0 1] : Proportion of the source of interest in the mix. W ( f , t ) ≈ 1 ⇒ TF bin ( f , t ) mostly comes from source of interest W ( f , t ) ≈ 0 ⇒ TF bin ( f , t ) mostly comes from other sources Comb filter Given a pitch contour f 0 ( t ), keep multiples of f 0 ( t ) Time−varying comb−filter 500 frequency (Hz) 0 100 200 300 400 500 600 700 800 900 frame Liutkus, Rafii et al Adaptive REPET

Time-Fequency masking Fixed patterns Varying patterns Demonstration Time-frequency masks Beyond the harmonic model Modeling the accompaniement Most studies focus on harmonic voice models : Voice assumed harmonic and predominant pitch is estimated Filtering e.g. through comb filters Problems : breathy voices ? Consonants ? Loud accompaniement ? We focus on a model for the background B ! Liutkus, Rafii et al Adaptive REPET

Time-Fequency masking Fixed patterns Varying patterns Demonstration Time-frequency masks Filtering given the model From the B to the mask Mask from B alone Imagine X and B are available. What is W ? X ( f , t ) close to B ( f , t ) → W ( f , t ) ≈ 1 X ( f , t ) far from B ( f , t ) → W ( f , t ) ≈ 0 Binary Mask: 0 or 1 based on a thresholding of B X Soft mask: � � − (log X ( f , t ) − log B ( f , t )) 2 W ( f , t ) = exp λ 2 Liutkus, Rafii et al Adaptive REPET

Time-Fequency masking Fixed patterns Varying patterns Demonstration REPET Repeating patterns in music modeling B Musical background is repetitive ! Background spectrogram B and its repeating pattern 7500 frequency (Hz) T 5000 2500 0 frames Given several repetitions, average to estimate B and filter it out ! Liutkus, Rafii et al Adaptive REPET

Time-Fequency masking Fixed patterns Varying patterns Demonstration REPET REpeating Pattern Extraction Technique (REPET) Original REPET algorithm Estimate a fixed repeating period T Estimate the fixed repeating pattern through averaging Compute W as a binary mask Liutkus, Rafii et al Adaptive REPET

Time-Fequency masking Fixed patterns Varying patterns Demonstration Advantages and limitations Advantages and limitations of REPET Advantages Fast Efficient for constant rythmic patterns (electro, short excerpts) Limitations Repeating pattern is changing over time Binary masking leads to artifacts We extend REPET to varying repeating patterns Liutkus, Rafii et al Adaptive REPET

Time-Fequency masking Fixed patterns Varying patterns Demonstration Time-varying period Pseudo-periodic patterns Patterns are not fixed: period may vary pattern may vary Frequency bands of B are assumed pseudo periodic, with the same period log−value of three frequency bands of the spectrogram B Background spectrogram B 5000 frequency (Hz) 3750 band 20 band 40 band 60 2500 1250 0 1000 2000 3000 4000 5000 0 frame Liutkus, Rafii et al Adaptive REPET

Time-Fequency masking Fixed patterns Varying patterns Demonstration Time-varying period Beat-spectrum estimation Estimating the period (1/2) Perform a short-term analysis of each band Add them all together Beat spectrogram : rythmic content of the signal spectrogram of band 1001 100 frequency (1/frame) 80 60 40 20 50 100 150 200 250 bag of frames mix spectrogram X spectrogram of band 401 beat spectrogram 10000 100 100 frequency (1/frame) frequency (1/frame) 80 80 frequency (Hz) 7500 60 60 5000 40 40 2500 20 20 0 1000 2000 3000 4000 5000 50 100 150 200 250 50 100 150 200 250 frame bag of frames bag of frames spectrogram of band 20 100 frequency (1/frame) 80 60 40 20 50 100 150 200 250 Liutkus, Rafii et al bag of frames Adaptive REPET

Time-Fequency masking Fixed patterns Varying patterns Demonstration Time-varying period Pseudo-period estimation Compute the beat spectrogram Estimate the time-varying repeating period Any frequency-based pitch detector will do ! Liutkus, Rafii et al Adaptive REPET

Time-Fequency masking Fixed patterns Varying patterns Demonstration Model and estimation Background model given T 0 ( t ) Background model ∀ t , accompaniement is periodic for 2 K periods around t : B ( f , t ) = B ( f , t + kT 0 ( t )) , k = − K · · · K 7500 frequency (Hz) 5000 2500 0 Liutkus, Rafii et al Adaptive REPET

Time-Fequency masking Fixed patterns Varying patterns Demonstration Model and estimation Voice model Voice model voice V is assumed to be sparse voice spectrogram V 5000 frequency (Hz) 3750 2500 1250 Liutkus, Rafii et al Adaptive REPET

Time-Fequency masking Fixed patterns Varying patterns Demonstration Model and estimation Background estimation estimation of B given X and T 0 ( t ) Sparsity of V Most of the time , V ≈ 0 ⇒ X ≈ B Sometimes, V active ⇒ outliers mix spectrogram X 5000 frequency (Hz) 3750 2500 1250 0 ˆ B ( f , t ) = median [ X ( f , t + kT 0 ( t ))] k = − K ··· K Liutkus, Rafii et al Adaptive REPET

Time-Fequency masking Fixed patterns Varying patterns Demonstration Model and estimation Adaptive REPET Block diagram Liutkus, Rafii et al Adaptive REPET

Time-Fequency masking Fixed patterns Varying patterns Demonstration Demonstration Demonstration on different musical genres Liutkus, Rafii et al Adaptive REPET

Time-Fequency masking Fixed patterns Varying patterns Demonstration Conclusion Adaptive algorithms for complete recordings Fast (approx. reading time) Extensions : from repetitivity to self-similarity Liutkus, Rafii et al Adaptive REPET

Adaptive Filtering for Music/Voice Separation Exploiting the - PowerPoint PPT Presentation

Time-Fequency masking Fixed patterns Varying patterns Demonstration Adaptive Filtering for Music/Voice Separation Exploiting the Repeating Musical Structure Adaptive REPET Antoine Liutkus 1 , Zafar Rafii 2 , Roland Badeau 1 , Bryan Pardo 2 ,

MUSIC THERAPY MUSIC THERAPY What is music therapy? Music therapy is simply the process of using

Slide 1 Page: 1 The Leader's Voice Slide 3 Page: 5 The Leader's Voice Slide 4 Page: 6 The

Filtering Cubemaps Filtering Cubemaps Angular Extent Filtering and Edge Seam Fixup Methods

Traffic Control Mechanisms Filtering Source address filtering Other forms of filtering

Lesson 7 Rate Conversion Filtering and Downsampling interchange Filtering and Upsampling

JEWISH MUSIC 101: WHAT IS JEWISH MUSIC? A PROGRAM OF THE LOWELL MILKEN FUND FOR AMERICAN JEWISH

The intriguing case of sad music Dr. Jonna Vuoskoski jonna.vuoskoski@music.ox.ac.uk Music &

Separation energies A = 21 isobaric chain one-nucleon separation energies two-nucleon separation

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

9.4 Local Perception Filters 9.4 Local Perception Filters Exploiting Exploiting Perceptual

ECE 516: Adaptive Digital Filters Lecture 8 (Kalman Filtering) Mojtaba Soltanalian Kalman

DMR and Digital Voice Modes DMR and Digital Voice Modes DMR and Digital Voice Modes DMR and

Digital Voice VHF, UHF, and HF Analog Voice - AM/SSB Analog Voice - FM Digital Voice GMSK UHF

Optimal and Adaptive Filtering Murat ney M.Uney@ed.ac.uk Institute for Digital Communications

Automated Oscillator Macromodelling Techniques for Capturing Amplitude Variations and Injection

A Sustainable Approach to The cost to the health economy in the UK is Telehealthcare in

www.gensource.ca (TSX.V: GSP) June 20, 2017 Disclaimer This presentation does not constitute

Wollongong to Ann Arbor Presentation scheduled start time: Ann Arbor 11 am EDT, Thursday, July

U.S. Environmental Protection Agency Clean Air Scientific Advisory Committee (CASAC) Nitrogen

Sound Tax Management in Difficult Times June 9, 2011 www.ryanco.ca Sound Tax Management in

Coloration in Wave Field Synthesis Hagen Wierstorf 1 , Christoph Hohnerlein 1 , Sascha Spors 2 ,

National Digital Library of India (NDLI) Towards Building a National Asset Vijaygarh Jyotish Ray

Adaptive Filtering for Music/Voice Separation Exploiting the - PowerPoint PPT Presentation

Time-Fequency masking Fixed patterns Varying patterns Demonstration Adaptive Filtering for Music/Voice Separation Exploiting the Repeating Musical Structure Adaptive REPET Antoine Liutkus 1 , Zafar Rafii 2 , Roland Badeau 1 , Bryan Pardo 2 ,

MUSIC THERAPY MUSIC THERAPY What is music therapy? Music therapy is simply the process of using

Slide 1 Page: 1 The Leader's Voice Slide 3 Page: 5 The Leader's Voice Slide 4 Page: 6 The

Filtering Cubemaps Filtering Cubemaps Angular Extent Filtering and Edge Seam Fixup Methods

Traffic Control Mechanisms Filtering Source address filtering Other forms of filtering

Lesson 7 Rate Conversion Filtering and Downsampling interchange Filtering and Upsampling

JEWISH MUSIC 101: WHAT IS JEWISH MUSIC? A PROGRAM OF THE LOWELL MILKEN FUND FOR AMERICAN JEWISH

The intriguing case of sad music Dr. Jonna Vuoskoski jonna.vuoskoski@music.ox.ac.uk Music &amp;

Separation energies A = 21 isobaric chain one-nucleon separation energies two-nucleon separation

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

9.4 Local Perception Filters 9.4 Local Perception Filters Exploiting Exploiting Perceptual

ECE 516: Adaptive Digital Filters Lecture 8 (Kalman Filtering) Mojtaba Soltanalian Kalman

DMR and Digital Voice Modes DMR and Digital Voice Modes DMR and Digital Voice Modes DMR and

Digital Voice VHF, UHF, and HF Analog Voice - AM/SSB Analog Voice - FM Digital Voice GMSK UHF

Optimal and Adaptive Filtering Murat ney M.Uney@ed.ac.uk Institute for Digital Communications

Automated Oscillator Macromodelling Techniques for Capturing Amplitude Variations and Injection

A Sustainable Approach to The cost to the health economy in the UK is Telehealthcare in

www.gensource.ca (TSX.V: GSP) June 20, 2017 Disclaimer This presentation does not constitute

Wollongong to Ann Arbor Presentation scheduled start time: Ann Arbor 11 am EDT, Thursday, July

U.S. Environmental Protection Agency Clean Air Scientific Advisory Committee (CASAC) Nitrogen

Sound Tax Management in Difficult Times June 9, 2011 www.ryanco.ca Sound Tax Management in

Coloration in Wave Field Synthesis Hagen Wierstorf 1 , Christoph Hohnerlein 1 , Sascha Spors 2 ,

National Digital Library of India (NDLI) Towards Building a National Asset Vijaygarh Jyotish Ray

The intriguing case of sad music Dr. Jonna Vuoskoski jonna.vuoskoski@music.ox.ac.uk Music &