Underdetermined Source Separation Using Speaker Subspace Models - PowerPoint PPT Presentation

Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions Underdetermined Source Separation Using Speaker Subspace Models Thesis Defense Ron Weiss May 4, 2009 Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 1 / 34

Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions Introduction 1 Speaker subspace model 2 Monaural speech separation 3 Binaural separation 4 Conclusions 5 Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 2 / 34

Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions Introduction 1 Speaker subspace model 2 Monaural speech separation 3 Binaural separation 4 Conclusions 5 Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 3 / 34

Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions Audio source separation Many real world signals contain contributions from multiple sources E.g. cocktail party Want to infer the original sources from the mixture Robust speech recognition Hearing aids Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 4 / 34

Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions Previous work Instantaneous mixing system       y 1 ( t ) a 11 a 1 I x 1 ( t ) . . . . . . . ... . . . .  =       . . . .      y C ( t ) x I ( t ) a C 1 a CI . . . Simplest case: more channels than sources (overdetermined) Perfect separation possible Use constraints on source signals to guide separation Independence constraints (e.g. independent component analysis) Spatial constraints (e.g. beamforming) Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 5 / 34

Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions Underdetermined source separation More sources than channels, need stronger constraints CASA: Use perceptual cues similar to human auditory system Segment STFT into short glimpses of each source By harmonicity, common onset, etc. Sequential grouping heuristics Create time-frequency mask for each source Inference based on prior source models Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 6 / 34

Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions Time-frequency masking Mixture Clean source 8 8 Frequency (kHz) Frequency (kHz) 6 6 4 4 0 2 2 −10 0 0 −20 Masks Reconstructed source (8.2 dB SNR) −30 8 8 −40 Frequency (kHz) Frequency (kHz) 6 6 −50 4 4 2 2 0 0 0.5 1 1.5 2 2.5 3 0.5 1 1.5 2 2.5 3 Time (sec) Time (sec) Natural sounds tend to be sparse in time and frequency 10% of spectrogram cells contain 78% of energy And redundant Still intelligible when 22% of source energy is masked Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 7 / 34

Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions Model-based separation Use constraints from prior source models to guide separation Leverage differences in spectral characteristics of different sources Hidden Markov models, log spectral features Factorial model inference e.g. IBM Iroquois system [Kristjansson et al., 2006] Speaker-dependent models Acoustic dynamics and grammar constraints Superhuman performance under some conditions Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 8 / 34

Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions Model-based separation – Limitations Rely on speaker-dependent models to disambiguate sources What if the task isn’t so well defined? No prior knowledge of speaker identities or grammar Use speaker-independent (SI) model for all sources Need strong temporal constraints or sources will permute “place white by t 4 now” mixed with “lay green with p 9 again” Separated source: “place white by t p 9 again” Solution: adapt speaker-independent model to compensate Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 9 / 34

Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions Introduction 1 Speaker subspace model 2 Model adaptation Eigenvoices Monaural speech separation 3 Binaural separation 4 Conclusions 5 Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 10 / 34

Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions Model selection vs. adaptation Speaker models Mean voice Speaker subspace bases Quantization boundaries Model selection (e.g. [Kristjansson et al., 2006] ) Given set of speaker-dependent (SD) models: Identify sources in mixture 1 Use corresponding models for separation 2 How to generalize to speakers outside of training set? Selection – choose closest model Adaptation – interpolate Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 11 / 34

Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions Model adaptation Feature 2 Adjust model parameters to better match observations Caveats Want to adapt to a single utterance, not Original distribution 1 Observations enough data for MLLR, MAP Adapted distribution Need adaptation framework with few Feature 1 parameters Observations are mixture of multiple sources 2 Iterative separation/adaptation algorithm Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 12 / 34

Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions Eigenvoice adaptation [Kuhn et al., 2000] Train a set of SD models Pack params into speaker supervector Samples from space of speaker variation Principal component analysis to find orthonormal bases for speaker subspace Model is linear combination of bases Speaker models Speaker subspace bases Other models Eigenvoice adaptation = µ + ¯ + U w B h µ adapted mean eigenvoice weights channel channel model voice bases bases weights Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 13 / 34

Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions Eigenvoice bases Mean Voice 8 −10 Frequency (kHz) −20 6 −30 4 −40 2 −50 b d g p t k jh ch s z f th v dh m n l r w y iy ih eh ey ae aaaway ah aoowuwax Eigenvoice dimension 1 8 8 Frequency (kHz) 6 6 Mean voice 4 4 = speaker-independent model 2 2 0 Eigenvoices shift formant b d g p t k jh ch s z f th v dh m n l r w y iy ih eh ey ae aaaway ah aoowuwax Eigenvoice dimension 2 8 frequencies, add pitch 8 Frequency (kHz) 6 6 Independent bases to capture 4 4 2 2 channel variation 0 b d g p t k jh ch s z f th v dh m n l r w y iy ih eh ey ae aaaway ah aoowuwax Eigenvoice dimension 3 8 8 Frequency (kHz) 6 6 4 4 2 2 0 b d g p t k jh ch s z f th v dh m n l r w y iy ih eh ey ae aaaway ah aoowuwax Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 14 / 34

Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions Introduction 1 Speaker subspace model 2 Monaural speech separation 3 Mixed signal model Adaptation algorithm Experiments Binaural separation 4 Conclusions 5 Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 15 / 34

Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions Eigenvoice factorial HMM Model mixture with combination of source HMMs Need adaptation parameters w i to estimate source signals x i ( t ) and vice versa Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 16 / 34

Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions Adaptation algorithm Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 17 / 34

Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions Adaptation example Mixture: t32_swil2a_m18_sbar9n 8 0 6 −20 4 2 −40 0 Adaptation iteration 1 8 0 6 −20 4 2 −40 0 Adaptation iteration 3 Frequency (kHz) 8 0 6 −20 4 2 −40 0 Adaptation iteration 5 8 0 6 −20 4 2 −40 0 SD model separation 8 0 6 −20 4 2 −40 0 0 0.5 1 1.5 Time (sec) Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 18 / 34

Underdetermined Source Separation Using Speaker Subspace Models - PowerPoint PPT Presentation

Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions Underdetermined Source Separation Using Speaker Subspace Models Thesis Defense Ron Weiss May 4, 2009 Ron Weiss Underdetermined Source

Subspace Polynomials and Cyclic Subspace Codes Netanel Raviv Joint work with: Prof. Tuvi Etzion

Graph based Subspace Segmentation Canyi Lu National University of Singapore Nov. 21, 2013

Separation energies A = 21 isobaric chain one-nucleon separation energies two-nucleon separation

Subspace Embeddings and p -Regression Using Exponential Random Variables David P. Woodruff

Solving Underdetermined Linear Equations and Overdetermined Quadratic Equations (using Convex

Cyclic Subspace Codes Via Subspace Polynomials Kamil Otal and Ferruh zbudak Middle East

Subspace Modeling and Selection Subspace Modeling and Selection for Noisy Speech Recognition for

Subspace Embeddings for Regression Lecture 12 October 1, 2020 Chandra (UIUC) CS498ABD 1 Fall

Subspace Information Criterion Subspace Information Criterion for Image Restoration for Image

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Math 211 Math 211 Lecture #21 Determinants October 16, 2002 2 Basis of a Subspace Basis of a

Speech Processing 15-492/18-492 Speaker ID Who is speaking? Speaker ID, Speaker Recognition

Sparse Solutions of Underdetermined Linear Equations by Linear Programming David Donoho &

Computing sparse solutions of underdetermined structured systems by greedy methods Stefan Kunis

A Classification Approach to Single Channel Source Separation CS 6772 Project Ron Weiss

and Retrieval Source: H. Jegou Source: H. Jegou Source: H. Jegou Source: H. Jegou Source: H.

Domain-Driven Design and CQRS Xebia ITR on Command Query Responsibility Separation Sjors

an introduction reach1to1 - 1 / 25 what we do big data solutions for capturing, storing,

Jeremy Planteen, GISP GIS Branch Manager Overview ODOT maintains data on a wide variety of

Building and deploying microservices with event sourcing, CQRS and Docker Chris Richardson

SCIENTIFIC AND TECHNICAL ADVISORY PANEL SCIENTIFIC AND TECHNICAL ADVISORY PANEL PRESENTATION TO

Updated Investor Presentation Noosa Mining Conference November 2019 July 2019 Investment

Fiscal Year 2016/17 3 months ended 30 June 2016 20 July 2016 Key Messages Q1 performance

10 th International Exchange Meeting on P&T STUDY OF MI NOR ACTI NI DES TRANSMUTATI ON I N

Underdetermined Source Separation Using Speaker Subspace Models - PowerPoint PPT Presentation

Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions Underdetermined Source Separation Using Speaker Subspace Models Thesis Defense Ron Weiss May 4, 2009 Ron Weiss Underdetermined Source

Subspace Polynomials and Cyclic Subspace Codes Netanel Raviv Joint work with: Prof. Tuvi Etzion

Graph based Subspace Segmentation Canyi Lu National University of Singapore Nov. 21, 2013

Separation energies A = 21 isobaric chain one-nucleon separation energies two-nucleon separation

Subspace Embeddings and p -Regression Using Exponential Random Variables David P. Woodruff

Solving Underdetermined Linear Equations and Overdetermined Quadratic Equations (using Convex

Cyclic Subspace Codes Via Subspace Polynomials Kamil Otal and Ferruh zbudak Middle East

Subspace Modeling and Selection Subspace Modeling and Selection for Noisy Speech Recognition for

Subspace Embeddings for Regression Lecture 12 October 1, 2020 Chandra (UIUC) CS498ABD 1 Fall

Subspace Information Criterion Subspace Information Criterion for Image Restoration for Image

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Math 211 Math 211 Lecture #21 Determinants October 16, 2002 2 Basis of a Subspace Basis of a

Speech Processing 15-492/18-492 Speaker ID Who is speaking? Speaker ID, Speaker Recognition

Sparse Solutions of Underdetermined Linear Equations by Linear Programming David Donoho &amp;

Computing sparse solutions of underdetermined structured systems by greedy methods Stefan Kunis

A Classification Approach to Single Channel Source Separation CS 6772 Project Ron Weiss

and Retrieval Source: H. Jegou Source: H. Jegou Source: H. Jegou Source: H. Jegou Source: H.

Domain-Driven Design and CQRS Xebia ITR on Command Query Responsibility Separation Sjors

an introduction reach1to1 - 1 / 25 what we do big data solutions for capturing, storing,

Jeremy Planteen, GISP GIS Branch Manager Overview ODOT maintains data on a wide variety of

Building and deploying microservices with event sourcing, CQRS and Docker Chris Richardson

SCIENTIFIC AND TECHNICAL ADVISORY PANEL SCIENTIFIC AND TECHNICAL ADVISORY PANEL PRESENTATION TO

Updated Investor Presentation Noosa Mining Conference November 2019 July 2019 Investment

Fiscal Year 2016/17 3 months ended 30 June 2016 20 July 2016 Key Messages Q1 performance

10 th International Exchange Meeting on P&amp;T STUDY OF MI NOR ACTI NI DES TRANSMUTATI ON I N

Sparse Solutions of Underdetermined Linear Equations by Linear Programming David Donoho &

10 th International Exchange Meeting on P&T STUDY OF MI NOR ACTI NI DES TRANSMUTATI ON I N