GMM-based classification from noisy features Alexey Ozerov ( 1 ) , - - PowerPoint PPT Presentation

gmm based classification from noisy features
SMART_READER_LITE
LIVE PREVIEW

GMM-based classification from noisy features Alexey Ozerov ( 1 ) , - - PowerPoint PPT Presentation

GMM-based classification from noisy features Alexey Ozerov ( 1 ) , Mathieu Lagrange ( 2 ) and Em m anuel Vincent ( 1 ) 1st September 2011 (1) INRIA, Centre de Rennes - Bretagne Atlantique, (2) STMS Lab IRCAM - CNRS UPMC International


slide-1
SLIDE 1

GMM-based classification from noisy features

Alexey Ozerov ( 1 ), Mathieu Lagrange ( 2 ) and Em m anuel Vincent ( 1 )

1st September 2011

(1) INRIA, Centre de Rennes - Bretagne Atlantique, (2) STMS Lab IRCAM - CNRS – UPMC

International Workshop on Machine Listening in Multisource Environments (CHiME 2011), Florence, Italy

slide-2
SLIDE 2

1st September 2011 CHiME 2011, Florence, Italy 2

Outline

Introduction GMM decoding from noisy data GMM learning from noisy data Experiments Conclusions and further work

slide-3
SLIDE 3

1st September 2011 CHiME 2011, Florence, Italy 3

Introduction

Classification from noisy data

Classification from noisy or multi-source audio

Poor performance because of high noise

variability

Feature extraction Classification

Noisy signal Noisy features Decision

slide-4
SLIDE 4

1st September 2011 CHiME 2011, Florence, Italy 4

State of the art

Signal level: Noise suppression or

source separation

Feature extraction Classification Source separation

Decision Noisy signal Noisy features Separated signal

slide-5
SLIDE 5

1st September 2011 CHiME 2011, Florence, Italy 5

State of the art

Feature level: Features robust to

additive or convolute noise errors produced by source separation

Robust feature extraction Classification Source separation

Decision Noisy signal Noisy features Separated signal

slide-6
SLIDE 6

1st September 2011 CHiME 2011, Florence, Italy 6

State of the art

Classifier level: Classification that

accounts for possible distortion of the features, given some information about this distortion

Feature extraction Classification Source separation

Decision Noisy signal Noisy features Separated signal

Generative GMM-based classification

Information about feature distortion / UNCERTAINTY [Cooke01, Barker05, Deng05, Kolossa10]

slide-7
SLIDE 7

1st September 2011 CHiME 2011, Florence, Italy 7

State of the art limits and our contributions

Limit 1: It is assumed that the clean data

underlying the noisy observations have been generated by the GMMs.

Contribution 1: Introduction and

investigation of a new data-driven criterion for GMM learning and decoding as an alternative to the model-driven criterion.

[Cooke01, Barker05, Deng05, Kolossa10]

slide-8
SLIDE 8

1st September 2011 CHiME 2011, Florence, Italy 8

State of the art limits and our contributions

Limit 2: Uncertainty is taken into account

  • nly at the decoding stage, assuming that

the GMMs were trained from some clean data.

Contribution 2: Deriving two new

Expectation Maximization (EM) algorithms allowing learning GMMs from noisy data with Gaussian uncertainty for the both criteria considered.

[Cooke01, Barker05, Deng05, Kolossa10]

slide-9
SLIDE 9

1st September 2011 CHiME 2011, Florence, Italy 9

Outline

Introduction GMM decoding from noisy data GMM learning from noisy data Experiments Conclusions and further work

slide-10
SLIDE 10

1st September 2011 CHiME 2011, Florence, Italy 10

GMM decoding from noisy data

GMM Uncertainties

Binary (either observed or missing) Gaussian (“asymptotically” more general)

unknown known unknown known

[Cooke01, Barker05] [Deng05, Kolossa10]

slide-11
SLIDE 11

1st September 2011 CHiME 2011, Florence, Italy 11

Criteria

Criterion 1: Model-driven criterion

(likelihood integration) [ state of the art]

GMM Missing feature Feature expectation [Deng05, Kolossa10]

slide-12
SLIDE 12

1st September 2011 CHiME 2011, Florence, Italy 12

Criteria

Criterion 2: Data-driven criterion

(log-likelihood integration) [ proposed]

slide-13
SLIDE 13

1st September 2011 CHiME 2011, Florence, Italy 13

Outline

Introduction GMM decoding from noisy data GMM learning from noisy data Experiments Conclusions and further work

slide-14
SLIDE 14

1st September 2011 CHiME 2011, Florence, Italy 14

GMM learning from noisy data

Binary uncertainty

EM algorithm

Gaussian uncertainty

We derived two new EM algorithms for

the both criteria considered

[Ghahramani&Jordan94]

slide-15
SLIDE 15

1st September 2011 CHiME 2011, Florence, Italy 15

GMM learning from noisy data

Needed some approximations Generalizes “asymptotically” the binary uncertainty EM [Ghahramani&Jordan94]

slide-16
SLIDE 16

1st September 2011 CHiME 2011, Florence, Italy 16

Outline

Introduction GMM decoding from noisy data GMM learning from noisy data Experiments Conclusions and further work

slide-17
SLIDE 17

1st September 2011 CHiME 2011, Florence, Italy 17

Artificial uncertainty

Artificial uncertainty

gives us a possibility to control some

characteristics of the uncertainty,

allows us leaving the study of the

following situations for further work:

realistic feature-corrupting noise, estimated uncertainty covariances.

  • 1. is drawn from a Gaussian
  • 2. is drawn from
slide-18
SLIDE 18

1st September 2011 CHiME 2011, Florence, Italy 18

Characteristics of the uncertainty

Feature to Noise Ratio (FNR) (dB) Noise Variation Level (NVL) (dB)

slide-19
SLIDE 19

1st September 2011 CHiME 2011, Florence, Italy 19

Evaluated setups

All possible combinations of 375 setups

slide-20
SLIDE 20

1st September 2011 CHiME 2011, Florence, Italy 20

Artificial data

−5 5 −6 −4 −2 2 4 6 GMMs used for clean data generation GMM of class 1 GMM of class 2 GMM of class 3 −5 5 −6 −4 −2 2 4 6 Clean data Class 1 Class 2 Class 3 −5 5 −6 −4 −2 2 4 6 Noisy data (NVL = 0 dB, FNR = 10 dB) −5 5 −6 −4 −2 2 4 6 Noisy data (NVL = 8 dB, FNR = 10 dB)

slide-21
SLIDE 21

1st September 2011 CHiME 2011, Florence, Italy 21

Real data

Speaker recognition task Setting is quite similar to [ Reynolds95]

TIMIT database 10 male speakers 16-states GMMs Feature space dimension = 20

Differences with [ Reynolds95]

Features: Logarithms of Mel-Frequency Filter-

Bank outputs (LMFFB) instead of MFCC

GMMs with full covariance matrices

slide-22
SLIDE 22

1st September 2011 CHiME 2011, Florence, Italy 22

Artificial data results

−20 −10 10 20 10 20 30 40 50 60 70 80 90 100 FNR in test Correct classification rate Impact of FNR (NVL train = NVL test = 0 dB) Like int (FNR train = 0 dB) Like int (FNR train = 20 dB) Log like int (FNR train = 0 dB) Log like int (FNR train = 20 dB) No uncrt (FNR train = 0 dB) No uncrt (FNR train = 20 dB) 2 4 6 8 10 20 30 40 50 60 70 80 90 100 NVL in test Correct classification rate Impact of NVL (FNR train = FNR test = −10 dB) Like int (NVL train = 0 dB) Like int (NVL train = 8 dB) Log like int (NVL train = 0 dB) Log like int (NVL train = 8 dB) No uncrt (NVL train = 0 dB) No uncrt (NVL train = 8 dB)

slide-23
SLIDE 23

1st September 2011 CHiME 2011, Florence, Italy 23

Artificial data

−5 5 −6 −4 −2 2 4 6 GMMs used for clean data generation GMM of class 1 GMM of class 2 GMM of class 3 −5 5 −6 −4 −2 2 4 6 Clean data Class 1 Class 2 Class 3 −5 5 −6 −4 −2 2 4 6 Noisy data (NVL = 0 dB, FNR = 10 dB) −5 5 −6 −4 −2 2 4 6 Noisy data (NVL = 8 dB, FNR = 10 dB)

slide-24
SLIDE 24

1st September 2011 CHiME 2011, Florence, Italy 24

Real data results

−20 −10 10 20 10 20 30 40 50 60 70 80 90 100 FNR in test Correct classification rate Impact of FNR (NVL train = NVL test = 0 dB) Like int (FNR train = 10 dB) Like int (FNR train = 20 dB) Log like int (FNR train = 10 dB) Log like int (FNR train = 20 dB) No uncrt (FNR train = 10 dB) No uncrt (FNR train = 20 dB) 2 4 6 8 10 20 30 40 50 60 70 80 90 100 NVL in test Correct classification rate Impact of NVL (FNR train = FNR test = 0 dB) Like int (NVL train = 0 dB) Like int (NVL train = 8 dB) Log like int (NVL train = 0 dB) Log like int (NVL train = 8 dB) No uncrt (NVL train = 0 dB) No uncrt (NVL train = 8 dB)

slide-25
SLIDE 25

1st September 2011 CHiME 2011, Florence, Italy 25

Outline

Introduction GMM decoding from noisy data GMM learning from noisy data Experiments Conclusions and further work

slide-26
SLIDE 26

1st September 2011 CHiME 2011, Florence, Italy 26

Conclusions and further work

Conclusions

We validate the model-driven uncertainty decoding

approach as compared to a data-driven approach.

We show that considering the uncertainty allows us to handle the heterogeneity of noise between the

training and testing sets,

exploit the variability of noise for improved

performance.

Further work

Considering realistic feature-corrupting noise and

uncertainty covariances estimation.

Considering the log-likelihood integration within a

GMM-based classification framework with discriminative training.

slide-27
SLIDE 27

1st September 2011 CHiME 2011, Florence, Italy 27

References

  • [ Cooke01] M. Cooke, “Robust automatic speech recognition with

missing and unreliable acoustic data,” Speech Communication, vol. 34, no. 3, pp. 267–285, Jun. 2001.

  • [ Barker05] J. Barker, M. Cooke, and D. Ellis, “Decoding speech in the

presence of other sources,” Speech Communication, vol. 45, no. 1,

  • pp. 5–25, Jan. 2005.
  • [ Deng05] L. Deng, J. Droppo, and A. Acero, “Dynamic compensation
  • f HMM variances using the feature enhancement uncertainty

computed from a parametric model of speech distortion,” IEEE Transactions on Speech and Audio Processing, vol. 13, no. 3, pp. 412–421, May 2005.

  • [ Kolossa10] D. Kolossa, R. Fernandez Astudillo, E. Hoffmann, and R.

Orglmeister, “Independent component analysis and time-frequency masking for speech recognition in multitalker conditions,” EURASIP Journal on Audio, Speech, and Music Processing, vol. 2010, pp. 1–14, 2010.

  • [ Ghahramani&Jordan94] Z. Ghahramani and M. Jordan, “Supervised

learning from incomplete data via an EM approach,” in Advance on Neural Information Processing Systems, 1994, pp. 120–127.

  • [ Reynolds95] D. Reynolds, “Large population speaker identification

using clean and telephone speech,” IEEE Signal Processing Letters,

  • vol. 2, no. 3, pp. 46–48, Mar. 1995.