Iterative Bayesian and MMSE-based noise compensation techniques for - - PowerPoint PPT Presentation

iterative bayesian and mmse based noise compensation
SMART_READER_LITE
LIVE PREVIEW

Iterative Bayesian and MMSE-based noise compensation techniques for - - PowerPoint PPT Presentation

Iterative Bayesian and MMSE-based noise compensation techniques for speaker recognition in the i-vector space Waad Ben Kheder Driss Matrouf Moez Ajili Jean-Fran cois Bonastre LIA laboratory University of Avignon Odyssey, 2016 1/27


slide-1
SLIDE 1

Iterative Bayesian and MMSE-based noise compensation techniques for speaker recognition in the i-vector space

Waad Ben Kheder Driss Matrouf Moez Ajili Jean-Fran¸ cois Bonastre

LIA laboratory University of Avignon

Odyssey, 2016

1/27

slide-2
SLIDE 2

Outline

1 Introduction

State of the art speaker recognition systems Dealing with noise in speaker recognition systems

2 I-vector denoising using I-MAP and the Kabsch algorithm

Motivation The I-MAP denoising procedure The Kabsch algorithm

3 Experimental protocol and results

Experimental protocol Results

2/27

slide-3
SLIDE 3

Outline

1 Introduction

State of the art speaker recognition systems Dealing with noise in speaker recognition systems

2 I-vector denoising using I-MAP and the Kabsch algorithm

Motivation The I-MAP denoising procedure The Kabsch algorithm

3 Experimental protocol and results

Experimental protocol Results

3/27

slide-4
SLIDE 4

State of the art speaker recognition systems

Structure of a speaker recognition system

4/27

slide-5
SLIDE 5

Outline

1 Introduction

State of the art speaker recognition systems Dealing with noise in speaker recognition systems

2 I-vector denoising using I-MAP and the Kabsch algorithm

Motivation The I-MAP denoising procedure The Kabsch algorithm

3 Experimental protocol and results

Experimental protocol Results

5/27

slide-6
SLIDE 6

Dealing with noise in speaker recognition systems

Many techniques can be used to deal with noise :

  • Speech enhancement techniques
  • Features compensation (VTS, SPLICE,..)
  • Model compensation (PMC,..)
  • Noise robust scoring (multi-style training)
  • DNN-based techniques (robust feature extraction, robust stats

computation, ..)

6/27

slide-7
SLIDE 7

Outline

1 Introduction

State of the art speaker recognition systems Dealing with noise in speaker recognition systems

2 I-vector denoising using I-MAP and the Kabsch algorithm

Motivation The I-MAP denoising procedure The Kabsch algorithm

3 Experimental protocol and results

Experimental protocol Results

7/27

slide-8
SLIDE 8

Motivation

Motivation

  • Cleaning i-vectors estimated over noisy data (noisy i-vectors).
  • Using a clean front-end (same i-vectors extraction procedure

for all noises).

  • Using a clean backend (same scoring procedure for all noises).

8/27

slide-9
SLIDE 9

Outline

1 Introduction

State of the art speaker recognition systems Dealing with noise in speaker recognition systems

2 I-vector denoising using I-MAP and the Kabsch algorithm

Motivation The I-MAP denoising procedure The Kabsch algorithm

3 Experimental protocol and results

Experimental protocol Results

9/27

slide-10
SLIDE 10

The I-MAP denoising procedure

I-MAP model

The I-MAP procedure is based on the relationship : N = Y − X (1) Where X and Y are two random variables representing respectively clean and noisy i-vectors and N represents the noise.

Hypothesis

Full-covariance Gaussian distributions are used for :

  • Clean i-vectors dX ∼ N(X; µX, ΣX)
  • Noise in the i-vector space dN ∼ N(N; µN, ΣN).

10/27

slide-11
SLIDE 11

The I-MAP denoising procedure

Solution

It is possible to write the cleaned-up version ˆ X0 of a noisy i-vector Y0 using MAP criterion as : ˆ X0 = (Σ−1

N + Σ−1 X )−1(Σ−1 N (Y0 − µN) + Σ−1 X µX)

(2) with :

  • Clean i-vectors dX ∼ N(X; µX, ΣX)
  • Noise in the i-vector space dN ∼ N(N; µN, ΣN).

11/27

slide-12
SLIDE 12

The I-MAP denoising procedure

Implementation

12/27

slide-13
SLIDE 13

The I-MAP denoising procedure

How to improve I-MAP ?

Problem : I-MAP can’t be used iteratively on noisy test data : the Gaussianity hypothesis is not guaranteed for residual noise. Solution : We propose to complement this technique by applying another MMSE-based approach that uses the Kabsch algorithm.

13/27

slide-14
SLIDE 14

Outline

1 Introduction

State of the art speaker recognition systems Dealing with noise in speaker recognition systems

2 I-vector denoising using I-MAP and the Kabsch algorithm

Motivation The I-MAP denoising procedure The Kabsch algorithm

3 Experimental protocol and results

Experimental protocol Results

14/27

slide-15
SLIDE 15

The Kabsch algorithm

Goal

The Kabsch algorithm finds the best translation vector and rotation matrix between two paired sets of points {xi}i=1..n and {yi}i=1..n.

Example

15/27

slide-16
SLIDE 16

The Kabsch algorithm

Formulation

Given two sets of paired points {xi}i=1..n and {yi}i=1..n. represented as matrices (PX and PY ) :

PX =      x1,1 x1,2 . . . x1,M x2,1 x2,2 . . . x2,M . . . . . . . . . xN,1 xN,2 . . . xN,M      PY =      y1,1 y1,2 . . . y1,M y2,1 y2,2 . . . y2,M . . . . . . . . . yN,1 yN,2 . . . yN,M     

The orthogonal Procrustes problem aims at finding the best

  • rthogonal matrix R that maps PX to PY according to:

R = argmin

R RPY − PXF

(3) where: RTR = IN and .F denotes the Frobenius norm.

16/27

slide-17
SLIDE 17

The Kabsch algorithm

Step 1: Translation of the two sets of points:

1 Computing the centroids of the clean and noisy sets of

i-vectors:

  • PX = centroid(PX)
  • PY = centroid(PY )

2 Centering all points of PX and PY around the origin of the

coordinate system:

  • ˜

PX i = PX i − PX for each row PX i of PX.

  • ˜

PY i = PY i − PY for each row PY i of PY .

17/27

slide-18
SLIDE 18

The Kabsch algorithm

Step 2: Estimation of the rotation matrix:

1 Estimation of a covariance matrix: A = ˜

PX

T ˜

PY

2 SVD decomposition of A: A = VSW T 3 Computing d = sign(det(WV T)) 4 Estimation of the rotation matrix R as:

R = W       1 . . . ... . . . 1 . . . . . . d       V T (4)

18/27

slide-19
SLIDE 19

The Kabsch algorithm

Step 3: Application of the rotation on test data:

Given a set of noisy test i-vectors {ti}i=1..N:

1 Centering test i-vectors:

˜ ti = ti − PY for all i in i = 1..N.

2 Rotating test i-vectors:

ˆ ti = R ˜ ti + PX for all i in i = 1..N.

19/27

slide-20
SLIDE 20

Outline

1 Introduction

State of the art speaker recognition systems Dealing with noise in speaker recognition systems

2 I-vector denoising using I-MAP and the Kabsch algorithm

Motivation The I-MAP denoising procedure The Kabsch algorithm

3 Experimental protocol and results

Experimental protocol Results

20/27

slide-21
SLIDE 21

Experimental protocol

Used data

  • Train: NIST SRE 2004, 2005, 2006, Switchboard.
  • Test: NIST SRE 2008 (det7 condition : All trials involve only

English language telephone speech in training and test).

SR system

  • 512 components gender-dependent GMM-UBM.
  • T matrix of low rank 400.
  • Two-covariance scoring.

21/27

slide-22
SLIDE 22

Outline

1 Introduction

State of the art speaker recognition systems Dealing with noise in speaker recognition systems

2 I-vector denoising using I-MAP and the Kabsch algorithm

Motivation The I-MAP denoising procedure The Kabsch algorithm

3 Experimental protocol and results

Experimental protocol Results

22/27

slide-23
SLIDE 23

Recognition performance using the Kabsch algorithm

Recognition performance on male data in different test conditions using clean enrollment and noisy test data

EER(%) Test condition Baseline Kabsch I-MAP I-MAP + Kabsch (1 iteration) I-MAP + Kabsch (2 iterations) Air-cooling noise 0dB 26.85 17.18 13.21 8.86 7.24 5dB 15.21 10.34 7.25 4.71 3.89 10dB 9.51 5.70 4.85 2.94 2.55 15dB 5.41 3.40 2.85 1.82 1.63 Car-driving noise 0dB 25.54 15.83 12.05 7.91 6.37 5dB 14.54 9.30 6.65 3.63 3.04 10dB 8.32 5.15 3.78 1.99 1.82 15dB 4.82 3.22 2.36 1.79 1.65

  • I-MAP : 40% to 60% relative EER improvement.
  • Kabsch: up to 45% of relative EER improvement.
  • I-MAP + Kabsch: up to 85% of relative EER improvement.

23/27

slide-24
SLIDE 24

Recognition performance using the Kabsch algorithm

Recognition performance on female data in different test conditions using clean enrollment and noisy test data

EER(%) Test condition Baseline Kabsch I-MAP I-MAP + Kabsch (1 iteration) I-MAP + Kabsch (2 iterations) Air-cooling noise 0dB 27.19 16.95 13.53 10.80 9.49 5dB 16.77 10.45 8.34 6.66 5.85 10dB 9.01 5.61 4.48 3.58 3.14 15dB 6.42 4.00 3.19 2.75 2.70 Car-driving noise 0dB 24.82 15.47 12.35 9.86 8.66 5dB 14.90 9.28 7.41 5.92 5.20 10dB 8.65 5.39 4.30 3.43 3.02 15dB 5.89 3.67 3.12 2.95 2.74

  • I-MAP : 40% to 60% relative EER improvement.
  • Kabsch: up to 45% of relative EER improvement.
  • I-MAP + Kabsch: up to 85% of relative EER improvement.

24/27

slide-25
SLIDE 25

Recognition performance using the Kabsch algorithm

Performance comparison in a heterogeneous setup for male and female data

EER (%) Male Female Baseline 29.65 31.02 Kabsch 18.78 19.95 I-MAP 16.27 17.46 I-MAP + Kabsch (1 iter.) 8.67 10.62 I-MAP + Kabsch (2 iter.) 7.39 9.28

25/27

slide-26
SLIDE 26

Summary

  • Using I-MAP yields 40% to 60% of relative EER improvement

compared to a baseline system performance.

  • Using the Kabsch algorithm yields up to 45% of relative EER

improvement compared to a baseline system performance.

  • Combining the two algorithms iteratively can achieve better

results while using the same train data achieving up to 85% of relative EER improvement.

26/27

slide-27
SLIDE 27

References I

Waad Ben Kheder et al. ”Additive noise compensation in the I-vector space for speaker recognition.”. 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2015. Wolfgang Kabsch ”A solution for the best rotation to relate two sets of vectors”. Acta Crystallographica 32:922.

27/27