Constrained discriminative speaker verification specific to - - PowerPoint PPT Presentation

constrained discriminative speaker verification specific
SMART_READER_LITE
LIVE PREVIEW

Constrained discriminative speaker verification specific to - - PowerPoint PPT Presentation

Constrained discriminative speaker verification specific to normalized i-vectors P.M. Bousquet, J.F. Bonastre LIA University of Avignon the June 21, 2016 P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 1 / 26


slide-1
SLIDE 1

Constrained discriminative speaker verification specific to normalized i-vectors

P.M. Bousquet, J.F. Bonastre

LIA University of Avignon

the June 21, 2016 P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 1 / 26

slide-2
SLIDE 2

Discriminative approach for i-vector: SoA

P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 2 / 26

Normalization Gaussian-PLDA modelling

Logistic regression-based (SoA) with score coefficients with PLDA parameters

(µ,Φ,Λ)

Within-class covariance matrix W (centering and scaling) Length normalization

... parameters (µ,Φ,Λ)

Discriminative classifier

LLR score

slide-3
SLIDE 3

Discriminative approach for i-vector: proposed ...

P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 3 / 26

Additional normalization procedure

(intended to constrain the discriminative training) Orthonormal discriminative classifier a new approach ... Constrained (limited order of coefficients to optimize)

Normalization Gaussian-PLDA modelling

Logistic regression-based (SoA) with score coefficients with PLDA parameters

(µ,Φ,Λ)

Within-class covariance matrix W (centering and scaling) Length normalization

... parameters (µ,Φ,Λ)

Discriminative classifier

LLR score

slide-4
SLIDE 4

Gaussian-PLDA

Model A d-dimensional i-vector w can be decomposed as follows: w = µ + Φys + ε (1)

  • Φys and ε are assumed to be statistically independent and ε follows a centered

Gaussian distribution with full covariance matrix Λ.

  • Speaker factor ys can be a full-rank d-vector (two-covariance model) or

constrained to lie in the r-linear range of the d × r matrix Φ, (eigenvoice subspace).

P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 4 / 26

slide-5
SLIDE 5

Gaussian-PLDA scoring

Closed-form solution of LLR-score: the second degree polynomial function of wi and wj components: si,j = log P (wi, wj|Htar) P (wi, wj|Hnon) = wt

i Pwj + 1

2

  • wt

i Qwi + wt j Qwj

  • − µt (P + Q) (wi + wj)

+ µt (P + Q) µ + 1 2 log |At| − log |An| (2) where P = Λ−1Φ

  • 2ΦtΛ−1Φ + Ir

−1 ΦtΛ−1 Q = P − Λ−1Φ

  • ΦtΛ−1Φ + Ir

−1 ΦtΛ−1 At =

  • 2ΦtΛ−1Φ + Ir

−1 An =

  • ΦtΛ−1Φ + Ir

−1 (3)

P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 5 / 26

slide-6
SLIDE 6

Discriminative classifiers for speaker recognition

SoA: based on logistic regression

Given the dataset of target and non-target trials χtar,χnon with cardinalities Ntar, Nnon respectively, the log-probability of correctly classifying all training (total cross entropy) is equal to: TCE =

t∈χnon

P (Hnon|t)

  • 1

Nnon

t∈χtar

P (Htar|t)

  • 1

Ntar

(4) Goal: maximizing the (log-)TCE by gradient-descent with respect to some coefficients: PLDA LLR score coefficients (i.e. of score matrices P and Q). LLR-score can be written as a dot-product ϕi,j.ω between an expanded vector of a trial ϕi,j and a vector ω initialized with PLDA parameters [Burget et al., 2011] PLDA parameters (µ, Φ, Λ) [B¨

  • rgstrom and Mac Cree, 2013]

P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 6 / 26

slide-7
SLIDE 7

Discriminative classifiers for speaker recognition

Difficulties to overcome

P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 7 / 26

Discriminative training (DT) can suffer from various limitations: data insufficiency

  • ver-fitting on development data

respect of metaparameters conditions: definiteness, positivity / negativity of PLDA LLR-score covariance matrices ...

slide-8
SLIDE 8

Discriminative classifiers for speaker recognition

Difficulties to overcome

P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 7 / 26

Discriminative training (DT) can suffer from various limitations: data insufficiency

  • ver-fitting on development data

respect of metaparameters conditions: definiteness, positivity / negativity of PLDA LLR-score covariance matrices ... Constrained DT: training only a small amount of parameters = ⇒ order O(d), or even O(1), instead of O(d2). Some solutions [Rohdin et al., 2016, B¨

  • rgstrom and Mac Cree, 2013]:

single coefficient optimized for each dimension of the i-vector or, even, the four feature kinds that make up score.

  • nly mean vector µ and eigenvalues of PLDA matrices ΦΦt and Λ are

trained by DT and, even, their scaling factors only. metaparameters conditions: working with singular value decomposition of P and Q / flooring of parameters.

slide-9
SLIDE 9

Discriminative classifiers for speaker recognition

Difficulties to overcome

P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 7 / 26

Discriminative training (DT) can suffer from various limitations: data insufficiency

  • ver-fitting on development data

respect of metaparameters conditions: definiteness, positivity / negativity of PLDA LLR-score covariance matrices ... DT struggles to improve speaker detection when i-vectors have been first normalized, whereas this option has proven to achieve best performance in speaker verification.

slide-10
SLIDE 10

Normalization step

P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 8 / 26

Additional normalization procedure

(intended to constrain the discriminative training) Orthonormal discriminative classifier a new approach ... Constrained (limited order of coefficients to optimize)

Normalization Gaussian-PLDA modelling

Logistic regression-based (SoA) with score coefficients with PLDA parameters

(µ,Φ,Λ)

Within-class covariance matrix W (centering and scaling) Length normalization

... parameters (µ,Φ,Λ)

Discriminative classifier

LLR score

slide-11
SLIDE 11

Normalization step

P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 9 / 26

Within-class covariance matrix W (centering and scaling) Length normalization = ⇒ W is almost exactly isotropic, i.e. W ≈ σI, σ > 0

slide-12
SLIDE 12

Normalization step

P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 9 / 26

Within-class covariance matrix W (centering and scaling) Length normalization Proposed: Additional normalization step (which does not modify distances between i-vectors): Rotation by the eigenvector basis of between-class covariance matrix B of the training dataset. B = P∆Pt (SVD) w← Ptw

slide-13
SLIDE 13

Normalization step

P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 9 / 26

Within-class covariance matrix W (centering and scaling) Length normalization Proposed: Additional normalization step (which does not modify distances between i-vectors): Rotation by the eigenvector basis of between-class covariance matrix B of the training dataset. = ⇒ B is diagonal, = ⇒ W remains almost exactly isotropic (and therefore diagonal), since B-eigenvector basis is orthogonal. Assumptions: PLDA matrices ΦΦt, Λ become almost diagonal, and even isotropic for Λ (as a consequence, P and Q of score are almost diagonal)

slide-14
SLIDE 14

Normalization step

P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 9 / 26

Within-class covariance matrix W (centering and scaling) Length normalization Proposed: Additional normalization step (which does not modify distances between i-vectors): Rotation by the eigenvector basis of between-class covariance matrix B of the training dataset. Moreover, W−1B ≈ B ⇒ the LDA solution can be identified as the subspace of the first r eigenvectors of B. First r components of training i-vectors are approximately their projection onto the LDA r-subspace.

slide-15
SLIDE 15

Normalization step

The score can be rewritten as this sum of O(r) terms: si,j =

r

  • k=1
  • pkwi,kwj,k + 1

2qk

  • w2

i,k + w2 j,k

  • − (pk + qk) µk (wi,k + wj,k)
  • + resi,j

(5) where r is the range of the PLDA eigenvoice subspace resi,j sums all the diagonal terms beyond the r th dimension, all the

  • ff-diagonal terms and offsets.

Thus, we assume that the major proportion of variability in the LLR-score is contained into the first r terms of the sum above (the residual term is negligible).

P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 10 / 26

slide-16
SLIDE 16

Normalization step

Table: Analysis of PLDA parameters before and after the B-rotation additional normalization procedure.

before after male female male female Diagonality of... PLDA eigenvoice subspace ΦΦt 0.23 0.15 0.95 0.97 PLDA score matrix P 0.48 0.25 0.98 0.96 PLDA score matrix Q 0.41 0.23 0.96 0.97 Isotropy of PLDA nuisance variability Λ 0.98 0.96 0.99 0.97 Residual variance 0.29 0.42 0.004 0.004 Diagonality of the symmetric matrix ΦΦt:

Tr

  • diag(ΦΦt)

2

Tr((ΦΦt)2)

∈ [0, 1] Isotropy of Λ:

m2

Λ

d×Tr(Λ2) ∈ [0, 1] where mΛ denotes the mean value of

Λ-diagonal Variance of the residual term:

var(res) var(score) ∈ [0, 1]

P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 11 / 26

slide-17
SLIDE 17

Normalization step

Comparison of PLDA baseline and the proposed simplified score (without the residual term).

Table: Speaker recognition results of NIST-SRE evaluation 2010 telephone extended (det 5 ext), with BUT i-vectors 2011 (*).

Male set Method EER minDCF10 minDCF08 C min

llr

PLDA (full, baseline) 1.03 0.309 0.061 0.040 simplified PLDA (diago.) 1.05 0.291 0.064 0.040 Female set Method EER minDCF10 minDCF08 C min

llr

PLDA (full, baseline) 1.79 0.331 0.102 0.063 simplified PLDA (diago.) 1.77 0.326 0.099 0.061

(*) Thanks to Honza ` Cernock´ y and Pavel Matˇ ejka.

P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 12 / 26

slide-18
SLIDE 18

Discriminative classifiers specific to normalized i-vectors

P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 13 / 26

Additional normalization procedure

(intended to constrain the discriminative training) Orthonormal discriminative classifier a new approach ... Constrained (limited order of coefficients to optimize)

Normalization Gaussian-PLDA modelling

Logistic regression-based (SoA) with score coefficients with PLDA parameters

(µ,Φ,Λ)

Within-class covariance matrix W (centering and scaling) Length normalization

... parameters (µ,Φ,Λ)

Discriminative classifier

LLR score

slide-19
SLIDE 19

Discriminative classifiers specific to normalized i-vectors

With logistic regression

P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 14 / 26

First approach (based on score coefficients) PLDA score becomes the R3r+1 dot-product: si,j = ϕi,j.ω =      w(r)

i

  • w(r)

j

w(r)

i

  • w(r)

i

+ w(r)

j

  • w(r)

j

w(r)

i

+ w(r)

j

resi,j      .ω where

  • the superscript (r) indicates the first r components of a vector,
  • the symbol ◦ denotes the element wise,
  • ϕi,j denotes the expanded vector of a trial,
  • ω is initialized with the PLDA parameters.

Following [Burget et al., 2011], logistic regression based-DT can be performed by optimizing ω.

slide-20
SLIDE 20

Discriminative classifiers specific to normalized i-vectors

With logistic regression

P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 15 / 26

Second approach (based on PLDA metaparameters) After B-rotation, ΦΦt and Λ are close to be diagonal = ⇒ similar to the constrained version of [B¨

  • rgstrom and Mac Cree, 2013],

which works with eigenvalues. DT is performed by training: ω =

  • δ ∈ Rd, σ ∈ R, µ ∈ Rdt ∈ R2d+1

(6) where δ is the diagonal of ΦΦt (approximately the eigenvalue spectrum of ΦΦt), Λ ∼ σI (preserving isotropy of the channel component). Following [B¨

  • rgstrom and Mac Cree, 2013], DT based on PLDA

parameter

  • µ, ΦΦt), Λ
  • can be performed (only mean value and PLDA

matrix eigenvalues are optimized).

slide-21
SLIDE 21

Discriminative classifiers specific to normalized i-vectors

P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 16 / 26

Additional normalization procedure

(intended to constrain the discriminative training) Orthonormal discriminative classifier a new approach ... Constrained (limited order of coefficients to optimize)

Normalization Gaussian-PLDA modelling

Logistic regression-based (SoA) with score coefficients with PLDA parameters

(µ,Φ,Λ)

Within-class covariance matrix W (centering and scaling) Length normalization

... parameters (µ,Φ,Λ)

Discriminative classifier

LLR score

slide-22
SLIDE 22

Discriminative classifiers specific to normalized i-vectors

With a new approach: Orthonormal discriminative classifier

Defining the Rr+1-expanded vector ϕi,j of a trial (wi, wj) by: ϕi,j =      p1wi,1wj,1 + 1

2q1

  • w2

i,1 + w2 j,1

  • − (p1 + q1) µ1 (wi,1 + wj,1)

. . . prwi,rwj,r + 1

2qr

  • w2

i,r + w2 j,r

  • − (pr + qr) µr (wi,r + wj,r)

resi,j      (7) the score can be written as the dot-product: si,j = ϕt

i,j.1r+1

where 1r+1 is the Rr+1 vector of ones. Note: each component of ϕi,j has some discriminant power ...

Goal:

Replacing the“PLDA”vector 1r+1 by a basis of discriminant axes extracted by using Fisher criterion. Combining this basis to find out the unique normal vector needed by speaker detection.

P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 17 / 26

slide-23
SLIDE 23

Discriminative classifiers specific to normalized i-vectors

With a new approach: Orthonormal discriminative classifier

Let denote by (αt, gt, Wt) and (αn, gn, Wn) the order 0, 1 and 2 statistics (prior, mean and covariance) of a target and non-target trial expanded vector dataset. g = αtgt + αngn W = αtWt + αnWn B = αtαn (gt − gn) (gt − gn)t (8) are the mean vector, within- and between-class covariance matrices of this trial expanded vector dataset (case of a two-class classifier). Fisher’s linear discriminant extracts a discriminant axis u by maximizing the following Fisher criterion: utBu utWu (9) Solution: u =

W−1(gt−gn) W−1(gt−gn)

P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 18 / 26

slide-24
SLIDE 24

Discriminative classifiers specific to normalized i-vectors

With a new approach: Orthonormal discriminative classifier

Drawback: Two-class classifier (target / non-target) = ⇒ only one axis extracted. Proposed: In [Okada and Tomita, 1985], a method is proposed to extract more axes than classes, whilst using the Fisher criterion. We refer to this method as“Orthonormal Discriminative (OD) classifier” . Given a training corpus T of target and non-target trial expanded vectors:

Algorithm of OD-discriminant axes extractor

for k = 1 to K Compute target and non-target means g (k)

t

, g (k)

n

  • f T

Compute between- and within-class covariance matrices B(k), W(k) of T Extract the vector maximizing the Fisher criterion: u(k) = arg max

v v tB(k)v v tW(k)v

Project T onto the orthogonal subspace of u(k). Once a vector has been extracted, data are projected onto its orthogonal subspace and the Fisher criterion based-extractor is reiterated.

P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 19 / 26

slide-25
SLIDE 25

Discriminative classifiers specific to normalized i-vectors

With a new approach: Orthonormal discriminative classifier Algorithm of OD-discriminant axes extractor

for k = 1 to K Compute target and non-target means g (k)

t

, g (k)

n

  • f T

Compute between- and within-class covariance matrices B(k), W(k) of T Extract the vector maximizing the Fisher criterion: u(k) = arg max

v v tB(k)v v tW(k)v

Project T onto the orthogonal subspace of u(k). Note: Fisher’s linear discriminant is a geometrical approach. ⇒ no assumptions of Gaussianity needed, for the expanded vector distribution. (expanded vectors components follow independent non-central χ2 distributions with 1 degree of freedom and distinct non-central parameters for target and non-target trials)

P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 20 / 26

slide-26
SLIDE 26

Discriminative classifiers specific to normalized i-vectors

With a new approach: Orthonormal discriminative classifier

P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 21 / 26

A r-set of orthonormal discriminant axes is extracted... Fusion of scores: finding weights {ωk}K

k=1 such that

si,j =

K

  • k=1

ωk

  • ϕt

i,j.u(k)

= ϕt

i,j.

r

  • k=1

ωku(k)

  • (10)

Proposed: ωk =

  • W(k)−1

g (k)

t

− g (k)

n

  • By this way, it can be shown that the variance of scores ϕt

i,j.u(k) is a

decreasing function of eigenvalues λ(k): ϕt

i,j.

  • ωku(k)

= Cλ(k) λ(k) + 1

  • (11)

OD becomes similar to a kind of ” SVD”by decreasing order of variance = ⇒ no need to tune the weights.

slide-27
SLIDE 27

Discriminative classifiers specific to normalized i-vectors

With a new approach: Orthonormal discriminative classifier Fast training

Huge dataset of non-target trial expanded vectors (> 108) ⇒ Complexity of order 2 statistics computation ... Solution: Parallelization: Given a training dataset T and a partitioning of T = ∪ {Tq}q, T -statistics can be expressed as a linear combination of Tq-statistics. ⇒ Discriminant axes-extractor: needs to project data onto the orthogonal subspace at each iteration. Solution: only updating statistics (no need to project data). See algorithm in the paper.

P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 22 / 26

slide-28
SLIDE 28

Results

P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 23 / 26

Table: Speaker recognition results of NIST-SRE evaluation 2010 telephone extended (det 5 ext), with BUT i-vectors 2011.

Male set Method EER% minDCF10 minDCF08 C min

llr

PLDA 1.03 0.309 0.061 0.040 LR (1) llr coeff. 1.24 0.342 0.076 0.047 LR (2) plda param. 1.06 0.294 0.062 0.040 OD 0.95 0.282 0.060 0.038 Female set Method EER% minDCF10 minDCF08 C min

llr

PLDA 1.79 0.331 0.102 0.063 LR (1) llr coeff. 1.78 0.331 0.101 0.064 LR (2) plda param. 1.72 0.336 0.101 0.061 OD 1.56 0.326 0.095 0.058

Note: In order to take into account eventual distortions of the non-target expanded vector distribution in regions of false alarms, OD model is trained using only the non-target expanded vector subset providing the 10% highest PLDA scores.

slide-29
SLIDE 29

Results

P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 24 / 26

Speaker in the Wild (SITW) 2016: a good way to assess robustness of an approach. SITW database was not collected under controlled conditions and thus contains real noise, reverberation, intraspeaker variability and compression artifacts, and also mixes male/female, short/long duration utterances...

Table: Speaker recognition results of SITW (Speaker in the Wild) evaluation 2016 core-core, with LIA i-vectors.

Method EER% actCdet minCdet actCllr PLDA 12.64 0.850 0.844 0.428 OD (∗) 11.93 0.838 0.836 0.394

(*) Results of actCdet and actCllr for OD system are not those of the official SITW scoreboard, because uploaded OD scores were not correctly calibrated.

slide-30
SLIDE 30

Perspectives

P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 25 / 26

Future work: short duration noisy utterances. Accurate estimation of the speaker variability is more difficult with these conditions, and Gaussian PLDA modeling could benefit from this additional discriminative training. Also, it has been shown in [Rouvier et al., 2015] that the normalization and PLDA framework can be successfully applied in speaker diarization to low rank total variability factors provided by a deep neural network. Testing OD method on i-vector-like representations (not necessarily for SR) would be of interest.

slide-31
SLIDE 31

Thank you ...

  • rgstrom, B. and Mac Cree, A. A. (2013).

Discriminatively trained bayesian speaker comparison of i-vectors. In IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP, pages 7659–7662. IEEE; 1999. Burget, L., Plchot, O., Cumani, S., Glembek, O., Matejka, P., and Brummer, N. (2011). Discriminatively trained probabilistic linear discriminant analysis for speaker verification. In IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP, pages 4832–4835. Okada, T. and Tomita, S. (1985). An optimal orthonormal system for discriminant analysis. Pattern Recognition, 18(2):139–144. Rohdin, J., , Biswas, S., and Shinoda, K. (2016). Robust discriminative training against data insufficiency in PLDA-based speaker verification. Computer Speech and Language, 35:32–57. Rouvier, M., Bousquet, P., and Favre, B. (2015). Speaker diarization through speaker embeddings. In European Signal and Image Processing Conference (EUSIPCO).

P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 26 / 26