constrained discriminative speaker verification specific
play

Constrained discriminative speaker verification specific to - PowerPoint PPT Presentation

Constrained discriminative speaker verification specific to normalized i-vectors P.M. Bousquet, J.F. Bonastre LIA University of Avignon the June 21, 2016 P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 1 / 26


  1. Constrained discriminative speaker verification specific to normalized i-vectors P.M. Bousquet, J.F. Bonastre LIA University of Avignon the June 21, 2016 P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 1 / 26

  2. Discriminative approach for i-vector: SoA Normalization Within-class covariance matrix W (centering and scaling) Length normalization Gaussian-PLDA modelling ... parameters ( µ , Φ , Λ ) LLR score Discriminative classifier Logistic regression-based (SoA) with with score coefficients PLDA parameters ( µ , Φ , Λ ) P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 2 / 26

  3. Discriminative approach for i-vector: proposed ... Normalization Within-class covariance matrix W (centering and scaling) Length normalization Additional normalization procedure (intended to constrain the discriminative training) Gaussian-PLDA modelling ... parameters ( µ , Φ , Λ ) LLR score Discriminative classifier Constrained (limited order of coefficients to optimize) Logistic regression-based (SoA) Orthonormal discriminative classifier a new approach ... with with score coefficients PLDA parameters ( µ , Φ , Λ ) P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 3 / 26

  4. Gaussian-PLDA Model A d -dimensional i-vector w can be decomposed as follows: w = µ + Φy s + ε (1) - Φy s and ε are assumed to be statistically independent and ε follows a centered Gaussian distribution with full covariance matrix Λ . - Speaker factor y s can be a full-rank d -vector ( two-covariance model ) or constrained to lie in the r -linear range of the d × r matrix Φ , ( eigenvoice subspace ). P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 4 / 26

  5. Gaussian-PLDA scoring Closed-form solution of LLR-score: the second degree polynomial function of w i and w j components: s i , j = log P ( w i , w j |H tar ) P ( w i , w j |H non ) i P w j + 1 − µ t ( P + Q ) ( w i + w j ) = w t w t i Q w i + w t � � j Q w j 2 + µ t ( P + Q ) µ + 1 2 log | A t | − log | A n | (2) where � − 1 Φ t Λ − 1 P = Λ − 1 Φ 2 Φ t Λ − 1 Φ + I r � � − 1 Φ t Λ − 1 Q = P − Λ − 1 Φ Φ t Λ − 1 Φ + I r � � − 1 2 Φ t Λ − 1 Φ + I r � A t = � − 1 Φ t Λ − 1 Φ + I r � A n = (3) P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 5 / 26

  6. Discriminative classifiers for speaker recognition SoA: based on logistic regression Given the dataset of target and non-target trials χ tar , χ non with cardinalities N tar , N non respectively, the log-probability of correctly classifying all training ( total cross entropy ) is equal to: 1 1 � � � Nnon � � � Ntar TCE = P ( H non | t ) P ( H tar | t ) (4) t ∈ χ non t ∈ χ tar Goal : maximizing the (log-)TCE by gradient-descent with respect to some coefficients: PLDA LLR score coefficients (i.e. of score matrices P and Q ). LLR-score can be written as a dot-product ϕ i , j .ω between an expanded vector of a trial ϕ i , j and a vector ω initialized with PLDA parameters [Burget et al., 2011] PLDA parameters ( µ, Φ , Λ ) [B¨ orgstrom and Mac Cree, 2013] P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 6 / 26

  7. Discriminative classifiers for speaker recognition Difficulties to overcome Discriminative training (DT) can suffer from various limitations: data insufficiency over-fitting on development data respect of metaparameters conditions: definiteness, positivity / negativity of PLDA LLR-score covariance matrices ... P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 7 / 26

  8. Discriminative classifiers for speaker recognition Difficulties to overcome Discriminative training (DT) can suffer from various limitations: data insufficiency over-fitting on development data respect of metaparameters conditions: definiteness, positivity / negativity of PLDA LLR-score covariance matrices ... Constrained DT : training only a small amount of parameters ⇒ order O ( d ), or even O (1), instead of O ( d 2 ). = Some solutions [Rohdin et al., 2016, B¨ orgstrom and Mac Cree, 2013]: single coefficient optimized for each dimension of the i-vector or, even, the four feature kinds that make up score. only mean vector µ and eigenvalues of PLDA matrices ΦΦ t and Λ are trained by DT and, even, their scaling factors only. metaparameters conditions: working with singular value decomposition of P and Q / flooring of parameters. P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 7 / 26

  9. Discriminative classifiers for speaker recognition Difficulties to overcome Discriminative training (DT) can suffer from various limitations: data insufficiency over-fitting on development data respect of metaparameters conditions: definiteness, positivity / negativity of PLDA LLR-score covariance matrices ... DT struggles to improve speaker detection when i-vectors have been first normalized, whereas this option has proven to achieve best performance in speaker verification. P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 7 / 26

  10. Normalization step Normalization Within-class covariance matrix W (centering and scaling) Length normalization Additional normalization procedure (intended to constrain the discriminative training) Gaussian-PLDA modelling ... parameters ( µ , Φ , Λ ) LLR score Discriminative classifier Constrained (limited order of coefficients to optimize) Logistic regression-based (SoA) Orthonormal discriminative classifier a new approach ... with with score coefficients PLDA parameters ( µ , Φ , Λ ) P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 8 / 26

  11. Normalization step Within-class covariance matrix W (centering and scaling) Length normalization = ⇒ W is almost exactly isotropic, i.e. W ≈ σ I , σ > 0 P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 9 / 26

  12. Normalization step Within-class covariance matrix W (centering and scaling) Length normalization Proposed : Additional normalization step (which does not modify distances between i-vectors): Rotation by the eigenvector basis of between-class covariance matrix B of the training dataset. B = P∆P t ( SVD ) w ← P t w P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 9 / 26

  13. Normalization step Within-class covariance matrix W (centering and scaling) Length normalization Proposed : Additional normalization step (which does not modify distances between i-vectors): Rotation by the eigenvector basis of between-class covariance matrix B of the training dataset. = ⇒ B is diagonal, = ⇒ W remains almost exactly isotropic (and therefore diagonal), since B -eigenvector basis is orthogonal. Assumptions: PLDA matrices ΦΦ t , Λ become almost diagonal, and even isotropic for Λ (as a consequence, P and Q of score are almost diagonal) P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 9 / 26

  14. Normalization step Within-class covariance matrix W (centering and scaling) Length normalization Proposed : Additional normalization step (which does not modify distances between i-vectors): Rotation by the eigenvector basis of between-class covariance matrix B of the training dataset. Moreover, W − 1 B ≈ B ⇒ the LDA solution can be identified as the subspace of the first r eigenvectors of B . First r components of training i-vectors are approximately their projection onto the LDA r -subspace. P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 9 / 26

  15. Normalization step The score can be rewritten as this sum of O ( r ) terms: r � p k w i , k w j , k + 1 � � w 2 i , k + w 2 � � s i , j = 2 q k − ( p k + q k ) µ k ( w i , k + w j , k ) j , k k =1 + res i , j (5) where r is the range of the PLDA eigenvoice subspace res i , j sums all the diagonal terms beyond the r th dimension, all the off-diagonal terms and offsets. Thus, we assume that the major proportion of variability in the LLR-score is contained into the first r terms of the sum above (the residual term is negligible). P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 10 / 26

  16. Normalization step Table: Analysis of PLDA parameters before and after the B -rotation additional normalization procedure. before after male female male female Diagonality of... PLDA eigenvoice subspace ΦΦ t 0.23 0.15 0.95 0.97 PLDA score matrix P 0.48 0.25 0.98 0.96 PLDA score matrix Q 0.41 0.23 0.96 0.97 Isotropy of PLDA nuisance variability Λ 0.98 0.96 0.99 0.97 Residual variance 0.29 0.42 0.004 0.004 � 2 � diag ( ΦΦ t ) Tr Diagonality of the symmetric matrix ΦΦ t : ∈ [0 , 1] Tr (( ΦΦ t ) 2 ) m 2 Isotropy of Λ : d × Tr ( Λ 2 ) ∈ [0 , 1] where m Λ denotes the mean value of Λ Λ -diagonal var ( res ) Variance of the residual term: var ( score ) ∈ [0 , 1] P.M. Bousquet, J.F. Bonastre (LIA) Odyssey 2016 the June 21, 2016 11 / 26

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend