i-vector space for speaker recognition Timur Pekhovsky Sergey - PowerPoint PPT Presentation

This work was financially supported by the Ministry of Education and Science of the Russian Federation (14.578.21.0126 (RFMEFI57815X0126) On autoencoders in the i-vector space for speaker recognition Timur Pekhovsky Sergey Novoselov Aleksey Sholokhov Oleg Kudashev Speech Technology Center Ltd., Russia

OUTLINE Motivation and goals  Detailed study of the DAE system  Datasets and experimental setup  Front-End and i-vector extractor  DAE system description & DAE training procedure  Back-End and scoring. Replacing back-end  Analysis of the DAE system performance  An improved DAE system  Dropout regularization  Deep architectures  DAE system in the domain mismatch scenario  Dataset. Back-Ends  Results  Conclusions  2

Motivation and goals The denoising autoencoder (DAE) based speaker verification system achieved performance improvement compared to the commonly used baseline (i.e. PLDA on raw i-vectors) [1]. This motivated us for detailed investigation: to study the properties of DAE in the i-vector space o to analyze different strategies of initialization and training of the o back-end parameters to investigate dropout regularization o to explore different deep architectures of DAE o to investigate DAE based system in case of domain mismatch o condition [1] Sergey Novoselov, Timur Pekhovsky, Oleg Kudashev, Valentin Mendelev, and Alexey Prudnikov , “ Non-linear PLDA for i- vector speaker verification,” in INTERSPEECH 2015, Dresden, Germany, September 6-10, 2015, 2015, pp. 214 – 218. 3

Detailed study of the DAE system 4

Datasets and experimental setup Training data: telephone channel recordings from the NIST SRE 1998-2008 o corpora 16618 sessions of 1763 male speakers (only English language) o Evaluation data: The NIST 2010 SRE protocol (condition 5 extended, males, o English language) Operating points: equal error rate (EER) o minimum detection cost function (minDCF 2010) o 5

Front-End and i-vector extractor 20 MFCC (including C0) with their first- and second-order derivatives (Kaldi o version) DNN based posteriors extraction with 11 frames splicing for DNN input o DNN with 2700 triphone states and 20 non-speech states o (trained on Switchboard corpus using Kaldi) “ SoftVAD ” solution using DNN outputs: o , 𝐺 𝑑 = 𝐺 𝑑 − 𝑛𝑂 𝑑 𝜏 𝑑 ∈ 𝐽 𝑢𝑠 𝐺 𝑇 𝑑 , 𝜏 2 = 𝑑 𝑑∈𝐽𝑢𝑠 𝑑∈𝐽𝑢𝑠 − 𝑛 2 𝑛 = 𝑂 𝑑 𝑂 𝑑 𝑑∈𝐽𝑢𝑠 𝑑∈𝐽𝑢𝑠 𝐽 𝑢𝑠 - DNN output indexes corresponding to triphone states; 𝐎 𝑑 , 𝑮 𝑑 , 𝑻 𝑑 are 0- 1st- and 2nd-order statistics 400-dimensional i-vectors o 6

DAE system description & DAE training procedure Learning denoising transform: 𝑗(𝑡, ℎ) is the i-vector representing ℎ -th session of s -th speaker • 𝑗(𝑡) is the mean i-vector for speaker 𝑡 • RBM parametrs are used to initialize denoising neural network • 7

DAE system description & DAE training procedure Block diagram of speaker recognition systems compared in our experiments 8

Back-End and scoring Two-covariance model:    (1) T T T Score i Qi i Qi 2 Pi i 1 1 2 2 1 2 where square matrices 𝑄 and 𝑅 can be expressed in terms of (2) and (3). (2) 1        T ( i )( i ) B s s S s S 1 1   (3)     T ( i i )( i i ) W s , h s s , h s S H s h 9

Back-End and scoring. Replacing back-end 10

Analysis of the DAE system performance Table 2: “ Rus- Telecom” * test Table 1: NIST SRE 2010 test System EER(%) minDCF System EER(%) minDCF Baseline 1.67 0.347 Baseline 1.63 0.64 RBM 1.55 0.332 RBM 1.65 0.63 DAE 1.43 0.55 DAE 1.43 0.284 * Rus-Telecom is the Russian-language corpus of telephone recordings. Training set consists of 6508 male speakers and 33678 speech cuts. Evaluation part consists of 235 male speakers and 4210 speech cuts. Evaluation protocol (singlesession enrollments) contains 37184 target trials and 111660 impostor trials. 11

Analysis of the DAE system performance Assessing denoising transform: Class-separability criterion: Figure 1: Eigenvalues of the matrix 𝐺 .       1 J Tr ( ) Tr ( F ) W B where Σ 𝑋 and Σ 𝐶 are the within-speaker and between-speaker covariance matrices No normalization was applied to the outputs of RBM and DAE! Table 3: NIST SRE 2010 test. Cos scoring System EER(%) minDCF J Baseline 5.34 0.603 501.45 RBM 5.27 0.611 525.65 DAE 3.19 0.427 537.76 AE 5.42 0.583 494.13 12

Analysis of the DAE system performance Effect of normalization: Figure 2: Eigenvalues of the matrix 𝐺 .       1 J Tr ( ) Tr ( F ) W B Whitening & LN was applied to the outputs of RBM and DAE! Table 4: NIST SRE 2010 test. Cos scoring System EER(%) minDCF J Baseline 5.34 0.603 501.45 RBM 4.96 0.565 525.35 DAE 4.95 0.558 533.37 13

Analysis of the DAE system performance Effect of replacing whitening parameters: Whitening & LN were applied to Figure 3: Eigenvalues of the matrix 𝐺 . the outputs of RBM and DAE! Whitening parameters of the DAE system are replaced by the RBMs ones Table 5: NIST SRE 2010 test. Cos scoring System EER(%) minDCF J Baseline 5.34 0.603 501.45 RBM 4.96 0.565 525.35 DAE 2.83 0.393 537.32 14

Analysis of the DAE system performance Effect of replacing whitening parameters: Table 6: NIST SRE 2010 test. Cos scoring Whitening: System EER(%) minDCF J 𝐵, 𝜈 Baseline raw 5.34 0.603 501.45 RBM no 5.27 0.611 525.65 DAE no 3.19 0.427 537.76 RBM RBM 4.96 0.565 525.35 4.95 0.558 533.37 DAE DAE DAE RBM 2.83 0.393 537.32 15

Analysis of the DAE system performance Effect of replacing back-end parameters: Table 7: Performance comparison for different configurations of the DAE system. NIST SRE 2010 test PLDA: Whitening: System EER(%) minDCF *𝑄, 𝑅+ 𝐵/𝜈 Baseline Raw raw/raw 1.67 0.347 RBM RBM RBM/RBM 1.55 0.332 DAE DAE DAE/DAE 1.58 0.336 DAE DAE DAE/RBM 1.55 0.338 DAE RBM DAE/DAE 1.56 0.330 DAE DAE RBM/DAE 1.43 0.291 DAE DAE RBM/RBM 1.44 0.287 DAE RBM RBM/RBM 1.43 0.284 16

An improved DAE system 17

Dropout regularization Dropout for RBM training: Table 8: Effect of dropout for RBM training. RBM is used to initialize DAE. NIST SRE 2010 test System EER(%) minDCF DAE 1.43 0.284 DAE+dropout 1.41 0.270 Applying dropout at the stage of discriminative fine-tuning was not helpful! 18

An improved DAE system Deep denoising autoencoders : Stacking RBMs Table 9: NIST SRE 2010 test. PLDA scoring System EER(%) minDCF Baseline 1.67 0.347 DAE 1.43 0.284 DAE 5 1.43 0.297 19

Aimproved DAE system Deep denoising autoencoders : Stacking DAEs Table 10: NIST SRE 2010 test. PLDA scoring System EER(%) minDCF Baseline 1.67 0.347 RBM 1 1.55 0.332 DAE 1 1.43 0.284 RBM 2 1.58 0.329 DAE 2 1.30 0.282 20

DAE system in the domain mismatch scenario 21

Domain Adaptation Challenge DAC setup: GMM-UBM based i-vector extractor (600 dimentional i-vectors) o In-domain SRE set (SRE 04, 05, 06, and 08 ). o Out-of-domain Switchboard set o Evaluation data: The NIST 2010 SRE protocol (condition 5 extended, males, o English language) Operating points: equal error rate (EER) o minimum detection cost function (minDCF 2010) o 22

Back-Ends The results are presented for the following scoring types: Cosine scoring o Two-covariance model (referred to as PLDA) o Simplified PLDA with 400-dimensional speaker subspace (referred to as o SPLDA) In our experiments we ignore labels of the in-domain data. We used in-domain SRE set only to estimate the whitening parameters of our systems.. 23

Results Table 11: Performance summary of speaker verification systems with PLDA and cosine back-ends Cos PLDA Whitening/ System Training EER minDCF EER minDCF 5.45 0.621 2.18 0.360 Baseline 5.47 0.634 2.16 0.348 SRE/SRE RBM DAE 3.67 0.467 1.67 0.307 9.13 0.788 6.45 0.660 Baseline 8.97 0.778 6.28 0.667 RBM SWB/SWB DAE 8.97 0.764 6.01 0.644 Baseline 5.45 0.621 4.23 0.554 SRE/SWB RBM 5.35 0.631 2.97 0.447 DAE 4.62 0.560 2.63 0.401 24

Results Table 12: Performance summary of speaker verification systems with SPLDA. PLDA Whitening/ System Training EER minDCF Baseline 2.23 0.312 RBM 2.07 0.317 SRE/SRE DAE 1.61 0.292 Baseline 4.21 0.531 RBM SRE/SWB 2.66 0.410 DAE 2.36 0.400 25

CONCLUSIONS A study of denoising autoencoders in the i-vector space was presented  We figured out the observed performance gain of DAE based system is due to  employing Back-End parameters (Whitening & PLDA) derived from RBM outputs The question why RBM transform provides better Back-End parameters for a test  Set is still open Dropout helps when applied to RBM training stage and does not help when  implemented in the fine-tuning stage Deep architecture in the form of stacked DAE provides further improvements  All our findings regarding speaker verification systems in matched conditions hold  true for mismatched conditions case Using whitening parameters from the target domain along with DAE trained on the  out-of-domain set allows to avoid significant performance gap caused by domain mismatch 26

i-vector space for speaker recognition Timur Pekhovsky Sergey - PowerPoint PPT Presentation

This work was financially supported by the Ministry of Education and Science of the Russian Federation (14.578.21.0126 (RFMEFI57815X0126) On autoencoders in the i-vector space for speaker recognition Timur Pekhovsky Sergey Novoselov

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

Matrix and Vector Operations Matrix and Vector Operations 1 / 21 Matrix and Vector Operations

Day 3 Advanced Vector Architectures Session A: Vector Instruction Execution Pipelines Break

The Classic Vector Space Model Description, Advantages and Limitations of the Classic Vector

Part 10: Vector Space Classification Francesco Ricci 1 Content p Recap on nave Bayes p

Support Vector Machines Support Vector Machines Hypothesis Space Hypothesis Space variable

Information Retrieval Tutorial 4: Vector Space Model Professor: Michel Schellekens TA: Ang Gao

27. Vector fields in space A vector field in space is given by + R F = P + Q

The Geometry of Vector Spaces x E N : vector x belongs to an N -dimensional Euclidean space.

Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 Jukka Lankinen Relevance Vector

Lecture 11 Vector Linear Network Coding Vector Linear Network Coding Outline Fundamentals for

. Vector Graphics Introduction to Web Design Vector graphics contain geometric objects, such as

Class 7: Vector and scalar, components Vector operations in components Multiplying a vector with a

Vector Functions A vector function is simply a function whose codomain is R n . In other words,

Vector Field Topology 8-1 Ronald Peikert SciVis 2007 - Vector Field Topology Vector fields as

Linear Algebra Vectors A column vector is a list of numbers stored vertically. The dimen-

Linear Algebra II: vector spaces Math Tools for Neuroscience (NEU 314) Spring 2016 Jonathan

Lec. 11: Vector Computers Peter Kemper Adapted from the slides of: Krste Asanovic (

Matrix Calculations: Vector Spaces and Linear Maps H. Geuvers (and A. Kissinger) Institute for

Fisher Vector image representation Machine Learning and Category Representation 2014-2015 Jakob

Attention Graham Neubig Site https://phontron.com/class/nn4nlp2017/ Encoder-decoder Models

I-vector representation based on GMM and DNN for audio classification Najim Dehak Center for

Using Vector Instructions Joppe W. Bos, Peter L. Montgomery, Daniel Shumow, and Gregory M.

CS 103 Unit 12 Slides Standard Template Library Vectors & Deques Mark Redekopp 2 Templates

i-vector space for speaker recognition Timur Pekhovsky Sergey - PowerPoint PPT Presentation

This work was financially supported by the Ministry of Education and Science of the Russian Federation (14.578.21.0126 (RFMEFI57815X0126) On autoencoders in the i-vector space for speaker recognition Timur Pekhovsky Sergey Novoselov

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

Matrix and Vector Operations Matrix and Vector Operations 1 / 21 Matrix and Vector Operations

Day 3 Advanced Vector Architectures Session A: Vector Instruction Execution Pipelines Break

The Classic Vector Space Model Description, Advantages and Limitations of the Classic Vector

Part 10: Vector Space Classification Francesco Ricci 1 Content p Recap on nave Bayes p

Support Vector Machines Support Vector Machines Hypothesis Space Hypothesis Space variable

Information Retrieval Tutorial 4: Vector Space Model Professor: Michel Schellekens TA: Ang Gao

27. Vector fields in space A vector field in space is given by + R F = P + Q

The Geometry of Vector Spaces x E N : vector x belongs to an N -dimensional Euclidean space.

Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 Jukka Lankinen Relevance Vector

Lecture 11 Vector Linear Network Coding Vector Linear Network Coding Outline Fundamentals for

. Vector Graphics Introduction to Web Design Vector graphics contain geometric objects, such as

Class 7: Vector and scalar, components Vector operations in components Multiplying a vector with a

Vector Functions A vector function is simply a function whose codomain is R n . In other words,

Vector Field Topology 8-1 Ronald Peikert SciVis 2007 - Vector Field Topology Vector fields as

Linear Algebra Vectors A column vector is a list of numbers stored vertically. The dimen-

Linear Algebra II: vector spaces Math Tools for Neuroscience (NEU 314) Spring 2016 Jonathan

Lec. 11: Vector Computers Peter Kemper Adapted from the slides of: Krste Asanovic (

Matrix Calculations: Vector Spaces and Linear Maps H. Geuvers (and A. Kissinger) Institute for

Fisher Vector image representation Machine Learning and Category Representation 2014-2015 Jakob

Attention Graham Neubig Site https://phontron.com/class/nn4nlp2017/ Encoder-decoder Models

I-vector representation based on GMM and DNN for audio classification Najim Dehak Center for

Using Vector Instructions Joppe W. Bos, Peter L. Montgomery, Daniel Shumow, and Gregory M.

CS 103 Unit 12 Slides Standard Template Library Vectors &amp; Deques Mark Redekopp 2 Templates

CS 103 Unit 12 Slides Standard Template Library Vectors & Deques Mark Redekopp 2 Templates