Uncertainty Modeling without Subspace Methods for Text-Dependent - PowerPoint PPT Presentation

Speaker Recognition Task and Features Two Backends Experiments Uncertainty Modeling without Subspace Methods for Text-Dependent Speaker Recognition Patrick Kenny, Themos Stafylakis, Md. Jahangir Alam and Marcel Kockmann Odyssey Speaker and Language Recognition Workshop Bilbao, Spain June, 2016 1 / 19 P . Kenny, T. Stafylakis, J. Alam et al. Uncertainty Modeling without Subspace Methods

Speaker Recognition Task and Features Two Backends Experiments Uncertainty Modeling in Text-Dependent Speaker Recognition Large numbers of mixture components are surprisingly effective in text-dependent speaker recognition where utterances are typically of 1 or 2 seconds duration The number of times a mixture component is observed typically << 1 and it could be 0 (particularly at test time) so observations ought to be treated as being noisy in the statistical sense Some progress has been made in uncertainty modeling in text-independent speaker recognition with subspace methods (i-vectors, speaker factors) but these are of limited use in text-dependent speaker recognition We tackle the problem of uncertainty modeling without resorting to subspace methods 2 / 19 P . Kenny, T. Stafylakis, J. Alam et al. Uncertainty Modeling without Subspace Methods

Speaker Recognition Task and Features Two Backends Experiments RSR2015 Part III (Random Digits) Background set (97 speakers) used for JFA and backend training Results reported on development set Enrollment consists of 3 utterances of the 10 digits in random order Each test utterance consists of a random string of 5 digits Error rates are much higher than on Part I Counterintuitively, it is hard to beat a naive GMM/UBM benchmark using HMMs We focus on backend modeling with a standard 60-dimensional PLP front end 3 / 19 P . Kenny, T. Stafylakis, J. Alam et al. Uncertainty Modeling without Subspace Methods

Speaker Recognition Task and Features Two Backends Experiments JFA for Speaker Recognition with Digits Given a speaker and a collection of enrollment recordings, the recordings are modeled by supervectors of the form m + Ux r + Dz (1) Speakers are characterized by z -vectors (supervector sized); the x -vectors (low-dimensional) model channel effects To perform speaker recognition, for each digit d in a test utterance compare the vectors supervectors z e and z t where z e is extracted from the enrollment utterances z t is extracted from the test utterance z vectors may be digit-independent (global) or digit-dependent (local) 4 / 19 P . Kenny, T. Stafylakis, J. Alam et al. Uncertainty Modeling without Subspace Methods

Speaker Recognition Task and Features Two Backends Experiments Two Backends The Joint Density Backend uses point estimates of z e and z t The Hidden Supervector Backend treats z e and z t as latent variables. Inference requires Baum-Welch statistics A joint prior distribution (under the same-speaker hypothesis) P ( w ) where w = ( z e , z t ) Calculating the posterior of w given Baum-Welch statistics 5 / 19 P . Kenny, T. Stafylakis, J. Alam et al. Uncertainty Modeling without Subspace Methods

Speaker Recognition Task and Features Two Backends Experiments Joint Density Backend The joint distribution for target trials, P T ( z e , z t ) , is modeled by a Gaussian for each mixture component Insufficient data to train full covariance Gaussians and diagonal Gaussians obviously incorrect “Semi-diagonal” constraints (see paper) Gaussians estimated by arranging the background set into a collection of target trials For non-target trials, assume statistical independence, i.e. P N ( z e , z t ) = P T ( z e ) × P T ( z t ) Likelihood ratio for speaker verification: � P T ( z e , z t ) P N ( z e , z t ) where the product ranges over the digits in the test utterance and mixture components in the UBM 6 / 19 P . Kenny, T. Stafylakis, J. Alam et al. Uncertainty Modeling without Subspace Methods

Speaker Recognition Task and Features Two Backends Experiments Hidden Supervector Backend For each mixture component treat z e , z t as a pair of hidden mean vectors which are correlated in the case of a target trial Use an “i-vector extractor” to do probability calculations (not to extract factors) The “i-vector” w is the pair z e , z t so its dimension is twice that of the acoustic feature vectors The i-vector model has full rank so we can take the total variability matrix to be the identity and shift the burden of modeling the correlation between z e and z t to the prior The prior cannot be standard normal so it needs to be estimated 7 / 19 P . Kenny, T. Stafylakis, J. Alam et al. Uncertainty Modeling without Subspace Methods

Speaker Recognition Task and Features Two Backends Experiments Posterior Calculations For an i-vector extractor with a non-standard prior, � − 1 � � N c T ⊤ Cov ( w , w ) = P + c T c c � � � T ⊤ � w � = Cov ( w , w ) P µ + c F c c where µ is the prior expectation and P the precision. (In the standard case, µ = 0 and P = I .) 8 / 19 P . Kenny, T. Stafylakis, J. Alam et al. Uncertainty Modeling without Subspace Methods

Speaker Recognition Task and Features Two Backends Experiments Minimum Divergence Estimation of the Prior We need to supply the mean µ and precision matrix P that specifies the prior distribution of “i-vectors” for same-speaker trials. Arrange the background set into a collection of target trials indexed by s = 1 , . . . , S and let w ( s ) be the “i-vector” for trial s . 1 � = � w ( s ) � µ S s 1 � � P − 1 � w ( s ) w ⊤ ( s ) − µµ ⊤ = S s Minor modifications to make µ and P digit dependent or impose semi-diagonal constraints. 9 / 19 P . Kenny, T. Stafylakis, J. Alam et al. Uncertainty Modeling without Subspace Methods

Speaker Recognition Task and Features Two Backends Experiments For the different speaker hypothesis, treat z e and z t as being statistically independent. In other words, suppress the cross correlations in the covariance matrix P − 1 that defines the prior under the same-speaker hypothesis. 10 / 19 P . Kenny, T. Stafylakis, J. Alam et al. Uncertainty Modeling without Subspace Methods

Speaker Recognition Task and Features Two Backends Experiments Likelihood Ratio Given data and a probability model with hidden variables, the evidence is the likelihood of the data calculated by integrating out the hidden variables For an i-vector model the integral can be evaluated in closed form (it is a Gaussian integral) and expressed in terms of the Baum-Welch statistics (see paper) To evaluate the likelihood ratio for a speaker verification trial, evaluate the evidence twice Using the prior for the same-speaker hypothesis Using the prior for the different speaker hypothesis 11 / 19 P . Kenny, T. Stafylakis, J. Alam et al. Uncertainty Modeling without Subspace Methods

Speaker Recognition Task and Features Two Backends Experiments Preparing the Baum-Welch Statistics For each speaker, we have a collection of (enrollment or test) recordings indexed by r For each mixture component c , zero and first order c and F r statistics denoted by N r c Remove the channel effects from each recording and pool over recordings � N r N c = c r � ( F r c − N r c U c � x r � ) F c = r � x r � is a point-estimate of the hidden variable x r in (1) One set of “synthetic” statistics per speaker (regardless of the number of recordings) 12 / 19 P . Kenny, T. Stafylakis, J. Alam et al. Uncertainty Modeling without Subspace Methods

Speaker Recognition Task and Features Two Backends Experiments “Length Normalization” of the Synthetic Statistics In the JFA model (1), z c is a hidden variable The posterior covariance and expectation C c and � z c � , are given by ( I + N c D ∗ c D c ) − 1 C c = C c D ∗ � z c � = c F c so that � � z c � � 2 + trace ( C c ) � � z c � 2 � = For each speaker, we scale the synthetic first order � � z c � 2 � statistics so that � is the same for all speakers c 13 / 19 P . Kenny, T. Stafylakis, J. Alam et al. Uncertainty Modeling without Subspace Methods

Speaker Recognition Task and Features Two Backends Experiments The dominant term in (2) is trace ( C c ) An experiment in the Appendix A demonstrates its usefulness The posterior covariance matrix C c depends critically on the relevance factor 14 / 19 P . Kenny, T. Stafylakis, J. Alam et al. Uncertainty Modeling without Subspace Methods

Speaker Recognition Task and Features Two Backends Experiments 128 Mixture Components, Global z -vectors norm.? EER (M/F) DCF (M/F) 1 GMM - 4.8%/8.0% 0.217/0.356 2 JDB - 4.8%/7.6% 0.219/0.353 × 3 HSB 4.5%/6.8% 0.201/0.338 4 HSB 3.9 %/ 6.1 % 0.177 / 0.307 � Table 1: Results on the development set obtained with 128 Gaussians. The systems are a GMM/UBM system, the Joint Density Backend (JDB) and the Hidden Supervector Backend (HSB) both with global z -vectors. Baum-Welch statistics normalization is indicated by “norm”. 15 / 19 P . Kenny, T. Stafylakis, J. Alam et al. Uncertainty Modeling without Subspace Methods

Speaker Recognition Task and Features Two Backends Experiments 512 Components, Global z -vectors r EER (M/F) DCF (M/F) 1 GMM 2 4.7%/8.2% 0.195/0.336 2 JDB 2 4.3%/6.1% 0.196/0.288 5 HSB 1 3.3 %/ 4.6 % 0.148 / 0.234 Table 2: Results on the development set obtained with 512 Gaussians and global z -vectors. 16 / 19 P . Kenny, T. Stafylakis, J. Alam et al. Uncertainty Modeling without Subspace Methods

Uncertainty Modeling without Subspace Methods for Text-Dependent - PowerPoint PPT Presentation

Speaker Recognition Task and Features Two Backends Experiments Uncertainty Modeling without Subspace Methods for Text-Dependent Speaker Recognition Patrick Kenny, Themos Stafylakis, Md. Jahangir Alam and Marcel Kockmann Odyssey Speaker and

Subspace Polynomials and Cyclic Subspace Codes Netanel Raviv Joint work with: Prof. Tuvi Etzion

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Subspace Modeling and Selection Subspace Modeling and Selection for Noisy Speech Recognition for

Graph based Subspace Segmentation Canyi Lu National University of Singapore Nov. 21, 2013

Uncertainty AIMA Chapter 13 Outline Uncertainty Uncertainty Probability Syntax and

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

Post-Conference Presentation Sunday Oladayo Oladejo Table of Content A Introduction B

Cyclic Subspace Codes Via Subspace Polynomials Kamil Otal and Ferruh zbudak Middle East

Subspace Embeddings for Regression Lecture 12 October 1, 2020 Chandra (UIUC) CS498ABD 1 Fall

Subspace Embeddings and p -Regression Using Exponential Random Variables David P. Woodruff

Subspace Information Criterion Subspace Information Criterion for Image Restoration for Image

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Math 211 Math 211 Lecture #21 Determinants October 16, 2002 2 Basis of a Subspace Basis of a

UNCERTAINTY IN KNOWLEDGE Ch. 9 Uncertainty in Knowledge 1 Sources of Uncertainty

Enhancing ICANN Text Accountability 26 June 2014 Text #ICANN50 Text #ICANN50 Text #ICANN50

Add Your Title Here Replace your text here! Replace your text here! Insert your title here 1

Errors Scientific Notation In scientific notation , a number can be expressed in the form =

Optimal ec-PIN Guessing Markus G. Kuhn Known: 12 offset digits from magnetic stripe: Offset 1: O

Ternary Expansions of Powers of 2 Je ff Lagarias , University of Michigan Workshop on Discovery

CPSC 490: Problem Solving in Computer Science 1 Range-minimum query Given an array A of N 10

Solving Single-digit Sudoku Subproblems David Eppstein Int. Conf. Fun with Algorithms, June 2012

cl a simple form of computation used widely one way to find patterns 1 current A A

Comparing the Difficulty of Factorization and Discrete Logarithm: a 240-digit Experiment Solving

Image Classification with DIGITS Twin Karmakharm Certified Instructor, NVIDIA Deep Learning