Ivector transformation and scaling for PLDA based speaker - PowerPoint PPT Presentation

I–vectors and PLDA I–vector Transformation Dataset mismatch compensation Conclusions I–vector transformation and scaling for PLDA based speaker recognition Sandro Cumani, Pietro Laface Politecnico di Torino, Italy Sandro Cumani, Pietro Laface I–vector transformation for PLDA based speaker recognition

I–vectors and PLDA I–vector Transformation Dataset mismatch compensation Conclusions Outline I–vectors and PLDA I–vector transformation Dataset mismatch compensation Results and conclusions Sandro Cumani, Pietro Laface I–vector transformation for PLDA based speaker recognition

I–vectors and PLDA I–vector Transformation PLDA assumptions Dataset mismatch compensation HT–PLDA / length norm Conclusions PLDA assumptions I–vectors are sampled from a Gaussian distribution Similar development and evaluation i–vector distributions 0 . 025 0 . 6 Dev φ 1 0 . 5 φ 2 0 . 020 Eval χ 2 N (0 , 1) 0 . 4 0 . 015 0 . 3 0 . 010 0 . 2 0 . 005 0 . 1 0 . 000 0 . 0 0 50 100 150 200 250 300 − 4 − 3 − 2 − 1 0 1 2 3 4 Distribution of squared i–vector norms Histogram of two (whitened) i–vector components with highest skewness φ 1 , φ 2 Sandro Cumani, Pietro Laface I–vector transformation for PLDA based speaker recognition

I–vectors and PLDA I–vector Transformation PLDA assumptions Dataset mismatch compensation HT–PLDA / length norm Conclusions HT–PLDA / length norm HT–PLDA tries to deal with non–Gaussianity Length normalization (LN) Mainly deals with dataset mismatch 0 . 6 0 . 6 � φ 1 φ 1 0 . 5 0 . 5 φ 2 � φ 2 N (0 , 1) N (0 , 1) 0 . 4 0 . 4 0 . 3 0 . 3 0 . 2 0 . 2 0 . 1 0 . 1 0 . 0 0 . 0 − 4 − 3 − 2 − 1 0 1 2 3 4 − 4 − 3 − 2 − 1 0 1 2 3 4 5 Histogram of two (whitened) i–vector components Histogram of two (whitened) i–vector components with highest skewness φ 1 , φ 2 before LN with highest skewness � φ 1 , � φ 2 after LN Sandro Cumani, Pietro Laface I–vector transformation for PLDA based speaker recognition

I–vectors and PLDA I–vector Transformation PLDA assumptions Dataset mismatch compensation HT–PLDA / length norm Conclusions HT–PLDA / length norm Length–normalized i–vectors are still far from Gaussian Can we transform i–vectors as to better fit Gaussian PLDA assumptions and at the same time perform a similar dataset mismatch compensation? Sandro Cumani, Pietro Laface I–vector transformation for PLDA based speaker recognition

I–vectors and PLDA I–vector Transformation I–vector Transformation Transformation Model Dataset mismatch compensation SAS Transformation on SRE ’10 data Conclusions I–vector Transformation Assume that i–vectors are sampled from R.V. Φ Represent Φ as a function of a Standard Normal R.V. Y Φ = f − 1 ( Y ) The (log) p.d.f. of Φ is given by � � � � � J f ( φ ) log P Φ ( φ ) = log P Y ( f ( φ )) + log � � � = − 1 � � 2 f ( φ ) T f ( φ ) + log � J f ( φ ) � + c Sandro Cumani, Pietro Laface I–vector transformation for PLDA based speaker recognition

I–vectors and PLDA I–vector Transformation I–vector Transformation Transformation Model Dataset mismatch compensation SAS Transformation on SRE ’10 data Conclusions I–vector Transformation How do we model the (unknown) function f ? Neural network style approach Composition of (invertible) layers f ( φ, θ 1 , · · · , θ n ) = f 1 ( · , θ 1 ) ◦ · · · ◦ f n ( · , θ n ) f is estimated as to maximize the likelihood of the samples of Φ (in our case the i–vectors) The function f allows transforming Φ –distributed samples into (almost) Gaussian–distributed samples Sandro Cumani, Pietro Laface I–vector transformation for PLDA based speaker recognition

I–vectors and PLDA I–vector Transformation I–vector Transformation Transformation Model Dataset mismatch compensation SAS Transformation on SRE ’10 data Conclusions I–vector Transformation 0 . 6 φ 1 4 0 . 5 p.d.f. of X 2 0 . 4 0 . 3 0 0 . 2 − 2 0 . 1 − 4 0 . 0 − 3 − 2 − 1 0 1 2 3 4 5 − 3 − 2 − 1 0 1 2 3 4 5 (a) (b) 0 . 45 f ( φ 1 ) 0 . 40 (a) Histogram of φ 1 and estimated N (0 , 1) 0 . 35 p.d.f. of Φ 0 . 30 0 . 25 (b) Estimated transformation 0 . 20 function f 0 . 15 (c) Histogram of f ( φ 1 ) 0 . 10 and p.d.f. of Y ∼ N ( 0 , 1 ) 0 . 05 0 . 00 − 4 − 2 0 2 4 (c) Sandro Cumani, Pietro Laface I–vector transformation for PLDA based speaker recognition

I–vectors and PLDA I–vector Transformation I–vector Transformation Transformation Model Dataset mismatch compensation SAS Transformation on SRE ’10 data Conclusions Transformation Model Simple structure (cascade of two types of transformations) Affine layer: f A ( φ ; A , b ) = A φ + b SAS layer (acting as non–linear units): cascade of Inverse sinh layer: f S 1 ( φ i ) = sinh − 1 ( φ i ) Diagonal affine layer: f S 2 ( φ i ; δ i , ε i ) = δ i φ i + ε i sinh layer: f S 3 ( φ i ) = sinh ( φ i ) The SAS layer can be summarized as f SAS ( φ i ; δ i , ε i ) = sinh ( δ i sinh − 1 ( φ i ) + ε i ) Sandro Cumani, Pietro Laface I–vector transformation for PLDA based speaker recognition

I–vectors and PLDA I–vector Transformation I–vector Transformation Transformation Model Dataset mismatch compensation SAS Transformation on SRE ’10 data Conclusions Transformation Model The parameters of the transformation function f are estimated using a Maximum Likelihood criterion Gradients can be computed using an algorithm similar to back–propagation with MSE loss (but we need to take into account Jacobian log–determinants) Sandro Cumani, Pietro Laface I–vector transformation for PLDA based speaker recognition

I–vectors and PLDA I–vector Transformation I–vector Transformation Transformation Model Dataset mismatch compensation SAS Transformation on SRE ’10 data Conclusions SAS Transformation on SRE ’10 data (female) 0 . 025 0 . 025 Dev Dev Eval Eval 0 . 020 0 . 020 χ 2 χ 2 0 . 015 0 . 015 0 . 010 0 . 010 0 . 005 0 . 005 0 . 000 0 . 000 0 50 100 150 200 250 300 0 50 100 150 200 250 300 Distribution of squared i–vector norms Distribution of squared norms of 1–layer–SAS transformed i–vector Cond 1 Cond 2 Cond 3 Cond 4 Cond 5 System EER DCF10 EER DCF10 EER DCF10 EER DCF10 EER DCF10 PLDA (w/o LN) 2.06 0.288 3.60 0.541 3.27 0.481 1.71 0.335 3.91 0.417 1–layer AS 2.15 0.221 3.36 0.462 2.96 0.414 1.61 0.290 3.19 0.391 PLDA (with LN) 1.81 0.255 2.83 0.476 1.95 0.367 1.21 0.295 2.19 0.347 Sandro Cumani, Pietro Laface I–vector transformation for PLDA based speaker recognition

I–vectors and PLDA I–vector Transformation Dataset mismatch compensation Dataset mismatch compensation Results Conclusions Dataset mismatch compensation Length–norm can be interpreted as the ML solution for the estimate of a scaling parameter of the i–vector distribution An i–vector φ i is sampled from the R.V. Φ i � � 0 , α − 2 Φ i ∼ N Σ i Given Σ the ML estimate for α i is � φ T i Σ − 1 φ i − 1 = α i � D Sandro Cumani, Pietro Laface I–vector transformation for PLDA based speaker recognition

I–vectors and PLDA I–vector Transformation Dataset mismatch compensation Dataset mismatch compensation Results Conclusions Dataset mismatch compensation Φ i can be represented as 1 Φ i = α − 1 2 Y , Σ Y ∼ N ( 0 , I ) i The transformation function f is given by A = Σ − 1 f ( φ i ; A , α i ) = α i A φ i , 2 Applying whitening followed by length–norm is equivalent to applying the transformation f using the ML estimates of A and α i Sandro Cumani, Pietro Laface I–vector transformation for PLDA based speaker recognition

I–vectors and PLDA I–vector Transformation Dataset mismatch compensation Dataset mismatch compensation Results Conclusions Dataset mismatch compensation We introduce an α –layer (scaling layer), whose single parameter is i–vector depedent The function f is a cascade of the α –layer and the original SAS layers For efficiency reasons we perform alternate estimation of SAS parameters and α i ’s At testing time, for each test i–vector we need to estimate the corresponding α i Sandro Cumani, Pietro Laface I–vector transformation for PLDA based speaker recognition

I–vectors and PLDA I–vector Transformation Dataset mismatch compensation Dataset mismatch compensation Results Conclusions SRE ’10 Results (female) 400–dimensional i–vectors reduced to 150–dimensions through LDA i–vectors are whitened (this allows initializing the transformation as the identity function) Results of α –scaled SAS transformation on the female set of NIST SRE 2010 dataset Cond 1 Cond 2 Cond 3 Cond 4 Cond 5 System EER DCF10 EER DCF10 EER DCF10 EER DCF10 EER DCF10 PLDA (w/o LN) 2.06 0.288 3.60 0.541 3.27 0.481 1.71 0.335 3.91 0.417 1–layer AS 2.15 0.221 3.36 0.462 2.96 0.414 1.61 0.290 3.19 0.391 PLDA (with LN) 1.81 0.255 2.83 0.476 1.95 0.367 1.21 0.295 2.19 0.347 1–layer α –AS iter. 1 1.80 0.204 2.83 0.424 2.15 0.373 1.20 0.280 2.03 0.333 1–layer α –AS iter. 3 1.38 0.192 2.58 0.406 2.30 0.361 1.20 0.237 2.16 0.322 Sandro Cumani, Pietro Laface I–vector transformation for PLDA based speaker recognition

Ivector transformation and scaling for PLDA based speaker - PowerPoint PPT Presentation

Ivectors and PLDA Ivector Transformation Dataset mismatch compensation Conclusions Ivector transformation and scaling for PLDA based speaker recognition Sandro Cumani, Pietro Laface Politecnico di Torino, Italy Sandro Cumani, Pietro

Fast Scoring for PLDA with Uncertainty Propagation Wei-wei LIN and Man-Wai Mak June 2016

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

On the Use of PLDA i-Vector Scoring for Clustering Short Segments Itay Salmun Irit Opher Itshak

UP UP AND OUT: SCALING SOFTWARE WITH AKKA Jonas Bonr CTO Typesafe @jboner Scaling software

Outline Scaling Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large

Analysis of Scaling Algorithms for Matrix & Operator Scaling Contents Scaling Algorithms

Matrix and Vector Operations Matrix and Vector Operations 1 / 21 Matrix and Vector Operations

Day 3 Advanced Vector Architectures Session A: Vector Instruction Execution Pipelines Break

Channel Compensation for Speaker Recognition Using MAP Adapted PLDA and Denoising DNNs Frederick

A PLDA Approach for Language and Text Independent Speaker Recognition Abbas Khosravani, Mohammad

Effectively Scaling Effectively Scaling up/universalizing exclusive up/universalizing exclusive

Scaling From simple models to rich strategies PPPLab Day, November 30th Scaling: recent

Outline Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large Principles of

CHAPTER 11: MODEL TRANSFORMATION Transformation Definition Transformation Tool 2 Agenda

Strategy for City Transformation (Part 4) Sunday November 25, Strategy for City Transformation

Composing Transformation Composing Transformation Composing Transformation the process of

CSE 521 Algorithms The Network Flow Problem 2 The Network Flow Problem 4 a x 3 5 3 7 7 4

Axions from Strings Ed Hardy Based on work with Marco Gorghetto & Giovanni Villadoro [

A two-step procedure for scaling multilevel data using Mokkens scalability coefficients Letty

Common Context for Informa-on Problem Descrip-on Complexity,

Finiteness Assumptions in Game Theory E.g. Results Finiteness assumptions baked into

Scanning 35mm Slides This workstation includes a large-format scanner; it can be used to scan

Theron Ji Eric Kim Raji Srikantan Alan Tsai Arel Cordero David Wagner UC Berkeley

Scanning for patterns (aka convolutional networks) Bhiksha Raj 1 Story so far MLPs are

Ivector transformation and scaling for PLDA based speaker - PowerPoint PPT Presentation

Ivectors and PLDA Ivector Transformation Dataset mismatch compensation Conclusions Ivector transformation and scaling for PLDA based speaker recognition Sandro Cumani, Pietro Laface Politecnico di Torino, Italy Sandro Cumani, Pietro

Fast Scoring for PLDA with Uncertainty Propagation Wei-wei LIN and Man-Wai Mak June 2016

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

On the Use of PLDA i-Vector Scoring for Clustering Short Segments Itay Salmun Irit Opher Itshak

UP UP AND OUT: SCALING SOFTWARE WITH AKKA Jonas Bonr CTO Typesafe @jboner Scaling software

Outline Scaling Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large

Analysis of Scaling Algorithms for Matrix &amp; Operator Scaling Contents Scaling Algorithms

Matrix and Vector Operations Matrix and Vector Operations 1 / 21 Matrix and Vector Operations

Day 3 Advanced Vector Architectures Session A: Vector Instruction Execution Pipelines Break

Channel Compensation for Speaker Recognition Using MAP Adapted PLDA and Denoising DNNs Frederick

A PLDA Approach for Language and Text Independent Speaker Recognition Abbas Khosravani, Mohammad

Effectively Scaling Effectively Scaling up/universalizing exclusive up/universalizing exclusive

Scaling From simple models to rich strategies PPPLab Day, November 30th Scaling: recent

Outline Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large Principles of

CHAPTER 11: MODEL TRANSFORMATION Transformation Definition Transformation Tool 2 Agenda

Strategy for City Transformation (Part 4) Sunday November 25, Strategy for City Transformation

Composing Transformation Composing Transformation Composing Transformation the process of

CSE 521 Algorithms The Network Flow Problem 2 The Network Flow Problem 4 a x 3 5 3 7 7 4

Axions from Strings Ed Hardy Based on work with Marco Gorghetto &amp; Giovanni Villadoro [

A two-step procedure for scaling multilevel data using Mokkens scalability coefficients Letty

Common Context for Informa-on Problem Descrip-on Complexity,

Finiteness Assumptions in Game Theory E.g. Results Finiteness assumptions baked into

Scanning 35mm Slides This workstation includes a large-format scanner; it can be used to scan

Theron Ji Eric Kim Raji Srikantan Alan Tsai Arel Cordero David Wagner UC Berkeley

Scanning for patterns (aka convolutional networks) Bhiksha Raj 1 Story so far MLPs are

Analysis of Scaling Algorithms for Matrix & Operator Scaling Contents Scaling Algorithms

Axions from Strings Ed Hardy Based on work with Marco Gorghetto & Giovanni Villadoro [