On the Use of PLDA i-Vector Scoring for Clustering Short Segments - PowerPoint PPT Presentation

On the Use of PLDA i-Vector Scoring for Clustering Short Segments Itay Salmun Irit Opher Itshak Lapidot itshakl@afeka.ac.il itaysa@afeka.ac.il irito@afeka.ac.il

Outline • Shortly about DNNs • Motivation • Problem Definition • Basic Mean-Shift Algorithm • Modified Mean-Shift Algorithm • Speaker Clustering System • Experiments and Results • Summary June 2016

Shortly about DNNs June 2016

Motivation Bli… Blo… Blu… Pli… Pla… Tra… Tra Ta Ta Gla… Ta Ta… Dla… June 2016

Motivation (cont.) Bli… Gla… Blu… Pla… Tra… Tra Ta Ta Ta Ta… Blo… Pli… Dla… June 2016

Problem Definition • Given many short speech segments, required to cluster them into homogeneous groups, such that: – Each cluster will occupied mostly by one speaker only (cluster purity). – Each speaker will mostly belongs to one cluster only (speaker purity). June 2016

Mean-Shift Algorithm Basic  Objective : Find the densest Region of interest region Center of mass  The Mean Shift vector:   2 φ − φ n ∑ φ   i g i h   = 1 i φ = − φ ( ) m   h Mean Shift 2 φ − φ n ∑ vector   i g h   = 1 i  The uniform kernel with bandwidth for Euclidean pairwise distances :  2 φ − φ ≤ 2  1 h φ φ =  i ( , , ) g h i φ − φ 2 > 2  0  h June 2016 i

Mean-Shift Algorithm (cont.) Modified  The Mean Shift vector: k ∑ φ φ φ ( , , ) g h il il i i φ = − φ = 1 ( ) l m h i i k ∑ i φ φ ( , , ) g h il i i = 1 l  The adaptive bandwidth parameter is h i calculated using K-Nearest neighbor φ φ algorithm. If is the K th nearest neighbor of iK i then the bandwidth is calculated as: = s φ φ ( , ) h i i iK s φ φ  Where is the two-covariance scoring. ( , ) i ik June 2016

Mean-Shift Algorithm (cont.) Modified φ  We select a subset of data points in ( ) S h i φ i which the PLDA pairwise score with are i h larger or equal to the adaptive bandwidth i φ = φ φ φ ≥ ( ) { : ( , ) } S s h h i il i il i i  We use Mean shift weighted kernel of: φ φ φ φ ≥ φ φ  ( , | ) ( , ) ( , ) p H s s h φ φ = φ φ =  1 2 s i il i il i ( , ) log s ( , , ) g h φ φ 1 2 φ φ < i il i ( , | ) p H  0 ( , ) s h 1 2 d i il i June 2016

Speaker Clustering System Mean Shift Clustering “i-vectors” algorithm results PLDA score * In previous work: I. fixed h threshold II. a cosine distance instead of PLDA III. Random Mean Shift June 2016

Speaker Clustering System Before clustering: • Train the UBM and TV matrix. • Train the PCA matrix T and the Whitening transformation matrix C. φ CT ϕ = • Calculate the low rank i-vectors: φ CT • Using the low rank i-vectors, train the two- covariance model parameters. June 2016

Speaker Clustering System Given a set of speech segments, cluster them according to the following steps: 1. For each speech segment extract the i- { } φ vectors: i { } ϕ 2. Calculate low rank i-vectors: i 3. Apply two-covariance score mean-shift. 4. Merge all shifted points, according to Euclidian distance with fixed threshold June 2016

Experiments and Results Experiments Setup Experiments on telephone conversations  Cutting NIST-2008 into segments according to a given statistic.  Average segment length: 2.5 Sec  Average number of segments per speaker: 33 Clustering evaluation 1. Average Speaker Purity ( ASP ). 2. Average Cluster purity ( ACP ). = ⋅ K acp asp 3. K : . 4. Average Number of Detected Speakers ( ANDS ). June 2016

Experiments and Results (cont.) Bandwidth parameter h (for 30 speakers) Cosine based random mean shift clustering: adaptive threshold using kNN VS a fixed threshold June 2016

Experiments and Results (cont.) Mean Shift’s selecting point configuration (for 30 speakers) Cosine based mean shift clustering with adaptive threshold: full mean shift VS random mean shift June 2016

Experiments and Results (cont.) PLDA based Mean Shift (for 30 speakers) Clustering with adaptive threshold: PLDA based mean shift VS cosine based mean shift June 2016

Experiments and Results (cont.) PLDA training (for 30 speakers) PLDA based mean shift: PLDA model trained on short segments VS PLDA model trained on long segments June 2016

Experiments and Results (cont.) Summary of Mean Shift configuration (for 30 speakers) Comparing K value of mean shift configurations June 2016

Experiments and Results (cont.) Summary of Mean Shift configuration (for 30 speakers) Comparing the average number of detected speakers (ANDS) of mean shift configurations. June 2016

Experiments and Results (cont.) Influence of the Population Size (Baseline System) Table 1: Results for different number of speakers for the cosine based mean shift ( baseline system) Number of h ACP ASP K ANDS Speakers 0.35 3 92.2 80.1 85.7 6.1 0.40 7 89.5 71.6 79.9 21.1 0.45 15 77.6 63.3 70.0 60.6 0.50 22 85.0 57.6 69.9 136.6 0.50 30 81.7 53.2 65.9 195.0 0.55 60 84.6 44.3 61.2 614.1 0.55 188 68.4 42.8 54.1 1742.1 June 2016

Experiments and Results (cont.) Influence of the Population Size (Proposed System) Table 2: Results for different number of speakers for the PLDA based mean shift ( proposed system) Number of k (kNN) ACP ASP K ANDS Speakers 19 3 90.0 71.3 79.8 5.0 17 7 11.2 84.8 67.5 75.5 15 15 26.9 86.6 63.6 74.1 15 36.4 22 86.6 65.3 75.1 17 30 80.8 64.3 72.1 46.6 17 60 73.8 61.1 67.2 90.0 17 188 283.0 61.4 53.1 57.1 June 2016

Experiments and Results (cont.) Baseline VS New system Table 2: Results for different number of speakers for the PLDA based mean shift ( proposed system) Number of K ANDS Speakers 5.0 (6.1) 3 79.8 (85.7) 7 11.2 (21.1) 75.5 (79.9) 15 26.9 (60.6) 74.1 (70.0) 22 36.4 (136.6) 75.1 (69.9) 30 72.1 (65.9) 46.6 (195.0) 60 90.0 (614.1) 67.2 (61.2) 188 57.1 (54.1) 283.0 (1742.1) June 2016

Summary  While the proposed system is more time consuming, it outperforms the baseline system in the following aspects: 1. it yields better results when clustering large numbers of speakers 2. it is more robust to changes in the number of speakers 3. no bandwidth adjustment is needed (almost) 4. The average number of detected speakers is by far more accurate June 2016

Thanks June 2016

On the Use of PLDA i-Vector Scoring for Clustering Short Segments - PowerPoint PPT Presentation

On the Use of PLDA i-Vector Scoring for Clustering Short Segments Itay Salmun Irit Opher Itshak Lapidot itshakl@afeka.ac.il itaysa@afeka.ac.il irito@afeka.ac.il Outline Shortly about DNNs Motivation Problem Definition Basic

Fast Scoring for PLDA with Uncertainty Propagation Wei-wei LIN and Man-Wai Mak June 2016

Exercise 8: Scoring Exercise 8: Scoring FLUKA Beginners Course Exercise 8: Scoring Aim of the

Mountain High Swim League Scoring Presentation 2018 Scoring Committee 1 MHSL Scoring Training

Ivector transformation and scaling for PLDA based speaker recognition Sandro Cumani, Pietro

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

Exercise 8: Scoring FLUKA Beginners Course Exercise 8: Scoring Aim of the exercise: 1- Add

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Clustering A Categorization of Major Clustering Methods Partitioning Methods

Clustering DSE 210 Clustering in R d Two common uses of clustering: Vector quantization Find

Trust based Clustering for Group Trust based Clustering for Group Trust based Clustering for

Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering

Clustering Hierarchical clustering and k-mean clustering Genome 373 Genomic Informatics

Cl Clustering t i A Categorization of Major Clustering Methods Partitioning Methods

Clustering Hierarchical clustering, k-mean clustering Genome 559: Introduction to Statistical and

Class 7: Learning and learnability Adam Albright (albright@mit.edu) LSA 2017 Phonology

through Glo lobal Lib iberal Arts Responsive, Emergent Vision and Effective, Collaborative

in Computer Science An Overview Jessica Chen-Burger Computer Science Heriot-Watt University 1

AP BIOLOGY Big Idea 4 Part A March 2013 www.njctl.org Slide 3 / 112 Big Idea 4: Biological

@natinfracom #ukinfra2050 What are the key principles for developing an integrated

London Plan Integrated Impact Assessment The IIA Report provides an assessment of the London Plan

Introduction to MPI and OpenMP myson @ postech.ac.kr CSE700-PL @ POSTECH Programming Language

Selecting Effective Expansion Terms for Diversity S. Vargas, R.L.T. Santos, C. Macdonald and I.