On the Use of PLDA i-Vector Scoring for Clustering Short Segments
Itay Salmun Irit Opher Itshak Lapidot
itaysa@afeka.ac.il irito@afeka.ac.il itshakl@afeka.ac.il
On the Use of PLDA i-Vector Scoring for Clustering Short Segments - - PowerPoint PPT Presentation
On the Use of PLDA i-Vector Scoring for Clustering Short Segments Itay Salmun Irit Opher Itshak Lapidot itshakl@afeka.ac.il itaysa@afeka.ac.il irito@afeka.ac.il Outline Shortly about DNNs Motivation Problem Definition Basic
Itay Salmun Irit Opher Itshak Lapidot
itaysa@afeka.ac.il irito@afeka.ac.il itshakl@afeka.ac.il
Outline
June 2016
Shortly about DNNs
June 2016
Motivation
June 2016
Bli… Blu… Blo… Pla… Pli… Dla… Gla… Tra… Tra Ta Ta Ta Ta…
Motivation (cont.)
June 2016
Bli… Blu… Blo… Pla… Pli… Dla… Gla… Tra… Tra Ta Ta Ta Ta…
Motivation (cont.)
June 2016
Bli… Blu… Blo… Pla… Pli… Dla… Gla… Tra… Tra Ta Ta Ta Ta…
Problem Definition
June 2016
required to cluster them into homogeneous groups, such that:
– Each cluster will occupied mostly by one speaker only (cluster purity). – Each speaker will mostly belongs to one cluster
June 2016
Mean-Shift Algorithm
Basic
region
Region of interest Center of mass Mean Shift vector
2 1 2 1
( )
n i i i h n i i
g h m g h φ φ φ φ φ φ φ
= =
− = − −
∑ ∑
2 2 2 2
1 ( , , )
i i i
h g h h φ φ φ φ φ φ − ≤ = − >
June 2016
Mean-Shift Algorithm (cont.)
Modified
calculated using K-Nearest neighbor
then the bandwidth is calculated as:
1 1
( , , ) ( ) ( , , )
i
k il il i i l h i i k il i i l
g h m g h φ φ φ φ φ φ φ
= =
= −
∑ ∑
( , )
i i iK
h s φ φ =
i
h
iK
φ
i
φ ( , )
i ik
s φ φ
June 2016
Mean-Shift Algorithm (cont.)
Modified
which the PLDA pairwise score with are larger or equal to the adaptive bandwidth
( ) { : ( , ) }
i
h i il i il i
S s h φ φ φ φ = ≥ ( , ) ( , ) ( , , ) ( , )
i il i il i i il i i il i
s s h g h s h φ φ φ φ φ φ φ φ ≥ = < ( )
i
h i
S φ
i
φ
i
h
1 2 1 2 1 2
( , | ) ( , ) log ( , | )
s d
p H s p H φ φ φ φ φ φ =
June 2016
Speaker Clustering System
“i-vectors” PLDA score Mean Shift algorithm Clustering results
* In previous work: I. fixed h threshold II. a cosine distance instead of PLDA
June 2016
Speaker Clustering System
Before clustering:
transformation matrix C.
covariance model parameters. CT CT φ ϕ φ =
June 2016
Speaker Clustering System
Given a set of speech segments, cluster them according to the following steps:
vectors:
Euclidian distance with fixed threshold
{ }
i
φ
{ }
i
ϕ
June 2016
Experiments and Results
Experiments Setup
Experiments on telephone conversations
Clustering evaluation
.
K acp asp = ⋅
June 2016
Experiments and Results (cont.)
Bandwidth parameter h (for 30 speakers)
Cosine based random mean shift clustering: adaptive threshold using kNN VS a fixed threshold
June 2016
Experiments and Results (cont.)
Mean Shift’s selecting point configuration (for 30 speakers)
Cosine based mean shift clustering with adaptive threshold: full mean shift VS random mean shift
June 2016
Experiments and Results (cont.)
PLDA based Mean Shift (for 30 speakers)
Clustering with adaptive threshold: PLDA based mean shift VS cosine based mean shift
June 2016
Experiments and Results (cont.)
PLDA training (for 30 speakers)
PLDA based mean shift: PLDA model trained on short segments VS PLDA model trained on long segments
June 2016
Experiments and Results (cont.)
Summary of Mean Shift configuration (for 30 speakers)
Comparing K value of mean shift configurations
June 2016
Experiments and Results (cont.)
Summary of Mean Shift configuration (for 30 speakers)
Comparing the average number of detected speakers (ANDS) of mean shift configurations.
June 2016
Experiments and Results (cont.)
Table 1: Results for different number of speakers for the cosine based mean shift (baseline system)
Influence of the Population Size (Baseline System)
ANDS K ASP ACP h Number of Speakers 6.1 85.7 80.1 92.2 0.35 3 21.1 79.9 71.6 89.5 0.40 7 60.6 70.0 63.3 77.6 0.45 15 136.6 69.9 57.6 85.0 0.50 22 195.0 65.9 53.2 81.7 0.50 30 614.1 61.2 44.3 84.6 0.55 60 1742.1 54.1 42.8 68.4 0.55 188
June 2016
Experiments and Results (cont.)
Table 2: Results for different number of speakers for the PLDA based mean shift (proposed system)
Influence of the Population Size (Proposed System)
ANDS K ASP ACP k (kNN) Number of Speakers 5.0 79.8 71.3 90.0 19 3 11.2 75.5 67.5 84.8 17 7 26.9 74.1 63.6 86.6 15 15 36.4 75.1 65.3 86.6 15 22 46.6 72.1 64.3 80.8 17 30 90.0 67.2 61.1 73.8 17 60 283.0 57.1 53.1 61.4 17 188
June 2016
Experiments and Results (cont.)
Table 2: Results for different number of speakers for the PLDA based mean shift (proposed system)
Baseline VS New system
ANDS K Number of Speakers 5.0 (6.1) 79.8 (85.7) 3 11.2 (21.1) 75.5 (79.9) 7 26.9 (60.6) 74.1 (70.0) 15 36.4 (136.6) 75.1 (69.9) 22 46.6 (195.0) 72.1 (65.9) 30 90.0 (614.1) 67.2 (61.2) 60 283.0 (1742.1) 57.1 (54.1) 188
June 2016
Summary
consuming, it outperforms the baseline system in the following aspects:
numbers of speakers
speakers
far more accurate
June 2016