Analysis of the Impact of the Audio Database Characteristics in the - - PowerPoint PPT Presentation

analysis of the impact of the audio database
SMART_READER_LITE
LIVE PREVIEW

Analysis of the Impact of the Audio Database Characteristics in the - - PowerPoint PPT Presentation

Motivation Clustering Algorithm Audio Database Characteristics Stopping Criterion Summary Analysis of the Impact of the Audio Database Characteristics in the Accuracy of a Speaker Clustering System Jess Jorrn Prieto 1 Carlos Vaquero 2


slide-1
SLIDE 1

Motivation Clustering Algorithm Audio Database Characteristics Stopping Criterion Summary

Analysis of the Impact of the Audio Database Characteristics in the Accuracy of a Speaker Clustering System

Jesús Jorrín Prieto1 Carlos Vaquero2 Leibny Paola García 1

1Agnitio S.L.

Madrid, Spain

2Cirrus Logic

Madrid, Spain

Odyssey 2016 The Speaker and Language Recognition Workshop

Jesús Jorrín-Prieto (AGNITIO S.L.) 1 / 22

slide-2
SLIDE 2

Motivation Clustering Algorithm Audio Database Characteristics Stopping Criterion Summary

Outline

1

Motivation The accuracy of a clustering task

2

Clustering Algorithm AHC Performance Measures Audio database

3

Audio Database Characteristics Size of the Task Number of Speakers Balance of Speakers

4

Stopping Criterion Threshold tuning Influence of a Training/Testing Mismatch

Jesús Jorrín-Prieto (AGNITIO S.L.) 2 / 22

slide-3
SLIDE 3

Motivation Clustering Algorithm Audio Database Characteristics Stopping Criterion Summary

Outline

1

Motivation The accuracy of a clustering task

2

Clustering Algorithm AHC Performance Measures Audio database

3

Audio Database Characteristics Size of the Task Number of Speakers Balance of Speakers

4

Stopping Criterion Threshold tuning Influence of a Training/Testing Mismatch

Jesús Jorrín-Prieto (AGNITIO S.L.) 3 / 22

slide-4
SLIDE 4

Motivation Clustering Algorithm Audio Database Characteristics Stopping Criterion Summary

The accuracy of a clustering task

Jesús Jorrín-Prieto (AGNITIO S.L.) 4 / 22

slide-5
SLIDE 5

Motivation Clustering Algorithm Audio Database Characteristics Stopping Criterion Summary

Outline

1

Motivation The accuracy of a clustering task

2

Clustering Algorithm AHC Performance Measures Audio database

3

Audio Database Characteristics Size of the Task Number of Speakers Balance of Speakers

4

Stopping Criterion Threshold tuning Influence of a Training/Testing Mismatch

Jesús Jorrín-Prieto (AGNITIO S.L.) 5 / 22

slide-6
SLIDE 6

Motivation Clustering Algorithm Audio Database Characteristics Stopping Criterion Summary

AHC

Agglomerative Hierarchical Clustering (AHC)

Distance metric: d(j, k) = −scorePLDA(j, k) (1) Linkage method: Minimum distance (single linkage) d(m, l) = min{d(j, l), d(k, l)} (2) Stopping criterion: If the distance between two clusters is lower than the threshold, they merge. Otherwise the clustering process stops and the solution is given by the partition after the last valid merge.

Jesús Jorrín-Prieto (AGNITIO S.L.) 6 / 22

slide-7
SLIDE 7

Motivation Clustering Algorithm Audio Database Characteristics Stopping Criterion Summary

Performance Measures

Impurities measures defined by van Leeuwen Speaker Impurity (SI): To what extent the speakers are spread among the clusters. Cluster Impurity (CI): To what extent a cluster contains audios from different speakers. Impurity tradeoff curves (IT): Graphical representation of the pair (CI, SI) at each iteration of the clustering process.

[1] David A. van Leeuwen. Speaker linking in large data sets. Odyssey 2010, pp. 202-208. Jesús Jorrín-Prieto (AGNITIO S.L.) 7 / 22

slide-8
SLIDE 8

Motivation Clustering Algorithm Audio Database Characteristics Stopping Criterion Summary

Audio database

Initial set-up

17868 audios from NIST SRE 04,05 and 06. Telephone channel. 300 seconds.

Jesús Jorrín-Prieto (AGNITIO S.L.) 8 / 22

slide-9
SLIDE 9

Motivation Clustering Algorithm Audio Database Characteristics Stopping Criterion Summary

Outline

1

Motivation The accuracy of a clustering task

2

Clustering Algorithm AHC Performance Measures Audio database

3

Audio Database Characteristics Size of the Task Number of Speakers Balance of Speakers

4

Stopping Criterion Threshold tuning Influence of a Training/Testing Mismatch

Jesús Jorrín-Prieto (AGNITIO S.L.) 9 / 22

slide-10
SLIDE 10

Motivation Clustering Algorithm Audio Database Characteristics Stopping Criterion Summary

Size of the Task

Experimental Set-Up

Clustering tasks:

Tasks of variable size. All the available audios are spread over all the subsets for a specific size of the task. All the tasks have similar audios per speaker distribution. Clustering tasks with common size will be evaluated together with one single IT curve. Size of the task #subsets 18000 1 9000 2 3000 6 1000 18 100 180 10 1800

Jesús Jorrín-Prieto (AGNITIO S.L.) 10 / 22

slide-11
SLIDE 11

Motivation Clustering Algorithm Audio Database Characteristics Stopping Criterion Summary

Size of the Task

Results

Size of the task EI(%) 10 0.789 100 1.61 1000 3.28 3000 4.8 9000 6.36 18000 7.52

Jesús Jorrín-Prieto (AGNITIO S.L.) 11 / 22

slide-12
SLIDE 12

Motivation Clustering Algorithm Audio Database Characteristics Stopping Criterion Summary

Number of Speakers

Definitions

We define R as the number of speakers in the task divided by the number of audios, R = #spks #audios (3) but it also sets the level of the clustering dendogram in which the number of clusters is equal to the number of speakers If we have n as the number of speakers in the task, we should stop after the last nth clustering merge. Since we have as many possible merges as the number of total audios (N), the

  • ptimal iteration to stop the clustering algorithm is:

itopt = N − n = N − N · R = N · (1 − R) (4)

Jesús Jorrín-Prieto (AGNITIO S.L.) 12 / 22

slide-13
SLIDE 13

Motivation Clustering Algorithm Audio Database Characteristics Stopping Criterion Summary

Number of Speakers

Experimental Set-Up

Experimental Set-Up:

Several clustering tasks depending on the number of speakers. Tasks of constant size: 100 audios. Given a clustering task, all the speakers have same number of audios. R Speakers Size Audios per speaker 0.05 5 100 20 0.1 10 100 10 0.2 20 100 5 0.5 50 100 2 0.8 80 100 1 and 3

Jesús Jorrín-Prieto (AGNITIO S.L.) 13 / 22

slide-14
SLIDE 14

Motivation Clustering Algorithm Audio Database Characteristics Stopping Criterion Summary

Number of Speakers

Results

Jesús Jorrín-Prieto (AGNITIO S.L.) 14 / 22

slide-15
SLIDE 15

Motivation Clustering Algorithm Audio Database Characteristics Stopping Criterion Summary

Balance of Speakers

Experimental Set-Up

Jesús Jorrín-Prieto (AGNITIO S.L.) 15 / 22

slide-16
SLIDE 16

Motivation Clustering Algorithm Audio Database Characteristics Stopping Criterion Summary

Balance of Speakers

Results

Jesús Jorrín-Prieto (AGNITIO S.L.) 16 / 22

slide-17
SLIDE 17

Motivation Clustering Algorithm Audio Database Characteristics Stopping Criterion Summary

Balance of Speakers

Results

Jesús Jorrín-Prieto (AGNITIO S.L.) 16 / 22

slide-18
SLIDE 18

Motivation Clustering Algorithm Audio Database Characteristics Stopping Criterion Summary

Outline

1

Motivation The accuracy of a clustering task

2

Clustering Algorithm AHC Performance Measures Audio database

3

Audio Database Characteristics Size of the Task Number of Speakers Balance of Speakers

4

Stopping Criterion Threshold tuning Influence of a Training/Testing Mismatch

Jesús Jorrín-Prieto (AGNITIO S.L.) 17 / 22

slide-19
SLIDE 19

Motivation Clustering Algorithm Audio Database Characteristics Stopping Criterion Summary

Threshold tuning

Maximum Distance with Raw scores (MD-R)

Jesús Jorrín-Prieto (AGNITIO S.L.) 18 / 22

slide-20
SLIDE 20

Motivation Clustering Algorithm Audio Database Characteristics Stopping Criterion Summary

Threshold tuning

Maximum Distance with Unsupervised Score Calibration (MD-USC)

Score calibration process defined by Brümmer and García-Romero [2]

[2] Niko Brümmer and Daniel García-Romero. Generative modeling for unsupervised score calibration. ICASSP 2014. Jesús Jorrín-Prieto (AGNITIO S.L.) 19 / 22

slide-21
SLIDE 21

Motivation Clustering Algorithm Audio Database Characteristics Stopping Criterion Summary

Influence of a Training/Testing Mismatch

Experimental Set-Up

Experimental set-up:

One single threshold training set for all the experiments. 4 groups of clustering tasks depending on the audios per speaker distribution.

Group Audios per speaker distribution Size A Similar to threshold training set 100 B 5 speakers with 20 audios 100 C 20 speakers with 5 audios 100 D 50 speakers with 2 audios 100 Performance measures: Difference between number of clusters and number of speakers relative to the number of speakers |#spks − #clusters| #spks (5)

Jesús Jorrín-Prieto (AGNITIO S.L.) 20 / 22

slide-22
SLIDE 22

Motivation Clustering Algorithm Audio Database Characteristics Stopping Criterion Summary

Influence of a Training/Testing Mismatch

Results

Maximum Distance with Raw scores (MD-R) Maximum Distance with Unsupervised Score Calibration (MD-USC)

Jesús Jorrín-Prieto (AGNITIO S.L.) 21 / 22

slide-23
SLIDE 23

Motivation Clustering Algorithm Audio Database Characteristics Stopping Criterion Summary

Summary

Speaker Clustering is strongly affected by the characteristics of the audio database. Extracted conclusions are useful to anticipate the accuracy of a clustering scenario and also to define possible solutions based

  • n clustering approaches.

Jesús Jorrín-Prieto (AGNITIO S.L.) 22 / 22