analysis of the impact of the audio database
play

Analysis of the Impact of the Audio Database Characteristics in the - PowerPoint PPT Presentation

Motivation Clustering Algorithm Audio Database Characteristics Stopping Criterion Summary Analysis of the Impact of the Audio Database Characteristics in the Accuracy of a Speaker Clustering System Jess Jorrn Prieto 1 Carlos Vaquero 2


  1. Motivation Clustering Algorithm Audio Database Characteristics Stopping Criterion Summary Analysis of the Impact of the Audio Database Characteristics in the Accuracy of a Speaker Clustering System Jesús Jorrín Prieto 1 Carlos Vaquero 2 Leibny Paola García 1 1 Agnitio S.L. Madrid, Spain 2 Cirrus Logic Madrid, Spain Odyssey 2016 The Speaker and Language Recognition Workshop Jesús Jorrín-Prieto (AGNITIO S.L.) 1 / 22

  2. Motivation Clustering Algorithm Audio Database Characteristics Stopping Criterion Summary Outline Motivation 1 The accuracy of a clustering task Clustering Algorithm 2 AHC Performance Measures Audio database Audio Database Characteristics 3 Size of the Task Number of Speakers Balance of Speakers Stopping Criterion 4 Threshold tuning Influence of a Training/Testing Mismatch Jesús Jorrín-Prieto (AGNITIO S.L.) 2 / 22

  3. Motivation Clustering Algorithm Audio Database Characteristics Stopping Criterion Summary Outline Motivation 1 The accuracy of a clustering task Clustering Algorithm 2 AHC Performance Measures Audio database Audio Database Characteristics 3 Size of the Task Number of Speakers Balance of Speakers Stopping Criterion 4 Threshold tuning Influence of a Training/Testing Mismatch Jesús Jorrín-Prieto (AGNITIO S.L.) 3 / 22

  4. Motivation Clustering Algorithm Audio Database Characteristics Stopping Criterion Summary The accuracy of a clustering task Jesús Jorrín-Prieto (AGNITIO S.L.) 4 / 22

  5. Motivation Clustering Algorithm Audio Database Characteristics Stopping Criterion Summary Outline Motivation 1 The accuracy of a clustering task Clustering Algorithm 2 AHC Performance Measures Audio database Audio Database Characteristics 3 Size of the Task Number of Speakers Balance of Speakers Stopping Criterion 4 Threshold tuning Influence of a Training/Testing Mismatch Jesús Jorrín-Prieto (AGNITIO S.L.) 5 / 22

  6. Motivation Clustering Algorithm Audio Database Characteristics Stopping Criterion Summary AHC Agglomerative Hierarchical Clustering (AHC) Distance metric: d ( j , k ) = − score PLDA ( j , k ) (1) Linkage method: Minimum distance (single linkage) d ( m , l ) = min { d ( j , l ) , d ( k , l ) } (2) Stopping criterion: If the distance between two clusters is lower than the threshold, they merge. Otherwise the clustering process stops and the solution is given by the partition after the last valid merge. Jesús Jorrín-Prieto (AGNITIO S.L.) 6 / 22

  7. Motivation Clustering Algorithm Audio Database Characteristics Stopping Criterion Summary Performance Measures Impurities measures defined by van Leeuwen Speaker Impurity ( SI ) : To what extent the speakers are spread among the clusters. Cluster Impurity ( CI ) : To what extent a cluster contains audios from different speakers. Impurity tradeoff curves (IT): Graphical representation of the pair ( CI , SI ) at each iteration of the clustering process. [1] David A. van Leeuwen. Speaker linking in large data sets. Odyssey 2010, pp. 202-208. Jesús Jorrín-Prieto (AGNITIO S.L.) 7 / 22

  8. Motivation Clustering Algorithm Audio Database Characteristics Stopping Criterion Summary Audio database Initial set-up 17868 audios from NIST SRE 04,05 and 06. Telephone channel. 300 seconds. Jesús Jorrín-Prieto (AGNITIO S.L.) 8 / 22

  9. Motivation Clustering Algorithm Audio Database Characteristics Stopping Criterion Summary Outline Motivation 1 The accuracy of a clustering task Clustering Algorithm 2 AHC Performance Measures Audio database Audio Database Characteristics 3 Size of the Task Number of Speakers Balance of Speakers Stopping Criterion 4 Threshold tuning Influence of a Training/Testing Mismatch Jesús Jorrín-Prieto (AGNITIO S.L.) 9 / 22

  10. Motivation Clustering Algorithm Audio Database Characteristics Stopping Criterion Summary Size of the Task Experimental Set-Up Clustering tasks: Tasks of variable size. All the available audios are spread over all the subsets for a specific size of the task. All the tasks have similar audios per speaker distribution. Clustering tasks with common size will be evaluated together with one single IT curve. Size of the task #subsets 18000 1 9000 2 3000 6 1000 18 100 180 10 1800 Jesús Jorrín-Prieto (AGNITIO S.L.) 10 / 22

  11. Motivation Clustering Algorithm Audio Database Characteristics Stopping Criterion Summary Size of the Task Results Size of the task EI(%) 10 0.789 100 1.61 1000 3.28 3000 4.8 9000 6.36 18000 7.52 Jesús Jorrín-Prieto (AGNITIO S.L.) 11 / 22

  12. Motivation Clustering Algorithm Audio Database Characteristics Stopping Criterion Summary Number of Speakers Definitions We define R as the number of speakers in the task divided by the number of audios , #spks R = (3) #audios but it also sets the level of the clustering dendogram in which the number of clusters is equal to the number of speakers If we have n as the number of speakers in the task, we should stop after the last n th clustering merge. Since we have as many possible merges as the number of total audios ( N ), the optimal iteration to stop the clustering algorithm is: it opt = N − n = N − N · R = N · ( 1 − R ) (4) Jesús Jorrín-Prieto (AGNITIO S.L.) 12 / 22

  13. Motivation Clustering Algorithm Audio Database Characteristics Stopping Criterion Summary Number of Speakers Experimental Set-Up Experimental Set-Up: Several clustering tasks depending on the number of speakers. Tasks of constant size: 100 audios. Given a clustering task, all the speakers have same number of audios. R Speakers Size Audios per speaker 0.05 5 100 20 0.1 10 100 10 0.2 20 100 5 0.5 50 100 2 0.8 80 100 1 and 3 Jesús Jorrín-Prieto (AGNITIO S.L.) 13 / 22

  14. Motivation Clustering Algorithm Audio Database Characteristics Stopping Criterion Summary Number of Speakers Results Jesús Jorrín-Prieto (AGNITIO S.L.) 14 / 22

  15. Motivation Clustering Algorithm Audio Database Characteristics Stopping Criterion Summary Balance of Speakers Experimental Set-Up Jesús Jorrín-Prieto (AGNITIO S.L.) 15 / 22

  16. Motivation Clustering Algorithm Audio Database Characteristics Stopping Criterion Summary Balance of Speakers Results Jesús Jorrín-Prieto (AGNITIO S.L.) 16 / 22

  17. Motivation Clustering Algorithm Audio Database Characteristics Stopping Criterion Summary Balance of Speakers Results Jesús Jorrín-Prieto (AGNITIO S.L.) 16 / 22

  18. Motivation Clustering Algorithm Audio Database Characteristics Stopping Criterion Summary Outline Motivation 1 The accuracy of a clustering task Clustering Algorithm 2 AHC Performance Measures Audio database Audio Database Characteristics 3 Size of the Task Number of Speakers Balance of Speakers Stopping Criterion 4 Threshold tuning Influence of a Training/Testing Mismatch Jesús Jorrín-Prieto (AGNITIO S.L.) 17 / 22

  19. Motivation Clustering Algorithm Audio Database Characteristics Stopping Criterion Summary Threshold tuning Maximum Distance with Raw scores (MD-R) Jesús Jorrín-Prieto (AGNITIO S.L.) 18 / 22

  20. Motivation Clustering Algorithm Audio Database Characteristics Stopping Criterion Summary Threshold tuning Maximum Distance with Unsupervised Score Calibration (MD-USC) Score calibration process defined by Brümmer and García-Romero [2] [2] Niko Brümmer and Daniel García-Romero. Generative modeling for unsupervised score calibration. ICASSP 2014. Jesús Jorrín-Prieto (AGNITIO S.L.) 19 / 22

  21. Motivation Clustering Algorithm Audio Database Characteristics Stopping Criterion Summary Influence of a Training/Testing Mismatch Experimental Set-Up Experimental set-up: One single threshold training set for all the experiments. 4 groups of clustering tasks depending on the audios per speaker distribution. Group Audios per speaker distribution Size A Similar to threshold training set 100 B 5 speakers with 20 audios 100 C 20 speakers with 5 audios 100 D 50 speakers with 2 audios 100 Performance measures: Difference between number of clusters and number of speakers relative to the number of speakers | #spks − #clusters | (5) #spks Jesús Jorrín-Prieto (AGNITIO S.L.) 20 / 22

  22. Motivation Clustering Algorithm Audio Database Characteristics Stopping Criterion Summary Influence of a Training/Testing Mismatch Results Maximum Distance with Raw scores (MD-R) Maximum Distance with Unsupervised Score Calibration (MD-USC) Jesús Jorrín-Prieto (AGNITIO S.L.) 21 / 22

  23. Motivation Clustering Algorithm Audio Database Characteristics Stopping Criterion Summary Summary Speaker Clustering is strongly affected by the characteristics of the audio database. Extracted conclusions are useful to anticipate the accuracy of a clustering scenario and also to define possible solutions based on clustering approaches. Jesús Jorrín-Prieto (AGNITIO S.L.) 22 / 22

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend