First Investigations on Self Trained Speaker Diarization el Le Lan 1 - PowerPoint PPT Presentation

First Investigations on Self Trained Speaker Diarization el Le Lan 1 , 2 Sylvain Meignier 2 Ga¨ Delphine Charlet 1 Anthony Larcher 2 1 Orange Labs, France first.lastname@orange.com 2 LIUM, Universit´ e du Maine, France first.lastname@lium.univ-lemans.fr June 22, 2016 Ga¨ el Le Lan (Orange Labs/LIUM) Self Trained Speaker Diarization June 22, 2016 1 / 15

Context Cross-recording speaker diarization of French TV archives Speaker indexing of collections of multiple recordings Two-pass approach Speaker segmentation and clustering, within each recording Cross-recording speaker linking State of the art speaker recognition framework i-vector/PLDA hierarchical agglomerative clustering PLDA maximizes the inter-speaker variability, while minimizing the intra-speaker. Using the target data as training material, how good can we estimate this variability ? Ga¨ el Le Lan (Orange Labs/LIUM) Self Trained Speaker Diarization June 22, 2016 2 / 15

State-of-the-Art two-pass Diarization Framework (baseline) target data unlabeled Ga¨ el Le Lan (Orange Labs/LIUM) Self Trained Speaker Diarization June 22, 2016 3 / 15

State-of-the-Art two-pass Diarization Framework (baseline) target data unlabeled frontend Ga¨ el Le Lan (Orange Labs/LIUM) Self Trained Speaker Diarization June 22, 2016 3 / 15

State-of-the-Art two-pass Diarization Framework (baseline) target data unlabeled frontend speaker segmentation Ga¨ el Le Lan (Orange Labs/LIUM) Self Trained Speaker Diarization June 22, 2016 3 / 15

State-of-the-Art two-pass Diarization Framework (baseline) target data unlabeled frontend speaker segmentation Universal Background Model i-vector extraction Total Variability Matrix Ga¨ el Le Lan (Orange Labs/LIUM) Self Trained Speaker Diarization June 22, 2016 3 / 15

State-of-the-Art two-pass Diarization Framework (baseline) target data unlabeled frontend speaker segmentation Universal Background Model i-vector extraction Total Variability Matrix similarity scoring PLDA parameters Ga¨ el Le Lan (Orange Labs/LIUM) Self Trained Speaker Diarization June 22, 2016 3 / 15

State-of-the-Art two-pass Diarization Framework (baseline) target data unlabeled frontend speaker segmentation Universal Background Model i-vector extraction Total Variability Matrix similarity scoring PLDA parameters speaker clustering Ga¨ el Le Lan (Orange Labs/LIUM) Self Trained Speaker Diarization June 22, 2016 3 / 15

State-of-the-Art two-pass Diarization Framework (baseline) target data unlabeled frontend speaker segmentation Universal Background Model i-vector extraction Total Variability Matrix similarity scoring PLDA parameters speaker clustering for each recording Ga¨ el Le Lan (Orange Labs/LIUM) Self Trained Speaker Diarization June 22, 2016 3 / 15

State-of-the-Art two-pass Diarization Framework (baseline) target data unlabeled frontend speaker segmentation Universal Background Model i-vector extraction Total Variability Matrix similarity scoring PLDA parameters speaker clustering for each recording cross-recording similarity scoring Ga¨ el Le Lan (Orange Labs/LIUM) Self Trained Speaker Diarization June 22, 2016 3 / 15

State-of-the-Art two-pass Diarization Framework (baseline) target data unlabeled frontend speaker segmentation Universal Background Model i-vector extraction Total Variability Matrix similarity scoring PLDA parameters speaker clustering for each recording cross-recording diarization output speaker linking similarity scoring (speaker clusters) Ga¨ el Le Lan (Orange Labs/LIUM) Self Trained Speaker Diarization June 22, 2016 3 / 15

State-of-the-Art two-pass Diarization Framework (baseline) target data train data unlabeled labeled by speaker frontend frontend speaker segmentation Universal Background Model i-vector extraction i-vector extraction Total Variability Matrix similarity scoring PLDA parameters baseline - supervised speaker clustering for each recording cross-recording diarization output speaker linking similarity scoring (speaker clusters) Ga¨ el Le Lan (Orange Labs/LIUM) Self Trained Speaker Diarization June 22, 2016 3 / 15

State-of-the-Art two-pass Diarization Framework (baseline) target data train data acoustic mismatch unlabeled labeled by speaker frontend frontend speaker segmentation Universal Background Model i-vector extraction i-vector extraction Total Variability Matrix similarity scoring PLDA parameters baseline - supervised speaker clustering for each recording cross-recording diarization output speaker linking similarity scoring (speaker clusters) Ga¨ el Le Lan (Orange Labs/LIUM) Self Trained Speaker Diarization June 22, 2016 3 / 15

”Self Trained” Framework target data train data acoustic mismatch unlabeled labeled by speaker frontend frontend speaker segmentation Universal Background Model i-vector extraction i-vector extraction Total Variability Matrix similarity scoring PLDA parameters baseline - supervised speaker clustering self trained - unsup. for each recording cross-recording diarization output speaker linking similarity scoring (speaker clusters) Ga¨ el Le Lan (Orange Labs/LIUM) Self Trained Speaker Diarization June 22, 2016 4 / 15

Adapted Framework target data train data acoustic mismatch unlabeled labeled by speaker frontend frontend speaker segmentation Universal Background Model i-vector extraction i-vector extraction Total Variability Matrix similarity scoring PLDA parameters baseline - supervised speaker clustering self trained - unsup. for each recording adapted - unsup. cross-recording diarization output speaker linking similarity scoring (speaker clusters) Ga¨ el Le Lan (Orange Labs/LIUM) Self Trained Speaker Diarization June 22, 2016 5 / 15

”Self Trained” Diarization ? (1/2) Goal: avoid acoustic mismatch using the target data as training material Requirements to train an i-vector/PLDA system UBM/TV: clean speech segments, straightforward PLDA: several sessions per speaker, in various acoustic conditions Are there several speakers appearing in different episodes ? Assuming we know how to effectively cluster the target data, can we train a system with those ? Ga¨ el Le Lan (Orange Labs/LIUM) Self Trained Speaker Diarization June 22, 2016 6 / 15

Which Data ? 200 hours of French broadcast news (drawn from REPERE, ETAPE and ESTER evaluation campaigns) Two shows selected as target corpora: LCP Info and BFM Story train corpus: all other recordings Corpus LCP target BFM target #Episodes 45 42 Episode duration 25m 60m Evaluated (labeled) speech duration 10h08m 19h57m One-Time speakers 127 345 Recurring speakers (2+ occurrences) 93 77 R. speakers (3+ occurrences) 48 35 Total speakers 220 422 O.T. speakers speech proportion 20.12% 44,84% R. speakers (2+ occurrences) s.p. 79.88% 55,16% R. speakers (3+ occurrences) s.p. 67.06% 45.94% Average speaker time per episode 1m08s 1m58s Table: Composition of target corpora. Ga¨ el Le Lan (Orange Labs/LIUM) Self Trained Speaker Diarization June 22, 2016 7 / 15

Oracle Framework target data target data train data acoustic mismatch unlabeled labels labeled by speaker frontend frontend LCP target BFM target 10,87 X speaker segmentation Universal Background Model i-vector extraction i-vector extraction Total Variability Matrix similarity scoring PLDA parameters baseline - supervised speaker clustering oracle - supervised for each recording cross-recording diarization output speaker linking similarity scoring (speaker clusters) Ga¨ el Le Lan (Orange Labs/LIUM) Self Trained Speaker Diarization June 22, 2016 8 / 15

Oracle Framework target data target data train data acoustic mismatch unlabeled labels labeled by speaker frontend frontend LCP target BFM target 17,72 13,22 10,87 X speaker segmentation Universal Background Model i-vector extraction i-vector extraction Total Variability Matrix similarity scoring PLDA parameters baseline - supervised speaker clustering oracle - supervised for each recording cross-recording diarization output speaker linking similarity scoring (speaker clusters) Ga¨ el Le Lan (Orange Labs/LIUM) Self Trained Speaker Diarization June 22, 2016 8 / 15

Minimum Requirements for PLDA Parameters Estimation Oracle Experiment For the LCP target corpus, we can estimate suitable PLDA parameters with a minimum of 37 episodes 40 recurring speakers, appearing in 7.2 episodes, in average As for the BFM target corpus, the EM algorithm does not converge all episodes, 35 recurring speakers, appearing in 5.45 episodes Ga¨ el Le Lan (Orange Labs/LIUM) Self Trained Speaker Diarization June 22, 2016 9 / 15

First Investigations on Self Trained Speaker Diarization el Le Lan 1 - PowerPoint PPT Presentation

First Investigations on Self Trained Speaker Diarization el Le Lan 1 , 2 Sylvain Meignier 2 Ga Delphine Charlet 1 Anthony Larcher 2 1 Orange Labs, France first.lastname@orange.com 2 LIUM, Universit e du Maine, France

Introduction to Speaker Diarization Dr. Gerald Friedland International Computer Science

Inspections and Inspections and Inspections and Investigations Investigations Investigations

Speech Processing 15-492/18-492 Speaker ID Who is speaking? Speaker ID, Speaker Recognition

Patrick Clavin Chief Bureau Officer Full Preliminary Total Investigations Investigations

Janet Kittams-Lalley Helpline Center Available 24/7 Have staff that are trained to assess

I have trained more than 1,000 individuals to become ACII qualified I have trained over

Debate: Writing and Presentation Mr. Winand Debate Proposition America is losing its competitive

A New Adaptation Method for Speaker- -Model Model A New Adaptation Method for Speaker Creation

Self-Driving Cars As Edge Computing Devices Matt Ranney - @mranney Uber ATG Why Self-Driving?

Empowered Self- Belief in Awareness Self Learner Interdepen Self- -dence Motivation Self-

PPP Loans For Self Employed Individuals PPP LOANS FOR SELF EMPLOYED INDIVIDUALS Self employed

Harmony in the Society Self-exploration, Self-investigation, Self-study 1. Content of Self

Noise2Self: Blind Denoising by Self-Supervision Joshua Batson Loc Royer Noisy Data

SERVICE ABOVE SELF SERVICE ABOVE SELF SERVICE ABOVE SELF SERVICE ABOVE SELF ROTARY DISTRICT

SERVICE ABOVE SELF SERVICE ABOVE SELF SERVICE ABOVE SELF SERVICE ABOVE SELF ROTARY DISTRICT

SELF SE SELF SE SE SELF SE SELF LF-INJECTION LF LF LF-INJECTION INJECTION INJECTION

Saratoga a Delay-Tolerant Networking convergence layer with efficient link utilization Wesley

The Problem Distributed Denial of Service Attacks and Defenses CS 239 Advanced Topics in

Radio Networks The Model Broadcast Andrea CLEMENTI A radio network is a set of stations (nodes)

Storage and Indexing (continued) CMPSCI 645 Mar 4, 2008 Slides Courtesy of R. Ramakrishnan and

Composer 2.0 Nils Adermann @naderman Private Packagist https://packagist.com Goals for 2.0

Decompilation, type inference and finding the code to decompile Alan Mycroft Computer

t t t

Software Testing Lecture 7 Property Based Testing Justin Pearson 2019 1 / 17 When are there

Sambuz

Useful Links

Newsletter

Mail Us