Introduction to Speaker Diarization Dr. Gerald Friedland - PowerPoint PPT Presentation

Segmentation & Clustering • Originally: Segment first, cluster later Chen, S. S. and Gopalakrishnan, P., “Clustering via the bayesian information criterion with applications in speech recognition,” Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, 2001, Vol. 2, Seattle, USA, pp. 645-648. • More e ffj cient: Top-Down and Bottom-Up Approaches 16 Monday, May 21, 12

Segmentation: Secret Sauce 17 Monday, May 21, 12

Segmentation: Secret Sauce • How do you distinguish speakers? 17 Monday, May 21, 12

Segmentation: Secret Sauce • How do you distinguish speakers? • Combination of MFCC+GMM+BIC seems unbeatable! 17 Monday, May 21, 12

Segmentation: Secret Sauce • How do you distinguish speakers? • Combination of MFCC+GMM+BIC seems unbeatable! • Can be generalized to Audio Percepts 17 Monday, May 21, 12

MFCC: Idea Audio Signal Mel-Scale Pre-emphasis Filterbank Windowing Log-Scale FFT DCT MFCC power cepstrum of signal 18 Monday, May 21, 12

MFCC: Mel Scale 19 Monday, May 21, 12

MFCC: Result 20 Monday, May 21, 12

Gaussian Mixtures 21 Monday, May 21, 12

Training of Mixture Models Goal: Find a i for Expectation: Maximization: 22 Monday, May 21, 12

Bayesian Information Criterion BIC = where X is the sequence of features for a segment, Θ are the parameters of the statistical model for the segment, K is the number of parameters for the model, N is the number of frames in the segment, λ is an optimization parameter. 23 Monday, May 21, 12

Bayesian Information Criterion: Explanation 24 Monday, May 21, 12

Bayesian Information Criterion: Explanation • BIC penalizes the complexity of the model (as of number of parameters in model). 24 Monday, May 21, 12

Bayesian Information Criterion: Explanation • BIC penalizes the complexity of the model (as of number of parameters in model). • BIC measures the e ffj ciency of the parameterized model in terms of predicting the data. 24 Monday, May 21, 12

Bayesian Information Criterion: Explanation • BIC penalizes the complexity of the model (as of number of parameters in model). • BIC measures the e ffj ciency of the parameterized model in terms of predicting the data. • BIC is therfore used to choose the number of clusters according to the intrinsic complexity present in a particular dataset. 24 Monday, May 21, 12

Bayesian Information Criterion: Properties 25 Monday, May 21, 12

Bayesian Information Criterion: Properties • BIC is a minimum description length criterion. 25 Monday, May 21, 12

Bayesian Information Criterion: Properties • BIC is a minimum description length criterion. • BIC is independent of the prior. 25 Monday, May 21, 12

Bayesian Information Criterion: Properties • BIC is a minimum description length criterion. • BIC is independent of the prior. • It is closely related to other penalized likelihood criteria such as RIC and the Akaike information criterion. 25 Monday, May 21, 12

Bottom-Up Algorithm Cluster1 Cluster2 Cluster3 Cluster1 Cluster2 Cluster3 Cluster2 Cluster1 Cluster2 Cluster3  Start with too many clusters (initialized randomly)  Purify clusters by comparing and merging similar clusters 26  Resegment and repeat until no more merging needed Monday, May 21, 12

Bottom-Up Algorithm Initialization  Start with too many clusters (initialized randomly)  Purify clusters by comparing and merging similar clusters 26  Resegment and repeat until no more merging needed Monday, May 21, 12

Bottom-Up Algorithm Initialization Cluster1 Cluster2 Cluster3 Cluster1 Cluster2 Cluster3 Cluster1 Cluster2 Cluster3 Cluster1 Cluster2 Cluster3  Start with too many clusters (initialized randomly)  Purify clusters by comparing and merging similar clusters 26  Resegment and repeat until no more merging needed Monday, May 21, 12

Bottom-Up Algorithm Initialization Cluster1 Cluster2 Cluster3 Cluster1 Cluster2 Cluster3 Cluster1 Cluster2 Cluster3 Cluster1 Cluster2 Cluster3 (Re-)Training  Start with too many clusters (initialized randomly)  Purify clusters by comparing and merging similar clusters 26  Resegment and repeat until no more merging needed Monday, May 21, 12

Bottom-Up Algorithm Initialization Cluster1 Cluster2 Cluster3 Cluster1 Cluster2 Cluster3 Cluster1 Cluster2 Cluster3 Cluster1 Cluster2 Cluster3 (Re-)Training (Re-)Alignment  Start with too many clusters (initialized randomly)  Purify clusters by comparing and merging similar clusters 26  Resegment and repeat until no more merging needed Monday, May 21, 12

Bottom-Up Algorithm Initialization Cluster1 Cluster2 Cluster3 Cluster1 Cluster2 Cluster3 Cluster1 Cluster2 Cluster3 Cluster1 Cluster2 Cluster3 (Re-)Training (Re-)Alignment Cluster2 Cluster2 Cluster1 Cluster1 Cluster2 Cluster2 Cluster3 Cluster3  Start with too many clusters (initialized randomly)  Purify clusters by comparing and merging similar clusters 26  Resegment and repeat until no more merging needed Monday, May 21, 12

Bottom-Up Algorithm Initialization Cluster1 Cluster2 Cluster3 Cluster1 Cluster2 Cluster3 Cluster1 Cluster2 Cluster3 Cluster1 Cluster2 Cluster3 Yes (Re-)Training Merge two Clusters? (Re-)Alignment Cluster2 Cluster2 Cluster1 Cluster1 Cluster2 Cluster2 Cluster3 Cluster3  Start with too many clusters (initialized randomly)  Purify clusters by comparing and merging similar clusters 26  Resegment and repeat until no more merging needed Monday, May 21, 12

Bottom-Up Algorithm Initialization Cluster2 Cluster2 Cluster1 Cluster1 Cluster2 Cluster2 Cluster2 Cluster2 Yes (Re-)Training Merge two Clusters? (Re-)Alignment Cluster2 Cluster2 Cluster1 Cluster1 Cluster2 Cluster2 Cluster3 Cluster3  Start with too many clusters (initialized randomly)  Purify clusters by comparing and merging similar clusters 26  Resegment and repeat until no more merging needed Monday, May 21, 12

Bottom-Up Algorithm Initialization Cluster2 Cluster2 Cluster1 Cluster1 Cluster2 Cluster2 Cluster2 Cluster2 Yes (Re-)Training Merge two Clusters? (Re-)Alignment Cluster1 Cluster1 Cluster2 Cluster2 Cluster1 Cluster1 Cluster2 Cluster2  Start with too many clusters (initialized randomly)  Purify clusters by comparing and merging similar clusters 26  Resegment and repeat until no more merging needed Monday, May 21, 12

Bottom-Up Algorithm Initialization Cluster2 Cluster2 Cluster1 Cluster1 Cluster2 Cluster2 Cluster2 Cluster2 No Yes (Re-)Training Merge two Clusters? End (Re-)Alignment Cluster1 Cluster1 Cluster2 Cluster2 Cluster1 Cluster1 Cluster2 Cluster2  Start with too many clusters (initialized randomly)  Purify clusters by comparing and merging similar clusters 26  Resegment and repeat until no more merging needed Monday, May 21, 12

ICSI’s Speaker Diarization 27 Monday, May 21, 12

ICSI’s Speaker Diarization • Speaker Diarization research @ ICSI since 2001 27 Monday, May 21, 12

ICSI’s Speaker Diarization • Speaker Diarization research @ ICSI since 2001 • Various versions of Diarization Engines developed over the years 27 Monday, May 21, 12

ICSI’s Speaker Diarization • Speaker Diarization research @ ICSI since 2001 • Various versions of Diarization Engines developed over the years • Status: Research code but stable for some applications that are error tolerant 27 Monday, May 21, 12

ICSI’s Speaker Diarization Engine Variants 28 Monday, May 21, 12

ICSI’s Speaker Diarization Engine Variants  Basic (single mic, easy installation) 28 Monday, May 21, 12

ICSI’s Speaker Diarization Engine Variants  Basic (single mic, easy installation)  Fast (single mic, multiple CPU cores) 28 Monday, May 21, 12

ICSI’s Speaker Diarization Engine Variants  Basic (single mic, easy installation)  Fast (single mic, multiple CPU cores)  Super fast (single mic, multiple GPUs) 28 Monday, May 21, 12

ICSI’s Speaker Diarization Engine Variants  Basic (single mic, easy installation)  Fast (single mic, multiple CPU cores)  Super fast (single mic, multiple GPUs)  Accurate but slow (multi mic, additional preprocessing) 28 Monday, May 21, 12

ICSI’s Speaker Diarization Engine Variants  Basic (single mic, easy installation)  Fast (single mic, multiple CPU cores)  Super fast (single mic, multiple GPUs)  Accurate but slow (multi mic, additional preprocessing)  Audio/Visual (single and multi mic, for localization) 28 Monday, May 21, 12

ICSI’s Speaker Diarization Engine Variants  Basic (single mic, easy installation)  Fast (single mic, multiple CPU cores)  Super fast (single mic, multiple GPUs)  Accurate but slow (multi mic, additional preprocessing)  Audio/Visual (single and multi mic, for localization)  Online (single mic, “who is speaking now”) 28 Monday, May 21, 12

Basic Speaker Diarization: Facts 29 Monday, May 21, 12

Basic Speaker Diarization: Facts • Input: 16kHz mono audio 29 Monday, May 21, 12

Basic Speaker Diarization: Facts • Input: 16kHz mono audio • Features: MFCC19, no delta or deltadelta 29 Monday, May 21, 12

Basic Speaker Diarization: Facts • Input: 16kHz mono audio • Features: MFCC19, no delta or deltadelta • Speech/Non-Speech Detector external 29 Monday, May 21, 12

Basic Speaker Diarization: Facts • Input: 16kHz mono audio • Features: MFCC19, no delta or deltadelta • Speech/Non-Speech Detector external • Runtime: ~ realtime (1h audio needs 1h processing on a single CPU, excluding speech/non-speech) 29 Monday, May 21, 12

Multi-CPU Speaker Diarization: Facts 30 Monday, May 21, 12

Multi-CPU Speaker Diarization: Facts • Same as Basic Speaker Diarization 30 Monday, May 21, 12

Multi-CPU Speaker Diarization: Facts • Same as Basic Speaker Diarization • Runtime: Dependent on number of CPUs used. Example: 8 cores runtime = 14.3 x realtime, i.e. 14minutes of audio need 1 minute of processing. 30 Monday, May 21, 12

Multi-CPU Speaker Diarization: Facts • Same as Basic Speaker Diarization • Runtime: Dependent on number of CPUs used. Example: 8 cores runtime = 14.3 x realtime, i.e. 14minutes of audio need 1 minute of processing. • Runtime bottleneck usually: Speech/ Non-Speech Detector 30 Monday, May 21, 12

GPU Speaker Diarization: Facts 31 Monday, May 21, 12

GPU Speaker Diarization: Facts • Same as Basic Speaker Diarization 31 Monday, May 21, 12

Introduction to Speaker Diarization Dr. Gerald Friedland - PowerPoint PPT Presentation

Introduction to Speaker Diarization Dr. Gerald Friedland International Computer Science Institute Berkeley, CA friedland@icsi.berkeley.edu Monday, May 21, 12 Speaker Diarization... tries to answer the question: who spoke when?

First Investigations on Self Trained Speaker Diarization el Le Lan 1 , 2 Sylvain Meignier 2 Ga

Speech Processing 15-492/18-492 Speaker ID Who is speaking? Speaker ID, Speaker Recognition

Debate: Writing and Presentation Mr. Winand Debate Proposition America is losing its competitive

A New Adaptation Method for Speaker- -Model Model A New Adaptation Method for Speaker Creation

speaker change detection using fundamental frequency with application to multi-talker segmentation

Segmentation of Broadcast News Brecht Desplanques, Kris Demuynck & Jean-Pierre Martens ELIS

UI I Design fr from Local to Global Shao Kun Sp Speaker Na Name Netease GUX Speaker Title

to Mobile Game Sp Speaker Na Name Speaker Title & Company Zhang Yang Netease GUX Session

MWKA ONLINE TALKS 2 Speaker Speaker VIVIEN FAN WONG CHEE EN Associate Associate MWKA ONLINE

Applied Bayesian Nonparametrics 3. Infinite Hidden Markov Models Tutorial at CVPR 2012 Erik

Odyssey 2016, Bilbao, Spain Short- and Long-Term Speech Features for Hybrid HMM-i-Vector based

PIPELINE Speaker Series September 13, 2018, 8:00 am Speaker Series Agenda Welcome and

Configuration and Management of Speaker Verification Systems W3C Workshop on Speaker Biometrics

Combining Speech and Speaker Recognition - A Joint Modeling Approach Hang Su Supervised by:

SESSION TITLE Moving Toward an Automated Environment Moderator: Speakers: Moderator Speaker 1

Certification and Training Class For CIPP and AIPPR Methods Training Host Speaker Welcome Speaker

Do You See What I See? Differential Treatment of Anonymous Users Sheharbano Khattak (University

Following the Packets: A Walk Through Bros Internal Processing Pipeline Robin Sommer

RESPITE: Tandem & multistream research Dan Ellis International Computer Science Institute,

The ICSI corpus; Browsing meetings nlssd natural language and speech system design . Steve

Evaluating Network Security using Internet Measurements Oliver Gasser Tuesday 23 rd May, 2017

Previous a researcher in the usable security group at ICSI/UCB Now a lead research engineer at Two

File Systems: Allocation Issues, Naming, and Performance CS 111 Operating Systems Peter Reiher

Computer Communication Networks Introduction IECE / ICSI 416 Spring 2020 Prof. Dola Saha 1

Introduction to Speaker Diarization Dr. Gerald Friedland - PowerPoint PPT Presentation

Introduction to Speaker Diarization Dr. Gerald Friedland International Computer Science Institute Berkeley, CA friedland@icsi.berkeley.edu Monday, May 21, 12 Speaker Diarization... tries to answer the question: who spoke when?

First Investigations on Self Trained Speaker Diarization el Le Lan 1 , 2 Sylvain Meignier 2 Ga

Speech Processing 15-492/18-492 Speaker ID Who is speaking? Speaker ID, Speaker Recognition

Debate: Writing and Presentation Mr. Winand Debate Proposition America is losing its competitive

A New Adaptation Method for Speaker- -Model Model A New Adaptation Method for Speaker Creation

speaker change detection using fundamental frequency with application to multi-talker segmentation

Segmentation of Broadcast News Brecht Desplanques, Kris Demuynck &amp; Jean-Pierre Martens ELIS

UI I Design fr from Local to Global Shao Kun Sp Speaker Na Name Netease GUX Speaker Title

to Mobile Game Sp Speaker Na Name Speaker Title &amp; Company Zhang Yang Netease GUX Session

MWKA ONLINE TALKS 2 Speaker Speaker VIVIEN FAN WONG CHEE EN Associate Associate MWKA ONLINE

Applied Bayesian Nonparametrics 3. Infinite Hidden Markov Models Tutorial at CVPR 2012 Erik

Odyssey 2016, Bilbao, Spain Short- and Long-Term Speech Features for Hybrid HMM-i-Vector based

PIPELINE Speaker Series September 13, 2018, 8:00 am Speaker Series Agenda Welcome and

Configuration and Management of Speaker Verification Systems W3C Workshop on Speaker Biometrics

Combining Speech and Speaker Recognition - A Joint Modeling Approach Hang Su Supervised by:

SESSION TITLE Moving Toward an Automated Environment Moderator: Speakers: Moderator Speaker 1

Certification and Training Class For CIPP and AIPPR Methods Training Host Speaker Welcome Speaker

Do You See What I See? Differential Treatment of Anonymous Users Sheharbano Khattak (University

Following the Packets: A Walk Through Bros Internal Processing Pipeline Robin Sommer

RESPITE: Tandem &amp; multistream research Dan Ellis International Computer Science Institute,

The ICSI corpus; Browsing meetings nlssd natural language and speech system design . Steve

Evaluating Network Security using Internet Measurements Oliver Gasser Tuesday 23 rd May, 2017

Previous a researcher in the usable security group at ICSI/UCB Now a lead research engineer at Two

File Systems: Allocation Issues, Naming, and Performance CS 111 Operating Systems Peter Reiher

Computer Communication Networks Introduction IECE / ICSI 416 Spring 2020 Prof. Dola Saha 1

Segmentation of Broadcast News Brecht Desplanques, Kris Demuynck & Jean-Pierre Martens ELIS

to Mobile Game Sp Speaker Na Name Speaker Title & Company Zhang Yang Netease GUX Session

RESPITE: Tandem & multistream research Dan Ellis International Computer Science Institute,