Crosstalk Analysis Stuart N Wrigley Vincent Wan Guy J Brown Steve - PowerPoint PPT Presentation

multimodal meeting manager - m4 Crosstalk Analysis Stuart N Wrigley Vincent Wan Guy J Brown Steve Renals 29 January 2003 Speech and Hearing Research Group, University of Sheffield, UK

multimodal meeting manager - m4 Crosstalk Analysis Goals • Detection of crosstalk. • Ideally, would like to segment each channel into channel speaker, channel speaker alone, channel speaker + crosstalk, crosstalk alone. • Segmentation must be channel (i.e. speaker, meeting, environment) independent. Data • ICSI: closetalking mics for each participant (mix of lapel and head-mounted), plus 4 tabletop mics. Large amounts of data which has already been checked and transcribed • IDIAP: lapel mics, plus 12 tabletop mics and a manikin. Still in initial stages of collection and transcription. Speech and Hearing Research Group, University of Sheffield, UK

multimodal meeting manager - m4 Initial notes Despite attractiveness, channel energy may be an unreliable cue to speaker activity • ICSI data primarily headmounted headsets - microphone fixed relative to the mouth (with one or two notable exceptions). • However , M4 recordings made with lapel microphones - head and body movement will change the channel gain throughout the meeting. e.g. If speaker turns head to speak to colleague, signal energy in channel may drop significantly. Speech and Hearing Research Group, University of Sheffield, UK

multimodal meeting manager - m4 Channel Activity Classifier Our goal is to produce a system that will classify a frame of a meeting as either: • Current channel speaker alone • Current channel speaker + crosstalk • Crosstalk alone • Silence / background noise We have taken a similar approach to that of ICSI by using an ergodic HMM (EHMM). However, our classifier differs : • Four main states as apposed to ICSI’s two (speech / nonspeech). • No intermediate states pairs (used to impose time constraints on transitions). Speech and Hearing Research Group, University of Sheffield, UK

multimodal meeting manager - m4 Ergodic HMM (EHMM) • Four states, each representing a particular label. S C • Equal prior probability of first state being any one of the four. • Each state modelled as a multivariate GMM. N SC • Transitions allowed between every state pair. • No minimum residency time in each state. Speech and Hearing Research Group, University of Sheffield, UK

multimodal meeting manager - m4 Features As mentioned at the August meeting, we wished to look at many different features and determine which were the best. The list (which is still growing) is: • MFCCs (20 coeffs) • Energy • Zero crossing rate (ZCR) • Time-domain kurtosis (a measure of nongaussianity of the signal) • Frequency-domain kurtosis (a measure of nongaussianity of the spectrum) • Spectral autocorrelation (SAPVR) • Fundamentalness (a measure related to AM and FM at different frequencies) • max, min and mean crosscorrelation of all channel pairs • autocorrelation normalised max, min and mean crosscorrelation of all channel pairs Total number of features: 13 (Total number of dimensions: 32) Speech and Hearing Research Group, University of Sheffield, UK

multimodal meeting manager - m4 Spotlight on... Kurtosis (fourth central moment divided by fourth power of the standard deviation) • Kurtosis is based on the size of a distribution's tails - i.e. a measure of gaussianity. • Kurtosis is zero for a gaussian random variable; nongaussian random variables have a nonzero kurtosis. • Kurtosis of co-channel speech (crosstalk) is generally less than the kurtosis of the individual speech utterances # . # See LeBlanc and de Leon. Speech Separation by Kurtosis Maximization, IEEE ICASSP 1998 , 1029-1032. Speech and Hearing Research Group, University of Sheffield, UK

multimodal meeting manager - m4 Spotlight on... Fundamentalness (see Speech Comm. 27 (1999) page 196, eqns (13)-(19)) • A wide analysing wavelet makes the output corresponding to the fundamental component have smaller FM and AM than other outputs. • Fundamentalness is defined as having maximum value when the FM and AM modulation magnitudes are minimum - corresponding to the fundamental component. • Although this was developed to analyse single harmonic series, the concept that a single fundamental produces high fundamentalness is useful: • If more than one fundamental is present, interference of the two components will cause AM and FM modulation, thus decreasing the fundamentalness measure. Speech and Hearing Research Group, University of Sheffield, UK

multimodal meeting manager - m4 Data • For each classifier, the multivariate GMMs were trained on 1M frames (16 ms) per class, taken randomly from four ICSI meetings ( bro012, bmr006, bed010, bed008 ). • The classifier was evaluated using 1K frames (16 ms) per class, taken randomly from one ICSI meetings ( bmr001 ). Note, the crosscorrelation information is incorporated into the feature set as opposed to being a post processing stage as in the ICSI classifier. Speech and Hearing Research Group, University of Sheffield, UK

multimodal meeting manager - m4 Selection of best features • The parcel algorithm (see Scott, Niranjan and Prager) was used to assess the classification performance of the different feature combinations. • A receiver operating characteristic (ROC) curve shows classification performance. • For each feature combination, the GMMs are trained and then evaluated to create a ROC for each class. • Each point on a ROC curve represents the performance of a classifier with a different decision threshold between two classes (i.e. the class of interest vs all others). • Given a number of ROCs (one per feature combination), a maximum realisable ROC (MRROC) can be calculated by fitting a convex hull over the existing ROCs. • Therefore, each point on a MRROC represents the optimum feature combination of that class for a particular trade-off between true positives and false positives. Speech and Hearing Research Group, University of Sheffield, UK

multimodal meeting manager - m4 ROCs After initial inspection of the ‘raw’ ROCs, it was determined that only a subset of features should be investigated (thus reducing the number of combinations from 8191 to 127). For example, the performance of the MFCCs was sufficiently low that it was not considered in combination with others. e.g. MFCCs vs crosscorrelation in detecting speaker alone: Single feature: max normalised XC Single feature: mfcc 100 100 90 90 80 80 Correct detection probability (in %) Correct detection probability (in %) 70 70 60 60 50 50 40 40 30 30 20 20 speaker alone speaker alone crosstalk alone crosstalk alone 10 10 speaker+crosstalk speaker+crosstalk silence silence 0 0 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 ~64 % ~78 % False Alarm probability (in %) False Alarm probability (in %) Speech and Hearing Research Group, University of Sheffield, UK

multimodal meeting manager - m4 MRROCs We computed the MRROCs of each combination of: • energy • kurtosis • fundamentalness • max XC • mean XC • max normalised XC • mean normalised XC ... and then the MRROCs of those MRROCs !! The final MRROC tells us which feature combination to use in the final classifier. Speech and Hearing Research Group, University of Sheffield, UK

multimodal meeting manager - m4 MRROC MMROC using features: energy, kurtosis, fundamentalness, max XC, mean XC, max normalised XC, mean normalised XC, 100 90 80 70 Correct detection probability (in %) 60 50 40 30 20 speaker alone 10 crosstalk alone speaker+crosstalk silence 0 0 10 20 30 40 50 60 70 80 90 100 False Alarm probability (in %) Speaker alone: ~83 % Speech and Hearing Research Group, University of Sheffield, UK

multimodal meeting manager - m4 MRROC discarding energy MMROC using features: kurtosis, fundamentalness, max XC, mean XC, max normalised XC, mean normalised XC, 100 90 80 70 Correct detection probability (in %) 60 50 40 30 20 speaker alone 10 crosstalk alone speaker+crosstalk silence 0 0 10 20 30 40 50 60 70 80 90 100 False Alarm probability (in %) Speaker alone: ~81 % Speech and Hearing Research Group, University of Sheffield, UK

multimodal meeting manager - m4 MRROC discarding energy and crosscorrelation (note ~10 % performance drop when not using crosscorrelation) MMROC using features: kurtosis, fundamentalness, 100 90 80 70 Correct detection probability (in %) 60 50 40 30 20 speaker alone 10 crosstalk alone speaker+crosstalk silence 0 0 10 20 30 40 50 60 70 80 90 100 False Alarm probability (in %) Speaker alone: ~71 % Speech and Hearing Research Group, University of Sheffield, UK

Crosstalk Analysis Stuart N Wrigley Vincent Wan Guy J Brown Steve - PowerPoint PPT Presentation

multimodal meeting manager - m4 Crosstalk Analysis Stuart N Wrigley Vincent Wan Guy J Brown Steve Renals 29 January 2003 Speech and Hearing Research Group, University of Sheffield, UK multimodal meeting manager - m4 Crosstalk Analysis

Crosstalk - How can we avoid it - Herv Grabas Mutual Inductance and Capacitance Crosstalk

Noise / IR Drop Crosstalk Delay Impacts Timing Timing Failures Crosstalk affects

UMBC A B M A L T F O U M B C I M Y O R T 1 (4/4/06) I E S R C E O V U

Twisted Pair Transmission from the Electron View Near End Crosstalk Delay Skew Attenuation

Crosstalk-Aware Transmitter Pulse-Shaping for Parallel Chip-to-Chip Links Mike Bichan, Anthony

Mitigating Wordline Crosstalk using Adaptive Trees of Counters Mohammad Seyedzadeh , Alex Jones,

Title: Using Chinese Traditional Crosstalk in Academic Development: the Case of Shanghai Jiao

FE65-P2 Tuning & More Instrumentation Meeting - 20th May 2016 Timon Heim - LBNL UNIVERSITY

For non commercial use ONLY at the Maternal Sylvie Hauguel-de Mouzon, PhD Case Western Reserve

Speaker 1: Good afternoon. Can everybody hear me? [crosstalk 00:00:12] Hello everyone. We're going

Enabling technology for suppressing nonlinear interchannel crosstalk in DWDM transoceanic systems

Bone-muscle crosstalk: cancer- and chemotherapy- induced muscle weakness David Waning, PhD

Industry-University Crosstalk Andrew George Brigham Young University Andy_George@byu.edu /

Algebra and tensors give interpretable groups for crosstalk mechanisms in breast cancer Mariano

Industry-University Crosstalk Mike Czabaj University of Utah m.czabaj@utah.edu 1 Nov 2107 My

Review of crosstalk between beam- beam interaction and lattice nonlinearity in e+e- colliders

The what, why and how of data management planning (part 1) 1 Except when otherwise noted, this

Liege University: Francqui Chair 2011-2012 Lecture 3: Huge-scale optimization problems Yurii

ccgarch: An R package for modelling multivariate GARCH models with conditional correlations

HelenOS in the Year HelenOS in the Year of the Fire Monkey of the Fire Monkey

Computer Graphics - Ray-Tracing II - Hendrik Lensch Computer Graphics WS07/08 Ray Tracing II

IST-2001 - Project 33522 IST-2001-33522 Animation and formal verification of a component-based

GENTILES ISRAEL n n t t t o n t o n c Patriarchs c m m n Patriarchs n

WEBPACK + WEBASSEMBLY W E B A S S E M B L Y W E B P A C K A N D T H E C H A L L E N G E O F

Crosstalk Analysis Stuart N Wrigley Vincent Wan Guy J Brown Steve - PowerPoint PPT Presentation

multimodal meeting manager - m4 Crosstalk Analysis Stuart N Wrigley Vincent Wan Guy J Brown Steve Renals 29 January 2003 Speech and Hearing Research Group, University of Sheffield, UK multimodal meeting manager - m4 Crosstalk Analysis

Crosstalk - How can we avoid it - Herv Grabas Mutual Inductance and Capacitance Crosstalk

Noise / IR Drop Crosstalk Delay Impacts Timing Timing Failures Crosstalk affects

UMBC A B M A L T F O U M B C I M Y O R T 1 (4/4/06) I E S R C E O V U

Twisted Pair Transmission from the Electron View Near End Crosstalk Delay Skew Attenuation

Crosstalk-Aware Transmitter Pulse-Shaping for Parallel Chip-to-Chip Links Mike Bichan, Anthony

Mitigating Wordline Crosstalk using Adaptive Trees of Counters Mohammad Seyedzadeh , Alex Jones,

Title: Using Chinese Traditional Crosstalk in Academic Development: the Case of Shanghai Jiao

FE65-P2 Tuning &amp; More Instrumentation Meeting - 20th May 2016 Timon Heim - LBNL UNIVERSITY

For non commercial use ONLY at the Maternal Sylvie Hauguel-de Mouzon, PhD Case Western Reserve

Speaker 1: Good afternoon. Can everybody hear me? [crosstalk 00:00:12] Hello everyone. We're going

Enabling technology for suppressing nonlinear interchannel crosstalk in DWDM transoceanic systems

Bone-muscle crosstalk: cancer- and chemotherapy- induced muscle weakness David Waning, PhD

Industry-University Crosstalk Andrew George Brigham Young University Andy_George@byu.edu /

Algebra and tensors give interpretable groups for crosstalk mechanisms in breast cancer Mariano

Industry-University Crosstalk Mike Czabaj University of Utah m.czabaj@utah.edu 1 Nov 2107 My

Review of crosstalk between beam- beam interaction and lattice nonlinearity in e+e- colliders

The what, why and how of data management planning (part 1) 1 Except when otherwise noted, this

Liege University: Francqui Chair 2011-2012 Lecture 3: Huge-scale optimization problems Yurii

ccgarch: An R package for modelling multivariate GARCH models with conditional correlations

HelenOS in the Year HelenOS in the Year of the Fire Monkey of the Fire Monkey

Computer Graphics - Ray-Tracing II - Hendrik Lensch Computer Graphics WS07/08 Ray Tracing II

IST-2001 - Project 33522 IST-2001-33522 Animation and formal verification of a component-based

GENTILES ISRAEL n n t t t o n t o n c Patriarchs c m m n Patriarchs n

WEBPACK + WEBASSEMBLY W E B A S S E M B L Y W E B P A C K A N D T H E C H A L L E N G E O F

FE65-P2 Tuning & More Instrumentation Meeting - 20th May 2016 Timon Heim - LBNL UNIVERSITY