? classification user model speech = sensor adapts its dialog - PowerPoint PPT Presentation

Speaker Classification: Supervector Approach and Detection Task Christian Müller, DFKI

Speech as a Source for Non-Intrusive UM Now it’s time to get to gate 38. Information about adaptive the user speech dialog system A speaker ? classification user model speech = sensor adapts its dialog behavior inference from (e.g. detailed map with sensors shops vs. arrows) ( not intrusive ) B provides explicit statement recommendations ( intrusive ) (e.g. a different route to the gate) Christian M ü ller

Overview Speech as a source of information for non-intrusive  user modeling Speech/signal processing Take-away messages GMM/SVM supervector Classification method   approach for acoustic for independent “bag of speech features observations” features Detection task and Valid application-   pseudo-NIST evaluation independent evaluation procedure Rank and polynomial Feature space warping   rank normalization normalization  Conclusions Christian M ü ller

Speaker Classification Systems Cognitive Load  Best Research Paper Award UM 2001 Age and Gender  Voice Award 2007  Telekom live operation 2009 S y Language Audio segment s  14 languages + dialects (telephone quality)  NIST evaluation 2007 t e Identity m  Project with BKA 2009  NIST* Evaluation 2008 Acoustic Events  Project with VW 2008  Interspeech 2008 Christian M ü ller

 How can your features be modeled assuming that they  are multi-dimensional  represent repeating observations of the same kind  can be assumed to be independent (“bag” of observations)  Proposing the GMM/SVM Supervector Approach on the example of frame-by-frame acoustic features Christian M ü ller

Hierarchical Feature Model High-level features (learned characteristics) semantics ? dialog A b b a e b B : d d e c : ideolect <s> how shall I say this <c> <s> yeah I know... phonetics /S/ /oU/ /m/ /i:/ /D/ /&/ /m/ /  / /n/ /i:/ ... prosody spectrum Low-level features (physical characterstics) Christian M ü ller

Modeling Acoustics and Prosodics semantics ? dialog A b b a e b B : d d e c : ideolect no ASR <s> how shall I say this <c> <s> yeah I know... phonetics /S/ /oU/ /m/ /i:/ /D/ /&/ /m/ /  / /n/ /i:/ ... prosody spectrum Christian M ü ller

General Classification Scheme z k e.g. channel compensation w kj -0,4 multilayer perceptron support-vector machines 0.7 -1 (not addressed in this networks Preprocessing talk) y 1 y 2 -1.5 0.5 1 Feature 1 1 w ji Extraction 1 x 2 x 1 Classification Fusion Top-Down- Knowledge Christian M ü ller

Generative Approach: Gaussian Mixture Model (GMM) training “emergency vehicle” probability density “emergency feature vehicle” extraction model frame of speech test ? avg likelihood over all frames “emergency feature for class vehicle” extraction “emergency model vehicle” Christian M ü ller

Generative Approach: Gaussian Mixture Model (GMM) test ? “emergency feature vehicle” extraction avg. log model likelihood ratio over all frames for frame of speech class “emergency vehicle” background model Christian M ü ller

A Mixture of Gaussians  Means, variances, and mixtures weights are optimized in training  Black line = mixture of 3 Gaussians Christian M ü ller

Discriminative Method: Support Vector Machine (SVM) training “em. vehic.” (1) “em. vehic.” feature model “not em. vehic.” (-1) extraction Features are transformed into higher-dimensional space where problem  is linear Discriminating hyper plane is learned using linear regression  Trade-o fg between training error and width of margin  Model is stored in form of “support vectors” (data points on the margin)  Christian M ü ller

Discriminative Method: Support Vector Machine (SVM) test ? feature score extraction (distance to hyper plane) Discriminative methods have shown to be superior to generative  methods for similar tasks Features vectors have to be of the same lengths (sensitive to variable  segment lengths) Solutions:   feature statistics calculated over the entire utterance  fixes portion of the segment  sequential kernels Christian M ü ller

GMM/SVM Supervector Approach feature extraction Gaussian means (MAP adapted)  Combines discriminative power of SVMs with length independency of GMMs  Very successful with similar tasks such as speaker recognition  GMM is trained using MAP adaptation Christian M ü ller

Evaluation Results Christian Müller, Joan-Isaac Biel, Edward Kim, and Daniel Rosario, “Speech-overlapped Acoustic Event Detection for Automotive Applications,” in Proceedings of the Interspeech 2008 , Brisbane, Australia, 2008. Christian M ü ller

 How can you evaluate your multi- class models independently from the given application?  How can you establish a appropriate evaluation procedure in order to obtain valid results?  Proposing the detection task and the “pseudo NIST” evaluation procedure on the example of acoustic event detection and speaker age recognition. Christian M ü ller

Background  With multi-class recognition problems, many test/analyzing methods are very application specific.  e.g. confusion matrices.  we want a method that allows results to be generalized across a large set of applications.  With home-grown databases, parameter tuning on the evaluation set often compromises the validity of the results/inferences.  we want a fair “one shot” evaluation. Christian M ü ller

The Detection Task system yes , 1.324326 emergency vehicle ?  Given  a speech segment (s)  and an acoustic event to be detected (target event, ET )  the task is to decide whether ET is present in s (yes or no)  the system's output shall also contain a score indicating its confidence with more positive scores indicating greater confidence. Christian M ü ller

Terminology  Segment class  e.g. segment event, segment age-class.  ground truth (not known).  Target  the hypothesized class.  Trial  a combination of segment and target. Christian M ü ller

Evaluation yes 1.32432 system no -0.3212 emergency vehicle ? no 1.8463 music ? no -2.5773 talking ? yes 0.00132 laughing ? phone ? no 2.20122 no event ?  The system performance is evaluated by presenting it with a set of trials.  Each test segment is used for multiple trials.  The absence of all of all targets is explicitly included. Christian M ü ller

Type of Errors segment “em. vehic.” system no “MISS” target “em. vehic” ? segment “em. vehic” system yes “FALSE ALARM” target “phone” ? Christian M ü ller

Decision-Error Tradeo fg misses “equal error rate” false alarms  Selecting an operating point (decision threshold) along the dotted line trades misses o fg false alarms.  Optimal operating point is application dependent.  Low false alarm rates are desirable for most applications. Christian M ü ller

Decision Cost Function C(E T , E N ) = C Miss · P Target · P Miss (E T ) + C FA · (1-P Target ) · P FA (E T ,E N ) where E T and E N are the target and non-target events, and C Miss , C FA and P Target are application model parameters. The application parameters for EER are: C Miss = C FA = 1 and P Target = 0.5  Weighted sum of misses and false alarms using variable costs and priors.  Application model parameters are selected according to the application. Christian M ü ller

Example DET-Plot miss probability false alarm probability Christian Müller, Joan-Isaac Biel, Edward Kim, and Daniel Rosario, “Speech-overlapped Acoustic Event Detection for Automotive Applications,” in Proceedings of the Interspeech 2008 , Brisbane, Australia, 2008. Christian M ü ller

? classification user model speech = sensor adapts its dialog - PowerPoint PPT Presentation

Speaker Classification: Supervector Approach and Detection Task Christian Mller, DFKI Speech as a Source for Non-Intrusive UM Now its time to get to gate 38. Information about adaptive the user speech dialog system A speaker ?

Speech Processing 15-492/18-492 Spoken Dialog Systems Advanced Concepts in Dialog Spoken Dialog

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Spoken Dialog Systems SDS

Speech Processing 11-492/18-495 Speech Processing 11-492/18-495 Spoken Dialog Systems Conversing

Advanced NLU & Dialog Models Ling575 Spoken Dialog Systems April 21, 2016 Roadmap

Dialog Models 11-716 September 18, 2003 Thomas Harris What is a (dialog) model? A model is

AI DIALOG SEARCH news services Josef Krupi ka Michal Svoboda Goals dialog system

Dialog Management EE596/LING580 -- Conversational Artificial Intelligence Hao Cheng University

SDS: ASR, NLU, & VXML Ling575 Spoken Dialog April 14, 2016 Roadmap Dialog System

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

Speech Processing 15-492/18-492 Spoken Dialog Systems - Details of Olympus modules - Dialog Task

Speech Processing 15-492/18-492 Spoken Dialog Systems Tree based dialogs VoiceXML State-based

Speech Processing 15-492/18-492 Spoken Dialog Systems Conversing with machines Spoken Dialog

Speech Processing 15-492/18-492 Spoken Dialog Systems SDS components Spoken Dialog Systems More

Wrapping Up Ling575 Spoken Dialog Systems June 5, 2013 Roadmap Overview Distinctive

Speech and Language CS 188: Artificial Intelligence Speech technologies Automatic

Pr rtt

Long-Term Formant Long-Term Formant Distribution as a forensic- phonetic feature phonetic

Hagen Telg Allison McComiskey Elisabeth Andrews Gary Hodges Don Collins Thomas Watson May 23,

End-Users Group Meeting Berlin 21th of February 2008 3D Face Prototype Integration Page 1 /

Numerical methods for inertial confinement fusion Xavier Blanc blanc@ann.jussieu.fr CEA, DAM,

Cybersecurity: Contractual guidelines and other recommendations to maximise the legal security

TI PROGRESS REPORT: ENFORCEMENT OF THE OECD CONVENTION ON COMBATING BRIBERY OF FOREIGN PUBLIC

MySQL Developments Narayan Newton Lead Sysadmin Drupal.org Performance Engineer Tag1 Consulting

? classification user model speech = sensor adapts its dialog - PowerPoint PPT Presentation

Speaker Classification: Supervector Approach and Detection Task Christian Mller, DFKI Speech as a Source for Non-Intrusive UM Now its time to get to gate 38. Information about adaptive the user speech dialog system A speaker ?

Speech Processing 15-492/18-492 Spoken Dialog Systems Advanced Concepts in Dialog Spoken Dialog

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Spoken Dialog Systems SDS

Speech Processing 11-492/18-495 Speech Processing 11-492/18-495 Spoken Dialog Systems Conversing

Advanced NLU &amp; Dialog Models Ling575 Spoken Dialog Systems April 21, 2016 Roadmap

Dialog Models 11-716 September 18, 2003 Thomas Harris What is a (dialog) model? A model is

AI DIALOG SEARCH news services Josef Krupi ka Michal Svoboda Goals dialog system

Dialog Management EE596/LING580 -- Conversational Artificial Intelligence Hao Cheng University

SDS: ASR, NLU, &amp; VXML Ling575 Spoken Dialog April 14, 2016 Roadmap Dialog System

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

Speech Processing 15-492/18-492 Spoken Dialog Systems - Details of Olympus modules - Dialog Task

Speech Processing 15-492/18-492 Spoken Dialog Systems Tree based dialogs VoiceXML State-based

Speech Processing 15-492/18-492 Spoken Dialog Systems Conversing with machines Spoken Dialog

Speech Processing 15-492/18-492 Spoken Dialog Systems SDS components Spoken Dialog Systems More

Wrapping Up Ling575 Spoken Dialog Systems June 5, 2013 Roadmap Overview Distinctive

Speech and Language CS 188: Artificial Intelligence Speech technologies Automatic

Pr rtt

Long-Term Formant Long-Term Formant Distribution as a forensic- phonetic feature phonetic

Hagen Telg Allison McComiskey Elisabeth Andrews Gary Hodges Don Collins Thomas Watson May 23,

End-Users Group Meeting Berlin 21th of February 2008 3D Face Prototype Integration Page 1 /

Numerical methods for inertial confinement fusion Xavier Blanc blanc@ann.jussieu.fr CEA, DAM,

Cybersecurity: Contractual guidelines and other recommendations to maximise the legal security

TI PROGRESS REPORT: ENFORCEMENT OF THE OECD CONVENTION ON COMBATING BRIBERY OF FOREIGN PUBLIC

MySQL Developments Narayan Newton Lead Sysadmin Drupal.org Performance Engineer Tag1 Consulting

Advanced NLU & Dialog Models Ling575 Spoken Dialog Systems April 21, 2016 Roadmap

SDS: ASR, NLU, & VXML Ling575 Spoken Dialog April 14, 2016 Roadmap Dialog System