Speech and Audio Technology for Enhanced Understanding of Cognitive - PowerPoint PPT Presentation

Speech and Audio Technology for Enhanced Understanding of Cognitive Radio Users and Environments Scott M. Lewandowski, Joseph P. Campbell, William M. Campbell, Clifford J. Weinstein {scl, jpc, wcampbell, cjw}@ll.mit.edu MIT Lincoln Laboratory Lexington, MA Software Defined Radio Forum Technical Conference Phoenix, AZ 15-18 November 2004 This work was sponsored by the Defense Advanced Research Projects Agency under Air Force contract F19628-00-C-0002. Opinions, interpretations, conclusions, and recommendations are those of the authors and are not necessarily endorsed by the US Government .

Outline • Introduction & Motivation: Cognitive Radio • Speech Technologies: – Speaker Recognition – Language Identification – Text-to-Speech – Speech-to-Text – Machine Translation – Background Noise Suppression – Adaptive Speech Coding – Speaker Characterization – Noise Characterization • Conclusions MIT Lincoln Laboratory #2

Cognitive Radio and the Mobile Land Warrior Sense & understand the situation Sense & understand the user’s state and needs • Friends, resources • Personalization, adaptation, authentication (PAA) • Foes, threats • Health state, stress Provide robust radio comm. Provide plan & decision assistance • Team plan including rendezvous • Continuous planning of actions/alternatives PlanA ThreatX PlanB “If you know the enemy and Features & benefits know yourself, you need not • Automated learning & reasoning fear the result of a hundred about user & environment battles.” Sun Tzu • User focus on mission • Enhanced mission effectiveness MIT Lincoln Laboratory #3

Today and Tomorrow: Example Scenarios Without Cognitive Radio With Cognitive Radio User Aware: Speech technologies provide state, identity, and interface to the user. RF Aware: Links are established automatically by reasoning. The radio is aware of other networks and User manually adjusts radios. Environment Aware: Situationally aware radio assists the user and understands rendezvous, location, and enemy & friendly forces. MIT Lincoln Laboratory #4

Cognitive Radio Technologies Machine Learning: SDR Technologies: Machine Learning: SDR Technologies: Machine Learning: SDR Technologies: • Pattern classification • Dynamically • Pattern classification Pattern classification • Dynamically Dynamically • • • Rule learning software • Rule learning Rule learning software • software • Bayesian nets constructible • Bayesian nets Bayesian nets constructible • constructible • Safe learning • Self - aware • Safe learning Safe learning • Self Self- -aware aware • • • Game theory • Standards • Game theory Game theory • Standards Standards • • Intelligent Agents: Intelligent Agents: Intelligent Agents: Human Computer Human Computer • Distributed AI • Distributed AI Distributed AI • Interaction: Interaction: • OWL/DAML • OWL/DAML OWL/DAML • • Speech technologies Speech technologies • • Reasoning • Reasoning Reasoning • • Biometrics Biometrics • • (Real - time) Planning • (Real (Real- -time) Planning time) Planning • • User modeling User modeling • • Visual processing • Visual processing MIT Lincoln Laboratory #5

Speaker Recognition Phases of a Speaker Verification System Two distinct phases to any speaker verification system Enrollment Enrollment Phase Phase Enrollment speech for Model for each each speaker speaker Bob Feature Model Model Bob Feature Model Model extraction training training extraction training training Sally Sally Verification Verification Phase Phase Feature Verification Verification Feature Verification Verification Accepted! extraction decision decision extraction decision decision Claimed identity: Sally MIT Lincoln Laboratory #6

Speaker Recognition and Cognitive Radio Cognitive Radio applications: • Personalization (e.g., recalling user preferences or accomodating a user’s unique workflow) • Adaptation (e.g., simplifying the user interface based on the current task, or modifying radio parameters according to environmental factors) • Authentication (e.g., detecting captured/stolen/lost devices, or providing “hands-free” biometric authentication) References : • Campbell, J. P., Campbell, W. M., Jones, D. A., Lewandowski, S. M., Reynolds, D. A., and Weinstein, C. J., “Biometrically Enhanced Software-Defined Radios,” in Proc. Software Defined Radio Technical Conference in Orlando, Florida, SDR Forum, 17-19 November 2003. • D.A. Reynolds, T.F. Quatieri, R.B. Dunn. “Speaker Verification using Adapted Gaussian Mixture Models,” Digital Signal Processing, 10(1--3), January/April/July 2000. • Campbell, W. M., Campbell, J. P., Reynolds, D. A., Jones, D. A., and Leek, T. R., “High-Level Speaker Verification with Support Vector Machines,” in Proc. International Conference on Acoustics, Speech, and Signal Processing in Montréal, Québec, Canada, IEEE, pp. I: 73-76, 17-21 May 2004. MIT Lincoln Laboratory #7

Continuous Authentication Do Are via Behavior & Voice Recognition Trusted State Required for sensitive operations Provisional Trust Continue interaction, gather behavioral & voice samples trust time Untrusted State Interrupt interaction MIT Lincoln Laboratory T. J. Hazen, D. Jones, A. Park, L. Kukolich, D. Reynolds, “Integration of #8 Speaker Recognition into Conversational Spoken Dialogue Systems,” Eurospeech, 2003.

Speaker Recognition Core Technologies • Basic decision statistic in core detectors is the likelihood-ratio Target model Target model + Feature LR score Feature LR score Λ Σ Extraction normalization Extraction normalization − Background Background model model Words Words Λ − µ ( u ) = tgt coh T ( u ) tgt σ Phones Phones GMM coh GMM T- T -norm norm SVM SVM Prosody Prosody V V IY G N- N -gram LM gram LM IY G Spectral Spectral EY H- H -norm norm EY P ( w w − | ) Eng i i 1 MIT Lincoln Laboratory #9

Speaker Recognition Performance NIST 2004 Speaker Recognition Evaluation • Miss and false alarm rates for a large corpora • 8 conversation enrollment • 1 conversation test • Results show the use of high-level features, different classifier types, and fusion MIT Lincoln Laboratory #10

Language Recognition Applications: Front-end Routing for Human Operators English-Speaking Operator German-Speaking Caller Language Message Recognition Router German-Speaking Operator • Language recognition system routes call to operator fluent in the speaker’s language Spanish-Speaking Operator MIT Lincoln Laboratory #11

Language Recognition Applications: Front-end for Automatic Speech Recognition Model Library Speech It’s German Language Recognition Language Hypothesis German-Speaking Caller Language-dependent Acoustic & Language Models … gut. Wie geht’s ... Speech Recognition Word Transcription • Language recognition system selects models to be loaded into speech recognition system MIT Lincoln Laboratory #12

Language Recognition Evaluation Metric Detection Error Tradeoff • Detection Error For all language hypotheses Tradeoff (DET) – Sort scores – Label scores based on truth PROBABILITY OF FALSE REJECT (%) – Compute false accept and Equal Error false reject error rates at Rate every score threshold Score Truth 0.252 Target Better 0.208 Target performance 0.203 Non-target … … 95% Confidence Limits at EER -0.221 Target -0.226 Non-target PROBABILITY OF FALSE ACCEPT (%) MIT Lincoln Laboratory #13

NIST 2003 LRE Results • NIST 2003 Language Recognition Evaluation (LRE) • Six sites submitted results to NIST 2003 LRE • Testing duration: 30s • Languages: – Arabic, English, Farsi, French, Japanese, Korean, Mandarin, Spanish, Tamil, and Vietnamese 95% Confidence Limits at EER MIT Lincoln Laboratory Singer, E., Torres-Carrasquillo, P.A., Gleason, T.P., Campbell, W.M. and Reynolds, D.A., “Acoustic, Phonetic, and Discriminative #14 Approaches to Automatic Language Recognition,” in Proc. Eurospeech, pp. 1345-1348, 1-4 September 2003.

Text-to-Speech (TTS) Cognitive Radio Enable eyes-free use of systems Effectively use modalities according to the environment TTS Choose speaking style and voice according to the situation Integration with speech-to-text (STT) and machine translation (MT) Elan_SaysoUS1.wav ATT_NaturalVoices.wav MIT Lincoln Laboratory #15

Speech-to-Text (STT) Architecture Model Model Training Training SALAM 0.4 SALAM 0.6 Transcribed KITAB 0.5 Acoustic Speech Data Acoustic … Model Model Training Training Peace_is 0.2 Language Language Hello_Tom 0.1 Model The_book 0.3 Model … Training Training Translation Translation Process Process Feature Words Out Feature Decode Decode Extraction Extraction Speech In MIT Lincoln Laboratory #16

Speech and Audio Technology for Enhanced Understanding of Cognitive - PowerPoint PPT Presentation

Speech and Audio Technology for Enhanced Understanding of Cognitive Radio Users and Environments Scott M. Lewandowski, Joseph P. Campbell, William M. Campbell, Clifford J. Weinstein {scl, jpc, wcampbell, cjw}@ll.mit.edu MIT Lincoln Laboratory

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Audio Device Client Better and Faster Audio I/O on Web Hongchan Choi Google Chrome Web Audio

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

An Enhanced Global Router An Enhanced Global Router An Enhanced Global Router An Enhanced Global

Audio and Speech August 13, 2001 Audio 2 Digital sound anti-aliasing amplifier codec filter

Project Overview Speech Speech Generation Generation Common Semantic Frame Speech Speech

Speech & Audio Coding TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jrgen

Cirrus Audio Solutions Cirrus Audio Solutions Home Audio Portable Audio Personal CD Player

Toward Toward Univeral Network-based Univeral Network-based Speech Translation Speech

Game Audio Coding vs. Aesthetics Leonard Paul of Lotus Audio Vancouver, Canada Game Audio :

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Acoustic

EECS E6870 converting speech to text Speech Recognition automatic speech recognition

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 25: Speech

Welcome to Columbia High Schools Parent Orientation Night Welcome Wayne Grignon Goff Middle

2017-2018 Budg e t Pre se nta tio n June 5, 2017 1 Ope ning Re ma rks L o re tta No tte n

Our Mission Coalition for Education Equity champions a quality, equitable and adequate public

WELCOME Harriet THOENY Raymond OYEN Lorenzo DERCHI Dear Colleagues, Olivier HELENON We have

The Participation and Environment Measure for Children and Youth (PEM-CY) Gary Bedell , Ph.D.,

The Validity of Standardized Tests for Evaluating Curricular Interventions in Mathematics and

Steven Paine Dr. Steven Paine is President of the Partnership for 21st Century Skills. A

EVIDENCE-BASED CHAPLAINCY CARE : Transforming Our Practice George Fitchett, DMin, PhD, BCC

Speech and Audio Technology for Enhanced Understanding of Cognitive - PowerPoint PPT Presentation

Speech and Audio Technology for Enhanced Understanding of Cognitive Radio Users and Environments Scott M. Lewandowski, Joseph P. Campbell, William M. Campbell, Clifford J. Weinstein {scl, jpc, wcampbell, cjw}@ll.mit.edu MIT Lincoln Laboratory

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Audio Device Client Better and Faster Audio I/O on Web Hongchan Choi Google Chrome Web Audio

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

An Enhanced Global Router An Enhanced Global Router An Enhanced Global Router An Enhanced Global

Audio and Speech August 13, 2001 Audio 2 Digital sound anti-aliasing amplifier codec filter

Project Overview Speech Speech Generation Generation Common Semantic Frame Speech Speech

Speech &amp; Audio Coding TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jrgen

Cirrus Audio Solutions Cirrus Audio Solutions Home Audio Portable Audio Personal CD Player

Toward Toward Univeral Network-based Univeral Network-based Speech Translation Speech

Game Audio Coding vs. Aesthetics Leonard Paul of Lotus Audio Vancouver, Canada Game Audio :

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Acoustic

EECS E6870 converting speech to text Speech Recognition automatic speech recognition

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 25: Speech

Welcome to Columbia High Schools Parent Orientation Night Welcome Wayne Grignon Goff Middle

2017-2018 Budg e t Pre se nta tio n June 5, 2017 1 Ope ning Re ma rks L o re tta No tte n

Our Mission Coalition for Education Equity champions a quality, equitable and adequate public

WELCOME Harriet THOENY Raymond OYEN Lorenzo DERCHI Dear Colleagues, Olivier HELENON We have

The Participation and Environment Measure for Children and Youth (PEM-CY) Gary Bedell , Ph.D.,

The Validity of Standardized Tests for Evaluating Curricular Interventions in Mathematics and

Steven Paine Dr. Steven Paine is President of the Partnership for 21st Century Skills. A

EVIDENCE-BASED CHAPLAINCY CARE : Transforming Our Practice George Fitchett, DMin, PhD, BCC

Speech & Audio Coding TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jrgen