Speaker State Elizabeth Shriberg Speech Technology and Research Lab - - PowerPoint PPT Presentation

speaker state
SMART_READER_LITE
LIVE PREVIEW

Speaker State Elizabeth Shriberg Speech Technology and Research Lab - - PowerPoint PPT Presentation

Speaker State Elizabeth Shriberg Speech Technology and Research Lab SRI International, Menlo Park, CA May 7-8, 2015 NSF Speech Science Workshop Overview Umbrella term covering variations within an individual Emotional Cognitive


slide-1
SLIDE 1

Speaker State

Elizabeth Shriberg Speech Technology and Research Lab SRI International, Menlo Park, CA

NSF Speech Science Workshop May 7-8, 2015

slide-2
SLIDE 2

Overview

  • Umbrella term covering variations within an individual

– Emotional – Cognitive (uncertainty, engagement) – Health (stress, fatigue, Parkinson’s…) – Mental health (depression, PTSD, MCI, mTBI) – Social, pragmatic (engagement, entrainment)

  • Synergy with some of the other talks here: Anton, Tom, Julia, Florian….
  • Standard approach

– Annotate data  “gold standard” – Extract features from speech (words, acoustic, prosodic, discourse) – Machine learning to predict annotations – Range of metrics for evaluation

  • Funding: some govt, some commercial; but limited
  • Interest from industry – e.g. call centers, but largely ASR based and data is
  • ften proprietary.

NSF Speech Science Workshop May 7-8, 2015

slide-3
SLIDE 3

Impact for Speech Technology

1. Detection of state from speech – For adaptation / action of system / filtering – For monitoring / filtering – Massively applicable, including for passive speech, especially with increases in mobile phone use and apps – Growing interest in industry in emotion, but speech content analysis is generally behind that of text and video 2. Improvement of speech recognition (via modeling of context for better train/test data matching)

NSF Speech Science Workshop May 7-8, 2015

slide-4
SLIDE 4

Challenges

  • Major effects of speaker, context, semantics but almost no

understanding of effects

  • Hundreds of papers/year, but we start over with each data set
  • Small data sets
  • Annotation issues – validity, reliability, unit of analysis
  • Common evaluations—have been great service to community

but focus has been on large feature sets + deep learning  we’re adding layers, not understanding

  • Feature sets biased toward those available from ASR
  • Metrics and evaluation
  • Sensitive data sets can’t be shared

NSF Speech Science Workshop May 7-8, 2015

slide-5
SLIDE 5

Future Directions

  • Core pursuits

– Understand how to decouple effects of speaker from state, and context – Go smaller, not bigger. What’s the minimum feature set and what can we learn from it? – Value generalization across data sets –learn what features and approaches transfer to new data – Explore robustness in real world data – studies often assume better audio than we can get in real applications – Understand role of lexical, visual, physiological information – increasingly available and need to understand where speech offers added value

  • Needs

– Invest in longitudinal data with real-world spontaneous speech – Add spontaneous collection to studies in medical community – Community focus on annotations and meaningful metrics – working group support if no government evaluations – User studies that involve real-world end applications

NSF Speech Science Workshop May 7-8, 2015