COMPUTATIONAL PARALINGUISTICS AND WHAT WE MIGHT GET FROM PHONETICS - - PowerPoint PPT Presentation

computational
SMART_READER_LITE
LIVE PREVIEW

COMPUTATIONAL PARALINGUISTICS AND WHAT WE MIGHT GET FROM PHONETICS - - PowerPoint PPT Presentation

COMPUTATIONAL PARALINGUISTICS AND WHAT WE MIGHT GET FROM PHONETICS / SPEECH SCIENCE Anton Batliner May 7th, 2015 NSF, Arlington The Topic Paralinguistics: not what but how the person(s) behind The Interspeech Computational Paralinguistic


slide-1
SLIDE 1

COMPUTATIONAL PARALINGUISTICS

AND WHAT WE MIGHT GET FROM PHONETICS / SPEECH SCIENCE

Anton Batliner May 7th, 2015 NSF, Arlington

slide-2
SLIDE 2

The Interspeech Computational Paralinguistic Challenges

  • 2009: emotion (childrens' speech)
  • 2010: age & gender, affect (level of interest)
  • 2011: intoxication (+/- alcoholised), sleepiness
  • 2012: personality (big 5), likability, pathology
  • 2013: social signals, conflict, emotion, autism
  • 2014: physical load, cognitive load
  • 2015: degree of nativeness, Parkinson's condition, eating condition

Anton Batliner 2

The Book

Computational Paralinguistics: Emotion, Affect and Personality in Speech and Language Processing

Björn Schuller & Anton Batliner, 344 pages, 2014, Wiley.

Paralinguistics: not what but how  the person(s) behind

The Topic

slide-3
SLIDE 3

Cultural Clashes data

Anton Batliner 3

pre-processing driving force features processing statistics

brute force: we don't know what's happening but we know how good we can be (roughly). phonetics/knowledge-based interpretation: we don't really know what's happening because: only what we are looking for is what we get.

both: what can we model, convey, teach?

Phonetics (Speech Science) Speech Processing

small, laboratory, controlled large, real-life manual (labels, segmentation) automatic few many, brute forcing, MFCC (low resolution, high generalisation) (high resolution) basic , (M)anova, mixed models ML / Pattern Recognition inferential, (fusion of) classifiers / regression  significance  effect size description, explanation, performance, models applications

slide-4
SLIDE 4
  • ML procedures, multi-modality, acoustic normalisation
  • cross-corpus /language/culture databases
  • speaker normalisation/adaptation
  • confusions: hits

vs.  severe wrong assignments

  • 'most important' features (from phonetics)
  • hybrid approach: same constellation, a few features based on tradition /

phonetic evidence vs. brute force feature sets with/without feature reduction/selection

  • interests: performance, interpretation, usability in applications
  • loudness in Parkinson's Condition – primary feature, to teach
  • speech tempo in non-nativeness – secondary feature, not to teach
  • speaker overlap in conflict – primary but: different cultures! – to teach
  • variability in depression or autism – cover feature, maybe to teach

Anton Batliner 4

What to do: CP Challenges  challenges

slide-5
SLIDE 5

brute force

huge feature vector processing x

performance

interpretation

phonetics

hand-picked, few features processing y

interpretation

performance

? ?

Features: Hybrid approach

hybrid

huge feature vector processing x processing y=x

interpretation performance

phonetic knowledge

usability in applications

slide-6
SLIDE 6

Anton Batliner 6

A Bandanna Approach

Thank you for your attention