Dialogsysteme mit Gefhlen: Wie, warum und wohin? Dr. Felix - - PowerPoint PPT Presentation

dialogsysteme mit gef hlen wie warum und wohin
SMART_READER_LITE
LIVE PREVIEW

Dialogsysteme mit Gefhlen: Wie, warum und wohin? Dr. Felix - - PowerPoint PPT Presentation

Dialogsysteme mit Gefhlen: Wie, warum und wohin? Dr. Felix Burkhardt Research director, audEERING GmbH Outlook General motivation for emotional HMI How to model emotions related states Recognition Simulation


slide-1
SLIDE 1

Dialogsysteme mit Gefühlen: Wie, warum und wohin?

  • Dr. Felix Burkhardt

Research director, audEERING GmbH

slide-2
SLIDE 2

3.3.2020, ITG Workshop Magdeburg Dialogsysteme mit Gefühlen: Wie, Warum und Wohin?

Outlook

  • General motivation for emotional HMI
  • How to model emotions related states
  • Recognition
  • Simulation
  • Applications
  • Ethical considerations
  • Market
slide-3
SLIDE 3

3.3.2020, ITG Workshop Magdeburg Dialogsysteme mit Gefühlen: Wie, Warum und Wohin?

Trends

  • Ubiquitous computing accessible via

a) Smart mobile devices: phones, glasses, watches, t-shirts, implants, etc. b) Home automation: central intelligence controlling media, communication, environment c) Aging society gets supported by technological interfaces d) Big data, faster hardware and new algorithms: DNNs

  • Uses natural interface: voice, gestures, wearables, …
  • Gets much nearer to user, unobtrusive
  • Will be emotional because it‘s easier: emotion expression is a channel of

communication, e.g. urgency, irony, ...

slide-4
SLIDE 4

3.3.2020, ITG Workshop Magdeburg Dialogsysteme mit Gefühlen: Wie, Warum und Wohin?

Emotions and intelligence

  • Antonio Damasio demonstrated that emotions

are central to the life-regulating processes of almost all living creatures.

  • E.g. brain injuries specific to emotional

processing robbed people of their capacity to make decisions

  • Emotions help to react fast, be social, be

motivated etc.

  • In opposition to Descartes, body and mind are

not separated „The question is not whether intelligent machines can have any emotions, but whether machines can be intelligent without any emotions,“ - Marvin Minsky

slide-5
SLIDE 5

3.3.2020, ITG Workshop Magdeburg Dialogsysteme mit Gefühlen: Wie, Warum und Wohin?

How to model emotions: categories

  • …everyone except a psychologist knows

what an emotion is (Young 1973)

  • Charles Darwin: The Expression of the

Emotions in Man and Animals

  • The big four:
  • Anger
  • Sadness
  • Joy
  • Fear
  • Needed to survive and „culturally universal“
  • Many more catgorical models exist, e.g.

Ekman‘s six or Plutchik‘s emotion weel

Emotions as characters in Pixar‘s „Inside Out“ (anger, fear, joy, disgust, sadness)

slide-6
SLIDE 6

3.3.2020, ITG Workshop Magdeburg Dialogsysteme mit Gefühlen: Wie, Warum und Wohin?

Dimensional models

  • Dimensions consider an emotion as a

point in an n-dimensional emotion space.

  • One of the most well-known spaces is

the PAD-space:

  • Pleasure (valence)
  • Arousal (activation)
  • Dominance
  • Specific dimensions are better

recognized by different modalities, e.g. activation in the speech but valence in the mimics

slide-7
SLIDE 7

3.3.2020, ITG Workshop Magdeburg Dialogsysteme mit Gefühlen: Wie, Warum und Wohin?

Panksepp‘s seven primal Emotions

  • Jaak Panksepp was a neuro-scientist who suggested

seven emotion categories in men and animals that can be localized in the brain.

  • Search (anticipation, desire)
  • Rage ((frustration, body surface irritation, restraint,

indignation)

  • Fear (pain, threat, foreboding)
  • Panic/Loss ((separation distress, social loss, grief,

loneliness)

  • Play ((rough-and tumble carefree play, joy)
  • Lust (copulation, mating)
  • Care ((maternal nurturance)
slide-8
SLIDE 8

3.3.2020, ITG Workshop Magdeburg Dialogsysteme mit Gefühlen: Wie, Warum und Wohin?

Appraisal theory

  • Appraisal theory means that emotions

are extracted from our evaluations (appraisals or estimates) of events that cause specific reactions in different people.

  • E.g. Scherer's multi-level sequential

check model

  • Three levels of processing are: innate

(sensory-motor), learned (schema- based), and deliberate (conceptual)

Source: https://en.wikipedia.org/wiki/Appraisal_theory

slide-9
SLIDE 9

3.3.2020, ITG Workshop Magdeburg Dialogsysteme mit Gefühlen: Wie, Warum und Wohin?

How are emotions expressed: modalities

  • User introspection: e.g. Emoticon, press button etc
  • Text: sentiment analysis
  • Audio: speech, extralinguistics
  • Video: facial expression, gestures, posture
  • Physiology: respiration rate, blood pressure, skin conductivity,

neuronal activity, speech (held vowels)

  • Behaviour, e.g. switched room often, typing speed
  • Context: localization, weather, time of day, other people‘s

moods etc.

slide-10
SLIDE 10

3.3.2020, ITG Workshop Magdeburg Dialogsysteme mit Gefühlen: Wie, Warum und Wohin?

Training and evaluation data

  • Ideally from the application
  • From an application similar to the

target

  • From Wizard of Oz scenario
  • From field recordings (e.g. VAM)
  • From induced emotions („Lost

luggage“, „Aibo“)

  • From actors

Felix Burkhardt, Astrid Paeschke, Miriam Rolfes, Walther F . Sendlmeier and Benjamin Weiss: A Database of German Emotional Speech, Proc. Interspeech 2005

slide-11
SLIDE 11

3.3.2020, ITG Workshop Magdeburg Dialogsysteme mit Gefühlen: Wie, Warum und Wohin?

Ground truth and gold standards

  • Five human labelers annotated the emotional

content of textual data using four categories.

  • A machine algorithm did the same

classification.

  • “majority” means the majority voting of the

human labelers.

  • The chart shows the Cohen’s kappa values

for the so-called “inter rater agreement”, i.e. how much each rater agrees with all other raters.

  • EWE (evaluator weighted estimator) is a

possibility to weight labelers according to their inter rater agreement

labeler A labeler B labeler C labeler D labeler E majority machine labeler A 1,00 0,20 0,19 0,10 0,24 0,27 0,15 labeler B 1,00 0,79 0,46 0,15 0,81 0,15 labeler C 1,00 0,47 0,19 0,83 0,14 labeler D 1,00 0,09 0,52 0,07 labeler E 1,00 0,29 0,10 majority 1,00 0,17 machine 1,00

slide-12
SLIDE 12

3.3.2020, ITG Workshop Magdeburg Dialogsysteme mit Gefühlen: Wie, Warum und Wohin?

Recognition by statistical classification

  • Basic approach:
  • extract features,
  • select best ones,
  • classify features,
  • fuse classifier outputs (unimodal/multimodal)
  • Classifiers: Gaussian Mixture Models: model training

data as Gaussian densities, Artificial Neural Networks (ANN), e.g. Multi Layer Perceptron, Support Vector Machines (SVM): use „kernel functions“ to separate non-linear decision boundaries, Classification and Regression Trees (CART), Hidden Markov Models (HMMs) used to model temporal structure

Early Fusion (Late)

slide-13
SLIDE 13

3.3.2020, ITG Workshop Magdeburg Dialogsysteme mit Gefühlen: Wie, Warum und Wohin?

Recognition by Deep Neural Networks

 Tremendous success during the last

decade

 Three reasons: more data , faster

hardware, new algorithms

 Can work end-to-end, no feature

engineering

 Can be used for analysis and synthesis  Can learn from unlabeled data  BUT:  needs lots of data (does it?)  Is lazy → Explainabilty

a) b)

slide-14
SLIDE 14

3.3.2020, ITG Workshop Magdeburg Dialogsysteme mit Gefühlen: Wie, Warum und Wohin?

Deep Learning II

  • Data Augmentation
  • Transfer Learning
  • Synthetic Training

 GANs  Autoencoder

  • Unsupervised

learning

  • Reinforcement

learning

source: medium.com Images: https://towardsdatascience.com

slide-15
SLIDE 15

Speech Synthesis

  • With deep learning using

 Embeddings  Style tokens

Source: https://www.researchgate.net/publication/335601425_Comic- Guided_Speech_Synthesis Source: https://ai.googleblog.com/2017/12/tacotron-2-generating-human-like- speech.html Source: https://towardsdatascience.com/neural-network-embeddings-explained- 4d028e6f0526

slide-16
SLIDE 16

Emofilt

  • Emofilt is a Java tool to transform the

prosody of a given utterance in order to simulate emotional expression

  • It is based on Mbrola for speech

generation and an arbitrary phonemization generator like MARY or Txt2Pho

  • Mbrola is a diphone synthesizer from the

University of Mons with databases for 34 languages

slide-17
SLIDE 17

3.3.2020, ITG Workshop Magdeburg Dialogsysteme mit Gefühlen: Wie, Warum und Wohin?

The storyteller

  • Used emofilt to “emotionalize” a fairytale
  • Asked 12 schoolkids for both versions

How they like the story

How they like the speaker

How many facts they remember

  • F. Burkhardt: “An Affective Spoken Story

Teller”, Interspeech, 2011

slide-18
SLIDE 18

3.3.2020, ITG Workshop Magdeburg Dialogsysteme mit Gefühlen: Wie, Warum und Wohin?

W3C recommendation for emotion annotation

  • As the Web is becoming ubiquitous, interactive, and

multimodal, technology needs to deal increasingly with human factors, including emotions.

  • The specification of Emotion Markup Language 1.0

aims to strike a balance between practical applicability and scientific well-foundedness.

  • The language is conceived as a "plug-in" language

suitable for use in three different areas:

  • manual annotation of data
  • automatic recognition of emotion-related states from

user behavior

  • generation of emotion-related system behavior

https://www.w3.org/TR/emotionml/

slide-19
SLIDE 19

3.3.2020, ITG Workshop Magdeburg Dialogsysteme mit Gefühlen: Wie, Warum und Wohin?

Five types of applications

a) Mediated emotion b) Affect recognition c) Affect simulation d) Modeling emotional intelligence e) Modeling human emotional behavior

  • A. Batliner, F. Burkhardt, M. van Ballegooy, E. Nöth: A Taxonomy of Applications that Utilize Emotional Awareness, Proc. IS-LTC

2006

slide-20
SLIDE 20

3.3.2020, ITG Workshop Magdeburg Dialogsysteme mit Gefühlen: Wie, Warum und Wohin?

Conciliation strategies

  • The price question in HMI dialogs is how to

react on sensed emotion expression

  • Remember emotion implies intelligence
  • Human agents are expensive
  • Current lack of world knowledge prevents

many of human strategies

Literature: F. Burkhardt, K.P. Engelbrecht, M. van Ballegooy, T. Polzehl and J. Stegmann: Emotion Detection in Dialog Systems - Usecases, Strategies and Challenges. ACII 2009

slide-21
SLIDE 21

3.3.2020, ITG Workshop Magdeburg Dialogsysteme mit Gefühlen: Wie, Warum und Wohin?

Applications irrespective of dialog strategies

  • Irony / Sarcasm detection: To analyze user opinion this is still a big

problem.

  • Call center support: distribute aggressive callers, support training.
  • Automated dialog support: Anger detection can be used for churn

prevention or for automatic quality monitoring.

  • Emotional Chat: Facilitate emotional computer mediated communication.
  • Emotion-aware Surrounding: Computer controlled environment that

adapts automatically on the user’s mood.

  • Search–by-emotion, e.g. Entertain product.
  • Believable Agent: The naturalness of an artificial ‘being’ and the

appearance of intelligence is highly altered by emotional expressions.

  • Artificial intelligence models, use emotions for motivation modeling
slide-22
SLIDE 22

3.3.2020, ITG Workshop Magdeburg Dialogsysteme mit Gefühlen: Wie, Warum und Wohin?

The anger monitor

  • Recorded and annotated anger data in office

and at home

  • Machine chunks and classifies continuously

and warns when anger is detected

  • Household with two teenagers
  • Cross database training not successful
  • F. Burkhardt: "You Seem Aggressive! -

Monitoring Anger in a Practical Application”, LREC, 2012

slide-23
SLIDE 23

3.3.2020, ITG Workshop Magdeburg Dialogsysteme mit Gefühlen: Wie, Warum und Wohin?

Ethical implications

  • Emotional expression can be

a) voluntary (communicative) or

b) involuntary

  • In a way, emotional expression of type b) reveals our

“inner thoughts”

  • There are cases were people decide voluntary to detect

their “subconscious” emotions (e.g. therapy)

  • It is not in any case of the interest of humans for these

to be revealed, but what if an ethically cgorrect procedure is facilitated by AI, e.g. fraud detection?

  • General: keep the user in control

Emerging standard: https://sagroups.ieee.org/7014/

“This standard defines "empathic technologies" as affect-sensitive technologies employed to algorithmically infer, model and simulate understanding of emotions, feelings, moods, perspective, attention and intention. Data insights and actions taken in response to these automated inferences typically, but not always, inform future interactions between a person or group and system (or between systems).”

slide-24
SLIDE 24

3.3.2020, ITG Workshop Magdeburg Dialogsysteme mit Gefühlen: Wie, Warum und Wohin?

Market overview Q3 2017

Sentjment from text Emotjonal systems Emotjonal biosignals Emotjons from face Audio recognitjon

slide-25
SLIDE 25

3.3.2020, ITG Workshop Magdeburg Dialogsysteme mit Gefühlen: Wie, Warum und Wohin?

Wrap up

  • Emotional processing comes with pervasive

computing

  • It can be used with intuitive interfaces, more natural

mediated communication and sophisticated AI models

  • It is ethically very sensitive
  • Emotional categories contrast with more complex

models, but the nature of emotion is dictated by application

  • The market is growing, all the big players are already

there

  • Deep Learning is a big driver