Multi-attention Recurrent Network for Human Communication - - PowerPoint PPT Presentation

multi attention recurrent network for human communication
SMART_READER_LITE
LIVE PREVIEW

Multi-attention Recurrent Network for Human Communication - - PowerPoint PPT Presentation

Multi-attention Recurrent Network for Human Communication Comprehension Amir Zadeh, Paul Pu Liang, Soujanya Poria, Prateek Vij, Erik Cambria, Louis-Philippe Morency Presenter: Paul Pu Liang 1 Progress of Artificial Intelligence Intelligent


slide-1
SLIDE 1

1

Amir Zadeh, Paul Pu Liang, Soujanya Poria, Prateek Vij, Erik Cambria, Louis-Philippe Morency

Presenter: Paul Pu Liang

Multi-attention Recurrent Network for Human Communication Comprehension

slide-2
SLIDE 2

2

Progress of Artificial Intelligence

Multimedia Content Intelligent Personal Assistants Robots and Virtual Agents

slide-3
SLIDE 3

3

Multimodal Communicative Behaviors

Ø Gestures

  • Head gestures
  • Eye gestures
  • Arm gestures

Ø Body language

  • Body posture
  • Proxemics

Ø Eye contact

  • Head gaze
  • Eye gaze

Ø Facial expressions

  • FACS action units
  • Smile, frowning

Language Visual Acoustic

Ø Lexicon

  • Words

Ø Syntax

  • Part-of-speech
  • Dependencies

Ø Pragmatics

  • Discourse acts

Ø Prosody

  • Intonation
  • Voice quality

§ Vocal expressions § Laughter, moans

Ø Anger Ø Disgust Ø Fear Ø Happiness Ø Sadness Ø Surprise

Emotion Personality

Ø Confidence Ø Persuasion Ø Passion

Sentiment

Ø Positive Ø Negative

Ø Vocal expressions

  • Laughter, moans
slide-4
SLIDE 4

4

Challenge 1: Intra-modal dynamics

“This movie is great” Smile

Intra-modal Speaker’s behaviors Sentiment Intensity time time

Head nod

slide-5
SLIDE 5

5

Challenge 1: Intra-modal dynamics

“This movie is great” Smile

Intra-modal Speaker’s behaviors Sentiment Intensity time time

Head nod

slide-6
SLIDE 6

6

Challenge 2: Cross-modal Dynamics

a) Multiple co-occurring interactions

“This movie is great” Smile

Cross-modal Speaker’s behaviors Sentiment Intensity

Loud voice

time

slide-7
SLIDE 7

7

a) Multiple co-occurring interactions b) Different weighted combinations

Challenge 2: Cross-modal Dynamics

“This movie is fair” Smile

Cross-modal Speaker’s behaviors Sentiment Intensity

Loud voice

time

slide-8
SLIDE 8

8

a) Multiple co-occurring interactions b) Different weighted combinations c) Multiple prediction targets

Challenge 2: Cross-modal Dynamics

“This movie is great” Raised Eyebrows

Cross-modal Speaker’s behaviors Emotions

Loud voice

time Happy Surprised

slide-9
SLIDE 9

9

1

Multi-attention Recurrent Network (MARN)

Modeling intra-modal dynamics Set of Long-short Term Memories

slide-10
SLIDE 10

10

1

Multi-attention Recurrent Network (MARN)

Modeling intra-modal dynamics Set of Long-short Term Memories 2 Modeling cross-modal dynamics Set of Long-short Term Hybrid Memories + Single-attention Block

slide-11
SLIDE 11

11

1

Multi-attention Recurrent Network (MARN)

Modeling intra-modal dynamics Set of Long-short Term Memories 2 Modeling cross-modal dynamics Set of Long-short Term Hybrid Memories + Single-attention Block Modeling multiple cross-modal dynamics Set of Long-short Term Hybrid Memories + Multi-attention Block

slide-12
SLIDE 12

12

Challenge 1: Intra-modal Dynamics

LSTM 𝑚

This

LSTM 𝑚 LSTM 𝑚 LSTM 𝑚

movie is great

LSTM 𝑤

  • LSTM 𝑤

LSTM 𝑤 LSTM 𝑤

  • LSTM 𝑤

(smile)

LSTM 𝑏

  • LSTM 𝑏

LSTM 𝑏 LSTM 𝑏

  • LSTM 𝑏

(loud voice)

  • LSTM 𝑚
  • 𝒎𝒃𝒐𝒉𝒗𝒃𝒉𝒇

𝒘𝒋𝒕𝒗𝒃𝒎 𝒃𝒅𝒑𝒗𝒕𝒖𝒋𝒅

slide-13
SLIDE 13

13

Challenge 2: Cross-modal Dynamics

LSTM 𝑚 LSTM 𝑤 LSTM 𝑤 LSTM 𝑏 LSTM 𝑏 LSTM 𝑚

ØHow do we capture cross-modal dynamics continuously across time?

slide-14
SLIDE 14

14

Challenge 2: Single-attention Block

LSTM 𝑚 LSTM 𝑤 LSTM 𝑤 LSTM 𝑏 LSTM 𝑏 LSTM 𝑚

ℎ2

3

ℎ2

4

ℎ2

5

Captures cross-modal dynamics.

𝑨2

slide-15
SLIDE 15

15

Challenge 2: Single-attention Block

𝒝 ℎ2

3

ℎ2

4

ℎ2

5

𝑏2

8

LSTM 𝑚 LSTM 𝑤 LSTM 𝑤 LSTM 𝑏 LSTM 𝑏 LSTM 𝑚

slide-16
SLIDE 16

16

Challenge 2: Single-attention Block

language vision acoustic

ℎ 92

8

⨂ 𝒝 ℎ2

3

ℎ2

4

ℎ2

5

𝑏2

8

LSTM 𝑚 LSTM 𝑤 LSTM 𝑤 LSTM 𝑏 LSTM 𝑏 LSTM 𝑚

slide-17
SLIDE 17

17

Challenge 2: Single-attention Block

language vision acoustic

ℎ 92

8

⨂ 𝒟 3 𝒟 4 𝒟 5 𝒝 𝑡2

3

𝑡2

4

𝑡2

5

ℎ2

3

ℎ2

4

ℎ2

5

𝑏2

8

LSTM 𝑚 LSTM 𝑤 LSTM 𝑤 LSTM 𝑏 LSTM 𝑏 LSTM 𝑚

slide-18
SLIDE 18

18

Challenge 2: Single-attention Block

language vision acoustic

ℎ 92

8

⨂ 𝒟 3 𝒟 4 𝒟 5 𝒣 𝒝 𝑡2

3

𝑡2

4

𝑡2

5

ℎ2

3

ℎ2

4

ℎ2

5

𝑏2

8

LSTM 𝑚 LSTM 𝑤 LSTM 𝑤 LSTM 𝑏 LSTM 𝑏 LSTM 𝑚

𝑨2

slide-19
SLIDE 19

19

Challenge 2: Single-attention Block

language vision acoustic

ℎ 92

8

⨂ 𝒟 3 𝒟 4 𝒟 5 𝒝 ℎ2

3

ℎ2

4

ℎ2

5

𝑏2

8

LSTM 𝑚 LSTM 𝑤 LSTM 𝑤 LSTM 𝑏 LSTM 𝑏 LSTM 𝑚 LSTM update

𝒣 𝑡2

3

𝑡2

4

𝑡2

5

𝑨2

𝑋3𝑦2@8

3

+ 𝑉3ℎ2

3 + 𝑐3

slide-20
SLIDE 20

20

Challenge 2: Long-short Term Hybrid Memory

language vision acoustic

ℎ 92

8

⨂ 𝒟 3 𝒟 4 𝒟 5 𝒝

LSTHM 𝒘 LSTHM 𝒘 LSTHM 𝒃 LSTHM 𝒎

ℎ2

3

ℎ2

4

ℎ2

5

𝑨2 𝑏2

8

LSTHM 𝒎 LSTHM 𝒃 𝑋3𝑦2@8

3

+ 𝑉3ℎ2

3 + 𝑾𝒎𝒜𝒖 𝒎 + 𝑐3

LSTHM update

𝒣 𝑡2

3

𝑡2

4

𝑡2

5

slide-21
SLIDE 21

21

Challenge 2: Multi-attention Block

𝒟 3 𝒟 4 𝒟 5 𝒝 ℎ2

3

ℎ2

4

ℎ2

5

𝑨2

language vision acoustic ⋯ ⨂ ⋯ ⨂ ⨂

LSTHM 𝒘 LSTHM 𝒘 LSTHM 𝒎 LSTHM 𝒎 LSTHM 𝒃 LSTHM 𝒃

𝒣 𝑡2

3

𝑡2

4

𝑡2

5

slide-22
SLIDE 22

22

Multi-attention Recurrent Network (MARN)

𝒟 3 𝒟 4 𝒟 5 𝒝 ℎ2

3

ℎ2

4

ℎ2

5

𝑨2

language vision acoustic ⋯ ⨂ ⋯ ⨂ ⨂

LSTHM 𝒘 LSTHM 𝒘 LSTHM 𝒎 LSTHM 𝒎 LSTHM 𝒃 LSTHM 𝒃

𝒣 𝑡2

3

𝑡2

4

𝑡2

5

slide-23
SLIDE 23

23

Experiments

Ø Anger Ø Disgust Ø Fear Ø Happiness Ø Sadness Ø Surprise

Emotion Personality

Ø Confidence Ø Persuasion Ø Passion

Sentiment

Ø Positive Ø Negative

Ø Facet features

  • FACS action units
  • Emotions

Language Visual Acoustic

Ø Glove word embeddings Ø COVAREP features

  • MFCCs
  • Pitch tracking

§ Vocal expressions § Laughter, moans

Alignment

Ø Word level Ø P2FA

slide-24
SLIDE 24

24

  • 1. Non-temporal Models
  • SVM-MD, RF
  • 2. Early Fusion
  • HMM, EF-LSTM, EF-HCRF, C-MKL, SAL-CNN
  • 3. Late Fusion
  • DF, TFN, BC-LSTM
  • 4. Multi-view Learning
  • MV-HMMs, MV-HCRFs, MV-LSTM

Baseline Models

slide-25
SLIDE 25

25

State-of-the-art Results

45 50 55 60 65 70 75 80 THMM RF EF-HCRF MV-HCRF SVM-MD C-MKL DF SAL-CNN EF-LSTM MV-LSTM BC-LSTM TFN MARN

CMU-MOSI Sentiment Analysis

Baseline Models Multi-attention Recurrent Network (MARN)

slide-26
SLIDE 26

26

State-of-the-art Results

40 45 50 55 60 65 70 75 80 85 90 CMU-MOSI ICT-MMMO MOUD YouTube

Sentiment Analysis

25 26 27 28 29 30 31 32 33 34 POM Confidence POM Persusasion POM Passion POM Credibility 30 31 32 33 34 35 36 37 38 IEMOCAP

State-of-the-art Baseline Multi-attention Recurrent Network (MARN)

Emotion Recognition Personality Trait Prediction

slide-27
SLIDE 27

27

Multi-attention Block is Important

40 45 50 55 60 65 70 75 80 85 90 CMU-MOSI ICT-MMMO MOUD YouTube

Sentiment Analysis

30 31 32 33 34 35 36 37 38 IEMOCAP 25 26 27 28 29 30 31 32 33 34 POM Confidence POM Persusasion POM Passion POM Credibility

No Multi-attention Block Multi-attention Recurrent Network (MARN)

Emotion Recognition Personality Trait Prediction

slide-28
SLIDE 28

28

Multiple Attentions are Important

75.6 75.8 76 76.2 76.4 76.6 76.8 77 77.2 40 42 44 46 48 50 52 54 56

YouTube Sentiment Analysis 1 2 3 4 5 1 2 3 4 5 6 Number of attentions Number of attentions CMU-MOSI Sentiment Analysis

slide-29
SLIDE 29

29

Visualization

Attentions show diversity and are sensitive to different cross-modal dynamics active inactive

slide-30
SLIDE 30

30

Visualization

Some attentions always inactive

  • Carry only intra-modal dynamics
  • No cross-modal dynamics

active inactive

slide-31
SLIDE 31

31

Visualization

Attentions change behaviors across time, some changes are more drastic than others. time active inactive

slide-32
SLIDE 32

32

Visualization

Different attentions focus on different modalities. inactive active active inactive

slide-33
SLIDE 33

33

1

Multi-attention Recurrent Network (MARN)

Modeling intra-modal dynamics Set of Long-short Term Memories 2 Modeling cross-modal dynamics Set of Long-short Term Hybrid Memories + Single-attention Block Modeling multiple cross-modal dynamics Set of Long-short Term Hybrid Memories + Multi-attention Block

slide-34
SLIDE 34

34

The End!

Code: https://github.com/A2Zadeh/MARN Email: pliang@cs.cmu.edu

slide-35
SLIDE 35

35

Workshop @ ACL 2018 First Workshop on Computational Modeling of Human Multimodal Language multicomp.cs.cmu.edu/acl2018multimodalchallenge/

The End!

Code: https://github.com/A2Zadeh/MARN Email: pliang@cs.cmu.edu