Multimodal Language Analysis with Recurrent Multistage Fusion - - PowerPoint PPT Presentation

multimodal language analysis with recurrent multistage
SMART_READER_LITE
LIVE PREVIEW

Multimodal Language Analysis with Recurrent Multistage Fusion - - PowerPoint PPT Presentation

Multimodal Language Analysis with Recurrent Multistage Fusion Presenter: Paul Pu Liang Paul Pu Liang, Ziyin Liu, Amir Zadeh, Louis-Philippe Morency 1 Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion Progress of


slide-1
SLIDE 1

1

Presenter: Paul Pu Liang

Paul Pu Liang, Ziyin Liu, Amir Zadeh, Louis-Philippe Morency

Multimodal Language Analysis with Recurrent Multistage Fusion

Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion

slide-2
SLIDE 2

2

Progress of Artificial Intelligence

Multimedia Content Intelligent Personal Assistants Robots and Virtual Agents

Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion

slide-3
SLIDE 3

3

Multimodal Language Modalities

Ø Gestures Ø Body language Ø Eye contact Ø Facial expressions

Language Visual Acoustic

Ø Lexicon Ø Syntax Ø Pragmatics Ø Prosody Ø Vocal expressions

Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion

slide-4
SLIDE 4

4

Multimodal Language Modalities

Ø Gestures Ø Body language Ø Eye contact Ø Facial expressions

Language Visual Acoustic

Ø Lexicon Ø Syntax Ø Pragmatics Ø Prosody Ø Vocal expressions

Ø Anger Ø Disgust Ø Fear Ø Happiness Ø Sadness Ø Surprise

Emotion Personality

Ø Confidence Ø Persuasion Ø Passion

Sentiment

Ø Positive Ø Negative

Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion

slide-5
SLIDE 5

5

Challenge 1: Intra-modal Interactions

“This movie is great” Smile

Intra-modal Speaker’s behaviors Sentiment Intensity time time

Head nod

a) Temporal sequences

Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion

slide-6
SLIDE 6

6

Challenge 2: Cross-modal Interactions

“This movie is great” Smile

Cross-modal Speaker’s behaviors Sentiment Intensity

Loud voice

time

a) Multiple co-occurring interactions b) Different weighted combinations

Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion

slide-7
SLIDE 7

7

Multistage Aggregation in Humans

Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion

(Parsini et al. 2015, Taylor et al. 2017) wide smile loud voice

slide-8
SLIDE 8

8

Multistage Aggregation in Humans

Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion

(Parsini et al. 2015, Taylor et al. 2017) wide smile loud voice positive reaction positive words

slide-9
SLIDE 9

9

Multistage Aggregation in Humans

excitement joyous wide smile loud voice positive reaction positive words (Parsini et al. 2015, Taylor et al. 2017)

slide-10
SLIDE 10

10

Computational Model for Multistage Fusion

excitement joyous wide smile loud voice positive reaction positive words

Computational Model

slide-11
SLIDE 11

11

Multimodal Descriptors

Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion

type He’s Language Visual Acoustic

time … … …

… multimodal descriptors average

slide-12
SLIDE 12

12

Language Descriptors

Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion

type He’s Language Visual Acoustic

time

average

… … …

neutral word … multimodal descriptors

slide-13
SLIDE 13

13

Visual Descriptors

Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion

type He’s Language Visual Acoustic

time

average

… … …

neutral word shrug … frown multimodal descriptors

slide-14
SLIDE 14

14

Acoustic Descriptors

Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion

type He’s Language Visual Acoustic

time

average

… … …

neutral word loud voice shrug speech elongation … frown multimodal descriptors

slide-15
SLIDE 15

15

Multistage Fusion

loud voice shrug speech elongation … neutral word frown

slide-16
SLIDE 16

16

Multistage Fusion

stage 1

HIGHLIGHT

loud voice shrug speech elongation … neutral word frown

slide-17
SLIDE 17

17

Multistage Fusion

stage 1

HIGHLIGHT FUSE

loud voice shrug speech elongation … neutral word

negative negative

frown

slide-18
SLIDE 18

18

Multistage Fusion

stage 1

HIGHLIGHT FUSE

loud voice shrug speech elongation … neutral word neutral word shrug speech elongation … loud voice

negative negative

stage 2

frown frown

slide-19
SLIDE 19

19

Multistage Fusion

stage 1

HIGHLIGHT FUSE

loud voice shrug speech elongation … neutral word neutral word shrug speech elongation … loud voice

negative negative emphasis

stage 2

frown frown

slide-20
SLIDE 20

20

Multistage Fusion

stage 1

HIGHLIGHT FUSE

loud voice shrug speech elongation … neutral word neutral word shrug speech elongation … loud voice

negative negative emphasis strongly negative

stage 2

frown frown

slide-21
SLIDE 21

21

Multistage Fusion

stage 1

HIGHLIGHT FUSE

loud voice shrug speech elongation … neutral word neutral word shrug speech elongation … loud voice neutral word loud voice shrug speech elongation …

negative negative emphasis strongly negative

stage 2 stage 3

frown frown frown

slide-22
SLIDE 22

22

Multistage Fusion

stage 1

HIGHLIGHT FUSE

loud voice shrug speech elongation … neutral word neutral word shrug speech elongation … loud voice neutral word loud voice shrug speech elongation …

negative negative emphasis strongly negative

stage 2

ambivalence

stage 3

frown frown frown

slide-23
SLIDE 23

23

Multistage Fusion

stage 1

HIGHLIGHT FUSE

loud voice shrug speech elongation … neutral word neutral word shrug speech elongation … loud voice neutral word loud voice shrug speech elongation …

negative negative emphasis strongly negative

stage 2

ambivalence disappointed

stage 3

frown frown frown

slide-24
SLIDE 24

24

Intra-modal Recurrent Networks

LSTHM ! LSTHM ! LSTHM " LSTHM " LSTHM # LSTHM #

$%

& $% ' $% (

time ) time ) + +

slide-25
SLIDE 25

25

Multistage Fusion Process

!"

# !" $ !" %

Multistage Fusion Process

Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion

slide-26
SLIDE 26

26

Multistage Fusion Process

!"

# !" $ !" %

stage 1

HIGHLIGHT

Multistage Fusion Process

Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion

slide-27
SLIDE 27

27

Multistage Fusion Process

!"

# !" $ !" %

stage 1

HIGHLIGHT

Multistage Fusion Process

Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion

slide-28
SLIDE 28

28

Multistage Fusion Process

FUSE

!"

# !" $ !" %

stage 1

HIGHLIGHT

Multistage Fusion Process

Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion

slide-29
SLIDE 29

29

Multistage Fusion Process

FUSE

!"

# !" $ !" %

stage 1 stage 2

HIGHLIGHT HIGHLIGHT

Multistage Fusion Process

Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion

slide-30
SLIDE 30

30

Multistage Fusion Process

FUSE

!"

# !" $ !" %

stage 1 stage 2

HIGHLIGHT HIGHLIGHT

Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion

Highlight LSTM

Multistage Fusion Process

slide-31
SLIDE 31

31

Multistage Fusion Process

FUSE FUSE

!"

# !" $ !" %

stage 1 stage 2

HIGHLIGHT HIGHLIGHT

Multistage Fusion Process

Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion

Highlight LSTM Fuse LSTM

slide-32
SLIDE 32

32

Multistage Fusion Process

FUSE FUSE FUSE

!"

# !" $ !" %

stage 1 stage 2 stage &

HIGHLIGHT

HIGHLIGHT HIGHLIGHT

Multistage Fusion Process

Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion

Highlight LSTM Fuse LSTM

slide-33
SLIDE 33

33

!"

Multistage Fusion Process

FUSE FUSE FUSE

#"

$ #" % #" &

stage 1 stage 2 stage '

SUMMARIZE

HIGHLIGHT

HIGHLIGHT HIGHLIGHT

Multistage Fusion Process

Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion

Highlight LSTM Fuse LSTM

slide-34
SLIDE 34

34

Recurrent Multistage Fusion Network

LSTHM ! LSTHM ! LSTHM " LSTHM " LSTHM # LSTHM #

$%

Multistage Fusion Process

FUSE FUSE FUSE

&%

' &% ( &% )

stage 1 stage 2 stage *

SUMMARIZE

time + time + + -

HIGHLIGHT

HIGHLIGHT HIGHLIGHT

slide-35
SLIDE 35

35

Recurrent Multistage Fusion Network

LSTHM ! LSTHM ! LSTHM " LSTHM " LSTHM # LSTHM #

$%

Multistage Fusion Process

FUSE FUSE FUSE

&%

' &% ( &% )

stage 1 stage 2 stage *

SUMMARIZE

time + time + + -

HIGHLIGHT

HIGHLIGHT HIGHLIGHT

slide-36
SLIDE 36

36

1. Non-temporal Models

§ SVM (Cortes and Vapnik, 1995), DF (Nojavanasghari et al., 2016)

2. Early Fusion

§ EF-LSTM (Hochreiter and Schmidhuber, 1997), EF-RHN (Zilly et al., 2016)

3. Late Fusion

§ LMF (Liu et al., 2018), TFN (Zadeh et al., 2017), BC-LSTM (Poria et al., 2017)

4. Multi-view Learning

§ MV-LSTM (Rajagopalan et al., 2016)

5. Memory-based models

§ MARN, MFN (Zadeh et al., 2018)

Baseline Models

Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion

slide-37
SLIDE 37

37

State-of-the-art Results

Baseline Models RMFN

73 73.5 74 74.5 75 75.5 76 76.5 77 SVM-MD DF EF-RHN EF-LSTM TFN BC-LSTM MV-LSTM MARN MFN Graph-MFN

CMU-MOSI Sentiment (Binary Accuracy)

78.4%

MFN MARN SVM DF TFN EF-RHN EF-LSTM BC-LSTM RMFN MVLSTM

Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion

slide-38
SLIDE 38

38

State-of-the-art Results

Best Baseline Model RMFN

0.45 0.46 0.47 0.48 0.49 0.5 0.51 0.52 0.53 0.54 0.55

CMU-MOSI Sentiment (Correlation)

44.65 44.7 44.75 44.8 44.85 44.9 44.95 45 45.05 45.1 45.15

POM Personality Traits (Multiclass Accuracy)

60 60.5 61 61.5 62 62.5 63

IEMOCAP Happy Emotion (Binary Accuracy)

60 61 62 63 64 65 66 67 68 69 70

IEMOCAP Sad Emotion (Binary Accuracy)

RMFN RMFN RMFN RMFN MV-LSTM MFN MARN MFN

Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion

slide-39
SLIDE 39

39

Results

Best Baseline Model RMFN

60 62 64 66 68 70 72 74

IEMOCAP Neutral Emotion (Binary Accuracy)

RMFN MFN

Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion

slide-40
SLIDE 40

40

Multiple Stages are Important

74 75 76 77 78 79 80 40 42 44 46 48 50 52 54 56

1 2 3 4 5 1 2 3 4 5 Number of stages Number of stages

CMU-MOSI Sentiment Analysis (Binary Accuracy) CMU-MOSI Sentiment Analysis (Multiclass Accuracy)

Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion

slide-41
SLIDE 41

41

Ablation Studies

Paul Pu Liang Multimodal Language Analysis with Recurrent Multistage Fusion

slide-42
SLIDE 42

42

Interpretable Fusion

Language Visual Acoustic

I (elongation) (emphasis) thought it was fun

slide-43
SLIDE 43

43

Interpretable Fusion

low high

ℎ"

#

ℎ"

$

ℎ"

%

Language Visual Acoustic

I (elongation)

stages

(emphasis) thought it was fun

& t = 1

slide-44
SLIDE 44

44

Interpretable Fusion

low high

ℎ"

#

ℎ"

$

ℎ"

%

Language Visual Acoustic

I (elongation)

stages

(emphasis) thought it was fun

&'( t = 1

slide-45
SLIDE 45

45

Interpretable Fusion

low high

ℎ"

#

ℎ"

$

ℎ"

%

Language Visual Acoustic

I (elongation)

stages stages

(emphasis) thought it was fun

&'( &'( t = 1 t = 5

slide-46
SLIDE 46

46

Across Stages

low high

ℎ"

#

ℎ"

$

ℎ"

%

Language Visual Acoustic

I (elongation)

stages stages

(emphasis) thought it was fun

&'( &'( t = 1 t = 5

slide-47
SLIDE 47

47

Across Time

low high

ℎ"

#

ℎ"

$

ℎ"

%

Language Visual Acoustic

I (elongation)

stages stages

(emphasis) thought it was fun

&'( &'( t = 1 t = 5

slide-48
SLIDE 48

48

Multimodal Priors

low high

ℎ"

#

ℎ"

$

ℎ"

%

Language Visual Acoustic

I (elongation)

stages stages

(emphasis) thought it was fun

&'( &'( t = 1 t = 5

slide-49
SLIDE 49

49

Synchronized Interactions

low high

ℎ"

#

ℎ"

$

ℎ"

%

Language Visual Acoustic

I (elongation)

stages stages

(emphasis) thought it was fun

&'( &'( t = 1 t = 5

slide-50
SLIDE 50

50

Synchronized Interactions

low high

ℎ"

#

ℎ"

$

ℎ"

%

Language Visual Acoustic

I (elongation)

stages stages

(emphasis) thought it was fun

&'( &'( t = 1 t = 5

slide-51
SLIDE 51

51

Synchronized Interactions

low high

ℎ"

#

ℎ"

$

ℎ"

%

Language Visual Acoustic

I (elongation)

stages stages

(emphasis) thought it was fun

&'( &'( t = 1 t = 5

slide-52
SLIDE 52

52

Asynchronous Trimodal Interactions

low high

ℎ"

#

ℎ"

$

ℎ"

%

Language

He delivers a lot

  • f intensity

&'( stages stages &'( t = 1 t = 6

slide-53
SLIDE 53

53

Asynchronous Trimodal Interactions

low high

ℎ"

#

ℎ"

$

ℎ"

%

Language Visual Acoustic

He delivers a lot

  • f intensity

(emphasis)

&'( stages stages

(smile) (smile)

&'( t = 1 t = 6

slide-54
SLIDE 54

54

Asynchronous Trimodal Interactions

low high

ℎ"

#

ℎ"

$

ℎ"

%

Language Visual Acoustic

He delivers a lot

  • f intensity

(emphasis)

&'( stages stages

(smile) (smile)

&'( t = 1 t = 6

slide-55
SLIDE 55

55

low high

ℎ"

#

ℎ"

$

ℎ"

%

Language Visual Acoustic

It doesn’t give any insight or help

stages stages

(soft) (emphasis) (disappointed)

&'( &'( t = 1 t = 7

Bimodal Interactions

slide-56
SLIDE 56

56

low high

ℎ"

#

ℎ"

$

ℎ"

%

Language Visual Acoustic

It doesn’t give any insight or help

stages stages

(soft) (emphasis) (disappointed)

&'( &'( t = 1 t = 7

Bimodal Interactions

slide-57
SLIDE 57

57

Recurrent Multistage Fusion Network

LSTHM ! LSTHM ! LSTHM " LSTHM " LSTHM # LSTHM #

$%

Multistage Fusion Process

FUSE FUSE FUSE

&%

' &% ( &% )

stage 1 stage 2 stage *

SUMMARIZE

time + time + + -

HIGHLIGHT

HIGHLIGHT HIGHLIGHT

slide-58
SLIDE 58

58

The End!

Website: www.cs.cmu.edu/~pliang Email: pliang@cs.cmu.edu Twitter: @pliang279

!"

Multistage Fusion Process

FUSE FUSE FUSE

#"

$ #" % #" &

stage 1 stage 2 stage '

SUMMARIZE

HIGHLIGHT

HIGHLIGHT HIGHLIGHT