Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and - - PowerPoint PPT Presentation

multimodal language analysis in the wild cmu mosei
SMART_READER_LITE
LIVE PREVIEW

Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and - - PowerPoint PPT Presentation

Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph Presenter: Paul Pu Liang Amir Zadeh, Paul Pu Liang, Jonathan Vanbriessen, Soujanya Poria, Edmund Tong, Erik Cambria, Minghai Chen, Louis-Philippe


slide-1
SLIDE 1

1

Presenter: Paul Pu Liang

Amir Zadeh, Paul Pu Liang, Jonathan Vanbriessen, Soujanya Poria, Edmund Tong, Erik Cambria, Minghai Chen, Louis-Philippe Morency

Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

Paul Pu Liang Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

slide-2
SLIDE 2

2

Progress of Artificial Intelligence

Multimedia Content Intelligent Personal Assistants Robots and Virtual Agents

Paul Pu Liang Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

slide-3
SLIDE 3

Throughout evolution language and nonverbal behaviors developed together.

Cries and Imitations Modern Language

Continuous Theories of (Multimodal) Language

Paul Pu Liang Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

slide-4
SLIDE 4

4

Multimodal Language Modalities

Ø Gestures Ø Body language Ø Eye contact Ø Facial expressions

Language Visual Acoustic

Ø Lexicon Ø Syntax Ø Pragmatics Ø Prosody Ø Vocal expressions

Paul Pu Liang Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

slide-5
SLIDE 5

5

Multimodal Language Modalities

Ø Gestures Ø Body language Ø Eye contact Ø Facial expressions

Language Visual Acoustic

Ø Lexicon Ø Syntax Ø Pragmatics Ø Prosody Ø Vocal expressions

Ø Anger Ø Disgust Ø Fear Ø Happiness Ø Sadness Ø Surprise

Emotion Personality

Ø Confidence Ø Persuasion Ø Passion

Sentiment

Ø Positive Ø Negative

Paul Pu Liang Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

slide-6
SLIDE 6

6

Multimodal Language Modalities Language Visual Acoustic

Emotion Personality Sentiment

Paul Pu Liang Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

Datasets Models

slide-7
SLIDE 7

7

Multimodal Language Modalities Language Visual Acoustic

Emotion Personality Sentiment

Paul Pu Liang Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

Datasets Models

slide-8
SLIDE 8

8

Multimodal Language Modalities Language Visual Acoustic

Emotion Personality Sentiment

Paul Pu Liang Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

Datasets Models ü Large-scale ü Diverse

slide-9
SLIDE 9

9

Multimodal Language Modalities Language Visual Acoustic

Emotion Personality Sentiment

Paul Pu Liang Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

Datasets Models § Word-level alignment § Attention models § Memory-based models ü Large-scale ü Diverse

slide-10
SLIDE 10

10

Multimodal Language Modalities Language Visual Acoustic

Emotion Personality Sentiment

Paul Pu Liang Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

Datasets ü Large-scale ü Diverse ü Good Performance ü Interpretable Models § Word-level alignment § Attention models § Memory-based models

slide-11
SLIDE 11

§ Require large and diverse amounts of data: § Diversity in samples

Datasets for Multimodal Language

Paul Pu Liang Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

slide-12
SLIDE 12

§ Require large and diverse amounts of data: § Diversity in samples § Diversity in topics

Datasets for Multimodal Language

Paul Pu Liang Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

slide-13
SLIDE 13

§ Require large and diverse amounts of data: § Diversity in samples § Diversity in topics § Diversity in speakers

Datasets for Multimodal Language

Paul Pu Liang Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

slide-14
SLIDE 14

§ Require large and diverse amounts of data: § Diversity in samples § Diversity in topics § Diversity in speakers § Diversity in annotations

Datasets for Multimodal Language

Paul Pu Liang Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

slide-15
SLIDE 15

15

New Dataset: CMU-MOSEI

23,000 video segments 3 modalities

Paul Pu Liang Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

slide-16
SLIDE 16

16

CMU-MOSEI Dataset

1,000 speakers 250 topics

Paul Pu Liang Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

slide-17
SLIDE 17

17

Annotation Distributions

Paul Pu Liang Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

slide-18
SLIDE 18

18

Annotation Distributions

Paul Pu Liang Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

slide-19
SLIDE 19

19

Feature Extraction

Ø Anger Ø Disgust Ø Fear Ø Happiness Ø Sadness Ø Surprise

Emotion Sentiment

Ø Positive Ø Negative

Ø Facet features Ø MultiComp OpenFace Ø Face embeddings

Language Visual Acoustic

Ø Glove word embeddings Ø COVAREP features

  • MFCCs
  • Pitch tracking

Alignment

Ø Word level Ø P2FA

Paul Pu Liang Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

slide-20
SLIDE 20

20

CMU-MOSEI Dataset

Paul Pu Liang Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

Multimodal Language Audio-visual

slide-21
SLIDE 21

Models for Multimodal Language

Paul Pu Liang Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

! ! ! ! Multimodal Fusion

multimodal

slide-22
SLIDE 22

Models for Multimodal Language

! Interpretation

multimodal

Multimodal Fusion

§ Importance of each modality § Interactions between modalities

slide-23
SLIDE 23

Dynamic Fusion Graph (DFG)

Paul Pu Liang Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

unimodal multimodal

Interpretation

! " # ! " #

§ Importance of each modality § Interactions between modalities

$

slide-24
SLIDE 24

Dynamic Fusion Graph (DFG)

Paul Pu Liang Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

bimodal unimodal multimodal

Interpretation

! " # ! " #

§ Importance of each modality § Interactions between modalities

$

slide-25
SLIDE 25

Dynamic Fusion Graph (DFG)

Paul Pu Liang Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

trimodal bimodal unimodal multimodal

Interpretation

§ Importance of each modality § Interactions between modalities

! " # ! " # $

slide-26
SLIDE 26

Dynamic Fusion Graph (DFG)

Paul Pu Liang Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

trimodal bimodal unimodal multimodal

Interpretation

! " # $

fusion weights

§ Importance of each modality § Interactions between modalities

slide-27
SLIDE 27

Dynamic Fusion Graph (DFG)

Paul Pu Liang Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

! " # $

trimodal bimodal unimodal multimodal

§ Importance of each modality § Interactions between modalities

Interpretation

fusion weights

slide-28
SLIDE 28

Dynamic Fusion Graph (DFG)

Paul Pu Liang Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

trimodal bimodal unimodal multimodal t = 1

! " # $

slide-29
SLIDE 29

Dynamic Fusion Graph (DFG)

Paul Pu Liang Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

trimodal bimodal unimodal multimodal t = 2

! " # $

t = 1

slide-30
SLIDE 30

Dynamic Fusion Graph (DFG)

Paul Pu Liang Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

trimodal bimodal unimodal multimodal t = 3

! " # $

t = 1 t = 2

slide-31
SLIDE 31

Dynamic Fusion Graph (DFG)

Paul Pu Liang Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

trimodal

! " # $

bimodal unimodal multimodal t = 4 t = 1 t = 2 t = 3

slide-32
SLIDE 32

Dynamic Fusion Graph (DFG)

Paul Pu Liang Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

trimodal bimodal unimodal multimodal

Interpretation

! " #

Interpretation

fusion weights

§ Importance of each modality § Interactions between modalities

$

slide-33
SLIDE 33

Dynamic Fusion Graph (DFG)

Paul Pu Liang Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

trimodal bimodal unimodal multimodal

! " #

Interpretation

§ Construction of bimodal and trimodal representations

construction weights fusion weights

§ Importance of each modality § Interactions between modalities

$

slide-34
SLIDE 34

Dynamic Fusion Graph (DFG)

Paul Pu Liang Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

trimodal bimodal unimodal multimodal

!",$ !",% !%,$

& ' (

Interpretation

fusion weights

§ Importance of each modality § Interactions between modalities § Construction of bimodal and trimodal representations

construction weights

$

slide-35
SLIDE 35

Dynamic Fusion Graph (DFG)

Paul Pu Liang Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

trimodal bimodal unimodal multimodal

Interpretation

fusion weights

§ Importance of each modality § Interactions between modalities § Construction of bimodal and trimodal representations

construction weights

$

!",$ !",% !%,$ !",%,$

& ' (

slide-36
SLIDE 36

Dynamic Fusion Graph (DFG)

Paul Pu Liang Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

trimodal bimodal unimodal multimodal

Interpretation

fusion weights

§ Importance of each modality § Interactions between modalities § Construction of bimodal and trimodal representations

construction weights

!",$ !",% !%,$ !",%,$

& ' ( )

slide-37
SLIDE 37

Dynamic Fusion Graph (DFG)

trimodal bimodal unimodal multimodal t = 4 t = 1 t = 2 t = 3

!",$ !",% !%,$ !",%,$

& ' ( )

!",$ !",% !%,$ !",%,$

& ' ( )

!",$ !",% !%,$ !",%,$

& ' ( )

!",$ !",% !%,$ !",%,$

& ' ( )

slide-38
SLIDE 38

Graph-Memory Fusion Network (Graph-MFN)

trimodal bimodal unimodal multimodal

!",$ !",% !%,$ !",%,$

& ' ( )

slide-39
SLIDE 39

Graph-Memory Fusion Network (Graph-MFN)

trimodal bimodal unimodal multimodal

Gated Memory

𝑣"

!",$ !",% !%,$ !",%,$

& ' ( )

slide-40
SLIDE 40

Graph-Memory Fusion Network (Graph-MFN)

trimodal bimodal unimodal multimodal

!",$ !",% !%,$ !",%,$

& ' ( )

Multi-view Gated Memory

𝑣"

!",$ !",% !%,$ !",%,$

& ' ( )

slide-41
SLIDE 41

41

1. Non-temporal Models

§ SVM (Cortes and Vapnik, 1995), DF (Nojavanasghari et al., 2016)

2. Early Fusion

§ EF-LSTM (Hochreiter and Schmidhuber, 1997), EF-RHN (Zilly et al., 2016)

3. Late Fusion

§ TFN (Zadeh et al., 2017), BC-LSTM (Poria et al., 2017)

4. Multi-view Learning

§ MV-LSTM (Rajagopalan et al., 2016)

5. Memory-based models

§ MFN (Zadeh et al., 2018)

Baseline Models

Paul Pu Liang Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

slide-42
SLIDE 42

42

State-of-the-art Results

Baseline Models Graph-MFN

Paul Pu Liang Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

73 73.5 74 74.5 75 75.5 76 76.5 77 SVM-MD DF EF-RHN EF-LSTM TFN BC-LSTM MV-LSTM MARN MFN Graph-MFN

CMU-MOSEI Sentiment (Binary Accuracy)

76.9%

MFN MARN SVM DF TFN EF-RHN EF-LSTM BC-LSTM Graph-MFN MVLSTM

slide-43
SLIDE 43

43

State-of-the-art Results

Best Baseline Model Graph-MFN

Paul Pu Liang Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

0.45 0.46 0.47 0.48 0.49 0.5 0.51 0.52 0.53 0.54 0.55

Sentiment (Correlation)

44.65 44.7 44.75 44.8 44.85 44.9 44.95 45 45.05 45.1 45.15

Sentiment (Multiclass Accuracy)

60 60.5 61 61.5 62 62.5 63

Anger Emotion (Binary Accuracy)

60 61 62 63 64 65 66 67 68 69 70

Disgust Emotion (Binary Accuracy)

Graph-MFN Graph-MFN Graph-MFN Graph-MFN MV-LSTM TFN TFN MFN

slide-44
SLIDE 44

44

Interpretable Fusion

Paul Pu Liang Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

slide-45
SLIDE 45

45

Interpretable Fusion

Paul Pu Liang Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

slide-46
SLIDE 46

46

Interpretable Fusion

Paul Pu Liang Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

fusion weights trimodal bimodal unimodal

slide-47
SLIDE 47

47

Interpretable Fusion

Paul Pu Liang Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

fusion weights construction weights trimodal bimodal trimodal bimodal unimodal

slide-48
SLIDE 48

48

Multimodal Fusion has a Dynamic Nature

Paul Pu Liang Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

fusion weights construction weights trimodal bimodal trimodal bimodal unimodal

slide-49
SLIDE 49

49

Priors in Human Multimodal Language

Paul Pu Liang Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

fusion weights construction weights trimodal trimodal bimodal unimodal

slide-50
SLIDE 50

50

Priors in Human Multimodal Language

Paul Pu Liang Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

fusion weights construction weights trimodal bimodal trimodal bimodal unimodal

slide-51
SLIDE 51

51

Dynamic Selection of Modalities

Paul Pu Liang Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

fusion weights construction weights trimodal bimodal trimodal bimodal unimodal all modalities are informative

slide-52
SLIDE 52

52

Dynamic Selection of Modalities

Paul Pu Liang Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

fusion weights construction weights trimodal bimodal trimodal bimodal unimodal visual modality uninformative

slide-53
SLIDE 53

53

1

Computational Modeling of Multimodal Language

CMU-MOSEI Dataset

Paul Pu Liang Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

ü Large-scale ü Diverse

slide-54
SLIDE 54

54

1

Computational Modeling of Multimodal Language

CMU-MOSEI Dataset 2 Dynamic Fusion Graph

Paul Pu Liang Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

!",$ !",% !%,$ !",%,$

& ' ( )

ü Large-scale ü Diverse ü Good Performance ü Interpretable

slide-55
SLIDE 55

55

The End!

Data: https://github.com/A2Zadeh/CMU-MultimodalSDK Website: www.cs.cmu.edu/~pliang Email: pliang@cs.cmu.edu Twitter: @pliang279

Paul Pu Liang Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

slide-56
SLIDE 56

56

Workshop @ 20 July 9am – 3pm, Room 217 First Grand Challenge and Workshop on Human Multimodal Language

Paul Pu Liang Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

Data: https://github.com/A2Zadeh/CMU-MultimodalSDK Website: www.cs.cmu.edu/~pliang Email: pliang@cs.cmu.edu Twitter: @pliang279

The End!