1
Amir Zadeh, Paul Pu Liang, Soujanya Poria, Prateek Vij, Erik Cambria, Louis-Philippe Morency
Presenter: Paul Pu Liang
Multi-attention Recurrent Network for Human Communication Comprehension
Multi-attention Recurrent Network for Human Communication - - PowerPoint PPT Presentation
Multi-attention Recurrent Network for Human Communication Comprehension Amir Zadeh, Paul Pu Liang, Soujanya Poria, Prateek Vij, Erik Cambria, Louis-Philippe Morency Presenter: Paul Pu Liang 1 Progress of Artificial Intelligence Intelligent
1
Amir Zadeh, Paul Pu Liang, Soujanya Poria, Prateek Vij, Erik Cambria, Louis-Philippe Morency
Presenter: Paul Pu Liang
Multi-attention Recurrent Network for Human Communication Comprehension
2
Progress of Artificial Intelligence
Multimedia Content Intelligent Personal Assistants Robots and Virtual Agents
3
Multimodal Communicative Behaviors
Ø Gestures
Ø Body language
Ø Eye contact
Ø Facial expressions
Language Visual Acoustic
Ø Lexicon
Ø Syntax
Ø Pragmatics
Ø Prosody
§ Vocal expressions § Laughter, moans
Ø Anger Ø Disgust Ø Fear Ø Happiness Ø Sadness Ø Surprise
Emotion Personality
Ø Confidence Ø Persuasion Ø Passion
Sentiment
Ø Positive Ø Negative
Ø Vocal expressions
4
Challenge 1: Intra-modal dynamics
“This movie is great” Smile
Intra-modal Speaker’s behaviors Sentiment Intensity time time
Head nod
5
Challenge 1: Intra-modal dynamics
“This movie is great” Smile
Intra-modal Speaker’s behaviors Sentiment Intensity time time
Head nod
6
Challenge 2: Cross-modal Dynamics
a) Multiple co-occurring interactions
“This movie is great” Smile
Cross-modal Speaker’s behaviors Sentiment Intensity
Loud voice
time
7
a) Multiple co-occurring interactions b) Different weighted combinations
Challenge 2: Cross-modal Dynamics
“This movie is fair” Smile
Cross-modal Speaker’s behaviors Sentiment Intensity
Loud voice
time
8
a) Multiple co-occurring interactions b) Different weighted combinations c) Multiple prediction targets
Challenge 2: Cross-modal Dynamics
“This movie is great” Raised Eyebrows
Cross-modal Speaker’s behaviors Emotions
Loud voice
time Happy Surprised
9
1
Multi-attention Recurrent Network (MARN)
Modeling intra-modal dynamics Set of Long-short Term Memories
10
1
Multi-attention Recurrent Network (MARN)
Modeling intra-modal dynamics Set of Long-short Term Memories 2 Modeling cross-modal dynamics Set of Long-short Term Hybrid Memories + Single-attention Block
11
1
Multi-attention Recurrent Network (MARN)
Modeling intra-modal dynamics Set of Long-short Term Memories 2 Modeling cross-modal dynamics Set of Long-short Term Hybrid Memories + Single-attention Block Modeling multiple cross-modal dynamics Set of Long-short Term Hybrid Memories + Multi-attention Block
12
Challenge 1: Intra-modal Dynamics
LSTM 𝑚
This
LSTM 𝑚 LSTM 𝑚 LSTM 𝑚
movie is great
LSTM 𝑤
LSTM 𝑤 LSTM 𝑤
(smile)
LSTM 𝑏
LSTM 𝑏 LSTM 𝑏
(loud voice)
𝒘𝒋𝒕𝒗𝒃𝒎 𝒃𝒅𝒑𝒗𝒕𝒖𝒋𝒅
13
Challenge 2: Cross-modal Dynamics
LSTM 𝑚 LSTM 𝑤 LSTM 𝑤 LSTM 𝑏 LSTM 𝑏 LSTM 𝑚
ØHow do we capture cross-modal dynamics continuously across time?
14
Challenge 2: Single-attention Block
LSTM 𝑚 LSTM 𝑤 LSTM 𝑤 LSTM 𝑏 LSTM 𝑏 LSTM 𝑚
ℎ2
3
ℎ2
4
ℎ2
5
Captures cross-modal dynamics.
𝑨2
15
Challenge 2: Single-attention Block
ℎ2
3
ℎ2
4
ℎ2
5
𝑏2
8
LSTM 𝑚 LSTM 𝑤 LSTM 𝑤 LSTM 𝑏 LSTM 𝑏 LSTM 𝑚
16
Challenge 2: Single-attention Block
language vision acoustic
ℎ 92
8
⨂ ℎ2
3
ℎ2
4
ℎ2
5
𝑏2
8
LSTM 𝑚 LSTM 𝑤 LSTM 𝑤 LSTM 𝑏 LSTM 𝑏 LSTM 𝑚
17
Challenge 2: Single-attention Block
language vision acoustic
ℎ 92
8
⨂ 𝒟 3 𝒟 4 𝒟 5 𝑡2
3
𝑡2
4
𝑡2
5
ℎ2
3
ℎ2
4
ℎ2
5
𝑏2
8
LSTM 𝑚 LSTM 𝑤 LSTM 𝑤 LSTM 𝑏 LSTM 𝑏 LSTM 𝑚
18
Challenge 2: Single-attention Block
language vision acoustic
ℎ 92
8
⨂ 𝒟 3 𝒟 4 𝒟 5 𝑡2
3
𝑡2
4
𝑡2
5
ℎ2
3
ℎ2
4
ℎ2
5
𝑏2
8
LSTM 𝑚 LSTM 𝑤 LSTM 𝑤 LSTM 𝑏 LSTM 𝑏 LSTM 𝑚
𝑨2
19
Challenge 2: Single-attention Block
language vision acoustic
ℎ 92
8
⨂ 𝒟 3 𝒟 4 𝒟 5 ℎ2
3
ℎ2
4
ℎ2
5
𝑏2
8
LSTM 𝑚 LSTM 𝑤 LSTM 𝑤 LSTM 𝑏 LSTM 𝑏 LSTM 𝑚 LSTM update
𝑡2
3
𝑡2
4
𝑡2
5
𝑨2
𝑋3𝑦2@8
3
+ 𝑉3ℎ2
3 + 𝑐3
20
Challenge 2: Long-short Term Hybrid Memory
language vision acoustic
ℎ 92
8
⨂ 𝒟 3 𝒟 4 𝒟 5
LSTHM 𝒘 LSTHM 𝒘 LSTHM 𝒃 LSTHM 𝒎
ℎ2
3
ℎ2
4
ℎ2
5
𝑨2 𝑏2
8
LSTHM 𝒎 LSTHM 𝒃 𝑋3𝑦2@8
3
+ 𝑉3ℎ2
3 + 𝑾𝒎𝒜𝒖 𝒎 + 𝑐3
LSTHM update
𝑡2
3
𝑡2
4
𝑡2
5
21
Challenge 2: Multi-attention Block
𝒟 3 𝒟 4 𝒟 5 ℎ2
3
ℎ2
4
ℎ2
5
𝑨2
language vision acoustic ⋯ ⨂ ⋯ ⨂ ⨂
LSTHM 𝒘 LSTHM 𝒘 LSTHM 𝒎 LSTHM 𝒎 LSTHM 𝒃 LSTHM 𝒃
𝑡2
3
𝑡2
4
𝑡2
5
22
Multi-attention Recurrent Network (MARN)
𝒟 3 𝒟 4 𝒟 5 ℎ2
3
ℎ2
4
ℎ2
5
𝑨2
language vision acoustic ⋯ ⨂ ⋯ ⨂ ⨂
LSTHM 𝒘 LSTHM 𝒘 LSTHM 𝒎 LSTHM 𝒎 LSTHM 𝒃 LSTHM 𝒃
𝑡2
3
𝑡2
4
𝑡2
5
23
Experiments
Ø Anger Ø Disgust Ø Fear Ø Happiness Ø Sadness Ø Surprise
Emotion Personality
Ø Confidence Ø Persuasion Ø Passion
Sentiment
Ø Positive Ø Negative
Ø Facet features
Language Visual Acoustic
Ø Glove word embeddings Ø COVAREP features
§ Vocal expressions § Laughter, moans
Alignment
Ø Word level Ø P2FA
24
Baseline Models
25
State-of-the-art Results
45 50 55 60 65 70 75 80 THMM RF EF-HCRF MV-HCRF SVM-MD C-MKL DF SAL-CNN EF-LSTM MV-LSTM BC-LSTM TFN MARNCMU-MOSI Sentiment Analysis
Baseline Models Multi-attention Recurrent Network (MARN)
26
State-of-the-art Results
40 45 50 55 60 65 70 75 80 85 90 CMU-MOSI ICT-MMMO MOUD YouTubeSentiment Analysis
25 26 27 28 29 30 31 32 33 34 POM Confidence POM Persusasion POM Passion POM Credibility 30 31 32 33 34 35 36 37 38 IEMOCAPState-of-the-art Baseline Multi-attention Recurrent Network (MARN)
Emotion Recognition Personality Trait Prediction
27
Multi-attention Block is Important
40 45 50 55 60 65 70 75 80 85 90 CMU-MOSI ICT-MMMO MOUD YouTubeSentiment Analysis
30 31 32 33 34 35 36 37 38 IEMOCAP 25 26 27 28 29 30 31 32 33 34 POM Confidence POM Persusasion POM Passion POM CredibilityNo Multi-attention Block Multi-attention Recurrent Network (MARN)
Emotion Recognition Personality Trait Prediction
28
Multiple Attentions are Important
75.6 75.8 76 76.2 76.4 76.6 76.8 77 77.2 40 42 44 46 48 50 52 54 56YouTube Sentiment Analysis 1 2 3 4 5 1 2 3 4 5 6 Number of attentions Number of attentions CMU-MOSI Sentiment Analysis
29
Visualization
Attentions show diversity and are sensitive to different cross-modal dynamics active inactive
30
Visualization
Some attentions always inactive
active inactive
31
Visualization
Attentions change behaviors across time, some changes are more drastic than others. time active inactive
32
Visualization
Different attentions focus on different modalities. inactive active active inactive
33
1
Multi-attention Recurrent Network (MARN)
Modeling intra-modal dynamics Set of Long-short Term Memories 2 Modeling cross-modal dynamics Set of Long-short Term Hybrid Memories + Single-attention Block Modeling multiple cross-modal dynamics Set of Long-short Term Hybrid Memories + Multi-attention Block
34
Code: https://github.com/A2Zadeh/MARN Email: pliang@cs.cmu.edu
35
Workshop @ ACL 2018 First Workshop on Computational Modeling of Human Multimodal Language multicomp.cs.cmu.edu/acl2018multimodalchallenge/
Code: https://github.com/A2Zadeh/MARN Email: pliang@cs.cmu.edu