Multi-attention Recurrent Network for Human Communication - PowerPoint PPT Presentation

Multi-attention Recurrent Network for Human Communication Comprehension Amir Zadeh, Paul Pu Liang, Soujanya Poria, Prateek Vij, Erik Cambria, Louis-Philippe Morency Presenter: Paul Pu Liang 1

Progress of Artificial Intelligence Intelligent Robots and Multimedia Content Personal Assistants Virtual Agents 2

Multimodal Communicative Behaviors Language Visual Sentiment Ø Positive Ø Lexicon Ø Gestures Ø Negative Words Head gestures • • Emotion Eye gestures Ø Syntax • Arm gestures Part-of-speech • Ø Anger • Dependencies Ø Body language Ø Disgust • Body posture Ø Fear Ø Pragmatics • Proxemics Discourse acts • Ø Happiness • Acoustic Ø Eye contact Ø Sadness Head gaze • Ø Surprise Ø Prosody Eye gaze • Intonation Personality • Ø Facial expressions Voice quality • Ø Confidence FACS action units Ø Vocal expressions • § Vocal expressions Ø Persuasion Smile, frowning Laughter, moans • • § Laughter, moans Ø Passion 3

Challenge 1: Intra-modal dynamics Speaker’s behaviors Sentiment Intensity “This movie is great” Intra-modal time Head nod Smile time 4

Challenge 1: Intra-modal dynamics Speaker’s behaviors Sentiment Intensity “This movie is great” Intra-modal time Head nod Smile time 5

Challenge 2: Cross-modal Dynamics a) Multiple co-occurring interactions Speaker’s behaviors Sentiment Intensity Cross-modal “This movie is great ” Smile Loud voice time 6

Challenge 2: Cross-modal Dynamics a) Multiple co-occurring interactions b) Different weighted combinations Speaker’s behaviors Sentiment Intensity Cross-modal “This movie is fair ” Smile Loud voice time 7

Challenge 2: Cross-modal Dynamics a) Multiple co-occurring interactions b) Different weighted combinations c) Multiple prediction targets Speaker’s behaviors Emotions Cross-modal “This movie is great” Happy Raised Eyebrows Surprised Loud voice time 8

Multi-attention Recurrent Network (MARN) 1 Modeling intra-modal dynamics Set of Long-short Term Memories 9

Multi-attention Recurrent Network (MARN) 1 Modeling intra-modal dynamics Set of Long-short Term Memories 2 Modeling cross-modal dynamics Set of Long-short Term Hybrid Memories + Single-attention Block 10

Multi-attention Recurrent Network (MARN) 1 Modeling intra-modal dynamics Set of Long-short Term Memories 2 Modeling cross-modal dynamics Set of Long-short Term Hybrid Memories + Single-attention Block Modeling multiple cross-modal dynamics Set of Long-short Term Hybrid Memories + Multi-attention Block 11

Challenge 1: Intra-modal Dynamics LSTM 𝑚 LSTM 𝑚 LSTM 𝑚 LSTM 𝑚 LSTM 𝑚 𝒎𝒃𝒐𝒉𝒗𝒃𝒉𝒇 great This movie is - LSTM 𝑤 𝒘𝒋𝒕𝒗𝒃𝒎 LSTM 𝑤 LSTM 𝑤 LSTM 𝑤 LSTM 𝑤 - - - (smile) - LSTM 𝑏 LSTM 𝑏 LSTM 𝑏 LSTM 𝑏 LSTM 𝑏 𝒃𝒅𝒑𝒗𝒕𝒖𝒋𝒅 - - - - (loud voice) 12

Challenge 2: Cross-modal Dynamics Ø How do we capture cross-modal dynamics continuously across time? LSTM 𝑚 LSTM 𝑚 LSTM 𝑤 LSTM 𝑤 LSTM 𝑏 LSTM 𝑏 13

Challenge 2: Single-attention Block 3 ℎ 2 Captures cross-modal 𝑨 2 4 ℎ 2 dynamics. 5 ℎ 2 LSTM 𝑚 LSTM 𝑚 LSTM 𝑤 LSTM 𝑤 LSTM 𝑏 LSTM 𝑏 14

Challenge 2: Single-attention Block 8 𝑏 2 3 ℎ 2 𝒝 4 ℎ 2 5 ℎ 2 LSTM 𝑚 LSTM 𝑚 LSTM 𝑤 LSTM 𝑤 LSTM 𝑏 LSTM 𝑏 15

Challenge 2: Single-attention Block 9 2 8 8 𝑏 2 ℎ language 3 ℎ 2 𝒝 4 vision ℎ 2 acoustic 5 ℎ 2 ⨂ LSTM 𝑚 LSTM 𝑚 LSTM 𝑤 LSTM 𝑤 LSTM 𝑏 LSTM 𝑏 16

Challenge 2: Single-attention Block 9 2 8 8 𝑏 2 ℎ 𝒟 3 language 3 ℎ 2 3 𝑡 2 𝒝 𝒟 4 4 vision 4 ℎ 2 𝑡 2 𝒟 5 acoustic 5 5 ℎ 2 𝑡 2 ⨂ LSTM 𝑚 LSTM 𝑚 LSTM 𝑤 LSTM 𝑤 LSTM 𝑏 LSTM 𝑏 17

Challenge 2: Single-attention Block 9 2 8 8 𝑏 2 ℎ 𝒟 3 language 3 ℎ 2 3 𝑡 2 𝒣 𝒝 𝑨 2 𝒟 4 4 vision 4 ℎ 2 𝑡 2 𝒟 5 acoustic 5 5 ℎ 2 𝑡 2 ⨂ LSTM 𝑚 LSTM 𝑚 LSTM 𝑤 LSTM 𝑤 LSTM 𝑏 LSTM 𝑏 18

Challenge 2: Single-attention Block LSTM update 9 2 8 8 𝑏 2 ℎ 3 + 𝑐 3 𝑋 3 𝑦 2@8 3 + 𝑉 3 ℎ 2 𝒟 3 language 3 ℎ 2 3 𝑡 2 𝒣 𝒝 𝑨 2 𝒟 4 4 vision 4 ℎ 2 𝑡 2 𝒟 5 acoustic 5 5 ℎ 2 𝑡 2 ⨂ LSTM 𝑚 LSTM 𝑚 LSTM 𝑤 LSTM 𝑤 LSTM 𝑏 LSTM 𝑏 19

Challenge 2: Long-short Term Hybrid Memory LSTHM update 9 2 8 8 𝑏 2 ℎ 3 + 𝑾 𝒎 𝒜 𝒖 𝒎 + 𝑐 3 3 𝑋 3 𝑦 2@8 + 𝑉 3 ℎ 2 𝒟 3 language 3 ℎ 2 3 𝑡 2 𝒣 𝒝 𝑨 2 𝒟 4 4 vision 4 ℎ 2 𝑡 2 𝒟 5 acoustic 5 5 ℎ 2 𝑡 2 ⨂ LSTHM 𝒎 LSTHM 𝒎 LSTHM 𝒘 LSTHM 𝒘 LSTHM 𝒃 LSTHM 𝒃 20

Challenge 2: Multi-attention Block language 𝒟 3 3 3 ℎ 2 𝑡 2 𝒣 𝒝 ⋯ ⋯ vision 𝑨 2 𝒟 4 4 4 𝑡 2 ℎ 2 acoustic 𝒟 5 5 5 𝑡 2 ℎ 2 ⨂ ⨂ ⨂ LSTHM 𝒎 LSTHM 𝒎 LSTHM 𝒘 LSTHM 𝒘 LSTHM 𝒃 LSTHM 𝒃 21

Multi-attention Recurrent Network (MARN) language 𝒟 3 3 3 ℎ 2 𝑡 2 𝒣 𝒝 ⋯ ⋯ vision 𝑨 2 𝒟 4 4 4 𝑡 2 ℎ 2 acoustic 𝒟 5 5 5 𝑡 2 ℎ 2 ⨂ ⨂ ⨂ LSTHM 𝒎 LSTHM 𝒎 LSTHM 𝒘 LSTHM 𝒘 LSTHM 𝒃 LSTHM 𝒃 22

Experiments Language Sentiment Ø Positive Ø Glove word embeddings Ø Negative Visual Emotion Alignment Ø Anger Ø Facet features Ø Disgust FACS action units • Ø Word level Emotions Ø Fear • Ø P2FA Acoustic Ø Happiness Ø Sadness Ø COVAREP features Ø Surprise MFCCs • Personality Pitch tracking • Ø Confidence § Vocal expressions Ø Persuasion § Laughter, moans Ø Passion 23

Baseline Models 1. Non-temporal Models • SVM-MD, RF 2. Early Fusion • HMM, EF-LSTM, EF-HCRF, C-MKL, SAL-CNN 3. Late Fusion • DF, TFN, BC-LSTM 4. Multi-view Learning • MV-HMMs, MV-HCRFs, MV-LSTM 24

State-of-the-art Results CMU-MOSI Sentiment Analysis 80 75 70 65 60 55 50 45 THMM RF EF-HCRF MV-HCRF SVM-MD C-MKL DF SAL-CNN EF-LSTM MV-LSTM BC-LSTM TFN MARN Baseline Models Multi-attention Recurrent Network (MARN) 25

State-of-the-art Results Emotion Recognition Personality Trait Prediction Sentiment Analysis 38 34 90 33 37 85 32 80 36 75 31 35 70 30 34 65 29 60 33 28 55 32 27 50 31 26 45 40 30 25 CMU-MOSI ICT-MMMO MOUD YouTube IEMOCAP POM Confidence POM Persusasion POM Passion POM Credibility State-of-the-art Baseline Multi-attention Recurrent Network (MARN) 26

Multi-attention Block is Important Emotion Recognition Personality Trait Prediction Sentiment Analysis 38 34 90 33 37 85 32 80 36 75 31 35 70 30 34 65 29 60 33 28 55 32 27 50 31 26 45 40 30 25 CMU-MOSI ICT-MMMO MOUD YouTube IEMOCAP POM Confidence POM Persusasion POM Passion POM Credibility No Multi-attention Block Multi-attention Recurrent Network (MARN) 27

Multiple Attentions are Important YouTube Sentiment Analysis CMU-MOSI Sentiment Analysis 77.2 56 77 54 76.8 52 76.6 50 76.4 48 76.2 46 76 44 75.8 42 75.6 40 1 2 3 4 5 1 2 3 4 5 6 Number of attentions Number of attentions 28

Visualization Attentions show diversity and are sensitive to different cross-modal dynamics inactive active 29

Visualization Some attentions always inactive • Carry only intra-modal dynamics • No cross-modal dynamics inactive active 30

Visualization Attentions change behaviors across time , some changes are more drastic than others. time inactive active 31

Visualization Different attentions focus on different modalities. active inactive inactive active 32

Multi-attention Recurrent Network (MARN) 1 Modeling intra-modal dynamics Set of Long-short Term Memories 2 Modeling cross-modal dynamics Set of Long-short Term Hybrid Memories + Single-attention Block Modeling multiple cross-modal dynamics Set of Long-short Term Hybrid Memories + Multi-attention Block 33

The End! Code: https://github.com/A2Zadeh/MARN Email: pliang@cs.cmu.edu 34

The End! Code: https://github.com/A2Zadeh/MARN Email: pliang@cs.cmu.edu Workshop @ ACL 2018 First Workshop on Computational Modeling of Human Multimodal Language multicomp.cs.cmu.edu/acl2018multimodalchallenge/ 35

Multi-attention Recurrent Network for Human Communication - PowerPoint PPT Presentation

Multi-attention Recurrent Network for Human Communication Comprehension Amir Zadeh, Paul Pu Liang, Soujanya Poria, Prateek Vij, Erik Cambria, Louis-Philippe Morency Presenter: Paul Pu Liang 1 Progress of Artificial Intelligence Intelligent

Attention in NLP CS 6956: Deep Learning for NLP Overview What is attention Attention in

Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat

CHAPTER VII VII CHAPTER Learning in Recurrent Networks Learning in Recurrent Networks CHAPTER

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

Recurrent Neural Network Xiaogang Wang xgwang@ee.cuhk.edu.hk February 26, 2019 cuhk Xiaogang

Attention Eye tracking seminar 2/19/15 Presented by Tatiana Emmanouil Outline What is

Attention, Transformer and BERT Prof. Kuan-Ting Lai 2020/6/16 Attention is All You Need! A.

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

Recurrent Neural Network Agenda Recurrent Neural Network

Introduction CSCE CSCE 496/896 496/896 Lecture 6: Lecture 6: Recurrent Recurrent CSCE

Human Language vs. Animal Communication Linguistics 101 Human Language vs. Animal Communication

Human-Robot Social Interactions through Multimodal Deep Attention Recurrent Q-Network Nana Baah

TACN - 2019 Tennessee Advanced Communication Network 1 Tennessee Advanced Communication Network

Attention! 1. Definitions and behavioral effects 2. Effects on neural firing rates: Spatial

The Attention Economy What is the attention economy? A business model where you (as the

Touchless Interfaces Managing sensor data Design considerations In-air and speech interfaces

Growth in Baumslag-Solitar Groups Asymptotics Groups St Andrews August 2013 Eric Freden and

Variations in Yiddish Among Hasidic Preschoolers: Production and Preference NSF Grant # 1659607

Webinar Information All participants Use the This webinar This webinar is lines are questions

Norbert Wiener P n h a i l i o c s i t o a p m h e e h r t a M Leo Weiner

TUHH TUHH Hamburg University of Technology Hamburg University of Technology Motivation

gr-satellites latest developments Dr. Daniel Estvez 2 February 2020 FOSDEM 2020, Brussels Dr.

Zero Knowledge Proofs Lecture 21 DNSSEC Recall: Name servers, when queried with a domain name,