Ranking Emotional Attributes With Deep Neural Networks Srinivas - PowerPoint PPT Presentation

Ranking Emotional Attributes With Deep Neural Networks Srinivas Parthasarathy, Reza Lotfian and Carlos Busso Multimodal Signal Processing (MSP) lab The University of Texas at Dallas Erik Jonsson School of Engineering and Computer Science March 8, 2017 msp.utdallas.edu

Motivation • Emotion recognition Very Active systems can be trained Angry Happy to Arousal • Classify discrete Very Negative Very Positive categories such as Neutral Happy, Neutral, Angry Sad etc. • Classify or predict values of emotional attributes such Very Passive Valence as • Arousal (passive vs active) • Valence (positive vs negative) 2 msp.utdallas.edu

Motivation Very Active • Humans are better at relative comparisons than absolute Angry Happy values Arousal • Rank emotional attributes rather Very Negative Very Positive than absolute Neutral classification/regression Sad • Appealing for Emotional Retrieval tasks Very Passive • Valence Rank order aggressive behavior • Retrieve target behaviors with given emotions 3 msp.utdallas.edu

Related Work • Commonly formulated as comparisons between pairs of samples • Rankers for categorical emotions (e.g. Which is angry rankers) [Cao et al. 2012, 2014] angrier • Pairs formed between preferred emotion and other emotion • Preference learning methods were used to learn from continuous ratings [Martinez et al. 2014] • Alternative framework to study trends where raters agreed [Parthasarathy et al. 2016] 4 msp.utdallas.edu

Contributions • We rank order emotional attribute • None of the previous studies have focused on using neural net learning techniques for preference learning • We utilize a neural network framework for preference learning – RankNet • To our knowledge, this is the first study that uses neural networks for ranking emotional attributes 5 msp.utdallas.edu

RankNet 𝑔(𝜲 ) • Given: samples 𝑗, 𝑘 , with features 𝜲 𝒋 , 𝜲 𝒌 • Goal: Find 𝑔 that learns the probability, P 𝑗𝑘 , that 𝑗 ≫ 𝑘 • Neural network learns the function 𝑔 , which maps feature vector 𝜲 , to 𝑔(𝜲 ) 𝜲 • Probabilistic framework P 𝑗𝑘 • P 𝑗𝑘 ≡ 1 Sigmoid 1+𝑓 −σ(𝑔(𝜲 𝒋 )−𝑔(𝜲 𝒌 )) 𝑔(𝜲 𝒋 ) − 𝑔(𝜲 𝒌 ) msp.utdallas.edu 6

RankNet • Ideal probabilities P 𝑗𝑘 is set according to the preference in pairs of samples. • P 𝑗𝑘 = 0 if 𝑘 ≫ 𝑗 • P 𝑗𝑘 = 1 if 𝑗 ≫ 𝑘 • Cross entropy is then used as the cost function to measure deviation of model 𝐷 = −P 𝑗𝑘 𝑚𝑝𝑕 P 𝑗𝑘 − 1 − P 𝑗𝑘 𝑚𝑝𝑕 1 − P 𝑗𝑘 • Simplifies to • 𝐷 = 𝑚𝑝𝑕 1 + 𝑓 −σ(𝑔(𝜲 𝒋 )−𝑔(𝜲 𝒌 )) when P 𝑗𝑘 = 1 • 𝐷 = 𝑚𝑝𝑕 1 + 𝑓 −σ(𝑔(𝜲 𝒌 )−𝑔(𝜲 𝒋 )) when P 𝑗𝑘 = 0 msp.utdallas.edu 7

RankNet Framework • The neural network for RankNet can be modeled with a Siamese architecture 𝐷 • Features of pairs of P 𝑗𝑘 samples are fed at the input • Train two identical neural 𝑋 Feedforward Feedforward DNN DNN networks that share all parameters 𝜲 𝒌 𝜲 𝒋 msp.utdallas.edu 8

Baselines • RankSVM framework for recognizing emotional attributes [Lotfian & Busso 2016] • Given: 𝑗 ≫ 𝑘 goal is to 1 𝑥 2 + 𝐷 min ξ 𝑗,𝑘 2 𝑥,ξ 𝑗,𝑘 𝑡. 𝑢 𝑥, 𝜲 𝒋 − 𝜲 𝒌 ≥ 1 − ξ 𝑗,𝑘 𝑏𝑜𝑒 ξ 𝑗,𝑘 ≥ 0 • Reduced to binary classification with 𝜲 𝒋 − 𝜲 𝒌 msp.utdallas.edu 9

Differences • RankSVM • Input is restricted to difference between features 𝜲 𝒋 − 𝜲 𝒌 • Large margin classifier SVM • Redundant data can be removed • 𝜲 𝒋 − 𝜲 𝒌 Performance does not increase with data [Lotfian & Busso 2016] • Kernel methods for non-linear classification 1 • P 𝑗𝑘 ≡ RankNet 1 + 𝑓 −σ(𝑔(𝜲 𝒋 )−𝑔(𝜲 𝒌 )) • Features 𝜲 individually fed with no restrictions DNN • Learns a non-linear mapping 𝑔(𝜲 ) • 𝜲 𝒋 𝜲 𝒌 Optimized for pairs of samples • Highly data and parameter dependent msp.utdallas.edu 10

Baselines g(𝜲 ) • DNNRegression: Regression using DNNs • No relative comparisons • Use scores, g(𝜲 ) to 𝜲 rank order sentences msp.utdallas.edu 11

Databases • Train: USC-IEMOCAP • 12 hours of conversational recordings from 10 actors in dyadic sessions • Sessions consists of emotional scripts as well as improvised interactions • All speaking turns annotated for emotional attributes by two raters on a scale of 1-5 IEMOCAP • Arousal, Valence and Dominance • Test: MSP-IMPROV • Improvisation between actors (12 actors) • Contains 8,438 speaking turns • Annotated by novel crowdsourcing methods on a scale of 1-5 by at least 5 raters • MSP-IMPROV Arousal, Valence and Dominance msp.utdallas.edu 12

Experimental Settings • Acoustic Features • Geneva Minimalistic Acoustic Parameter Set [Eyben et al. 2016] • Minimalistic features selected based on their performance in previous studies • Extended set – 88 features • Reproducibility (no feature selection) • Theoretical significance • All DNN architectures include • 2 hidden layer, feed forward architecture 256 nodes each • Sigmoidal activation function • Stochastic Gradient Descent, learning rate of 10 −4 for 100 epochs msp.utdallas.edu 13

Experimental Settings • Relative labels: consider samples separated by margin 𝑢 • 𝑇1 𝑏𝑠𝑝𝑣𝑡𝑏𝑚 − 𝑇2 𝑏𝑠𝑝𝑣𝑡𝑏𝑚 > 𝑢 • Tradeoff between 𝑢 and data size t = 0 • 𝑢 reliability data decrease increase increase • RankSVM: 𝑢 = 1.0 for arousal and dominance 𝑢 = 0.9 for valence [Lotfian & Busso 2016] t = 1 • For RankNet we study the performance for 𝑢 ∈ {0,1,2,3} • Regression has no relative scores t = 2 msp.utdallas.edu 14

Evaluation • Precision at 𝑙 ( 𝑄@𝑙 ) • Measures the precision at retrieving 𝑙 % of the samples from top and Ordered speech samples bottom • Ground truth is split into high and low classes about the median Arousal • Evaluate success in retrieving samples on the correct side of the split msp.utdallas.edu 15

Effect of Margin on RankNet • Attributes annotated on scale of 1-5 • P@10, P@20, P@30 • We see improvement for 𝑢 = 1,2 but decrease 𝑢 = 3 . • Use 𝑢 = 2 for RankNet msp.utdallas.edu 16

Comparisons RankSVM RankNet DNNRegression Arosual P@10 85.77 88.02 87.54 83.93 * 83.72 * P@20 80.81 79.32 * 79.02 * P@30 77.15 Valence 71.29 * 69.28 * P@10 63.46 64.77 * 63.76 * P@20 59.79 61.66 * 61.13 * P@30 57.26 Dominance 86.15 * 84.67 * P@10 76.79 79.94 * 79.61 * P@20 73.97 75.65 * 75.33 * P@30 70.95 * Denotes Statistical Significance over RankSVM (population proportion) msp.utdallas.edu 17

Results • Kendall’s Tau RankSVM RankNet DNNRegression Coefficient 𝜐 0.41 * 0.41 * Arousal 0.36 • 0.14 * 0.13 * Valence 0.08 Correlation between the two 0.35 * 0.34 * Dominance 0.28 ordered lists [-1,1] • RankNet and DNNRegression outperform RankSVM in all cases for 𝑄@𝑙 and Kendall’s 𝜐 • Kendall’s 𝜐 values are better than those reported in previous studies • 𝜐 values ≈ 0.02 for Arousal, 0.05 valence [Martinez et al. 2014] msp.utdallas.edu 18

Conclusions • Benefits of using deep neural network architectures for ranking emotional attributes • Cross – corpora evaluations show that RankNet algorithms outperform RankSVM algorithms for 𝑄@𝑙 , 𝜐 • Future Work • Use of other architectures (RNN-LSTMs) for preference learning to outperform DNNRegression • Ranking for emotional classes • Role of training data size in performance • Will we see better performance with increase in data size? msp.utdallas.edu 19

Thanks for your attention! Questions? msp.utdallas.edu

Ranking Emotional Attributes With Deep Neural Networks Srinivas - PowerPoint PPT Presentation

Ranking Emotional Attributes With Deep Neural Networks Srinivas Parthasarathy, Reza Lotfian and Carlos Busso Multimodal Signal Processing (MSP) lab The University of Texas at Dallas Erik Jonsson School of Engineering and Computer Science

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

61A Lecture 16 Terminology: Python object system: Functions are objects. Wednesday, October 3

Data Examples Announcements Examples: Objects Land Owners Instance attributes are found before

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Tutorial: TF-Ranking for sparse features Tutorial: TF-Ranking for sparse features This tutorial

Beyond Emotional Intelligence: Emotional Competence in the Workplace S. Colby Peters, PhD, LCSW

Deep Learning with Neural Networks The Structure and Optimization of Deep Neural Networks Allan

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Easy and Hard Outline Constraint Ranking in OT The Constraint Ranking problem Making fast

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Optimizing Deep Neural Networks Leena Chennuru Vankadara 26-10-2015 Table of Contents Neural

Introduction to Data Science: Principles ordered categorical data do not have magnitude

From E/R Diagrams to Relations Entity set relation Attributes attributes

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

TEXTURE MAPPING 1 OUTLINE Implementing Texturing What Can Go Wrong and How to Fix It

Lecture 2: From MDP Planning to RL Basics CS234: RL Emma Brunskill Spring 2017 Recap: Value

Comparison of spatial interpolation methods using a simulation experiment based on Australian

The Gnuradio Companion (GRC) Josh Blum October 5, 2009 Contents 1 Introduction 2 1.1

The GNU Radio Companion Changelog Communicatjons Engineering Lab Prof. i.R. Dr.rer.nat. Friedrich

Maximum Entropy Inverse RL, Adversarial imitation learning Katerina Fragkiadaki Reinforcement

Lecture no: 10 Multicarrier systems History of multicarrier Modulation/demodulation

International Virtual Observatory Alliance The Standards Organization for Data Interoperability