Joint Emotion Analysis via Multi-task Gaussian Processes Daniel - PowerPoint PPT Presentation

Joint Emotion Analysis via Multi-task Gaussian Processes Daniel Beck, Trevor Cohn, Lucia Specia October 28, 2014

Introduction 1 Multi-task Gaussian Process Regression 2 Experiments and Discussion 3 Conclusions and Future Work 4 2 / 23

Emotion Analysis Goal Automatically detect emotions in a text [Strapparava and Mihalcea, 2008]; 4 / 23

Emotion Analysis Goal Automatically detect emotions in a text [Strapparava and Mihalcea, 2008]; Headline Fear Joy Sadness Storms kill, knock out power, cancel flights 82 0 60 Panda cub makes her debut 0 59 0 4 / 23

Why Multi-task? Learn a model that shows sound and interpretable correlations between emotions. 5 / 23

Why Multi-task? Learn a model that shows sound and interpretable correlations between emotions. Datasets are scarce and small → Multi-task models are able to learn from all emotions jointly; 5 / 23

Why Multi-task? Learn a model that shows sound and interpretable correlations between emotions. Datasets are scarce and small → Multi-task models are able to learn from all emotions jointly; Annotation scheme is subjective and fine-grained → Prone to bias and noise; 5 / 23

Why Multi-task? Learn a model that shows sound and interpretable correlations between emotions. Datasets are scarce and small → Multi-task models are able to learn from all emotions jointly; Annotation scheme is subjective and fine-grained → Prone to bias and noise; Disclaimer: this work is not about features (at the moment...) 5 / 23

Multi-task learning and Anti-correlations Most multi-task models used in NLP assume some degree of correlation between tasks: 6 / 23

Multi-task learning and Anti-correlations Most multi-task models used in NLP assume some degree of correlation between tasks: Domain Adaptation: assumes the existence of a “general” domain-independent knowledge in the data. 6 / 23

Multi-task learning and Anti-correlations Most multi-task models used in NLP assume some degree of correlation between tasks: Domain Adaptation: assumes the existence of a “general” domain-independent knowledge in the data. Annotation Noise Modelling: assumes that annotations are noisy deviations from a “ground truth”. 6 / 23

Multi-task learning and Anti-correlations Most multi-task models used in NLP assume some degree of correlation between tasks: Domain Adaptation: assumes the existence of a “general” domain-independent knowledge in the data. Annotation Noise Modelling: assumes that annotations are noisy deviations from a “ground truth”. For Emotion Analysis, we need a multi-task model that is able to take into account possible anti-correlations, avoiding negative transfer. Headline Fear Joy Sadness Storms kill, knock out power, cancel flights 82 0 60 Panda cub makes her debut 0 59 0 6 / 23

Gaussian Processes Let ( X , y ) be the training data and f ( x ) the latent function that models that data: 8 / 23

Gaussian Processes Let ( X , y ) be the training data and f ( x ) the latent function that models that data: f ( x ) ∼ GP ( µ ( x ) , k ( x , x ′ )) 8 / 23

Gaussian Processes Let ( X , y ) be the training data and f ( x ) the latent function that models that data: f ( x ) ∼ GP ( µ ( x ) , k ( x , x ′ )) Mean function 8 / 23

Gaussian Processes Let ( X , y ) be the training data and f ( x ) the latent function that models that data: f ( x ) ∼ GP ( µ ( x ) , k ( x , x ′ )) Kernel function 8 / 23

Gaussian Processes Let ( X , y ) be the training data and f ( x ) the latent function that models that data: f ( x ) ∼ GP ( µ ( x ) , k ( x , x ′ )) p ( f | X , y ) = p ( y | X , f ) p ( f ) Prior p ( y | X ) 8 / 23

Gaussian Processes Let ( X , y ) be the training data and f ( x ) the latent function that models that data: f ( x ) ∼ GP ( µ ( x ) , k ( x , x ′ )) Likelihood p ( f | X , y ) = p ( y | X , f ) p ( f ) p ( y | X ) 8 / 23

Gaussian Processes Let ( X , y ) be the training data and f ( x ) the latent function that models that data: f ( x ) ∼ GP ( µ ( x ) , k ( x , x ′ )) p ( f | X , y ) = p ( y | X , f ) p ( f ) p ( y | X ) Marginal likelihood 8 / 23

Gaussian Processes Let ( X , y ) be the training data and f ( x ) the latent function that models that data: f ( x ) ∼ GP ( µ ( x ) , k ( x , x ′ )) p ( f | X , y ) = p ( y | X , f ) p ( f ) p ( y | X ) � p ( y ∗ | x ∗ , X , y ) = p ( y ∗ | x ∗ , f , X , y ) p ( f | X , y ) df f Predictive distribution 8 / 23

GP Regression Likelihood: In a regression setting, we usually consider a Gaussian likelihood, which allow us to obtain a closed form solution for the test posterior; 1 AKA Squared Exponential, Gaussian or Exponential Quadratic kernel. 9 / 23

GP Regression Likelihood: In a regression setting, we usually consider a Gaussian likelihood, which allow us to obtain a closed form solution for the test posterior; Kernel: Many options available. In this work we use the Radial Basis Function (RBF) kernel 1 : F � � i ) 2 − 1 ( x i − x ′ k ( x , x ′ ) = α 2 � f × exp 2 l i i =1 1 AKA Squared Exponential, Gaussian or Exponential Quadratic kernel. 9 / 23

The Intrinsic Coregionalisation Model Coregionalisation models extend GPs to vector-valued outputs [´ Alvarez et al., 2012]. Here we use the Intrinsic Coregionalisation Model (ICM): k (( x , d ) , ( x ′ , d ′ )) = k data ( x , x ′ ) × B d , d ′ 10 / 23

The Intrinsic Coregionalisation Model Coregionalisation models extend GPs to vector-valued outputs [´ Alvarez et al., 2012]. Here we use the Intrinsic Coregionalisation Model (ICM): k (( x , d ) , ( x ′ , d ′ )) = k data ( x , x ′ ) × B d , d ′ Kernel on data points (like RBF, for instance) 10 / 23

The Intrinsic Coregionalisation Model Coregionalisation models extend GPs to vector-valued outputs [´ Alvarez et al., 2012]. Here we use the Intrinsic Coregionalisation Model (ICM): k (( x , d ) , ( x ′ , d ′ )) = k data ( x , x ′ ) × B d , d ′ Coregionalisation matrix: encodes task covariances 10 / 23

The Intrinsic Coregionalisation Model Coregionalisation models extend GPs to vector-valued outputs [´ Alvarez et al., 2012]. Here we use the Intrinsic Coregionalisation Model (ICM): k (( x , d ) , ( x ′ , d ′ )) = k data ( x , x ′ ) × B d , d ′ B can be parameterised and learned by optimizing the model marginal likelihood. 10 / 23

PPCA model [Bonilla et al., 2008] decomposes B using PPCA: B = UΛU T + diag( α ) , 11 / 23

PPCA model [Bonilla et al., 2008] decomposes B using PPCA: B = UΛU T + diag( α ) , To ensure numerical stability, we employ the incomplete-Cholesky decomposition over UΛU T : L T + diag( α ) , B = ˜ L˜ 11 / 23

PPCA model L 11 L 21 L 31 L 41 L 51 L 61 ˜ L 12 / 23

PPCA model L 11 L 11 L 21 L 31 L 41 L 51 L 61 L 21 L 31 × L 41 L 51 L 61 ˜ × ˜ L T L 12 / 23

PPCA model L 11 L 11 L 21 L 31 L 41 L 51 L 61 α 1 L 21 α 2 L 31 α 3 × + diag( ) = L 41 α 4 L 51 α 5 L 61 α 6 ˜ × ˜ + diag( α ) = L T L 12 / 23

PPCA model L 11 L 11 L 21 L 31 L 41 L 51 L 61 α 1 L 21 α 2 L 31 α 3 × + diag( ) = L 41 α 4 L 51 α 5 L 61 α 6 ˜ × ˜ + diag( α ) = L T B L 12 / 23

PPCA model 12 hyperparameters L 11 L 11 L 21 L 31 L 41 L 51 L 61 α 1 L 21 α 2 L 31 α 3 × + diag( ) = L 41 α 4 L 51 α 5 L 61 α 6 ˜ × ˜ + diag( α ) = L T B L 12 / 23

PPCA model 18 hyperparameters L 11 L 12 L 11 L 21 L 31 L 41 L 51 L 61 α 1 L 21 L 22 L 12 L 22 L 32 L 42 L 52 L 62 α 2 L 31 L 32 α 3 × + diag( ) = L 41 L 42 α 4 L 51 L 52 α 5 L 61 L 62 α 6 ˜ × ˜ + diag( α ) = L T B L 13 / 23

PPCA model 24 hyperparameters L 11 L 12 L 13 L 11 L 21 L 31 L 41 L 51 L 61 α 1 L 21 L 22 L 23 L 12 L 22 L 32 L 42 L 52 L 62 α 2 L 31 L 32 L 33 L 13 L 23 L 33 L 43 L 53 L 63 α 3 × + diag( ) = L 41 L 42 L 43 α 4 L 51 L 52 L 53 α 5 L 61 L 62 L 63 α 6 ˜ × ˜ + diag( α ) = L T B L 14 / 23

Experimental Setup Dataset: SEMEval2007 “Affective Text” [Strapparava and Mihalcea, 2007]; 16 / 23

Experimental Setup Dataset: SEMEval2007 “Affective Text” [Strapparava and Mihalcea, 2007]; 1000 News headlines, each one annotated with 6 scores [0-100], one for emotion; 16 / 23

Joint Emotion Analysis via Multi-task Gaussian Processes Daniel - PowerPoint PPT Presentation

Joint Emotion Analysis via Multi-task Gaussian Processes Daniel Beck, Trevor Cohn, Lucia Specia October 28, 2014 Introduction 1 Multi-task Gaussian Process Regression 2 Experiments and Discussion 3 Conclusions and Future Work 4 2 / 23

Motivation and Emotion: Emotions, Stress and Health Unit Overview Theories of Emotion

Emotion and Child Development Eve Ekman, MSW, PhC UC Berkeley, UCSF Overview Science of Emotion

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee

Emotion-Cause Pair Extraction: A New Task to Emotion Analysis in Texts Rui Xia, Zixiang Ding

Emotion Lecturer: Dr Tony Mowbray (tony.mowbray@monash.edu) Learning Objectives Define

Gaussian Processes Dan Cervone NYU CDS November 10, 2015 Dan Cervone (NYU CDS) Gaussian

CMPUT 466 Introduction to Gaussian Processes Dan Lizotte The Plan Introduction to Gaussian

Non-Gaussian likelihoods for Gaussian Processes Alan Saul Outline Motivation Non-Gaussian

Gaussian Processes for Big Data James Hensman joint work with Nicol o Fusi, Neil D. Lawrence

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

State Space Gaussian Processes with Non-Gaussian Likelihoods Hannes Nickisch 1 Arno Solin 2

Another introduction to Gaussian Processes Richard Wilkinson School of Maths and Statistics

Gaussian Processes Seung-Hoon Na Chonbuk National University Gaussian Process Regression

EVE: Emotion Vector Encoding Towards Learning Feature Representations for Emotion Embeddings Yuya

25/02/2013 OVERVIEW Emotion competence in conduct PROMOTING CHILD DEVELOPMENTAL problem

Family Ministry Elementary Department FAMILY GOSPEL REFLECTION MAY 31, 2020 PENTECOST SUNDAY

Summer Reading Summer Reading 10th Grade 10th Grade June 2020 June 2020 CHERRY HILL PUBLIC

Do Developers Feel Emotion? An Exploratory Analysis of Emotions. Motivation Feelings and

Authors Apurba Paul Dr. Dipankar Das JIS College of Engineering Jadavpur University Kalyani,

Building Cognitive Applications * *creating visualizations using cognitive APIs Jonathan Kaufman

Distilled Theology Indy The Thirsty Scholar Indianapolis, IN Vision of Ministry An

ON T HE CAL L T O HOL I NE SS I N T ODAY S WORL D GAUDE T E E T E XSUL T

Studying the Dark Triad of Personality through Twitter Behavior Author: Daniel Preotiuc-Pietro,