Natural Language Processing with Deep Learning Sentiment Analysis - PowerPoint PPT Presentation

Natural Language Processing with Deep Learning Sentiment Analysis with Machine Learning Navid Rekab-Saz navid.rekabsaz@jku.at Institute of Computational Perception

Agenda • Introduction to Machine Learning • Sentiment Analysis • Feature Extraction • Breaking the curse of dimensionality!

Notation § 𝑏 → a value or a scalar § 𝒄 → an array or a vector - 𝑗 !" element of 𝒄 is the scalar 𝑐 # § 𝑫 → a set of arrays or a matrix - 𝑗 !" vector of 𝑫 is 𝒅 # - 𝑘 !" element of the 𝑗 !" vector of 𝑫 is the scalar 𝑑 #,% 4

Linear Algebra – Recap § Transpose - 𝒃 is in 1 × d dimensions → 𝒃 𝐔 is in d × 1 dimensions - 𝑩 is in e × d dimensions → 𝑩 𝐔 is in d × e dimensions § Inverse of the square matrix 𝑻 is 𝑻 &𝟐 § Dot product - 𝒃 + 𝒄 ( = 𝑑 dimensions: 1 × d $ d × 1 = 1 𝒅 - 𝒃 + 𝑪 = dimensions: 1 × d $ d × e = 1 × e 𝑫 - 𝑩 + 𝑪 = dimensions: l × m $ m × n = l × n 5

Statistical Learning § Given 𝑂 observed data points: 𝒀 = [𝒚 & , 𝒚 ' , … , 𝒚 ( ] accompanied with output (label) values: 𝒛 = [𝑧 & , 𝑧 ' , … , 𝑧 ( ] and each data point is defined as a vector with 𝑚 dimensions (features): ) , 𝑦 ' ) , … , 𝑦 * ) ] 𝒚 ) = [𝑦 & 6

Statistical Learning § Statistical learning assumes that there exists a TRUE function ( 𝑔 ()*+ ) that has generated these data: 𝒛 = 𝑔 +,-. 𝒀 + 𝜗 § 𝑔 ()*+ - The true but unknown function that produces the data - A fixed function § 𝜗 > 0 - Called irreducible error - Rooted in the constrains in gathering data, and measuring and quantifying features 7

Example 𝑔 ()*+ 𝑔 "#$% à blue surface 𝒀 à Red points with two features: Seniority , Years of Education 𝒛 à Income 𝜗 à the differences between the data points and the surface 8

Machine Learning Model § A machine learning (ML) model tries to estimate 𝑔 ()*+ by defining function 𝑔 : 1 𝒛 = 𝑔 𝒀 such that 7 𝒛 (predicted outputs) be close to 𝒛 (real outputs). § The differences between the values of 7 𝒛 and 𝒛 is reducible error - Can be reduced by better models, better estimations of 𝑔 "#$% 9

Generalization § The aim of machine learning is to create a model using observed experiences (training data) that generalizes to the problem domain, namely performs well on unobserved instances (test data) link 10

Learning the model – Splitting dataset § Data points are splitted into: - Training set : for training the model - Validation set : for tuning model’s hyper-parameters - Test set : for evaluating model’s performance § Common train – validation – test splitting sizes - 60%, 20%, 20% - 70%, 15%, 15% - 80%, 10%, 10% Observed data points Training set Test set Validation Training set Test set set 11

Learning the model Features / Variables (𝑌) Labels / Output Variable (𝑍) sex age Pstatus romantic Walc F 18 A no 1 F 17 T no 1 Pstatus: parent's cohabitation status ('T' - F 15 T no 3 living together 'A' - apart) F 15 T yes 1 Romantic : with a romantic relationship F 16 T no 2 Walc : weekend alcohol consumption (from M 16 T no 2 1 - very low to 5 - very high) M 16 T no 1 Dataset F 17 A no 1 http://archive.ics.uci.edu/ml/datasets/STUDENT+ALCOH M 15 A no 1 OL+CONSUMPTION# M 15 T no 1 F 15 T no 2 F 15 T no 1 M 15 T no 3 M 15 T no 2 M 15 A yes 1 F 16 T no 2 F 16 T no 2 F 16 T no 1 M 17 T no 4 12

Learning the model sex age Pstatus romantic Walc F 18 A no 1 F 17 T no 1 F 15 T no 3 F 15 T yes 1 F 16 T no 2 M 16 T no 2 Train Set M 16 T no 1 F 17 A no 1 M 15 A no 1 M 15 T no 1 F 15 T no 2 F 15 T no 1 M 15 T no 3 M 15 T no 2 M 15 A yes 1 F 16 T no 2 Test Set F 16 T no 2 F 16 T no 1 M 17 T no 4 13

Learning the model sex age Pstatus romantic Walc F 18 A no 1 F 17 T no 1 F 15 T no 3 F 15 T yes 1 F 16 T no 2 M 16 T no 2 Train Set M 16 T no 1 F 17 A no 1 M 15 A no 1 M 15 T no 1 F 15 T no 2 F 15 T no 1 M 15 T no 3 M 15 T no ? 2 M 15 A yes ? 1 F 16 T no ? 2 Test Set F 16 T no ? 2 F 16 T no ? 1 M 17 T no ? 4 𝑧 14

Learning the model sex age Pstatus romantic Walc F 18 A no 1 F 17 T no 1 F 15 T no 3 F 15 T yes 1 F 16 T no 2 M 16 T no 2 Train Set ML Model Train M 16 T no 1 F 17 A no 1 M 15 A no 1 M 15 T no 1 F 15 T no 2 F 15 T no 1 M 15 T no 3 M 15 T no ? 2 M 15 A yes ? 1 F 16 T no ? 2 Test Set F 16 T no ? 2 F 16 T no ? 1 M 17 T no ? 4 𝑧 15

Learning the model sex age Pstatus romantic Walc F 18 A no 1 F 17 T no 1 F 15 T no 3 F 15 T yes 1 F 16 T no 2 M 16 T no 2 Train Set ML Model Train M 16 T no 1 F 17 A no 1 M 15 A no 1 M 15 T no 1 F 15 T no 2 Predict F 15 T no 1 M 15 T no 3 2 M 15 T no 1 1 M 15 A yes 1 2 F 16 T no 2 Test Set 2 F 16 T no 2 F 16 T no 3 1 4 M 17 T no 4 𝑧 2 𝑧 16

Learning the model sex age Pstatus romantic Walc F 18 A no 1 F 17 T no 1 F 15 T no 3 F 15 T yes 1 F 16 T no 2 M 16 T no 2 Train Set ML Model Train M 16 T no 1 F 17 A no 1 M 15 A no 1 M 15 T no 1 F 15 T no 2 Predict F 15 T no 1 M 15 T no 3 2 M 15 T no 1 1 M 15 A yes 1 2 F 16 T no 2 Test Set 2 F 16 T no 2 F 16 T no 3 1 4 M 17 T no 4 𝑧 2 𝑧 Evaluation – Generalization error 17

Tuning hyper parameters – Model selection § Decide on the exploration of several sets of the model’s hyper-parameters § Train a separate model per each set using training set § Among the trained models, select the best performing one based on the evaluation result on validation set § Take the selected model and evaluate it on test set → final model performance 18

ML models § Parametric models - The model is defined as a function (or a family of functions) consisting of a set of parameters - Functions such as linear regression, logistic regression, naïve Bayes, and neural networks - The problem of finding the ML model is reduced to finding the optimum values for the parameters § Non-parametric models - There is no assumption about the form of the function - The model is directly learned from data - ML models such as SVM, k-NN, smoothing spline, gaussian processes Term of the day! Inductive bias: all assumptions we consider in defining and creating an ML model. Our prior knowledge about what 𝑔 %&'( should be. 19

A sample ML model: Linear Regression § 𝑔 is defined as a Linear Regression function: 𝑧 = 𝑔 𝒚; 𝒙 = 𝑥 , + 𝑥 - 𝑦 - + 𝑥 . 𝑦 . +…+ 𝑥 / 𝑦 / where 𝒙 = [𝑥 & , 𝑥 ' , … , 𝑥 ( ] is the set of model parameters § In the “income” example: 𝑗𝑜𝑑𝑝𝑛𝑓 = 𝑔 𝒚; 𝒙 = 𝑥 , + 𝑥 - ×𝑓𝑒𝑣𝑑𝑏𝑢𝑗𝑝𝑜 +𝑥 . ×𝑡𝑓𝑜𝑗𝑝𝑠𝑗𝑢𝑧 20

A trained Linear Regression model 21

Loss Function § Optimization of parameters is done by first defining a loss function § A loss function measures the discrepancies between the predicted outputs 7 𝒛 and real ones 𝒛 § E.g. Mean Square Error (MSE) – a common regression loss function: 1 𝑧 # ; 𝒙) = 1 𝑧 # . ℒ(𝑧 # , M 𝑂 P 𝑧 # − M #0- Loss functions for classification: Next lectures Good to know! What is Mean Absolute Error and how is it different from MSE? 22

Optimization § Next, training data is used to find an optimum set of parameters 𝒙 ∗ by optimizing the loss function: 𝒙 ∗ = argmin ℒ (𝑧 # , M 𝑧 # ; 𝒙) 𝒙 . MSE: 𝒙 ∗ = argmin - 1 1 ∑ #0- 𝑧 # − 𝑔 𝑦 # ; 𝒙 𝒙 § How to optimize: - Stochastically , e.g. using Stochastic Gradient Descent (SGD) → next lecture - Analytically , e.g. in linear regression → Deep Learning book 5.1.4 23

ML models… cont. Model Capacity high low less flexible more flexible less parameters more parameters lower variance higher variance higher bias lower bias prune to underfitting prune to overfitting Terms of the day! (Statistical) Bias indicates the amount of assumptions, taken to define a model. Higher bias means more assumptions and less flexibility, as in linear regression. Variance: in what extent the estimated parameters of a model vary when the values of data points change (are resampled). Overfitting: When the model exactly fits to training data, namely when it also captures the noise in data. 24

Learning Curve underfit overfit test set sweet spot! error train set capacity Models: black → 𝑔 !"#$ orange → linear regression blue and green → two smoothing spline models 25

Regularization § A regularization method introduces additional information (assumptions) to avoid overfitting by decreasing variance § E.g. adding the squared L2 norm of parameters to loss function: 1 𝑧 # ; 𝒙 = 1 𝑧 # . + 𝒙 . . ℒ 𝑧 # , M 𝑂 P 𝑧 # − M #0- . 𝒙 . = P 𝑥 # # 26

Natural Language Processing with Deep Learning Sentiment Analysis - PowerPoint PPT Presentation

Natural Language Processing with Deep Learning Sentiment Analysis with Machine Learning Navid Rekab-Saz navid.rekabsaz@jku.at Institute of Computational Perception Agenda Introduction to Machine Learning Sentiment Analysis

Sentiment analysis Christopher Potts CS 244U: Natural language understanding May 19 1 / 83

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Paula

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Twitter Sentiment Analysis Twitter Sentiment Analysis Presented by: Loitongbam Gyanendro Singh

Information Extraction Industrial Natural Language Processing Industrial Natural Language

Deep learning for natural language processing A short primer on deep learning Benoit Favre <

Deep learning for natural language processing Introduction to natural language processing

Natural Language Processing 1 Lecture 11: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Lecture 10: Language generation and summarisation Katia Shutova

Natural Language Processing with Deep Learning CS224N The Future of Deep Learning + NLP Kevin

Natural Language Processing Fall 2018 Frank Ferraro Natural language processing ITE 358

Linguistic Expressions of Sentiment, Subjectivity & Stance Ling575 Sentiment April 1, 2014

Pl u tchik ' s w heel of emotion , polarit y v s . sentiment SE N TIME N T AN ALYSIS IN R Ted K

Cyrus Cousins with Eli Upfal Brown University BigData Group Spring 2019 Web:

On k -anonymity and the curse of dimensionality Introduction An important method for privacy

Dimensionality Reduction Alexandros Tantos Assistant Professor Aristotle University of

. . . 1 / 5 The curse of dimensionality . many applications require high dimensional data .

Lifting the curse of dimensionality in nonlinear system identification with tensor networks. Kim

The Nearest Neighbor Algorithm The Nearest Neighbor Algorithm Hypothesis Space Hypothesis Space

Unsupervised learning Clustering and Dimensionality Reduction Marta Arias marias@cs.upc.edu

Applied Machine Learning Applied Machine Learning Some basic concepts Siamak Ravanbakhsh Siamak

Sambuz

Useful Links

Newsletter

Mail Us

Natural Language Processing with Deep Learning Sentiment Analysis - PowerPoint PPT Presentation

Natural Language Processing with Deep Learning Sentiment Analysis with Machine Learning Navid Rekab-Saz navid.rekabsaz@jku.at Institute of Computational Perception Agenda Introduction to Machine Learning Sentiment Analysis

Sentiment analysis Christopher Potts CS 244U: Natural language understanding May 19 1 / 83

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Paula

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Twitter Sentiment Analysis Twitter Sentiment Analysis Presented by: Loitongbam Gyanendro Singh

Information Extraction Industrial Natural Language Processing Industrial Natural Language

Deep learning for natural language processing A short primer on deep learning Benoit Favre &lt;

Deep learning for natural language processing Introduction to natural language processing

Natural Language Processing 1 Lecture 11: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Lecture 10: Language generation and summarisation Katia Shutova

Natural Language Processing with Deep Learning CS224N The Future of Deep Learning + NLP Kevin

Natural Language Processing Fall 2018 Frank Ferraro Natural language processing ITE 358

Linguistic Expressions of Sentiment, Subjectivity &amp; Stance Ling575 Sentiment April 1, 2014

Pl u tchik ' s w heel of emotion , polarit y v s . sentiment SE N TIME N T AN ALYSIS IN R Ted K

Cyrus Cousins with Eli Upfal Brown University BigData Group Spring 2019 Web:

On k -anonymity and the curse of dimensionality Introduction An important method for privacy

Dimensionality Reduction Alexandros Tantos Assistant Professor Aristotle University of

. . . 1 / 5 The curse of dimensionality . many applications require high dimensional data .

Lifting the curse of dimensionality in nonlinear system identification with tensor networks. Kim

The Nearest Neighbor Algorithm The Nearest Neighbor Algorithm Hypothesis Space Hypothesis Space

Unsupervised learning Clustering and Dimensionality Reduction Marta Arias marias@cs.upc.edu

Applied Machine Learning Applied Machine Learning Some basic concepts Siamak Ravanbakhsh Siamak

Sambuz

Useful Links

Newsletter

Mail Us

Deep learning for natural language processing A short primer on deep learning Benoit Favre <

Linguistic Expressions of Sentiment, Subjectivity & Stance Ling575 Sentiment April 1, 2014