Statistical Natural Language Processing Classifjcation ar ltekin - PowerPoint PPT Presentation

Statistical Natural Language Processing Classifjcation Çağrı Çöltekin University of Tübingen Seminar für Sprachwissenschaft Summer Semester 2019

Introduction Perceptron Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, As opposed to regression the outcome is a ‘category’. 1 / 40 When/why do we do classifjcation Evaluation More methods Multi-class strategies Naive Bayes Logistic Regression • Is a given email spam or not? • What is the gender of the author of a document? • Is a product review positive or negative? • Who is the author of a document? • What is the subject of an article? • …

Introduction Train a model to predict Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, Perceptron same distribution future data points from the 2 / 40 with (categorical) labels The task Evaluation More methods Logistic Regression Naive Bayes Multi-class strategies − − − • Given a set of training data − x 2 + + + + x 1

Introduction future data points from the Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, ? Perceptron same distribution 2 / 40 with (categorical) labels Multi-class strategies The task Evaluation More methods Logistic Regression Naive Bayes − − − • Given a set of training data − x 2 • Train a model to predict + + + + x 1

Introduction Outline Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, Perceptron 3 / 40 Evaluation More methods Multi-class strategies Naive Bayes Logistic Regression • Perceptron • Logistic regression • Naive Bayes • Multi-class strategies for binary classifjers • Evaluation metrics for classifjcation • Brief notes on what we skipped

Introduction Perceptron Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, to one is often used (called bias in ANN literature) which is always set Similar to the intercept in linear models, an additional input otherwise if where 4 / 40 . Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation . The perceptron . ( n ) ∑ x 1 y = f w i x i i w 1 x 2 y w 2 { ∑ n + 1 i w i x i > 0 f ( x ) = w n − 1 x n

Introduction . Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, to one is often used (called bias in ANN literature) otherwise if where Perceptron 4 / 40 . Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation The perceptron . x 0 = 1 ( n ) ∑ x 1 w 0 y = f w i x i i w 1 x 2 y w 2 { ∑ n + 1 i w i x i > 0 f ( x ) = w n − 1 x n Similar to the intercept in linear models, an additional input x 0 which is always set

Introduction . Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, negative otherwise positive the sum is larger than 0 function Perceptron . 5 / 40 . The perceptron: in plain words Logistic Regression Naive Bayes Multi-class strategies More methods Evaluation x 0 = 1 x 1 w 0 • Sum all input x i weighted with corresponding weight w i w 1 x 2 y • Classify the input using a threshold w 2 w n x n

Introduction Learning with perceptron Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, Perceptron 6 / 40 Evaluation More methods Multi-class strategies Naive Bayes Logistic Regression • We do not update the parameters if classifjcation is correct • For misclassifjed examples, we try to minimize ∑ E ( w ) = − wx i y i i where i ranges over all misclassifjed examples • Perceptron algorithm updates the weights such that w ← w − η ∇ E ( w ) w ← w + η x i y i for misclassifjed examples. η is the learning rate

Introduction Perceptron Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, – Number of iterations without improvement – Number of misclassifjed examples – Maximum number iterations/updates algorithm converges linearly separable batch updates weights for all misclassifjed examples at once online update weights for a single misclassifjed example The perceptron algorithm Evaluation More methods Multi-class strategies Naive Bayes Logistic Regression 7 / 40 • The perceptron algorithm can be • The perceptron algorithm converges to the global minimum if the classes are • If the classes are not linearly separable, the perceptron algorithm will not stop • We do not know whether the classes are linearly separable or not before the • In practice, one can set a stopping condition, such as

Introduction Perceptron Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, convergence (0) 8 / 40 More methods Logistic Regression Naive Bayes Evaluation Multi-class strategies − + + − 1. Randomly initialize w the decision + boundary is orthogonal to w 2. Pick a misclassifjed example x i add y i x i to w 3. Set w ← w + y i x i , go to step 2 until + −

Introduction Perceptron Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, convergence (1) 8 / 40 Logistic Regression More methods Evaluation Naive Bayes Multi-class strategies − + + − 1. Randomly initialize w the decision + boundary is orthogonal to w 2. Pick a misclassifjed example x i add y i x i to w 3. Set w ← w + y i x i , go to step 2 until + w −

Introduction Perceptron Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, convergence (2) 8 / 40 Logistic Regression More methods Evaluation Naive Bayes Multi-class strategies − + + − 1. Randomly initialize w the decision + boundary is orthogonal to w 2. Pick a misclassifjed example x i add y i x i to w 3. Set w ← w + y i x i , go to step 2 until + w −

Introduction Perceptron Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, convergence (3) 8 / 40 Logistic Regression More methods Evaluation Naive Bayes Multi-class strategies − + + − 1. Randomly initialize w the decision + boundary is orthogonal to w 2. Pick a misclassifjed example x i add y i x i to w w 3. Set w ← w + y i x i , go to step 2 until + −

Introduction Perceptron Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, convergence (4) 8 / 40 More methods Evaluation Logistic Regression Naive Bayes Multi-class strategies − + + w − 1. Randomly initialize w the decision + boundary is orthogonal to w 2. Pick a misclassifjed example x i add y i x i to w 3. Set w ← w + y i x i , go to step 2 until + −

Introduction Perceptron Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, convergence (5) 8 / 40 Logistic Regression More methods Evaluation Naive Bayes Multi-class strategies − + + − 1. Randomly initialize w the decision + boundary is orthogonal to w w 2. Pick a misclassifjed example x i add y i x i to w 3. Set w ← w + y i x i , go to step 2 until + −

Introduction Perceptron Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, problems that are not linearly separable Minsky and Papert 1969) intelligence, cognitive science 9 / 40 1958) Perceptron: a bit of history Evaluation More methods Multi-class strategies Naive Bayes Logistic Regression • The perceptron was developed in late 1950’s and early 1960’s (Rosenblatt • It caused excitement in many fjelds including computer science, artifjcial • The excitement (and funding) died away in early 1970’s (after the criticism by • The main issue was the fact that the perceptron algorithm cannot handle

Introduction Perceptron Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, max-ent) in the NLP literature multiple classes – it is a member of the family of models called generalized linear models 10 / 40 Logistic regression Evaluation More methods Multi-class strategies Naive Bayes Logistic Regression • Logistic regression is a classifjcation method • In logistic regression, we fjt a model that predicts P ( y | x ) • Logistic regression is an extension of linear regression • Typically formulated for binary classifjcation, but it has a natural extension to • The multi-class logistic regression is often called maximum-entropy model (or

Introduction Perceptron Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, Is RMS error appropriate? ? What is Why not just use linear regression? 11 / 40 Multi-class strategies Logistic Regression Naive Bayes an example with a single predictor Data for logistic regression Evaluation More methods y 1 0 . 5 0 x − 2 − 1 0 1 2

Introduction an example with a single predictor Summer Semester 2019 SfS / University of Tübingen Ç. Çöltekin, Perceptron 11 / 40 Data for logistic regression Evaluation Logistic Regression Naive Bayes Multi-class strategies More methods y 1 • Why not just use linear regression? 0 . 5 • What is P ( y | x = 2 ) ? • Is RMS error appropriate? 0 x − 2 − 1 0 1 2

Statistical Natural Language Processing Classifjcation ar ltekin - PowerPoint PPT Presentation

Statistical Natural Language Processing Classifjcation ar ltekin University of Tbingen Seminar fr Sprachwissenschaft Summer Semester 2019 Introduction Perceptron Summer Semester 2019 SfS / University of Tbingen .

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Paula

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Information Extraction Industrial Natural Language Processing Industrial Natural Language

Natural Language Processing 1 Lecture 11: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Lecture 10: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Lecture 8: Compositional semantics and discourse processing Katia

Natural Language Processing Fall 2018 Frank Ferraro Natural language processing ITE 358

Natural Language Processing George Konidaris gdk@cs.brown.edu Fall 2019 Natural Language

Statistical Natural Language Processing Prasad Tadepalli CS430 lecture Natural Language

MIA - Master on Artificial Intelligence Advanced Natural Language Processing Advanced Natural

Advanced Natural Language Processing: What is Natural Language Processing (NLP)? Background

Introduction Karl Stratos Rutgers University Karl Stratos CS 533: Natural Language Processing

Statistical natural language processing 24.05.19 Statistical Natural Language Processing 1 The

Coupling of BEM and FEM in the Time Domain for Fluid-Structure Interaction . Stephan 1 Ernst P

Statistical Natural Language Processing Machine learning: evaluation ar ltekin

Teoria Erg odica Diferenci avel lecture 21: Entropy Instituto Nacional de Matem atica

EARTHS TWINS ? THE SEARCH FOR EXO-EARTHS ZSUZSA HORVATH KOSZTOLANYI HIGHSCHOOL BUDAPEST

A p -adically entire function with integral values on Q p and additive characters of perfectoid

Manifold Reconstruction Jean-Daniel Boissonnat Geometrica, INRIA

On the discrete logarithm problem in elliptic curves Claus Diem University of Leipzig On the

The decomposition theorem: the smooth case Arnaud Beauville Universit e C ote dAzur