Statistical Natural Language Processing Classifjcation ar ltekin - PowerPoint PPT Presentation

Statistical Natural Language Processing Classifjcation Çağrı Çöltekin University of Tübingen Seminar für Sprachwissenschaft Summer Semester 2017

Perceptron Logistic Regression More than two classes When/why do we do classifjcation As opposed to regression the outcome is a ‘category’. Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2017 1 / 26 • Is a given email spam or not? • Who is the gender of the author of a document? • Is a product review positive or negative? • Who is the author of a document? • What is the subject of an articles?

Perceptron Logistic Regression Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, 2 / 26 More than two classes The task − − − − x 2 + + + + x 1

Perceptron Logistic Regression Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, 2 / 26 The task ? More than two classes − − − − x 2 + + + + x 1

Perceptron Logistic Regression Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, unknown instances predict the label of Use the discriminant to (for a defjnition of ‘best’) the training instance best 3 / 26 More than two classes A quick survey of some solutions (Linear) discriminant functions x 2 • Find a discriminant function ( f ) that separates + + + − + − + − + − − − − x 1

Perceptron Logistic Regression Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, unknown instances predict the label of (for a defjnition of ‘best’) the training instance best 3 / 26 More than two classes A quick survey of some solutions (Linear) discriminant functions x 2 • Find a discriminant function ( f ) that separates + + + − + − + ? • Use the discriminant to − + − − − { − + f ( x ) > 0 y = ˆ − f ( x ) < 0 x 1

Perceptron Logistic Regression Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, boundary is non-linear Note that the decision 4 / 26 A quick survey of some solutions More than two classes Decision trees x 2 < a 2 x 2 s n + e y o + + − + − − x 1 < a 1 + ? − + s n a 2 e y o − − − + − − a 1 x 1

Perceptron Logistic Regression Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, boundary is non-linear 4 / 26 A quick survey of some solutions More than two classes Decision trees x 2 < a 2 x 2 s n + e y o + + − + − − x 1 < a 1 + ? − + s n a 2 e y o − − − + − − • Note that the decision a 1 x 1

Perceptron Logistic Regression Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, regression non-linear neighbors the instances 5 / 26 More than two classes A quick survey of some solutions Instance/memory based methods x 2 • No training: just memorize + + + − • During test time, decide + − + based on the k nearest ? − + − − • Like decision trees, kNN is − − • It can also be used for x 1

Perceptron Logistic Regression Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, training data 6 / 26 A quick survey of some solutions More than two classes Probability-based solutions x 2 + + + • Estimate distributions of − p ( x | y = +) and + − + p ( x | y = −) from the − + − − • Assign the new items to − the class c with the highest − p ( x | y = c ) x 1

Perceptron Logistic Regression Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, training data 6 / 26 A quick survey of some solutions More than two classes Probability-based solutions x 2 + + + • Estimate distributions of − p ( x | y = +) and + − + p ( x | y = −) from the ? − + − − • Assign the new items to − the class c with the highest − p ( x | y = c ) x 1

Perceptron Logistic Regression Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, 7 / 26 Artifjcial neural networks More than two classes A quick survey of some solutions x 2 + + + − + − + x 1 − + y − x 2 − − − x 1

Perceptron Logistic Regression Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, literature.) which is always set to one is often used (called bias in ANN Similar to the intercept in linear models, an additional input otherwise if where 8 / 26 More than two classes . The perceptron . . ( n ) ∑ x 1 y = f w i x i i w 1 y x 2 w 2 { ∑ n + 1 i w i x i > 0 f ( x ) = w n − 1 x n

Perceptron Logistic Regression Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, literature.) which is always set to one is often used (called bias in ANN otherwise if where 8 / 26 . . . More than two classes The perceptron x 0 = 1 ( n ) ∑ x 1 w 0 y = f w i x i i w 1 y x 2 w 2 { ∑ n + 1 i w i x i > 0 f ( x ) = w n − 1 x n Similar to the intercept in linear models, an additional input x 0

Perceptron Logistic Regression Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, negative otherwise positive the sum is larger than 0 threshold function with corresponding weight 9 / 26 . More than two classes The perceptron: in plain words . . x 0 = 1 • Sum all input x i weighted x 1 w 0 w 1 w i x 2 y • Classify the input using a w 2 w n x n

Perceptron Logistic Regression Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, 10 / 26 Learning with perceptron More than two classes • We do not update the parameters if classifjcation is correct • For misclassifjed examples, we try to minimize ∑ E ( w ) = − wx i y i i where i ranges over all misclassifjed examples • Perceptron algorithm updates the weights such that w ← w − η ∇ E ( w ) w ← w + η x i y i for a misclassifjed example ( η is the learning rate)

Perceptron Logistic Regression Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, or not before the algorithm converges algorithm will not stop minimum if the classes are linearly separable batch updates weights for all misclassifjed examples at once online update weights for a single misclassifjed example The perceptron algorithm More than two classes 11 / 26 • The perceptron algorithm can be • The perceptron algorithm converges to the global • If the classes are not linearly separable, the perceptron • We do not know whether the classes are linearly separable

Perceptron Logistic Regression Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, change Note that with every update the set of misclassifjed examples step 2 until convergence 2. Pick a misclassifjed demonstration decision boundary is Perceptron algorithm (online) More than two classes 12 / 26 1. Randomly initialize w the orthogonal to w example x i add y i x i to w 3. Set w ← w + y i x i , go to w

Perceptron Logistic Regression Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, change Note that with every update the set of misclassifjed examples step 2 until convergence 2. Pick a misclassifjed demonstration decision boundary is Perceptron algorithm (online) More than two classes 12 / 26 1. Randomly initialize w the orthogonal to w example x i add y i x i to w w 3. Set w ← w + y i x i , go to

Perceptron Logistic Regression Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, change Note that with every update the set of misclassifjed examples step 2 until convergence 2. Pick a misclassifjed demonstration decision boundary is Perceptron algorithm (online) More than two classes 12 / 26 w 1. Randomly initialize w the orthogonal to w example x i add y i x i to w 3. Set w ← w + y i x i , go to

Perceptron Logistic Regression More than two classes Perceptron: a bit of history 1960’s (Rosenblatt 1958) science, artifjcial intelligence, cognitive science cannot handle problems that are not linearly separable Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2017 13 / 26 • The perceptron was developed in late 1950’s and early • It caused excitement in many fjelds including computer • The excitement (and funding) died away in early 1970’s (after the criticism by Minsky and Papert 1969) • The main issue was the fact that the perceptron algorithm

Statistical Natural Language Processing Classifjcation ar ltekin - PowerPoint PPT Presentation

Statistical Natural Language Processing Classifjcation ar ltekin University of Tbingen Seminar fr Sprachwissenschaft Summer Semester 2017 Perceptron Logistic Regression More than two classes When/why do we do classifjcation

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Paula

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Information Extraction Industrial Natural Language Processing Industrial Natural Language

Natural Language Processing 1 Lecture 11: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Lecture 10: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Lecture 8: Compositional semantics and discourse processing Katia

Natural Language Processing Fall 2018 Frank Ferraro Natural language processing ITE 358

Natural Language Processing George Konidaris gdk@cs.brown.edu Fall 2019 Natural Language

Statistical Natural Language Processing Prasad Tadepalli CS430 lecture Natural Language

MIA - Master on Artificial Intelligence Advanced Natural Language Processing Advanced Natural

Advanced Natural Language Processing: What is Natural Language Processing (NLP)? Background

Introduction Karl Stratos Rutgers University Karl Stratos CS 533: Natural Language Processing

Statistical natural language processing 24.05.19 Statistical Natural Language Processing 1 The

The Clusters Around Radio- Loud AGN (CARLA) project Carlos De Breuck, for the CARLA collaboration

Square Kilometre Array Instrument and Science Overview Robert Braun SKA Science Director 22

15-292 History of Computing Mini-computers, workstations and advances in portable memory 1 The

On the Capacity of Information Networks January 28, 2005 April Rasala Lehman Joint work with

Four-Lesson Special The Holocaust, Anti-Semitism, and UsPart 1 May 17, 2016 Dean Bible

Weakly-unambiguous Parikh automata and their link with holonomic series Alin Bostan, Arnaud

Computer Systems Wolfgang Schreiner Research Institute for Symbolic Computation (RISC-Linz)

Natural computation Automata networks Sylvain Sen ENS Cachan visits Marseille 23rd November