 
              Statistical Natural Language Processing Classifjcation Çağrı Çöltekin University of Tübingen Seminar für Sprachwissenschaft Summer Semester 2017
Perceptron Logistic Regression More than two classes When/why do we do classifjcation As opposed to regression the outcome is a ‘category’. Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2017 1 / 26 • Is a given email spam or not? • Who is the gender of the author of a document? • Is a product review positive or negative? • Who is the author of a document? • What is the subject of an articles?
Perceptron Logistic Regression More than two classes When/why do we do classifjcation As opposed to regression the outcome is a ‘category’. Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2017 1 / 26 • Is a given email spam or not? • Who is the gender of the author of a document? • Is a product review positive or negative? • Who is the author of a document? • What is the subject of an articles?
Perceptron Logistic Regression Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, 2 / 26 More than two classes The task − − − − x 2 + + + + x 1
Perceptron Logistic Regression Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, 2 / 26 The task ? More than two classes − − − − x 2 + + + + x 1
Perceptron Logistic Regression Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, 2 / 26 The task ? More than two classes − − − − x 2 + + + + x 1
Perceptron Logistic Regression Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, unknown instances predict the label of Use the discriminant to (for a defjnition of ‘best’) the training instance best 3 / 26 More than two classes A quick survey of some solutions (Linear) discriminant functions x 2 • Find a discriminant function ( f ) that separates + + + − + − + − + − − − − x 1
Perceptron Logistic Regression Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, unknown instances predict the label of (for a defjnition of ‘best’) the training instance best 3 / 26 More than two classes A quick survey of some solutions (Linear) discriminant functions x 2 • Find a discriminant function ( f ) that separates + + + − + − + ? • Use the discriminant to − + − − − { − + f ( x ) > 0 y = ˆ − f ( x ) < 0 x 1
Perceptron Logistic Regression Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, boundary is non-linear Note that the decision 4 / 26 A quick survey of some solutions More than two classes Decision trees x 2 < a 2 x 2 s n + e y o + + − + − − x 1 < a 1 + ? − + s n a 2 e y o − − − + − − a 1 x 1
Perceptron Logistic Regression Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, boundary is non-linear 4 / 26 A quick survey of some solutions More than two classes Decision trees x 2 < a 2 x 2 s n + e y o + + − + − − x 1 < a 1 + ? − + s n a 2 e y o − − − + − − • Note that the decision a 1 x 1
Perceptron Logistic Regression Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, regression non-linear neighbors the instances 5 / 26 More than two classes A quick survey of some solutions Instance/memory based methods x 2 • No training: just memorize + + + − • During test time, decide + − + based on the k nearest ? − + − − • Like decision trees, kNN is − − • It can also be used for x 1
Perceptron Logistic Regression Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, training data 6 / 26 A quick survey of some solutions More than two classes Probability-based solutions x 2 + + + • Estimate distributions of − p ( x | y = +) and + − + p ( x | y = −) from the − + − − • Assign the new items to − the class c with the highest − p ( x | y = c ) x 1
Perceptron Logistic Regression Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, training data 6 / 26 A quick survey of some solutions More than two classes Probability-based solutions x 2 + + + • Estimate distributions of − p ( x | y = +) and + − + p ( x | y = −) from the ? − + − − • Assign the new items to − the class c with the highest − p ( x | y = c ) x 1
Perceptron Logistic Regression Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, 7 / 26 Artifjcial neural networks More than two classes A quick survey of some solutions x 2 + + + − + − + x 1 − + y − x 2 − − − x 1
Perceptron Logistic Regression Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, literature.) which is always set to one is often used (called bias in ANN Similar to the intercept in linear models, an additional input otherwise if where 8 / 26 More than two classes . The perceptron . . ( n ) ∑ x 1 y = f w i x i i w 1 y x 2 w 2 { ∑ n + 1 i w i x i > 0 f ( x ) = w n − 1 x n
Perceptron Logistic Regression Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, literature.) which is always set to one is often used (called bias in ANN otherwise if where 8 / 26 . . . More than two classes The perceptron x 0 = 1 ( n ) ∑ x 1 w 0 y = f w i x i i w 1 y x 2 w 2 { ∑ n + 1 i w i x i > 0 f ( x ) = w n − 1 x n Similar to the intercept in linear models, an additional input x 0
Perceptron Logistic Regression Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, negative otherwise positive the sum is larger than 0 threshold function with corresponding weight 9 / 26 . More than two classes The perceptron: in plain words . . x 0 = 1 • Sum all input x i weighted x 1 w 0 w 1 w i x 2 y • Classify the input using a w 2 w n x n
Perceptron Logistic Regression Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, 10 / 26 Learning with perceptron More than two classes • We do not update the parameters if classifjcation is correct • For misclassifjed examples, we try to minimize ∑ E ( w ) = − wx i y i i where i ranges over all misclassifjed examples • Perceptron algorithm updates the weights such that w ← w − η ∇ E ( w ) w ← w + η x i y i for a misclassifjed example ( η is the learning rate)
Perceptron Logistic Regression Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, or not before the algorithm converges algorithm will not stop minimum if the classes are linearly separable batch updates weights for all misclassifjed examples at once online update weights for a single misclassifjed example The perceptron algorithm More than two classes 11 / 26 • The perceptron algorithm can be • The perceptron algorithm converges to the global • If the classes are not linearly separable, the perceptron • We do not know whether the classes are linearly separable
Perceptron Logistic Regression Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, change Note that with every update the set of misclassifjed examples step 2 until convergence 2. Pick a misclassifjed demonstration decision boundary is Perceptron algorithm (online) More than two classes 12 / 26 1. Randomly initialize w the orthogonal to w example x i add y i x i to w 3. Set w ← w + y i x i , go to w
Perceptron Logistic Regression Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, change Note that with every update the set of misclassifjed examples step 2 until convergence 2. Pick a misclassifjed demonstration decision boundary is Perceptron algorithm (online) More than two classes 12 / 26 1. Randomly initialize w the orthogonal to w example x i add y i x i to w 3. Set w ← w + y i x i , go to w
Perceptron Logistic Regression Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, change Note that with every update the set of misclassifjed examples step 2 until convergence 2. Pick a misclassifjed demonstration decision boundary is Perceptron algorithm (online) More than two classes 12 / 26 1. Randomly initialize w the orthogonal to w example x i add y i x i to w w 3. Set w ← w + y i x i , go to
Perceptron Logistic Regression Summer Semester 2017 SfS / University of Tübingen Ç. Çöltekin, change Note that with every update the set of misclassifjed examples step 2 until convergence 2. Pick a misclassifjed demonstration decision boundary is Perceptron algorithm (online) More than two classes 12 / 26 w 1. Randomly initialize w the orthogonal to w example x i add y i x i to w 3. Set w ← w + y i x i , go to
Perceptron Logistic Regression More than two classes Perceptron: a bit of history 1960’s (Rosenblatt 1958) science, artifjcial intelligence, cognitive science cannot handle problems that are not linearly separable Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2017 13 / 26 • The perceptron was developed in late 1950’s and early • It caused excitement in many fjelds including computer • The excitement (and funding) died away in early 1970’s (after the criticism by Minsky and Papert 1969) • The main issue was the fact that the perceptron algorithm
Recommend
More recommend