Machine Learning for Computational Linguistics Classifjcation ar - PowerPoint PPT Presentation

Machine Learning for Computational Linguistics Classifjcation Çağrı Çöltekin University of Tübingen Seminar für Sprachwissenschaft May 3, 2016

Practical matters Classifjcation Logistic Regression More than two classes Practical issues libraries (like NLTK) end of May. Ç. Çöltekin, SfS / University of Tübingen May 3, 2016 1 / 23 ▶ Homework 1: try to program it without help from specialized ▶ Time to think about projects. A short proposal towards the

Practical matters Classifjcation May 3, 2016 SfS / University of Tübingen Ç. Çöltekin, good idea here label of an unknown label. In the example: 2 / 23 Logistic Regression More than two classes The problem x 2 + ▶ The response (outcome) is a + + positive + or negative − − + − + ? ▶ Given the features ( x 1 and − + x 2 ), we want to predict the − − instance ? − − ▶ Note: regression is not a x 1

Practical matters Classifjcation May 3, 2016 SfS / University of Tübingen Ç. Çöltekin, 3 / 23 The problem (with a single predictor) Logistic Regression More than two classes y + + + + + + 1 − − − − − − 0 x 1

Practical matters Classifjcation May 3, 2016 SfS / University of Tübingen Ç. Çöltekin, 4 / 23 A quick survey of some solutions Decision trees Logistic Regression More than two classes x 2 x 2 < a 2 + + + yes n − o + − + ? − − x 1 < a 1 + a 2 − − s no e − y − + − a 1 x 1

Practical matters Classifjcation May 3, 2016 SfS / University of Tübingen Ç. Çöltekin, regression non-parametric neighbors the instances 5 / 23 Logistic Regression More than two classes A quick survey of some solutions Instance/memory based methods x 2 ▶ No training: just memorize + + + − + − ▶ During test time, decide + based on the k nearest ? − + − − ▶ Like decision trees, kNN is − − ▶ It can also be used for x 1

Practical matters Classifjcation May 3, 2016 SfS / University of Tübingen Ç. Çöltekin, instances predict the label of unknown Use the discriminant to defjnition of ‘best’) training instance best (for a 6 / 23 (Linear) discriminant functions Logistic Regression More than two classes A quick survey of some solutions x 2 ▶ Find a discriminant function ( f ) that separates the + + + − + − + − + − − − − x 1

Practical matters Classifjcation May 3, 2016 SfS / University of Tübingen Ç. Çöltekin, instances predict the label of unknown defjnition of ‘best’) training instance best (for a 6 / 23 Logistic Regression More than two classes A quick survey of some solutions (Linear) discriminant functions x 2 ▶ Find a discriminant function ( f ) that separates the + + + − + − + ? − ▶ Use the discriminant to + − − − { − + f ( x ) > 0 y = ˆ − f ( x ) < 0 x 1

Practical matters Classifjcation May 3, 2016 SfS / University of Tübingen Ç. Çöltekin, training data 7 / 23 Logistic Regression More than two classes A quick survey of some solutions Probability-based solutions x 2 + + + ▶ Estimate distributions of − p ( x | y = +) and + − + p ( x | y = −) from the − + − − ▶ Assign the new items to the − class c with the highest − p ( x | y = c ) x 1

Practical matters Classifjcation May 3, 2016 SfS / University of Tübingen Ç. Çöltekin, training data 7 / 23 Logistic Regression More than two classes A quick survey of some solutions Probability-based solutions x 2 + + + ▶ Estimate distributions of − p ( x | y = +) and + − + p ( x | y = −) from the ? − + − − ▶ Assign the new items to the − class c with the highest − p ( x | y = c ) x 1

Practical matters Classifjcation May 3, 2016 SfS / University of Tübingen Ç. Çöltekin, 8 / 23 More than two classes Artifjcial neural networks A quick survey of some solutions Logistic Regression x 2 + + + − + − + x 1 ? − + y − − x 2 − − x 1

Practical matters Classifjcation Logistic Regression More than two classes Logistic regression regression. It is a member of the family of models called generalized linear models Ç. Çöltekin, SfS / University of Tübingen May 3, 2016 9 / 23 ▶ Logistic regression is a classifjcation method ▶ In logistic regression, we fjt a model that predicts P ( y | x ) ▶ Alternatively, logistic regression is an extension of linear

Practical matters 1 May 3, 2016 SfS / University of Tübingen Ç. Çöltekin, * The research question is from a real study by Ben Maasen and his colleagues. Data is fake as usual. . . . . . . 1 62 22 Classifjcation 0 82 Dyslexia Test score simplifjed problem: not based on a test applied to pre-verbal children. Here is a We would like to guess whether a child would develop dyslexia or A simple example More than two classes Logistic Regression 10 / 23 ▶ We test children when they are less than 2 years of age. ▶ We want to predict the diagnosis from the test score ▶ The data looks like

Practical matters 1 May 3, 2016 SfS / University of Tübingen Ç. Çöltekin, distributed normally for correct predictions are not bounded Problems: P(dyslexia|score) Test score Classifjcation 2 0 -1 100 80 60 40 20 0 Example: fjtting ordinary least squares regression More than two classes Logistic Regression 11 / 23 ▶ The probability values between 0 and 1 ▶ Residuals will be large ▶ Residuals are not

Practical matters Classifjcation May 3, 2016 SfS / University of Tübingen Ç. Çöltekin, function, with some ambiguity). 12 / 23 More than two classes Logistic Regression Example: transforming the output variable Instead of predicting the probability p , we predict logit(p) p y = logit ( p ) = log 1 − p = w 0 + w 1 x ˆ p 1 − p (odds) is bounded between 0 and ∞ ▶ p ▶ log 1 − p (log odds) is bounded between − ∞ and ∞ ▶ we can estimate logit ( p ) with regression, and convert it to a probability using the inverse of logit e w 0 + w 1 x 1 p = ˆ 1 + e w 0 + w 1 x = 1 + e − w 0 − w 1 x which is called logistic function (or sometimes sigmoid

Practical matters Classifjcation May 3, 2016 SfS / University of Tübingen Ç. Çöltekin, logit(p) p 4 2 0 -2 -4 1.0 0.8 0.6 0.4 0.2 0.0 Logit function More than two classes Logistic Regression 13 / 23 p logit ( p ) = log 1 − p

Practical matters Classifjcation May 3, 2016 SfS / University of Tübingen Ç. Çöltekin, logistic(x) x 1.0 0.8 0.6 0.4 0.2 0.0 4 2 0 -2 -4 Logit function More than two classes Logistic Regression 14 / 23 1 logistic ( x ) = 1 − e − x

Practical matters Classifjcation May 3, 2016 SfS / University of Tübingen Ç. Çöltekin, distributed normally distributed binomially family 15 / 23 (GLM). GLMs are expressed with, Logistic regression is a special case of generalized linear models Logistic regression as a generalized linear model More than two classes Logistic Regression g ( y ) = Xw + ϵ ▶ The function g () is called the link function ▶ ϵ is distributed according to a distribution from exponential ▶ For logistic regression, g () is the logit function, ϵ is ▶ For linear regression g () is the identity function, ϵ is

Practical matters 0.04493 -3.225 0.00126 ** May 3, 2016 SfS / University of Tübingen Ç. Çöltekin, Number of Fisher Scoring iterations: 5 AIC: 34.337 Residual deviance: 30.337 on 38 degrees of freedom Null deviance: 54.548 on 39 degrees of freedom (Dispersion parameter for binomial family taken to be 1) --- -0.14491 Classifjcation score 2.978 0.00290 ** 2.31737 (Intercept) 6.90079 Estimate Std. Error z value Pr(>|z|) Coefficients: glm(formula = diag ~ score, family = binomial, data = dys) Interpreting the dyslexia example More than two classes Logistic Regression 16 / 23

Practical matters 100 May 3, 2016 SfS / University of Tübingen Ç. Çöltekin, P(dyslexia|score) Test score 1 Classifjcation 0 80 60 40 20 0 Interpreting the dyslexia example More than two classes Logistic Regression 17 / 23 1 logit ( p ) = 6 . 9 − 0 . 14x p = 1 + e − 6 . 9 + 0 . 14x

Practical matters The likelihood of the training set is, May 3, 2016 SfS / University of Tübingen Ç. Çöltekin, To maximize, we fjnd the gradient: Classifjcation 18 / 23 Reminder: How to fjt a logistic regression model More than two classes Logistic Regression e − wx 1 P ( y = 1 | x ) = p = P ( y = 0 | x ) = 1 − p = 1 + e − wx 1 + e − wx ∏ ∏ p y i ( 1 − p ) 1 − y i L ( w ) = P ( y i | x i ) = i i In practice, maximizing log likelihood is more practical: ∑ ∑ w = arg max ˆ log L ( w ) = P ( y i | x i ) = y i log p +( 1 − y i ) log ( 1 − p ) w i i ∑ 1 ∇ log L ( w ) = ( y i − 1 + e − wx ) x i i

Machine Learning for Computational Linguistics Classifjcation ar - PowerPoint PPT Presentation

Machine Learning for Computational Linguistics Classifjcation ar ltekin University of Tbingen Seminar fr Sprachwissenschaft May 3, 2016 Practical matters Classifjcation Logistic Regression More than two classes Practical

4CSLL5 Advanced Computational Linguistics Introduction Phrase Based Machine Trans Martin

Foundations of Computational Linguistics man-machine communication in natural language R OLAND H

Foundations of Computational Linguistics man-machine communication in natural language R OLAND H

Foundations of Computational Linguistics man-machine communication in natural language R OLAND H

Foundations of Computational Linguistics man-machine communication in natural language R OLAND H

Introduction to Linguistics Darrell Larsen Linguistics 101 Darrell Larsen Introduction to

Outline zipfR zipfR (Computational) linguistics Evert & Baroni Evert & Baroni

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Topics in Computational Linguistics Topics in Computational Linguistics March 28, 2014 GIL,

One-Shot Learning: Language Acquisition for Machine SS16 Computational Linguistics for

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Linguistics 201 Personnel Introduction to Linguistics General Course Description Syllabus

Marcel Dettling Marcel Dettling Institute for Data Analysis and d Process Design Zurich

Generalized Nonlinear Models gnm : a Package for Generalized Nonlinear Models Same form as

Logistic regression Susanne Rosthj Section of Biostatistics Institute of Public Health

Workshop 10.5a: Logistic regression Murray Logan August 23, 2016 Table of contents 1 Logistic

R E G R E S S I O N D I AG N O ST I C S A N D P R E D I C T I O N S MPA 630: Data Science for

Statistical-Significance Background & Goal Shortcuts Statistical significance is one of

Decomposition of sum of squares 8 y y 6 y y y y 4 y y 2 2 4 6 8

Linear Algebra review Powers of a diagonalizable matrix Spectral decomposition Prof. Tesler

Machine Learning for Computational Linguistics Classifjcation ar - PowerPoint PPT Presentation

Machine Learning for Computational Linguistics Classifjcation ar ltekin University of Tbingen Seminar fr Sprachwissenschaft May 3, 2016 Practical matters Classifjcation Logistic Regression More than two classes Practical

4CSLL5 Advanced Computational Linguistics Introduction Phrase Based Machine Trans Martin

Foundations of Computational Linguistics man-machine communication in natural language R OLAND H

Foundations of Computational Linguistics man-machine communication in natural language R OLAND H

Foundations of Computational Linguistics man-machine communication in natural language R OLAND H

Foundations of Computational Linguistics man-machine communication in natural language R OLAND H

Introduction to Linguistics Darrell Larsen Linguistics 101 Darrell Larsen Introduction to

Outline zipfR zipfR (Computational) linguistics Evert &amp; Baroni Evert &amp; Baroni

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Topics in Computational Linguistics Topics in Computational Linguistics March 28, 2014 GIL,

One-Shot Learning: Language Acquisition for Machine SS16 Computational Linguistics for

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Linguistics 201 Personnel Introduction to Linguistics General Course Description Syllabus

Marcel Dettling Marcel Dettling Institute for Data Analysis and d Process Design Zurich

Generalized Nonlinear Models gnm : a Package for Generalized Nonlinear Models Same form as

Logistic regression Susanne Rosthj Section of Biostatistics Institute of Public Health

Workshop 10.5a: Logistic regression Murray Logan August 23, 2016 Table of contents 1 Logistic

R E G R E S S I O N D I AG N O ST I C S A N D P R E D I C T I O N S MPA 630: Data Science for

Statistical-Significance Background &amp; Goal Shortcuts Statistical significance is one of

Decomposition of sum of squares 8 y y 6 y y y y 4 y y 2 2 4 6 8

Linear Algebra review Powers of a diagonalizable matrix Spectral decomposition Prof. Tesler

Outline zipfR zipfR (Computational) linguistics Evert & Baroni Evert & Baroni

Statistical-Significance Background & Goal Shortcuts Statistical significance is one of