Supervised Classification with the Perceptron
CMSC 470 Marine Carpuat
Slides credit: Hal Daume III & Piyush Rai
Supervised Classification with the Perceptron CMSC 470 Marine - - PowerPoint PPT Presentation
Supervised Classification with the Perceptron CMSC 470 Marine Carpuat Slides credit: Hal Daume III & Piyush Rai Last time Word senses distinguish different meanings of same word Sense inventories Annotation issues and annotator
Slides credit: Hal Daume III & Piyush Rai
label1 label2 label3 label4 Classifier supervised machine learning algorithm
?
unlabeled document label1? label2? label3? label4?
Testing Training
training data
Feature Functions
correct not correct selected tp fp not selected fn tn
correct not correct selected tp fp not selected fn tn
Precision: % of selected items that are correct Recall: % of correct items that are selected
Q: When are Precision/Recall more informative than accuracy?
(weighted harmonic mean):
F = 2PR/(P+R)
R P PR R P F + + =
=
2 2
) 1 ( 1 ) 1 ( 1 1 b b a a
𝛾2 = 1 𝛽 − 1 With
A simple Supervised Classifier
label1 label2 label3 label4 Classifier supervised machine learning algorithm
?
unlabeled document label1? label2? label3? label4?
Testing Training
training data
Feature Functions
Task definition
binary or real values
Y = {y1, y2,…, yJ}
e.g. word senses from WordNet
Classifier definition A function f: x f(x) = y
Many different types of functions/classifiers can be defined
regression, neural networks.
possible
that co-occur in a window of +/- k words around “bass”
values, or tf.idf, or PPMI, etc.
entire sentence
use POS tags
f(x) = sign(w.x + b)
𝑥𝑝𝑚𝑒 𝑥𝑝𝑚𝑒 𝑥𝑝𝑚𝑒 𝑥𝑜𝑓𝑥
x is often called the feature vector
features of the input that are expected to correlate with predictions
w and b are the parameters of the classifier
MaxIter is a hyperparameter
All of the above affect the performance of the final classifier!
weight vectors
can be rewritten as
separates positive from negative examples
ො 𝑧 = 𝑡𝑗𝑜(𝑥. 𝑦 + 𝑐)
separable
examples incorrectly
𝑢𝑠𝑏𝑗𝑜(𝜄)
𝑢𝑠𝑣𝑓 𝜄
𝑓𝑠𝑠𝑝𝑠
𝑢𝑠𝑏𝑗𝑜 𝜄 < 𝑓𝑠𝑠𝑝𝑠 𝑢𝑠𝑣𝑓 𝜄
𝑓𝑠𝑠𝑝𝑠
𝑢𝑓𝑡𝑢 𝜄
parameters 𝜄′, such that
didn’t
data; the resulting classifier doesn’t generalize