CS 6956: Deep Learning for NLP
Multiclass Classification
1
Multiclass Classification CS 6956: Deep Learning for NLP 1 So far: - - PowerPoint PPT Presentation
Multiclass Classification CS 6956: Deep Learning for NLP 1 So far: Binary Classification We have seen linear models for binary classification We can write down a loss for binary classification Common losses: Hinge loss and log loss 2
1
2
3
4
5
6
all map to the letter A Car tire Car tire Duck laptop
7
8
0 𝑡𝑑𝑝𝑠𝑓(𝒚, 𝑗)
9
0 𝑡𝑑𝑝𝑠𝑓(𝒚, 𝑗)
10
0 𝑡𝑑𝑝𝑠𝑓(𝒚, 𝑗)
11
We haven’t committed to the actual functional form of the 𝑡𝑑𝑝𝑠𝑓 function. For now, we will assume that there is some function that is parameterized. Our eventual goal would be to learn the parameters.
0 𝑡𝑑𝑝𝑠𝑓(𝒚, 𝑗)
12
13
14
I𝒚
15
16
17
18
19
20
21
22
23
24
25
26
27
0 (𝑡𝑑𝑝𝑠𝑓 𝑦, 𝑗 − 𝑡𝑑𝑝𝑠𝑓 𝑦, 𝑧 + Δ 𝑧, 𝑗 )
28
0 (𝑡𝑑𝑝𝑠𝑓 𝑦, 𝑗 − 𝑡𝑑𝑝𝑠𝑓 𝑦, 𝑧 + Δ 𝑧, 𝑗 )
29
The score for label i
0 (𝑡𝑑𝑝𝑠𝑓 𝑦, 𝑗 − 𝑡𝑑𝑝𝑠𝑓 𝑦, 𝑧 + Δ 𝑧, 𝑗 )
30
The score for label i The score for label y
0 (𝑡𝑑𝑝𝑠𝑓 𝑦, 𝑗 − 𝑡𝑑𝑝𝑠𝑓 𝑦, 𝑧 + Δ 𝑧, 𝑗 )
31
The score for label i The score for label y The “loss” term defined as: Δ 𝑧, 𝑗 = R0 𝑧 = 𝑗 1 𝑧 ≠ 𝑗
0 (𝑡𝑑𝑝𝑠𝑓 𝑦, 𝑗 − 𝑡𝑑𝑝𝑠𝑓 𝑦, 𝑧 + Δ 𝑧, 𝑗 )
32
The score for label i The score for label y The “loss” term defined as: Δ 𝑧, 𝑗 = R0 𝑧 = 𝑗 1 𝑧 ≠ 𝑗 The loss is defined by the label whose score, when augmented by the Δ is more than the score of the true label by the greatest amount.
33
34
35
36
Questions?