1
Natural Language Processing
Classification III
Dan Klein – UC Berkeley
Natural Language Processing Classification III Dan Klein UC Berkeley - - PowerPoint PPT Presentation
Natural Language Processing Classification III Dan Klein UC Berkeley 1 Classification 2 Linear Models: Perceptron The perceptron algorithm Iteratively processes the training set, reacting to training errors Can be thought of as
1
Dan Klein – UC Berkeley
2
3
4
5
6
(potentially) more complex classifiers
several neighbors
functions, but not objective‐driven learning
7
8
mistake vectors
9
update counts (the dual representation) for each i mistake counts
10
11
never need to build the weight vectors
12
to all training candidates
made mistakes on during training
methods) tend to be extraordinarily slow
go...
13
calculation
function in place of the dot product
* Fine print: if your kernel doesn’t satisfy certain technical requirements, lots of proofs break. E.g. convergence, mistake bounds. In practice, illegal kernels sometimes work (but not always).
14
spaces, take the dot product there, and hand the result back
15
[Collins and Duffy 01]
16
violate any of them
17
point of (this is a general property):
Primal problem in w Dual problem in
18
has the same value as
unconstrained min in w that we can solve analytically.
19
20
CG, L‐BFGS)
21
clip to zero…
22
update is very fast and simple
need slightly more complex methods (SMO, exponentiated gradient)
23
constraint
Support vectors
24
25
Sequential structure
x y
[Slides: Taskar and Klein 05]
26
Recursive structure
x y
27
What is the anticipated cost of collecting fees under the new proposal? En vertu de nouvelle propositions, quel est le côut prévu de perception de les droits?
x y
What is the anticipated cost
collecting fees under the new proposal ? En vertu de les nouvelle propositions , quel est le côut prévu de perception de le droits ?
Combinatorial structure
28
Assumption: Score is a sum of local “part” scores Parts = nodes, edges, productions
space of feasible outputs
29
# (NP DT NN) … # (PP IN NP) … # (NN ‘sea’)
30
What is the anticipated cost
collecting fees under the new proposal ? En vertu de les nouvelle propositions , quel est le côut prévu de perception de le droits ?
j k
31
x = “The screen was a sea of red.”
…
Baseline Parser
Input N-Best List (e.g. n=100)
Non-Structured Classification
Output
[e.g. Charniak and Johnson 05]
32
33
at least approximately, and you want to learn w
k‐best lists), but the most commonly used options do not
combinatorial algorithm (dynamic programming, matchings, ILPs, A*…)
34
35
a lot!
…
“brace” “brace” “aaaaa” “brace” “aaaab” “brace” “zzzzz”
36
‘I t was red’
a lot!
S A B C D S A B D F S A B C D S E F G H S A B C D S A B C D S A B C D
…
‘I t was red’ ‘I t was red’ ‘I t was red’ ‘I t was red’ ‘I t was red’ ‘I t was red’
37
‘What is the’ ‘Quel est le’
a lot!
…
1 2 3 1 2 3
‘What is the’ ‘Quel est le’
1 2 3 1 2 3
‘What is the’ ‘Quel est le’
1 2 3 1 2 3
‘What is the’ ‘Quel est le’
1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3
‘What is the’ ‘Quel est le’ ‘What is the’ ‘Quel est le’ ‘What is the’ ‘Quel est le’
38
is typically very small
SMO or other QP solver)
39
more stable, no averaging
40
alphas essentially act as a dual “distribution”
41
marginals P(DT‐NN|sentence) for each position and sum