Natural Language Processing (CSE 490U): Phrase Structure
Noah Smith
c 2017 University of Washington nasmith@cs.washington.edu
February 8–17, 2017
1 / 91
Natural Language Processing (CSE 490U): Phrase Structure Noah Smith - - PowerPoint PPT Presentation
Natural Language Processing (CSE 490U): Phrase Structure Noah Smith 2017 c University of Washington nasmith@cs.washington.edu February 817, 2017 1 / 91 Finite-State Automata A finite-state automaton (plural automata) consists
1 / 91
◮ Initial state s0 ∈ S ◮ Final states F ⊆ S
◮ Special case: deterministic FSA defines δ : S × Σ → S
2 / 91
◮ an empty string (usually denoted ǫ) or a symbol from Σ ◮ a concatentation of regular expressions (e.g., abc) ◮ an alternation of regular expressions (e.g., ab|cd) ◮ a Kleene star of a regular expression (e.g., (abc)∗)
3 / 91
4 / 91
5 / 91
6 / 91
7 / 91
8 / 91
9 / 91
10 / 91
11 / 91
12 / 91
13 / 91
14 / 91
15 / 91
◮ On September 17th, I’d like to fly from Atlanta to Denver ◮ I’d like to fly on September 17th from Atlanta to Denver ◮ I’d like to fly from Atlanta to Denver on September 17th 16 / 91
◮ On September 17th, I’d like to fly from Atlanta to Denver ◮ I’d like to fly on September 17th from Atlanta to Denver ◮ I’d like to fly from Atlanta to Denver on September 17th
◮ *On September I’d like to fly 17th from Atlanta to Denver 17 / 91
◮ On September 17th, I’d like to fly from Atlanta to Denver ◮ I’d like to fly on September 17th from Atlanta to Denver ◮ I’d like to fly from Atlanta to Denver on September 17th
◮ *On September I’d like to fly 17th from Atlanta to Denver
◮ I’d like to fly from Atlanta to Denver on September 17th and
18 / 91
19 / 91
20 / 91
◮ A start symbol S ∈ N
◮ The lefthand side N is a nonterminal from N ◮ The righthand side α is a sequence of zero or more terminals
◮ Special case: Chomsky normal form constrains α to be
21 / 91
22 / 91
23 / 91
24 / 91
25 / 91
26 / 91
27 / 91
28 / 91
29 / 91
S NP-SBJ NP NNP Pierre NNP Vinken , , ADJP NP CD 61 NNS years JJ
, , VP MD will VP VB join NP DT the NN board PP-CLR IN as NP DT a JJ nonexecutive NN director NP-TMP NNP Nov. CD 29
30 / 91
31 / 91
32 / 91
33 / 91
34 / 91
35 / 91
36 / 91
37 / 91
S Aux does NP Det this Noun flight VP Verb include NP Det a Noun meal
38 / 91
39 / 91
40 / 91
41 / 91
◮ Often greedy, with a statistical classifier deciding what action
42 / 91
◮ Often greedy, with a statistical classifier deciding what action
43 / 91
◮ Often greedy, with a statistical classifier deciding what action
◮ Today: scores are defined using the rules.
t
t
44 / 91
◮ A start symbol S ∈ N
◮ The lefthand side N is a nonterminal from N ◮ The righthand side α is a sequence of zero or more terminals
◮ Special case: Chomsky normal form constrains α to be
45 / 91
46 / 91
47 / 91
48 / 91
49 / 91
50 / 91
51 / 91
52 / 91
53 / 91
54 / 91
55 / 91
56 / 91
57 / 91
58 / 91
59 / 91
60 / 91
61 / 91
62 / 91
63 / 91
64 / 91
65 / 91
66 / 91
67 / 91
68 / 91
69 / 91
70 / 91
71 / 91
72 / 91
◮ Pop the highest-priority update from the agenda (item I with
◮ If I = goal, then return v. ◮ If v > chart(I): ◮ chart(I) ← v ◮ Find all combinations of I with other items in the chart,
73 / 91
74 / 91
SROOT NPS DTNP The NNNP luxury NNNP auto NNNP maker NPS JJNP last NNNP year VPS VBDVP sold NPVP CDNP 1,214 NNNP cars PPVP INPP in NPPP DTNP the NNPNP U.S.
75 / 91
76 / 91
Ssold NPmaker DTThe The NNluxury luxury NNauto auto NNmaker maker NPyear JJlast last NNyear year VPsold VBDsold sold NPcars CD1,214 1,214 NNcars cars PPin INin in NPU.S. DTthe the NNPU.S. U.S.
77 / 91
78 / 91
79 / 91
80 / 91
◮ K-best parsing: Huang and Chiang (2005) 81 / 91
◮ K-best parsing: Huang and Chiang (2005)
◮ These exploit dynamic programming algorithms for training
82 / 91
◮ K-best parsing: Huang and Chiang (2005)
◮ These exploit dynamic programming algorithms for training
83 / 91
◮ K-best parsing: Huang and Chiang (2005)
◮ These exploit dynamic programming algorithms for training
◮ Socher et al. (2013) define compositional vector grammars
◮ Dyer et al. (2016): recurrent neural network grammars,
84 / 91
85 / 91
86 / 91
◮ Pick it uniformly at random from {1, . . . , n}. ◮ ˆ
t∈Txit
◮ w ← w − α
87 / 91
88 / 91
89 / 91
90 / 91
91 / 91