CS11-747 Neural Networks for NLP
Document Level Models
Zhengzhong Liu (Hector)
Site https://phontron.com/class/nn4nlp2017/
1
Document Level Models Zhengzhong Liu (Hector) Site - - PowerPoint PPT Presentation
CS11-747 Neural Networks for NLP Document Level Models Zhengzhong Liu (Hector) Site https://phontron.com/class/nn4nlp2017/ 1 NN and some NLP tasks Language Models Parsing Classification Entity Tagging 2 Their Counter-part in Documents
CS11-747 Neural Networks for NLP
Zhengzhong Liu (Hector)
Site https://phontron.com/class/nn4nlp2017/
1
2
Parsing Language Models Classification Entity Tagging
Sentence Document Entity Entity Tagging Coreference Parsing Semantic Parsing; Syntactic Parsing Discourse Parsing Language Model Word Prediction Sentence/Discourse Element Prediction Classification Sentence Classification Document Classification
3
70% 15% 15% Recovers structures of documents
(note the difference from named entity recognition).
to the same underlying world entity.
Queen Elizabeth set about transforming her husband,King George VI, into a viable monarch. A renowned speech therapist was summoned to help the King overcome his speech impediment...
4
Example from Ng, 2016
Noun Phrases.
A renowned speech therapist was summoned to help the King overcome his speech impediment… A renowned speech therapist was summoned to help the King overcome his speech impediment... A renowned speech therapist A renowned speech
5
metrics.
lectures).
6
the way each instance is constructed:
7
Queen Elizabeth her husband King George VI
Which mention to link to?
Model:
between every 2 mentions.
transitivity.
instances.
level features.
8
Queen Elizabeth set about transforming her husband,King George VI, into a viable monarch. A renowned speech therapist was summoned to help the King overcome his speech impediment...
✔: Queen Elizabeth <-> her ❌: Queen Elizabeth <-> husband ❌: Queen Elizabeth <-> King George VI ❌: Queen Elizabeth <-> a viable monarch …..
between a mention and a previous* cluster.
Daume & Marcu (2005); Cullotta et al. (2007)
9
Example Cluster Level Features:
compatible?
pronouns only?
same gender?????
Problems:
antecedents.
to design.
* This process often follows the natural discourse order, so we can refer to partial build clusters.
Clark and Manning (2015)
between two clusters.
entity representation.
10
Problems:
to design. (recurring problem)
creation process
Learning Algorithm
learning (normally agglomerative)
standard!!
guide the clusters.
Manning (2015) trained it with DAgger.
(Create a NULL mention)
11
12
A Log-Linear probabilistic Model
decide a ranking of the antecedents
(Durrett and Klein, 2013)
Rank previous clusters for a given mention. Similarly, a NULL cluster is added to the antecedents. Rahman & Ng use a complex set of features (39 feature templates)
13
(Rahman & Ng, 2009)
14
Trained as structured perceptron
decide which antecedent to linked to (similar to a ranking)
licensed by the gold cluster
(Bjorkelund and Kuhn, 2014)
Latent Tree Model share some similarities with the mention ranking models. Each subtree under the root represent a cluster.
15
to work in their own settings.
example, Bjorkelund and Kuhn use a decision tree for feature
engineering and selection.
(any thoughts?)
16
(Kummerfeld and Klein, 2013)
17
(Kummerfeld and Klein, 2013)
18
model in previous slides).
sensitive training to log-linear models).
Link)
19
(Durrett and Klein, 2013)
features:
20
21
Final Feature Set
them can be captured by surface features.
22
Clark & Manning (2015)
23
Mention Pair Model Cluster Pair Model
Feature Objective Training
Clark & Manning (2016)
an action of the agent.
coreference (B-Cubed).
24
Wiseman et.al (2016)
capture.
most?).
mentions into a single representation.
templates.
25
Lee et.al (2017)
neural network embedding way?
end? All the way to mention detection?
which is not previously handled.
26
27
28
29
30
31
Kummerfeld and Klein, 2013
32
formalism, which forms a tree of relations.
Example RST structures from Marcu (2000)
33
34
Li et.al (2014)
composition scoring.
capture the representation of a EDU, then combine EDU representation into larger representation
35
Li et.al (2016)
Ji and Smith (2017)
structure.
36
(Qin et al. 2017)
marked, but would like to detect them if they are
the same as text without!
38
Predicting the next entity/sentence given previous sentences
Referent Prediction Corpus from (Modi et.al. 2017)
ROS Story corpus (Mostafazade et.al. 2017)
39
40
Kevin is robbed by Robert Z rescued Y Z arrested X
feature to learn next sentence.
series of sentences, to predict next sentence.
41
word, very useful in speech recognition, translation, etc.
entity/event, potentially useful for:
Let’s elaborate!).
42
Peng et. al. (2015)
features.
43
we [rescued/punished] them.
[arrested/rescued] by police.
The Winograd Schema Challenge
44