CS11-747 Neural Networks for NLP
Document Level Models
Graham Neubig
Site https://phontron.com/class/nn4nlp2019/
1
(w/ thanks for many Slides from Zhengzhong Liu)
Document Level Models Graham Neubig Site - - PowerPoint PPT Presentation
CS11-747 Neural Networks for NLP Document Level Models Graham Neubig Site https://phontron.com/class/nn4nlp2019/ (w/ thanks for many Slides from Zhengzhong Liu) 1 Some NLP Tasks we've Handled Language Models Parsing Classification Entity
CS11-747 Neural Networks for NLP
Graham Neubig
Site https://phontron.com/class/nn4nlp2019/
1
(w/ thanks for many Slides from Zhengzhong Liu)
2
Parsing Language Models Classification Entity Tagging
3
coherence of language on the multi-sentence level (c.f. single-sentence language modeling)
documents (c.f. sentence classification)
correspond to each-other? (c.f. syntactic parsing)
Prediction using documents Prediction of document structure
4
entire document
vacuum! We want to take advantage of this fact.
5
6
I hate this movie
RNN RNN RNN RNN predict hate predict this predict movie predict </s> predict I
(Mikolov et al. 2011)
7
I hate this movie
RNN RNN RNN RNN predict hate predict this predict movie predict . predict I
. It's the worst .
RNN RNN RNN RNN predict the predict worst predict . predict </s> predict It's RNN
(Mikolov & Zweig 2012)
and global context tends to miss out on global context (as local context is more predictive)
incorporate document- level context explicitly
8
9
sentence tokens, attend to all
et al. 2019)
10
11
12
Image credit: Stanford NLP
(note the difference from named entity recognition).
to the same underlying world entity.
Queen Elizabeth set about transforming her husband,King George VI, into a viable monarch. A renowned speech therapist was summoned to help the King overcome his speech impediment...
13
Example from Ng, 2016
Noun Phrases.
A renowned speech therapist was summoned to help the King overcome his speech impediment… A renowned speech therapist was summoned to help the King overcome his speech impediment... A renowned speech therapist A renowned speech
14
metrics.
lectures).
15
number of mentions. (Number of partitions)
difference is the way each instance is constructed:
16
Hillary Clinton Clinton she Bill Clinton
Which mention to link to?
Model:
between every 2 mentions.
transitivity.
instances.
level features.
17
Queen Elizabeth set about transforming her husband,King George VI, into a viable monarch. A renowned speech therapist was summoned to help the King overcome his speech impediment...
✔: Queen Elizabeth <-> her ❌: Queen Elizabeth <-> husband ❌: Queen Elizabeth <-> King George VI ❌: Queen Elizabeth <-> a viable monarch …..
between a mention and a previous* cluster.
Daume & Marcu (2005); Cullotta et al. (2007)
18
Example Cluster Level Features:
compatible?
pronouns only?
same gender?????
Problems:
antecedents.
to design.
* This process often follows the natural discourse order, so we can refer to partially built clusters.
Clark and Manning (2015)
between two clusters.
entity representation.
19
Problems:
to design. (recurring problem)
creation process
Learning Algorithm
learning (normally agglomerative)
standard!!
guide the clusters.
Manning (2015) trained it with DAgger.
20
A probabilistic Model
decide a ranking of the antecedents
(Durrett and Klein, 2013)
Rank previous clusters for a given mention. Similarly, a NULL cluster is added to the antecedents. Rahman & Ng use a complex set of features (39 feature templates)
21
(Rahman & Ng, 2009)
them can be captured by surface features.
learning or margin-based methods.
clustering.
22
Clark & Manning (2015)
23
Mention Pair Model Cluster Pair Model
Feature Objective Training
Clark & Manning (2016)
an action of the agent.
coreference (B-Cubed).
24
Wiseman et.al (2016)
capture.
most?).
mentions into a single representation.
templates.
25
Lee et.al (2017)
neural network embedding way?
end? All the way to mention detection?
which is not previously handled.
26
27
28
29
30
formalism, which forms a tree of relations.
Example RST structures from Marcu (2000)
31
(Ji and Eisenstein 2014)
and queue element
32
33
Li et.al (2014)
composition scoring.
capture the representation of a EDU, then combine EDU representation into larger representation
34
Li et.al (2016)
(Qin et al. 2017)
marked, but would like to detect them if they are
the same as text without!
36
interestingly, discourse parsing accuracy very important!
37