SLIDE 1 Coreference & Coherence
Ling571 Deep Processing Techniques for NLP March 9, 2015
SLIDE 2 Roadmap
Coreference algorithms:
Machine learning Deterministic sieves
Discourse structure
Cohesion
Topic segmentation
Coherence
Discourse parsing
SLIDE 3 NP Coreference Examples
Link all NPs refer to same entity
Queen Elizabeth set about transforming her husband, King George VI, into a viable monarch. Logue, a renowned speech therapist, was summoned to help the King overcome his speech impediment...
Example from Cardie&Ng 2004
SLIDE 4 Typical Feature Set
25 features per instance: 2NPs, features, class
lexical (3)
string matching for pronouns, proper names, common nouns
grammatical (18)
pronoun_1, pronoun_2, demonstrative_2, indefinite_2, … number, gender, animacy appositive, predicate nominative binding constraints, simple contra-indexing constraints, … span, maximalnp, …
semantic (2)
same WordNet class alias
positional (1)
distance between the NPs in terms of # of sentences
knowledge-based (1) naïve pronoun resolution algorithm
SLIDE 5 Clustering by Classification
Mention-pair style system:
For each pair of NPs, classify +/- coreferent
Any classifier
SLIDE 6 Clustering by Classification
Mention-pair style system:
For each pair of NPs, classify +/- coreferent
Any classifier
Linked pairs form coreferential chains
Process candidate pairs from End to Start All mentions of an entity appear in single chain
SLIDE 7 Clustering by Classification
Mention-pair style system:
For each pair of NPs, classify +/- coreferent
Any classifier
Linked pairs form coreferential chains
Process candidate pairs from End to Start All mentions of an entity appear in single chain
F-measure: MUC-6: 62-66%; MUC-7: 60-61%
Soon et. al, Cardie and Ng (2002)
SLIDE 8 Multi-pass Sieve Approach
Raghunathan et al., 2010
Key Issues:
Limitations of mention-pair classifier approach
SLIDE 9 Multi-pass Sieve Approach
Raghunathan et al., 2010
Key Issues:
Limitations of mention-pair classifier approach
Local decisions over large number of features
Not really transitive
SLIDE 10 Multi-pass Sieve Approach
Raghunathan et al., 2010
Key Issues:
Limitations of mention-pair classifier approach
Local decisions over large number of features
Not really transitive Can’t exploit global constraints
SLIDE 11 Multi-pass Sieve Approach
Raghunathan et al., 2010
Key Issues:
Limitations of mention-pair classifier approach
Local decisions over large number of features
Not really transitive Can’t exploit global constraints Low precision features may overwhelm less frequent, high
precision ones
SLIDE 12 Multi-pass Sieve Strategy
Basic approach:
Apply tiers of deterministic coreference modules
Ordered highest to lowest precision
Aggregate information across mentions in cluster
Share attributes based on prior tiers
Simple, extensible architecture
Outperforms many other (un-)supervised approaches
SLIDE 13
Pre-Processing and Mentions
Pre-processing:
Gold mention boundaries given, parsed, NE tagged
SLIDE 14 Pre-Processing and Mentions
Pre-processing:
Gold mention boundaries given, parsed, NE tagged
For each mention, each module can skip or pick
best candidate antecedent Antecedents ordered:
Same sentence:
SLIDE 15 Pre-Processing and Mentions
Pre-processing:
Gold mention boundaries given, parsed, NE tagged
For each mention, each module can skip or pick
best candidate antecedent Antecedents ordered:
Same sentence: by Hobbs algorithm Prev. sentence:
For Nominal: by right-to-left, breadth first: proximity/
recency
For Pronoun: left-to-right: salience hierarchy
SLIDE 16 Pre-Processing and Mentions
Pre-processing:
Gold mention boundaries given, parsed, NE tagged
For each mention, each module can skip or pick best
candidate antecedent Antecedents ordered:
Same sentence: by Hobbs algorithm Prev. sentence:
For Nominal: by right-to-left, breadth first: proximity/recency For Pronoun: left-to-right: salience hierarchy
W/in cluster: aggregate attributes, order mentions Prune indefinite mentions: can’t have antecedents
SLIDE 17
Multi-pass Sieve Modules
Pass 1: Exact match (N): P: 96%
SLIDE 18
Multi-pass Sieve Modules
Pass 1: Exact match (N): P: 96% Pass 2: Precise constructs
SLIDE 19 Multi-pass Sieve Modules
Pass 1: Exact match (N): P: 96% Pass 2: Precise constructs
Predicate nominative, (role) appositive, re;. pronoun,
acronym, demonym
Pass 3: Strict head matching
Matches cluster head noun AND all non-stop cluster
wds AND modifiers AND non i-within-I (embedded NP)
SLIDE 20 Multi-pass Sieve Modules
Pass 1: Exact match (N): P: 96% Pass 2: Precise constructs
Predicate nominative, (role) appositive, re;. pronoun,
acronym, demonym
Pass 3: Strict head matching
Matches cluster head noun AND all non-stop cluster
wds AND modifiers AND non i-within-I (embedded NP)
Pass 4 & 5: Variants of 3: drop one of above
SLIDE 21 Multi-pass Sieve Modules
Pass 6: Relaxed head match
Head matches any word in cluster AND all non-stop
cluster wds AND non i-within-I (embedded NP)
SLIDE 22 Multi-pass Sieve Modules
Pass 6: Relaxed head match
Head matches any word in cluster AND all non-stop
cluster wds AND non i-within-I (embedded NP)
Pass 7: Pronouns
Enforce constraints on gender, number, person,
animacy, and NER labels
SLIDE 23
Multi-pass Effectiveness
SLIDE 24
Sieve Effectiveness
ACE Newswire
SLIDE 25
Questions
Good accuracies on (clean) text. What about…
SLIDE 26 Questions
Good accuracies on (clean) text. What about…
Conversational speech?
Ill-formed, disfluent
SLIDE 27 Questions
Good accuracies on (clean) text. What about…
Conversational speech?
Ill-formed, disfluent
Dialogue?
Multiple speakers introduce referents
SLIDE 28 Questions
Good accuracies on (clean) text. What about…
Conversational speech?
Ill-formed, disfluent
Dialogue?
Multiple speakers introduce referents
Multimodal communication?
How else can entities be evoked? Are all equally salient?
SLIDE 29
More Questions
Good accuracies on (clean) (English) text: What
about.. Other languages?
SLIDE 30 More Questions
Good accuracies on (clean) (English) text: What
about.. Other languages?
Salience hierarchies the same
Other factors
SLIDE 31 More Questions
Good accuracies on (clean) (English) text: What
about.. Other languages?
Salience hierarchies the same
Other factors
Syntactic constraints?
E.g. reflexives in Chinese, Korean,..
SLIDE 32 More Questions
Good accuracies on (clean) (English) text: What
about.. Other languages?
Salience hierarchies the same
Other factors
Syntactic constraints?
E.g. reflexives in Chinese, Korean,..
Zero anaphora?
How do you resolve a pronoun if you can’t find it?
SLIDE 33 Reference Resolution Algorithms
Many other alternative strategies:
Linguistically informed, saliency hierarchy
Centering Theory
Machine learning approaches:
Supervised: Maxent Unsupervised: Clustering
Heuristic, high precision:
Cogniac
SLIDE 34
Conclusions
Co-reference establishes coherence Reference resolution depends on coherence Variety of approaches:
Syntactic constraints, Recency, Frequency,Role
Similar effectiveness - different requirements Co-reference can enable summarization within and
across documents (and languages!)
SLIDE 35
Discourse Structure
SLIDE 36 Why Model Discourse Structure? (Theoretical)
Discourse: not just constituent utterances
Create joint meaning Context guides interpretation of constituents How????
What are the units? How do they combine to establish meaning?
How can we derive structure from surface forms?
What makes discourse coherent vs not? How do they influence reference resolution?
SLIDE 37
Why Model Discourse Structure?(Applied)
Design better summarization, understanding Improve speech synthesis
Influenced by structure
Develop approach for generation of discourse Design dialogue agents for task interaction Guide reference resolution
SLIDE 38 Discourse Topic Segmentation
Separate news broadcast into component stories
Necessary for information retrieval
On "World News Tonight" this Thursday, another bad day on stock markets, all over the world global economic anxiety. Another massacre in Kosovo, the U.S. and its allies prepare to do something about it. Very slowly. And the millennium bug, Lubbock Texas prepares for catastrophe, Banglaore in India sees
SLIDE 39 Discourse Topic Segmentation
Separate news broadcast into component stories
On "World News Tonight" this Thursday, another bad day on stock markets, all over the world global economic anxiety. || Another massacre in Kosovo, the U.S. and its allies prepare to do something about it. Very slowly. || And the millennium bug, Lubbock Texas prepares for catastrophe, Bangalore in India sees only profit.||
SLIDE 40
Discourse Segmentation
Basic form of discourse structure
Divide document into linear sequence of subtopics
Many genres have conventional structures:
SLIDE 41
Discourse Segmentation
Basic form of discourse structure
Divide document into linear sequence of subtopics
Many genres have conventional structures:
Academic: Into, Hypothesis, Methods, Results, Concl.
SLIDE 42
Discourse Segmentation
Basic form of discourse structure
Divide document into linear sequence of subtopics
Many genres have conventional structures:
Academic: Into, Hypothesis, Methods, Results, Concl. Newspapers: Headline, Byline, Lede, Elaboration
SLIDE 43
Discourse Segmentation
Basic form of discourse structure
Divide document into linear sequence of subtopics
Many genres have conventional structures:
Academic: Into, Hypothesis, Methods, Results, Concl. Newspapers: Headline, Byline, Lede, Elaboration Patient Reports: Subjective, Objective, Assessment, Plan
SLIDE 44
Discourse Segmentation
Basic form of discourse structure
Divide document into linear sequence of subtopics
Many genres have conventional structures:
Academic: Into, Hypothesis, Methods, Results, Concl. Newspapers: Headline, Byline, Lede, Elaboration Patient Reports: Subjective, Objective, Assessment, Plan
Can guide: summarization, retrieval
SLIDE 45 Cohesion
Use of linguistics devices to link text units
Lexical cohesion:
Link with relations between words
Synonymy, Hypernymy Peel, core and slice the pears and the apples. Add the fruit to the skillet.
SLIDE 46 Cohesion
Use of linguistics devices to link text units
Lexical cohesion:
Link with relations between words
Synonymy, Hypernymy Peel, core and slice the pears and the apples. Add the fruit to the skillet.
Non-lexical cohesion:
E.g. anaphora
Peel, core and slice the pears and the apples. Add them to the skillet.
SLIDE 47 Cohesion
Use of linguistics devices to link text units
Lexical cohesion:
Link with relations between words
Synonymy, Hypernymy Peel, core and slice the pears and the apples. Add the fruit to the skillet.
Non-lexical cohesion:
E.g. anaphora
Peel, core and slice the pears and the apples. Add them to the skillet.
Cohesion chain establishes link through sequence of words Segment boundary = dip in cohesion
SLIDE 48
TextTiling (Hearst ‘97)
Lexical cohesion-based segmentation
Boundaries at dips in cohesion score Tokenization, Lexical cohesion score, Boundary ID
SLIDE 49
TextTiling (Hearst ‘97)
Lexical cohesion-based segmentation
Boundaries at dips in cohesion score Tokenization, Lexical cohesion score, Boundary ID
Tokenization
Units?
SLIDE 50 TextTiling (Hearst ‘97)
Lexical cohesion-based segmentation
Boundaries at dips in cohesion score Tokenization, Lexical cohesion score, Boundary ID
Tokenization
Units?
White-space delimited words Stopped Stemmed 20 words = 1 pseudo sentence
SLIDE 51
Lexical Cohesion Score
Similarity between spans of text
b = ‘Block’ of 10 pseudo-sentences before gap a = ‘Block’ of 10 pseudo-sentences after gap How do we compute similarity?
SLIDE 52 Lexical Cohesion Score
Similarity between spans of text
b = ‘Block’ of 10 pseudo-sentences before gap a = ‘Block’ of 10 pseudo-sentences after gap How do we compute similarity?
Vectors and cosine similarity (again!)
simcosine( b, a) = b • a b a = bi × ai
i=1 N
∑
bi
2 i=1 N
∑
ai
2 i=1 N
∑
SLIDE 53
Segmentation
Depth score:
Difference between position and adjacent peaks E.g., (ya1-ya2)+(ya3-ya2)
SLIDE 54 Evaluation
How about precision/recall/F-measure?
Problem: No credit for near-misses
Alternative model: WindowDiff
WindowDiff (ref,hyp) = 1 N − k ( b(refi,refi+k)− b(hypi,hypi+k) ≠ 0)
i=1 N−k
∑
SLIDE 55
Discussion
Overall: Auto much better than random Issues
SLIDE 56
Discussion
Overall: Auto much better than random Issues: Summary material
Often not similar to adjacent paras
Similarity measures
SLIDE 57
Discussion
Overall: Auto much better than random Issues: Summary material
Often not similar to adjacent paras
Similarity measures
Is raw tf the best we can do? Other cues??
Other experiments with TextTiling perform
less well – Why?
SLIDE 58
Text Coherence
Cohesion – repetition, etc – does not imply coherence Coherence relations:
Possible meaning relations between utts in discourse
SLIDE 59 Text Coherence
Cohesion – repetition, etc – does not imply coherence Coherence relations:
Possible meaning relations between utts in discourse Examples:
Result: Infer state of S0 cause state in S1
The Tin Woodman was caught in the rain. His joints rusted.
SLIDE 60 Text Coherence
Cohesion – repetition, etc – does not imply coherence Coherence relations:
Possible meaning relations between utts in discourse Examples:
Result: Infer state of S0 cause state in S1
The Tin Woodman was caught in the rain. His joints rusted.
Explanation: Infer state in S1 causes state in S0
John hid Bill’s car keys. He was drunk.
SLIDE 61 Text Coherence
Cohesion – repetition, etc – does not imply coherence Coherence relations:
Possible meaning relations between utts in discourse Examples:
Result: Infer state of S0 cause state in S1
The Tin Woodman was caught in the rain. His joints rusted.
Explanation: Infer state in S1 causes state in S0
John hid Bill’s car keys. He was drunk.
Elaboration: Infer same prop. from S0 and S1.
Dorothy was from Kansas. She lived in the great Kansas prairie.
Pair of locally coherent clauses: discourse segment
SLIDE 62 Coherence Analysis
S1: John went to the bank to deposit his paycheck. S2: He then took a train to Bill’s car dealership. S3: He needed to buy a car. S4: The company he works now isn’t near any public transportation. S5: He also wanted to talk to Bill about their softball league.
SLIDE 63 Coherence Analysis
S1: John went to the bank to deposit his paycheck. S2: He then took a train to Bill’s car dealership. S3: He needed to buy a car. S4: The company he works now isn’t near any public transportation. S5: He also wanted to talk to Bill about their softball league.
SLIDE 64 Coherence Analysis
S1: John went to the bank to deposit his paycheck. S2: He then took a train to Bill’s car dealership. S3: He needed to buy a car. S4: The company he works now isn’t near any public transportation. S5: He also wanted to talk to Bill about their softball league.
SLIDE 65 Rhetorical Structure Theory
Mann & Thompson (1987) Goal: Identify hierarchical structure of text
Cover wide range of TEXT types
Language contrasts
Relational propositions (intentions)
Derives from functional relations b/t clauses
SLIDE 66 Components of RST
Relations:
Hold b/t two text spans, nucleus and satellite
Nucleus core element, satellite peripheral Constraints on each, between Effect: why the author wrote this
SLIDE 67 Components of RST
Relations:
Hold b/t two text spans, nucleus and satellite
Nucleus core element, satellite peripheral Constraints on each, between Effect: why the author wrote this
Schemas:
Grammar of legal relations between text spans Define possible RST text structures
Most common: N + S, others involve two or more nuclei
SLIDE 68 Components of RST
Relations:
Hold b/t two text spans, nucleus and satellite
Nucleus core element, satellite peripheral Constraints on each, between Effect: why the author wrote this
Schemas:
Grammar of legal relations between text spans Define possible RST text structures
Most common: N + S, others involve two or more nuclei
Structures:
Using clause units, complete, connected, unique,
adjacent
SLIDE 69 RST Relations
Core of RST
RST analysis requires building tree of relations Circumstance, Solutionhood, Elaboration.
Background, Enablement, Motivation, Evidence, Justify, Vol. Cause, Non-Vol. Cause, Vol. Result, Non-
- Vol. Result, Purpose, Antithesis, Concession,
Condition, Otherwise, Interpretation, Evaluation, Restatement, Summary, Sequence, Contrast
SLIDE 70 RST Relations
Evidence
Effect: Evidence (Satellite) increases R’s belief in
Nucleus
The program really works. (N) I entered all my info and it matched my results. (S)
1 2
Evidence
SLIDE 71
SLIDE 72
RST Parsing
Learn and apply classifiers for
Segmentation and parsing of discourse
SLIDE 73
RST Parsing
Learn and apply classifiers for
Segmentation and parsing of discourse
Assign coherence relations between spans
SLIDE 74 RST Parsing
Learn and apply classifiers for
Segmentation and parsing of discourse
Assign coherence relations between spans Create a representation over whole text => parse Discourse structure
RST trees
Fine-grained, hierarchical structure
Clause-based units
SLIDE 75
Identifying Segments & Relations
Key source of information:
SLIDE 76 Identifying Segments & Relations
Key source of information:
Cue phrases
Aka discourse markers, cue words, clue words Although, but, for example, however, yet, with, and….
John hid Bill’s keys because he was drunk.
Issues:
SLIDE 77 Identifying Segments & Relations
Key source of information:
Cue phrases
Aka discourse markers, cue words, clue words Although, but, for example, however, yet, with, and….
John hid Bill’s keys because he was drunk.
Issues:
Ambiguity: discourse vs sentential use
With its distant orbit, Mars exhibits frigid weather. We can see Mars with a telescope.
Ambiguity: cue multiple discourse relations
Because: CAUSE/EVIDENCE; But: CONTRAST/
CONCESSION
SLIDE 78
Cue Phrases
Last issue:
Insufficient:
SLIDE 79 Cue Phrases
Last issue:
Insufficient:
Not all relations marked by cue phrases Only 15-25% of relations marked by cues
SLIDE 80 Learning Discourse Parsing
Train classifiers for:
Segmentation Coherence relation assignment Discourse structure assignment
Shift-reduce parser transitions
Use range of features:
Cue phrases Lexical/punctuation in context Syntactic parses
SLIDE 81 Evaluation
Segmentation:
Good: 96%
Better than frequency or punctuation baseline
Discourse structure:
Okay: 61% span, relation structure
Relation identification: poor
SLIDE 82
Issues
Goal: Single tree-shaped analysis of all text
Difficult to achieve
SLIDE 83 Issues
Goal: Single tree-shaped analysis of all text
Difficult to achieve
Significant ambiguity Significant disagreement among labelers
SLIDE 84 Issues
Goal: Single tree-shaped analysis of all text
Difficult to achieve
Significant ambiguity Significant disagreement among labelers Relation recognition is difficult
Some clear “signals”, I.e. although Not mandatory, only 25%
SLIDE 85 Summary
Computational discourse:
Cohesion and Coherence in extended spans Key tasks:
Reference resolution
Constraints and preferences Heuristic, learning, and sieve models
Discourse structure modeling
Linear topic segmentation, hierarchical relation induction
Exploiting shallow and deep language processing
SLIDE 86 Problem 1
NP3 NP4 NP5 NP6 NP7 NP8 NP9 NP2 NP1 farthest antecedent
Coreference is a rare relation
skewed class distributions (2% positive
instances)
remove some negative instances
SLIDE 87 Problem 2
Coreference is a discourse-level problem
different solutions for different types of NPs
proper names: string matching and aliasing
inclusion of “hard” positive training instances positive example selection: selects easy positive
training instances (cf. Harabagiu et al. (2001))
Select most confident antecedent as positive instance
Queen Elizabeth set about transforming her husband, King George VI, into a viable monarch. Logue, the renowned speech therapist, was summoned to help the King overcome his speech impediment...
SLIDE 88 Problem 3
Coreference is an equivalence relation
loss of transitivity need to tighten the connection between
classification and clustering
prune learned rules w.r.t. the clustering-level
coreference scoring function
[Queen Elizabeth] set about transforming [her] [husband], ...
coref ? coref ? not coref ?
SLIDE 89
Results Snapshot
SLIDE 90
Classification & Clustering
Classifiers:
C4.5 (Decision Trees) RIPPER – automatic rule learner
SLIDE 91
Classification & Clustering
Classifiers:
C4.5 (Decision Trees), RIPPER
Cluster: Best-first, single link clustering
Each NP in own class Test preceding NPs Select highest confidence coreferent, merge classes
SLIDE 92
Baseline Feature Set
SLIDE 93 Extended Feature Set
Explore 41 additional features
More complex NP matching (7) Detail NP type (4) – definite, embedded, pronoun,.. Syntactic Role (3) Syntactic constraints (8) – binding, agreement, etc Heuristics (9) – embedding, quoting, etc Semantics (4) – WordNet distance, inheritance, etc Distance (1) – in paragraphs Pronoun resolution (2)
Based on simple or rule-based resolver
SLIDE 94
Feature Selection
Too many added features
Hand select ones with good coverage/precision
SLIDE 95 Feature Selection
Too many added features
Hand select ones with good coverage/precision
Compare to automatically selected by learner
Useful features are:
Agreement Animacy Binding Maximal NP
Reminiscent of Lappin & Leass
SLIDE 96 Feature Selection
Too many added features
Hand select ones with good coverage/precision
Compare to automatically selected by learner
Useful features are:
Agreement Animacy Binding Maximal NP
Reminiscent of Lappin & Leass
Still best results on MUC-7 dataset: 0.634