Coreference & Coherence Ling571 Deep Processing Techniques for - - PowerPoint PPT Presentation

coreference coherence
SMART_READER_LITE
LIVE PREVIEW

Coreference & Coherence Ling571 Deep Processing Techniques for - - PowerPoint PPT Presentation

Coreference & Coherence Ling571 Deep Processing Techniques for NLP March 9, 2015 Roadmap Coreference algorithms: Machine learning Deterministic sieves Discourse structure Cohesion Topic segmentation


slide-1
SLIDE 1

Coreference & Coherence

Ling571 Deep Processing Techniques for NLP March 9, 2015

slide-2
SLIDE 2

Roadmap

— Coreference algorithms:

— Machine learning — Deterministic sieves

— Discourse structure

— Cohesion

— Topic segmentation

— Coherence

— Discourse parsing

slide-3
SLIDE 3

NP Coreference Examples

— Link all NPs refer to same entity

Queen Elizabeth set about transforming her husband, King George VI, into a viable monarch. Logue, a renowned speech therapist, was summoned to help the King overcome his speech impediment...

Example from Cardie&Ng 2004

slide-4
SLIDE 4

Typical Feature Set

— 25 features per instance: 2NPs, features, class

— lexical (3)

— string matching for pronouns, proper names, common nouns

— grammatical (18)

— pronoun_1, pronoun_2, demonstrative_2, indefinite_2, … — number, gender, animacy — appositive, predicate nominative — binding constraints, simple contra-indexing constraints, … — span, maximalnp, …

— semantic (2)

— same WordNet class — alias

— positional (1)

— distance between the NPs in terms of # of sentences

— knowledge-based (1) — naïve pronoun resolution algorithm

slide-5
SLIDE 5

Clustering by Classification

— Mention-pair style system:

— For each pair of NPs, classify +/- coreferent

— Any classifier

slide-6
SLIDE 6

Clustering by Classification

— Mention-pair style system:

— For each pair of NPs, classify +/- coreferent

— Any classifier

— Linked pairs form coreferential chains

— Process candidate pairs from End to Start — All mentions of an entity appear in single chain

slide-7
SLIDE 7

Clustering by Classification

— Mention-pair style system:

— For each pair of NPs, classify +/- coreferent

— Any classifier

— Linked pairs form coreferential chains

— Process candidate pairs from End to Start — All mentions of an entity appear in single chain

— F-measure: MUC-6: 62-66%; MUC-7: 60-61%

— Soon et. al, Cardie and Ng (2002)

slide-8
SLIDE 8

Multi-pass Sieve Approach

— Raghunathan et al., 2010

— Key Issues:

— Limitations of mention-pair classifier approach

slide-9
SLIDE 9

Multi-pass Sieve Approach

— Raghunathan et al., 2010

— Key Issues:

— Limitations of mention-pair classifier approach

— Local decisions over large number of features

— Not really transitive

slide-10
SLIDE 10

Multi-pass Sieve Approach

— Raghunathan et al., 2010

— Key Issues:

— Limitations of mention-pair classifier approach

— Local decisions over large number of features

— Not really transitive — Can’t exploit global constraints

slide-11
SLIDE 11

Multi-pass Sieve Approach

— Raghunathan et al., 2010

— Key Issues:

— Limitations of mention-pair classifier approach

— Local decisions over large number of features

— Not really transitive — Can’t exploit global constraints — Low precision features may overwhelm less frequent, high

precision ones

slide-12
SLIDE 12

Multi-pass Sieve Strategy

— Basic approach:

— Apply tiers of deterministic coreference modules

— Ordered highest to lowest precision

— Aggregate information across mentions in cluster

— Share attributes based on prior tiers

— Simple, extensible architecture

— Outperforms many other (un-)supervised approaches

slide-13
SLIDE 13

Pre-Processing and Mentions

— Pre-processing:

— Gold mention boundaries given, parsed, NE tagged

slide-14
SLIDE 14

Pre-Processing and Mentions

— Pre-processing:

— Gold mention boundaries given, parsed, NE tagged

— For each mention, each module can skip or pick

best candidate antecedent — Antecedents ordered:

— Same sentence:

slide-15
SLIDE 15

Pre-Processing and Mentions

— Pre-processing:

— Gold mention boundaries given, parsed, NE tagged

— For each mention, each module can skip or pick

best candidate antecedent — Antecedents ordered:

— Same sentence: by Hobbs algorithm — Prev. sentence:

— For Nominal: by right-to-left, breadth first: proximity/

recency

— For Pronoun: left-to-right: salience hierarchy

slide-16
SLIDE 16

Pre-Processing and Mentions

— Pre-processing:

— Gold mention boundaries given, parsed, NE tagged

— For each mention, each module can skip or pick best

candidate antecedent — Antecedents ordered:

— Same sentence: by Hobbs algorithm — Prev. sentence:

— For Nominal: by right-to-left, breadth first: proximity/recency — For Pronoun: left-to-right: salience hierarchy

— W/in cluster: aggregate attributes, order mentions — Prune indefinite mentions: can’t have antecedents

slide-17
SLIDE 17

Multi-pass Sieve Modules

— Pass 1: Exact match (N): P: 96%

slide-18
SLIDE 18

Multi-pass Sieve Modules

— Pass 1: Exact match (N): P: 96% — Pass 2: Precise constructs

slide-19
SLIDE 19

Multi-pass Sieve Modules

— Pass 1: Exact match (N): P: 96% — Pass 2: Precise constructs

— Predicate nominative, (role) appositive, re;. pronoun,

acronym, demonym

— Pass 3: Strict head matching

— Matches cluster head noun AND all non-stop cluster

wds AND modifiers AND non i-within-I (embedded NP)

slide-20
SLIDE 20

Multi-pass Sieve Modules

— Pass 1: Exact match (N): P: 96% — Pass 2: Precise constructs

— Predicate nominative, (role) appositive, re;. pronoun,

acronym, demonym

— Pass 3: Strict head matching

— Matches cluster head noun AND all non-stop cluster

wds AND modifiers AND non i-within-I (embedded NP)

— Pass 4 & 5: Variants of 3: drop one of above

slide-21
SLIDE 21

Multi-pass Sieve Modules

— Pass 6: Relaxed head match

— Head matches any word in cluster AND all non-stop

cluster wds AND non i-within-I (embedded NP)

slide-22
SLIDE 22

Multi-pass Sieve Modules

— Pass 6: Relaxed head match

— Head matches any word in cluster AND all non-stop

cluster wds AND non i-within-I (embedded NP)

— Pass 7: Pronouns

— Enforce constraints on gender, number, person,

animacy, and NER labels

slide-23
SLIDE 23

Multi-pass Effectiveness

slide-24
SLIDE 24

Sieve Effectiveness

— ACE Newswire

slide-25
SLIDE 25

Questions

— Good accuracies on (clean) text. What about…

slide-26
SLIDE 26

Questions

— Good accuracies on (clean) text. What about…

— Conversational speech?

— Ill-formed, disfluent

slide-27
SLIDE 27

Questions

— Good accuracies on (clean) text. What about…

— Conversational speech?

— Ill-formed, disfluent

— Dialogue?

— Multiple speakers introduce referents

slide-28
SLIDE 28

Questions

— Good accuracies on (clean) text. What about…

— Conversational speech?

— Ill-formed, disfluent

— Dialogue?

— Multiple speakers introduce referents

— Multimodal communication?

— How else can entities be evoked? — Are all equally salient?

slide-29
SLIDE 29

More Questions

— Good accuracies on (clean) (English) text: What

about.. — Other languages?

slide-30
SLIDE 30

More Questions

— Good accuracies on (clean) (English) text: What

about.. — Other languages?

— Salience hierarchies the same

— Other factors

slide-31
SLIDE 31

More Questions

— Good accuracies on (clean) (English) text: What

about.. — Other languages?

— Salience hierarchies the same

— Other factors

— Syntactic constraints?

— E.g. reflexives in Chinese, Korean,..

slide-32
SLIDE 32

More Questions

— Good accuracies on (clean) (English) text: What

about.. — Other languages?

— Salience hierarchies the same

— Other factors

— Syntactic constraints?

— E.g. reflexives in Chinese, Korean,..

— Zero anaphora?

— How do you resolve a pronoun if you can’t find it?

slide-33
SLIDE 33

Reference Resolution Algorithms

— Many other alternative strategies:

— Linguistically informed, saliency hierarchy

— Centering Theory

— Machine learning approaches:

— Supervised: Maxent — Unsupervised: Clustering

— Heuristic, high precision:

— Cogniac

slide-34
SLIDE 34

Conclusions

— Co-reference establishes coherence — Reference resolution depends on coherence — Variety of approaches:

— Syntactic constraints, Recency, Frequency,Role

— Similar effectiveness - different requirements — Co-reference can enable summarization within and

across documents (and languages!)

slide-35
SLIDE 35

Discourse Structure

slide-36
SLIDE 36

Why Model Discourse Structure? (Theoretical)

— Discourse: not just constituent utterances

— Create joint meaning — Context guides interpretation of constituents — How????

— What are the units? — How do they combine to establish meaning?

— How can we derive structure from surface forms?

— What makes discourse coherent vs not? — How do they influence reference resolution?

slide-37
SLIDE 37

Why Model Discourse Structure?(Applied)

— Design better summarization, understanding — Improve speech synthesis

— Influenced by structure

— Develop approach for generation of discourse — Design dialogue agents for task interaction — Guide reference resolution

slide-38
SLIDE 38

Discourse Topic Segmentation

— Separate news broadcast into component stories

— Necessary for information retrieval

On "World News Tonight" this Thursday, another bad day on stock markets, all over the world global economic anxiety. Another massacre in Kosovo, the U.S. and its allies prepare to do something about it. Very slowly. And the millennium bug, Lubbock Texas prepares for catastrophe, Banglaore in India sees

  • nly profit.
slide-39
SLIDE 39

Discourse Topic Segmentation

— Separate news broadcast into component stories

On "World News Tonight" this Thursday, another bad day on stock markets, all over the world global economic anxiety. || Another massacre in Kosovo, the U.S. and its allies prepare to do something about it. Very slowly. || And the millennium bug, Lubbock Texas prepares for catastrophe, Bangalore in India sees only profit.||

slide-40
SLIDE 40

Discourse Segmentation

— Basic form of discourse structure

— Divide document into linear sequence of subtopics

— Many genres have conventional structures:

slide-41
SLIDE 41

Discourse Segmentation

— Basic form of discourse structure

— Divide document into linear sequence of subtopics

— Many genres have conventional structures:

— Academic: Into, Hypothesis, Methods, Results, Concl.

slide-42
SLIDE 42

Discourse Segmentation

— Basic form of discourse structure

— Divide document into linear sequence of subtopics

— Many genres have conventional structures:

— Academic: Into, Hypothesis, Methods, Results, Concl. — Newspapers: Headline, Byline, Lede, Elaboration

slide-43
SLIDE 43

Discourse Segmentation

— Basic form of discourse structure

— Divide document into linear sequence of subtopics

— Many genres have conventional structures:

— Academic: Into, Hypothesis, Methods, Results, Concl. — Newspapers: Headline, Byline, Lede, Elaboration — Patient Reports: Subjective, Objective, Assessment, Plan

slide-44
SLIDE 44

Discourse Segmentation

— Basic form of discourse structure

— Divide document into linear sequence of subtopics

— Many genres have conventional structures:

— Academic: Into, Hypothesis, Methods, Results, Concl. — Newspapers: Headline, Byline, Lede, Elaboration — Patient Reports: Subjective, Objective, Assessment, Plan

— Can guide: summarization, retrieval

slide-45
SLIDE 45

Cohesion

— Use of linguistics devices to link text units

— Lexical cohesion:

— Link with relations between words

— Synonymy, Hypernymy — Peel, core and slice the pears and the apples. Add the fruit to the skillet.

slide-46
SLIDE 46

Cohesion

— Use of linguistics devices to link text units

— Lexical cohesion:

— Link with relations between words

— Synonymy, Hypernymy — Peel, core and slice the pears and the apples. Add the fruit to the skillet.

— Non-lexical cohesion:

— E.g. anaphora

— Peel, core and slice the pears and the apples. Add them to the skillet.

slide-47
SLIDE 47

Cohesion

— Use of linguistics devices to link text units

— Lexical cohesion:

— Link with relations between words

— Synonymy, Hypernymy — Peel, core and slice the pears and the apples. Add the fruit to the skillet.

— Non-lexical cohesion:

— E.g. anaphora

— Peel, core and slice the pears and the apples. Add them to the skillet.

— Cohesion chain establishes link through sequence of words — Segment boundary = dip in cohesion

slide-48
SLIDE 48

TextTiling (Hearst ‘97)

— Lexical cohesion-based segmentation

— Boundaries at dips in cohesion score — Tokenization, Lexical cohesion score, Boundary ID

slide-49
SLIDE 49

TextTiling (Hearst ‘97)

— Lexical cohesion-based segmentation

— Boundaries at dips in cohesion score — Tokenization, Lexical cohesion score, Boundary ID

— Tokenization

— Units?

slide-50
SLIDE 50

TextTiling (Hearst ‘97)

— Lexical cohesion-based segmentation

— Boundaries at dips in cohesion score — Tokenization, Lexical cohesion score, Boundary ID

— Tokenization

— Units?

— White-space delimited words — Stopped — Stemmed — 20 words = 1 pseudo sentence

slide-51
SLIDE 51

Lexical Cohesion Score

— Similarity between spans of text

— b = ‘Block’ of 10 pseudo-sentences before gap — a = ‘Block’ of 10 pseudo-sentences after gap — How do we compute similarity?

slide-52
SLIDE 52

Lexical Cohesion Score

— Similarity between spans of text

— b = ‘Block’ of 10 pseudo-sentences before gap — a = ‘Block’ of 10 pseudo-sentences after gap — How do we compute similarity?

— Vectors and cosine similarity (again!)

simcosine(  b,  a) =  b •  a  b  a = bi × ai

i=1 N

bi

2 i=1 N

ai

2 i=1 N

slide-53
SLIDE 53

Segmentation

— Depth score:

— Difference between position and adjacent peaks — E.g., (ya1-ya2)+(ya3-ya2)

slide-54
SLIDE 54

Evaluation

— How about precision/recall/F-measure?

— Problem: No credit for near-misses

— Alternative model: WindowDiff

WindowDiff (ref,hyp) = 1 N − k ( b(refi,refi+k)− b(hypi,hypi+k) ≠ 0)

i=1 N−k

slide-55
SLIDE 55

Discussion

— Overall: Auto much better than random — Issues

slide-56
SLIDE 56

Discussion

— Overall: Auto much better than random — Issues: Summary material

— Often not similar to adjacent paras

— Similarity measures

slide-57
SLIDE 57

Discussion

— Overall: Auto much better than random — Issues: Summary material

— Often not similar to adjacent paras

— Similarity measures

— Is raw tf the best we can do? — Other cues??

— Other experiments with TextTiling perform

less well – Why?

slide-58
SLIDE 58

Text Coherence

— Cohesion – repetition, etc – does not imply coherence — Coherence relations:

— Possible meaning relations between utts in discourse

slide-59
SLIDE 59

Text Coherence

— Cohesion – repetition, etc – does not imply coherence — Coherence relations:

— Possible meaning relations between utts in discourse — Examples:

— Result: Infer state of S0 cause state in S1

— The Tin Woodman was caught in the rain. His joints rusted.

slide-60
SLIDE 60

Text Coherence

— Cohesion – repetition, etc – does not imply coherence — Coherence relations:

— Possible meaning relations between utts in discourse — Examples:

— Result: Infer state of S0 cause state in S1

— The Tin Woodman was caught in the rain. His joints rusted.

— Explanation: Infer state in S1 causes state in S0

— John hid Bill’s car keys. He was drunk.

slide-61
SLIDE 61

Text Coherence

— Cohesion – repetition, etc – does not imply coherence — Coherence relations:

— Possible meaning relations between utts in discourse — Examples:

— Result: Infer state of S0 cause state in S1

— The Tin Woodman was caught in the rain. His joints rusted.

— Explanation: Infer state in S1 causes state in S0

— John hid Bill’s car keys. He was drunk.

— Elaboration: Infer same prop. from S0 and S1.

— Dorothy was from Kansas. She lived in the great Kansas prairie.

— Pair of locally coherent clauses: discourse segment

slide-62
SLIDE 62

Coherence Analysis

S1: John went to the bank to deposit his paycheck. S2: He then took a train to Bill’s car dealership. S3: He needed to buy a car. S4: The company he works now isn’t near any public transportation. S5: He also wanted to talk to Bill about their softball league.

slide-63
SLIDE 63

Coherence Analysis

S1: John went to the bank to deposit his paycheck. S2: He then took a train to Bill’s car dealership. S3: He needed to buy a car. S4: The company he works now isn’t near any public transportation. S5: He also wanted to talk to Bill about their softball league.

slide-64
SLIDE 64

Coherence Analysis

S1: John went to the bank to deposit his paycheck. S2: He then took a train to Bill’s car dealership. S3: He needed to buy a car. S4: The company he works now isn’t near any public transportation. S5: He also wanted to talk to Bill about their softball league.

slide-65
SLIDE 65

Rhetorical Structure Theory

— Mann & Thompson (1987) — Goal: Identify hierarchical structure of text

— Cover wide range of TEXT types

— Language contrasts

— Relational propositions (intentions)

— Derives from functional relations b/t clauses

slide-66
SLIDE 66

Components of RST

— Relations:

— Hold b/t two text spans, nucleus and satellite

— Nucleus core element, satellite peripheral — Constraints on each, between — Effect: why the author wrote this

slide-67
SLIDE 67

Components of RST

— Relations:

— Hold b/t two text spans, nucleus and satellite

— Nucleus core element, satellite peripheral — Constraints on each, between — Effect: why the author wrote this

— Schemas:

— Grammar of legal relations between text spans — Define possible RST text structures

— Most common: N + S, others involve two or more nuclei

slide-68
SLIDE 68

Components of RST

— Relations:

— Hold b/t two text spans, nucleus and satellite

— Nucleus core element, satellite peripheral — Constraints on each, between — Effect: why the author wrote this

— Schemas:

— Grammar of legal relations between text spans — Define possible RST text structures

— Most common: N + S, others involve two or more nuclei

— Structures:

— Using clause units, complete, connected, unique,

adjacent

slide-69
SLIDE 69

RST Relations

— Core of RST

— RST analysis requires building tree of relations — Circumstance, Solutionhood, Elaboration.

Background, Enablement, Motivation, Evidence, Justify, Vol. Cause, Non-Vol. Cause, Vol. Result, Non-

  • Vol. Result, Purpose, Antithesis, Concession,

Condition, Otherwise, Interpretation, Evaluation, Restatement, Summary, Sequence, Contrast

slide-70
SLIDE 70

RST Relations

— Evidence

— Effect: Evidence (Satellite) increases R’s belief in

Nucleus

— The program really works. (N) — I entered all my info and it matched my results. (S)

1 2

Evidence

slide-71
SLIDE 71
slide-72
SLIDE 72

RST Parsing

— Learn and apply classifiers for

— Segmentation and parsing of discourse

slide-73
SLIDE 73

RST Parsing

— Learn and apply classifiers for

— Segmentation and parsing of discourse

— Assign coherence relations between spans

slide-74
SLIDE 74

RST Parsing

— Learn and apply classifiers for

— Segmentation and parsing of discourse

— Assign coherence relations between spans — Create a representation over whole text => parse — Discourse structure

— RST trees

— Fine-grained, hierarchical structure

— Clause-based units

slide-75
SLIDE 75

Identifying Segments & Relations

— Key source of information:

slide-76
SLIDE 76

Identifying Segments & Relations

— Key source of information:

— Cue phrases

— Aka discourse markers, cue words, clue words — Although, but, for example, however, yet, with, and….

— John hid Bill’s keys because he was drunk.

— Issues:

slide-77
SLIDE 77

Identifying Segments & Relations

— Key source of information:

— Cue phrases

— Aka discourse markers, cue words, clue words — Although, but, for example, however, yet, with, and….

— John hid Bill’s keys because he was drunk.

— Issues:

— Ambiguity: discourse vs sentential use

— With its distant orbit, Mars exhibits frigid weather. — We can see Mars with a telescope.

— Ambiguity: cue multiple discourse relations

— Because: CAUSE/EVIDENCE; But: CONTRAST/

CONCESSION

slide-78
SLIDE 78

Cue Phrases

— Last issue:

— Insufficient:

slide-79
SLIDE 79

Cue Phrases

— Last issue:

— Insufficient:

— Not all relations marked by cue phrases — Only 15-25% of relations marked by cues

slide-80
SLIDE 80

Learning Discourse Parsing

— Train classifiers for:

— Segmentation — Coherence relation assignment — Discourse structure assignment

— Shift-reduce parser transitions

— Use range of features:

— Cue phrases — Lexical/punctuation in context — Syntactic parses

slide-81
SLIDE 81

Evaluation

— Segmentation:

— Good: 96%

— Better than frequency or punctuation baseline

— Discourse structure:

— Okay: 61% span, relation structure

— Relation identification: poor

slide-82
SLIDE 82

Issues

— Goal: Single tree-shaped analysis of all text

— Difficult to achieve

slide-83
SLIDE 83

Issues

— Goal: Single tree-shaped analysis of all text

— Difficult to achieve

— Significant ambiguity — Significant disagreement among labelers

slide-84
SLIDE 84

Issues

— Goal: Single tree-shaped analysis of all text

— Difficult to achieve

— Significant ambiguity — Significant disagreement among labelers — Relation recognition is difficult

— Some clear “signals”, I.e. although — Not mandatory, only 25%

slide-85
SLIDE 85

Summary

— Computational discourse:

— Cohesion and Coherence in extended spans — Key tasks:

— Reference resolution

— Constraints and preferences — Heuristic, learning, and sieve models

— Discourse structure modeling

— Linear topic segmentation, hierarchical relation induction

— Exploiting shallow and deep language processing

slide-86
SLIDE 86

Problem 1

NP3 NP4 NP5 NP6 NP7 NP8 NP9 NP2 NP1 farthest antecedent

— Coreference is a rare relation

— skewed class distributions (2% positive

instances)

— remove some negative instances

slide-87
SLIDE 87

Problem 2

— Coreference is a discourse-level problem

— different solutions for different types of NPs

— proper names: string matching and aliasing

— inclusion of “hard” positive training instances — positive example selection: selects easy positive

training instances (cf. Harabagiu et al. (2001))

— Select most confident antecedent as positive instance

Queen Elizabeth set about transforming her husband, King George VI, into a viable monarch. Logue, the renowned speech therapist, was summoned to help the King overcome his speech impediment...

slide-88
SLIDE 88

Problem 3

— Coreference is an equivalence relation

— loss of transitivity — need to tighten the connection between

classification and clustering

— prune learned rules w.r.t. the clustering-level

coreference scoring function

[Queen Elizabeth] set about transforming [her] [husband], ...

coref ? coref ? not coref ?

slide-89
SLIDE 89

Results Snapshot

slide-90
SLIDE 90

Classification & Clustering

— Classifiers:

— C4.5 (Decision Trees) — RIPPER – automatic rule learner

slide-91
SLIDE 91

Classification & Clustering

— Classifiers:

— C4.5 (Decision Trees), RIPPER

— Cluster: Best-first, single link clustering

— Each NP in own class — Test preceding NPs — Select highest confidence coreferent, merge classes

slide-92
SLIDE 92

Baseline Feature Set

slide-93
SLIDE 93

Extended Feature Set

— Explore 41 additional features

— More complex NP matching (7) — Detail NP type (4) – definite, embedded, pronoun,.. — Syntactic Role (3) — Syntactic constraints (8) – binding, agreement, etc — Heuristics (9) – embedding, quoting, etc — Semantics (4) – WordNet distance, inheritance, etc — Distance (1) – in paragraphs — Pronoun resolution (2)

— Based on simple or rule-based resolver

slide-94
SLIDE 94

Feature Selection

— Too many added features

— Hand select ones with good coverage/precision

slide-95
SLIDE 95

Feature Selection

— Too many added features

— Hand select ones with good coverage/precision

— Compare to automatically selected by learner

— Useful features are:

— Agreement — Animacy — Binding — Maximal NP

— Reminiscent of Lappin & Leass

slide-96
SLIDE 96

Feature Selection

— Too many added features

— Hand select ones with good coverage/precision

— Compare to automatically selected by learner

— Useful features are:

— Agreement — Animacy — Binding — Maximal NP

— Reminiscent of Lappin & Leass

— Still best results on MUC-7 dataset: 0.634