Entity Coreference Resolution CMSC 473/673 UMBC December 6 th , - - PowerPoint PPT Presentation

entity coreference resolution
SMART_READER_LITE
LIVE PREVIEW

Entity Coreference Resolution CMSC 473/673 UMBC December 6 th , - - PowerPoint PPT Presentation

Entity Coreference Resolution CMSC 473/673 UMBC December 6 th , 2017 Course Announcement 1: Assignment 4 Due Monday December 11 th (~5 days) Remaining late days can be used until 12/20, 11:59 AM Any questions? Course Announcement 2: Project


slide-1
SLIDE 1

Entity Coreference Resolution

CMSC 473/673 UMBC December 6th, 2017

slide-2
SLIDE 2

Course Announcement 1: Assignment 4

Due Monday December 11th (~5 days) Remaining late days can be used until 12/20, 11:59 AM Any questions?

slide-3
SLIDE 3

Course Announcement 2: Project

Due Wednesday 12/20, 11:59 AM Late days cannot be used Any questions?

slide-4
SLIDE 4

Course Announcement 3: Final Exam

No mandatory final exam December 20th, 1pm-3pm: optional second midterm/final Averaged into first midterm score No practice questions Register by Monday 12/11: https://goo.gl/forms/aXflKkP0BIRxhOS83

slide-5
SLIDE 5

Course Announcement 4: Evaluations

Please fill them out! (We do pay attention to them) Links from StudentCourseEvaluations@umbc.edu

slide-6
SLIDE 6

Recap from last time…

slide-7
SLIDE 7

Entailment: Underlying a Number of Applications

Information extraction: X acquire Y Information retrieval: Overture was bought for … Summarization: identify redundant information MT evaluation Overture’s acquisition by Yahoo Yahoo bought Overture

Question Expected answer form

Who bought Overture? >> X bought Overture

text hypothesized answer

entails

Adapted from Dagan, Roth and Zanzotto (2007; tutorial)

slide-8
SLIDE 8

Applied Textual Entailment

A directional relation between two text fragments

t (text) entails h (hypothesis) (t  h) if humans reading t will infer that h is most likely true

Adapted from Dagan, Roth and Zanzotto (2007; tutorial)

t probabilistically entails h if: P(h is true | t) > P(h is true) t increases the likelihood of h being true Positive PMI – t provides information on h’s truth the value is the entailment confidence

slide-9
SLIDE 9

PASCAL RTE Examples

TEXT HYPOTHESIS TASK ENTAILMENT Reagan attended a ceremony in Washington to commemorate the landings in Normandy. Washington is located in Normandy. IE False Google files for its long awaited IPO. Google goes public. IR True …: a shootout at the Guadalajara airport in May, 1993, that killed Cardinal Juan Jesus Posadas Ocampo and six others. Cardinal Juan Jesus Posadas Ocampo died in 1993. QA True The SPD got just 21.5% of the vote in the European Parliament elections, while the conservative opposition parties polled 44.5%. The SPD is defeated by the opposition parties. IE True

Adapted from Dagan, Roth and Zanzotto (2007; tutorial)

slide-10
SLIDE 10

Basic Representations

Meaning Representation Raw Text

Inference Textual Entailment Local Lexical Syntactic Parse Semantic Representation Logical Forms

Adapted from Dagan, Roth and Zanzotto (2007; tutorial)

slide-11
SLIDE 11

Common and Successful Approaches (Features)

Measure similarity match between t and h

Lexical overlap (unigram, N-gram, subsequence) Lexical substitution (WordNet, statistical) Syntactic matching/transformations Lexical-syntactic variations (“paraphrases”) Semantic role labeling and matching Global similarity parameters (e.g. negation, modality)

Cross-pair similarity Detect mismatch (for non- entailment) Interpretation to logic representation + logic inference

Lexical baselines are hard to beat!

Lack of knowledge (syntactic transformation rules, paraphrases, lexical relations, etc.) Lack of training data

Adapted from Dagan, Roth and Zanzotto (2007; tutorial)

slide-12
SLIDE 12

Knowledge Acquisition

Direct Algorithms

Concepts from text via clustering (Lin and Pantel, 2001) Inference rules – aka DIRT (Lin and Pantel, 2001) …

Indirect Algorithms

Hearst’s ISA patterns (Hearst, 1992) Question Answering patterns (Ravichandran and Hovy, 2002) …

Iterative Algorithms

Entailment rules from Web (Szepktor et al., 2004) Espresso (Pantel and Pennacchiotti, 2006) …

Adapted from Dagan, Roth and Zanzotto (2007; tutorial)

slide-13
SLIDE 13

Choice of Plausible Alternatives (COPA; Roemmele et al., 2011)

Goal: test causal implication, not (likely) entailment 1000 questions Premise, prompt, and 2 plausible alternatives Forced choice, 50% random baseline Forward and backward causality Cohen’s Kappa = 0.95 (only 30 disagreements) http://ict.usc.edu/~gordon/copa. html

Adapted from Roemmele et al. (2011)

Forward causal reasoning:

The chef hit the egg on the side of the

  • bowl. What happened as a RESULT?
  • A. The egg cracked.
  • B. The egg rotted.

Backward causal reasoning:

The man broke his toe. What was the CAUSE of this?

  • A. He got a hole in his sock.
  • B. He dropped a hammer on his foot.
slide-14
SLIDE 14

COPA Test Results

Method Test Accuracy PMI (window of 5) 58.8 PMI (window of 25) 58.6 PMI (window of 50) 55.6 Goodwin et al.: bigram PMI 61.8 Goodwin et al.: SVM 63.4 Performance of purely associative statistical NLP techniques? Statements that are causally related

  • ften occur close together in text

Connected by causal expressions (“because”, “as a result”, “so”)

Approach: choose the alternative with a stronger correlation to the premise

PMI a la Church and Hanks, 1989

Goodwin et al.

Adapted from Roemmele et al. (2011)

slide-15
SLIDE 15

SNLI (Bowman et al., 2015)

Bowman et al. (2015) SNLI Test Performance Lexicalized 78.2 Unigrams Only 71.6 Unlexicalized 50.4 Neural: sum of word vectors 75.3 Neural: LSTM 77.6

BLEU score between hypothesis and premise # words in hypothesis - # words in premise word overlap unigram and bigrams in the hypothesis Cross-unigrams: for every pair of words across the premise and hypothesis which share a POS tag, an indicator feature over the two words. Cross-bigrams: for every pair of bigrams across the premise and hypothesis which share a POS tag on the second word, an indicator feature over the two bigrams

slide-16
SLIDE 16

Pat and Chandler agreed on a plan. He said Pat would try the same tactic again.

Entity Coreference Resolution

slide-17
SLIDE 17

Pat and Chandler agreed on a plan. He said Pat would try the same tactic again.

Entity Coreference Resolution

slide-18
SLIDE 18

Pat and Chandler agreed on a plan. He said Pat would try the same tactic again.

is “he” the same person as “Chandler?”

?

Entity Coreference Resolution

slide-19
SLIDE 19

Coref Applications

Question answering Information extraction Machine translation Text summarization Information retrieval

slide-20
SLIDE 20

I spread the cloth on the table to protect it. I spread the cloth on the table to display it.

Sentences courtesy Jason Eisner

Winograd Schemas

slide-21
SLIDE 21

Rule-Based Attempts

Popular in the 1970s and 1980s Charniak (1972): Children’s story comprehension “In order to do pronoun resolution, one had to be able to do everything else.” Focus on sophisticated knowledge & inference mechanisms Syntax-based approaches (Hobbs, 1976) Discourse-based approaches / Centering algorithms Kantor (1977), Grosz (1977), Webber (1978), Sidner (1979)

Ng (2006; IJCAI Tutorial)

slide-22
SLIDE 22

Basic System

Input Text

slide-23
SLIDE 23

Basic System

Preprocessing Input Text

slide-24
SLIDE 24

Basic System

Preprocessing Mention Detection Input Text

slide-25
SLIDE 25

Basic System

Preprocessing Mention Detection Coref Model Input Text

slide-26
SLIDE 26

Basic System

Preprocessing Mention Detection Coref Model Output Input Text

slide-27
SLIDE 27

Basic System

Preprocessing Mention Detection Coref Model Output Input Text

slide-28
SLIDE 28

Preprocessing

POS tagging Stemming Predicate argument representation

verb predicates and nominalization

Entity Annotation

Stand alone NERs with a variable number of classes

Dates, times and numeric value normalization Identification of semantic relations

complex nominals, genitives, adjectival phrases, and adjectival clauses

Event identification Shallow Parsing (chunking)

slide-29
SLIDE 29

Preprocessing

POS tagging Stemming Predicate argument representation

verb predicates and nominalization

Entity Annotation

Stand alone NERs with a variable number of classes

Dates, times and numeric value normalization Identification of semantic relations

complex nominals, genitives, adjectival phrases, and adjectival clauses

Event identification Shallow Parsing (chunking)

slide-30
SLIDE 30

Basic System

Preprocessing Mention Detection Coref Model Output Input Text

slide-31
SLIDE 31

What are Named Entities?

Named entity recognition (NER) Identify proper names in texts, and classification into a set of predefined categories of interest Person names Organizations (companies, government

  • rganisations, committees, etc)

Locations (cities, countries, rivers, etc) Date and time expressions

Cunningham and Bontcheva (2003, RANLP Tutorial)

slide-32
SLIDE 32

What are Named Entities?

Named entity recognition (NER) Identify proper names in texts, and classification into a set of predefined categories of interest Person names Organizations (companies, government organisations, committees, etc) Locations (cities, countries, rivers, etc) Date and time expressions Measures (percent, money, weight etc), email addresses, Web addresses, street addresses, etc. Domain-specific: names of drugs, medical conditions, names

  • f ships, bibliographic references etc.

Cunningham and Bontcheva (2003, RANLP Tutorial)

slide-33
SLIDE 33

Basic Problems in NE

Variation of NEs: John Smith, Mr Smith, John. Ambiguity of NE types: John Smith (company vs. person)

May (person vs. month) Washington (person vs. location) 1945 (date vs. time)

Ambiguity with common words, e.g. "may"

Cunningham and Bontcheva (2003, RANLP Tutorial)

slide-34
SLIDE 34

More complex problems in NE

Issues of style, structure, domain, genre etc. Punctuation, spelling, spacing, formatting

  • Dept. of Computing and Maths

Manchester Metropolitan University Manchester United Kingdom

Cunningham and Bontcheva (2003, RANLP Tutorial)

slide-35
SLIDE 35

Two kinds of NE approaches

Knowledge Engineering

rule based developed by experienced language engineers make use of human intuition requires only small amount of training data development could be very time consuming some changes may be hard to accommodate

Learning Systems

requires some (large?) amounts of annotated training data some changes may require re- annotation of the entire training corpus annotators can be cheap

Cunningham and Bontcheva (2003, RANLP Tutorial)

slide-36
SLIDE 36

Baseline: list lookup approach

System that recognises only entities stored in its lists (gazetteers). Advantages - Simple, fast, language independent, easy to retarget (just create lists) Disadvantages – impossible to enumerate all names, collection and maintenance of lists, cannot deal with name variants, cannot resolve ambiguity

Cunningham and Bontcheva (2003, RANLP Tutorial)

slide-37
SLIDE 37

Creating Gazetteer Lists

Online phone directories and yellow pages for person and

  • rganisation names; SSA database

https://www.ssa.gov/oact/babynames/

Locations lists

US GEOnet Names Server (GNS) data – 3.9 million locations with 5.37 million names (e.g., [Manov03]) UN site: http://unstats.un.org/unsd/citydata Global Discovery database from Europa technologies Ltd, UK (e.g., [Ignat03])

Automatic collection from annotated training data

Cunningham and Bontcheva (2003, RANLP Tutorial)

slide-38
SLIDE 38

Shallow Parsing Approach (internal structure)

Internal evidence – names often have internal structure. These components can be either stored or guessed, e.g. location:

  • Cap. Word + {City, Forest, Center, River}

Sherwood Forest

  • Cap. Word + {Street, Boulevard, Avenue, Crescent,

Road}

Portobello Street

Cunningham and Bontcheva (2003, RANLP Tutorial)

slide-39
SLIDE 39

Problems with the shallow parsing approach

Ambiguously capitalized words (first word in sentence)

[All American Bank] vs. All [State Police]

Semantic ambiguity

"John F. Kennedy" = airport (location) "Philip Morris" = organisation

Structural ambiguity

[Cable and Wireless] vs. [Microsoft] and [Dell]; [Center for Computational Linguistics] vs. message from [City Hospital] for [John Smith]

Cunningham and Bontcheva (2003, RANLP Tutorial)

slide-40
SLIDE 40

Gazetteer lists for rule-based NE

Needed to store the indicator strings for the internal structure and context rules Internal location indicators: river, mountain, street, road Internal organisation indicators: Ltd., Inc.

Cunningham and Bontcheva (2003, RANLP Tutorial)

slide-41
SLIDE 41

Named Entity Grammars

Phases run sequentially and constitute a cascade of FSTs

  • ver the pre-processing results

Hand-coded rules applied to annotations to identify NEs Annotations from format analysis, tokeniser, sentence splitter, POS tagger, and gazetteer modules Use of contextual information Finds person names, locations, organisations, dates, addresses.

Cunningham and Bontcheva (2003, RANLP Tutorial)

slide-42
SLIDE 42

NER and Shallow Parsing: Machine Learning

Sequence models (HMM, CRF) often effective BIO encoding

slide-43
SLIDE 43

NER and Shallow Parsing: Machine Learning

Sequence models (HMM, CRF) often effective BIO encoding Pat and Chandler Smith agreed on a plan.

slide-44
SLIDE 44

NER and Shallow Parsing: Machine Learning

Sequence models (HMM, CRF) often effective BIO encoding Pat and Chandler Smith agreed on a plan.

B-PER B-PER I-PER O O O O O

slide-45
SLIDE 45

NER and Shallow Parsing: Machine Learning

Sequence models (HMM, CRF) often effective BIO encoding Pat and Chandler Smith agreed on a plan.

B-PER B-PER I-PER O O O O O B-NP B-NP I-NP O B-VP O B-NP I-NP

slide-46
SLIDE 46

NER and Shallow Parsing: Machine Learning

Sequence models (HMM, CRF) often effective BIO encoding Pat and Chandler Smith agreed on a plan.

B-PER B-PER I-PER O O O O O B-NP B-NP I-NP O B-VP O B-NP I-NP B-NP I-NP I-NP I-NP B-VP O B-NP I-NP

slide-47
SLIDE 47

Basic System

Preprocessing Mention Detection Coref Model Output Input Text

slide-48
SLIDE 48

Pat and Chandler agreed on a plan. He said Pat would try the same tactic again.

?

Model Attempt 1: Binary Classification

Mention-Pair Model

slide-49
SLIDE 49

Pat and Chandler agreed on a plan. He said Pat would try the same tactic again.

Model Attempt 1: Binary Classification

training

negative instances

  • bserved

positive instances

slide-50
SLIDE 50

Pat and Chandler agreed on a plan. He said Pat would try the same tactic again.

Model Attempt 1: Binary Classification

training

  • bserved

positive instances negative instances naïve approach (take all non-positive pairs): highly imbalanced! Soon et al. (2001): heuristic for more balanced selection

slide-51
SLIDE 51

Pat and Chandler agreed on a plan. He said Pat would try the same tactic again.

Model Attempt 1: Binary Classification

possible problem: not transitive

slide-52
SLIDE 52

Pat and Chandler agreed on a plan. He said Pat would try the same tactic again.

Model Attempt 1: Binary Classification

solution: go left-to-right for a mention m, select the closest preceding coreferent mention

  • therwise, no antecedent is found for m

possible problem: not transitive

slide-53
SLIDE 53

Anaphora

does a mention have an antecedent?

slide-54
SLIDE 54

Anaphora

does a mention have an antecedent? Chris told Pat he aced the test.

slide-55
SLIDE 55

Model 2: Entity-Mention Model

cluster 1 cluster 2 cluster 3 cluster 4

Pat and Chandler agreed on a plan. He said Pat would try the same tactic again.

slide-56
SLIDE 56

Model 2: Entity-Mention Model

entity 1 entity 2 entity 3 entity 4

Pat and Chandler agreed on a plan. He said Pat would try the same tactic again.

slide-57
SLIDE 57

Model 2: Entity-Mention Model

entity 1 entity 2 entity 3 entity 4

Pat and Chandler agreed on a plan. He said Pat would try the same tactic again.

advantage: featurize based on all (or some or none) of the clustered mentions

slide-58
SLIDE 58

Model 2: Entity-Mention Model

entity 1 entity 2 entity 3 entity 4

Pat and Chandler agreed on a plan. He said Pat would try the same tactic again.

advantage: featurize based on all (or some or none) of the clustered mentions disadvantage: clustering doesn’t address anaphora

slide-59
SLIDE 59

Model 3: Cluster-Ranking Model (Rahman and Ng, 2009)

entity 1 entity 2 entity 3 entity 4

Pat and Chandler agreed on a plan. He said Pat would try the same tactic again.

learn to rank the clusters and items in them

slide-60
SLIDE 60

Stanford Coref (Lee et al., 2011)

slide-61
SLIDE 61

Stanford Coref (Lee et al., 2011)

John is a musician. He played a new song. A girl was listening to the song. “It is my favorite,” John said to her.

slide-62
SLIDE 62

Stanford Coref (Lee et al., 2011)

[John] is [a musician]. [He] played [a new song]. [A girl] was listening to [the song]. “[It] is [[my] favorite],” [John] said to [her].

slide-63
SLIDE 63

Stanford Coref (Lee et al., 2011)

[John] is [a musician]. [He] played [a new song]. [A girl] was listening to [the song]. “[It] is [[my] favorite],” [John] said to [her].

slide-64
SLIDE 64

Stanford Coref (Lee et al., 2011)

[John] is [a musician]. [He] played [a new song]. [A girl] was listening to [the song]. “[It] is [[my] favorite],” [John] said to [her].

slide-65
SLIDE 65

Stanford Coref (Lee et al., 2011)

[John] is [a musician]. [He] played [a new song]. [A girl] was listening to [the song]. “[It] is [[my] favorite],” [John] said to [her].

slide-66
SLIDE 66

Stanford Coref (Lee et al., 2011)

[John] is [a musician]. [He] played [a new song]. [A girl] was listening to [the song]. “[It] is [[my] favorite],” [John] said to [her].

slide-67
SLIDE 67

Stanford Coref (Lee et al., 2011)

[John] is [a musician]. [He] played [a new song]. [A girl] was listening to [the song]. “[It] is [[my] favorite],” [John] said to [her].

slide-68
SLIDE 68

Stanford Coref (Lee et al., 2011)

[John] is [a musician]. [He] played [a new song]. [A girl] was listening to [the song]. “[It] is [[my] favorite],” [John] said to [her].

slide-69
SLIDE 69

Stanford Coref (Lee et al., 2011)

[John] is [a musician]. [He] played [a new song]. [A girl] was listening to [the song]. “[It] is [[my] favorite],” [John] said to [her].

slide-70
SLIDE 70

Stanford Coref (Lee et al., 2011)

[John] is [a musician]. [He] played [a new song]. [A girl] was listening to [the song]. “[It] is [[my] favorite],” [John] said to [her].

slide-71
SLIDE 71

Other Sieve-Based Approaches

Ratinov & Roth (EMNLP 2012) Each sieve is a machine-learned classifier Later sieves can override earlier sieves’ decisions Can recover from errors as additional evidence is available

Ng (2006; IJCAI Tutorial)

slide-72
SLIDE 72

Other Sieve-Based Approaches

Ratinov & Roth (EMNLP 2012) Each sieve is a machine-learned classifier Later sieves can override earlier sieves’ decisions Can recover from errors as additional evidence is available

  • Nested (e.g., {city of {Jurusalem}})
  • Same Sentence both Named

Entities (NEs)

  • Adjacent (Mentions closest to

each other in dependency tree)

  • Same Sentence NE&Nominal

(e.g., Barack Obama, president)

  • Different Sentence two NEs
  • Same Sentence No Pronouns
  • Different Sentence Closest

Mentions (no intervening mentions)

  • Same Sentence All Pairs
  • All Pairs

Ng (2006; IJCAI Tutorial)

slide-73
SLIDE 73

Possible Classifiers

Perceptron  Structured Perceptron

Learn more in 678

slide-74
SLIDE 74

Possible Classifiers

Perceptron  Structured Perceptron Structured Maxent Classifiers Neural Networks

Learn more in 678