Additional Semantic Tasks: Entity Coreference and Question - PowerPoint PPT Presentation

Additional Semantic Tasks: Entity Coreference and Question Answering CMSC 473/673 UMBC

Outline Entity Coreference Basic Definition Three possible solutions: mention-pair, mention-clustering, coarse-to-fine Question Answering Basic Definition Relation to NLP techniques and other fields Course Recap

Entity Coreference Resolution Pat and Chandler agreed on a plan. ? He said Pat would try the same tactic again. is “he” the same person as “Chandler?”

Basic System Mention Coref Input Preprocessing Output Detection Model Text

What are Named Entities? Named entity recognition (NER) Identify proper names in texts, and classification into a set of predefined categories of interest Person names Organizations (companies, government organisations, committees, etc) Locations (cities, countries, rivers, etc) Date and time expressions Measures (percent, money, weight etc), email addresses, Web addresses, street addresses, etc. Domain-specific: names of drugs, medical conditions, names of ships, bibliographic references etc. Cunningham and Bontcheva (2003, RANLP Tutorial)

Two kinds of NE approaches Knowledge Engineering Learning Systems requires some (large?) amounts of rule based annotated training data developed by experienced language engineers some changes may require re- make use of human intuition annotation of the entire training requires only small amount of training corpus data development could be very time annotators can be cheap consuming some changes may be hard to accommodate Cunningham and Bontcheva (2003, RANLP Tutorial)

Baseline: list lookup approach System that recognises only entities stored in its lists (gazetteers). Advantages - Simple, fast, language independent, easy to retarget (just create lists) Disadvantages – impossible to enumerate all names, collection and maintenance of lists, cannot deal with name variants, cannot resolve ambiguity Cunningham and Bontcheva (2003, RANLP Tutorial)

Shallow Parsing Approach (internal structure) Internal evidence – names often have internal structure. These components can be either stored or guessed, e.g. location: Cap. Word + {City, Forest, Center, River} Sherwood Forest Cap. Word + {Street, Boulevard, Avenue, Crescent, Road} Portobello Street Cunningham and Bontcheva (2003, RANLP Tutorial)

NER and Shallow Parsing: Machine Learning Sequence models (HMM, CRF) often effective BIO encoding NER B-PER O B-PER I-PER O O O O annotation Pat and Chandler Smith agreed on a plan. Two possible O B-NP I-NP B-VP O B-NP I-NP B-NP “chunking” (shallow syntactic parsing) B-NP I-NP I-NP I-NP B-VP O B-NP I-NP annotations

Model Attempt 1: Binary Classification Pat and Chandler agreed on a plan. ? He said Pat would try the same tactic again. Mention-Pair Model

Model Attempt 1: Binary Classification negative instances Pat and Chandler agreed on a plan. observed positive instances He said Pat would try the same tactic again. solution : go left-to-right training naïve approach (take all non-positive possible problem: for a mention m , select the closest pairs): highly imbalanced! not transitive preceding coreferent mention Soon et al. (2001): heuristic for more otherwise, no antecedent is found for balanced selection m

Model 2: Entity-Mention Model Pat and Chandler agreed on a plan. He said Pat would try the same tactic again. entity entity entity entity 1 2 3 4 advantage : featurize based on disadvantage : clustering all (or some or none) of the doesn’t address anaphora clustered mentions does a mention have an antecedent? Chris told Pat he aced the test.

Model 3: Cluster-Ranking Model (Rahman and Ng, 2009) Pat and Chandler agreed on a plan. He said Pat would try the same tactic again. learn to rank the clusters and items in them entity entity entity entity 1 2 3 4

Stanford Coref (Lee et al., 2011)

Outline Entity Coreference Basic Definition Three possible solutions: mention-pair, mention-clustering, coarse-to-fine Question Answering Basic Definition Relation to NLP techniques and other fields Course Recap

IBM Watson https://www.youtube.com/watch?v=C5Xnxjq63Zg https://youtu.be/WFR3lOm_xhE?t=34s

What Happened with Watson? David Ferrucci, the manager of the Watson project at IBM Research, explained during a viewing of the show on Monday morning that several of things probably confused Watson. First, the category names on Jeopardy! are tricky. The answers often do not exactly fit the category. Watson, in his training phase, learned that categories only weakly suggest the kind of answer that is expected, and, therefore, the machine downgrades their significance. The way the language was parsed provided an advantage for the humans and a disadvantage for Watson, as well. “What US city” wasn’t in the question. If it had been, Watson would have given US cities much more weight as it searched for the answer. Adding to the confusion for Watson, there are cities named Toronto in the United States and the Toronto in Canada has an American League baseball team. It probably picked up those facts from the written material it has digested. Also, the machine didn’t find much evidence to connect either city’s airport to World War II. (Chicago was a very close second on Watson’s list of possible answers.) So this is just one of those situations that’s a snap for a reasonably knowledgeable human but a true brain teaser for the machine. https://www.huffingtonpost.com/2011/02/15/watson-final-jeopardy_n_823795.html

How many children does the Queen have?

Dec. 2017

There are still errors (but some questions are harder than others)

Question Answering Motivation Question answering Information extraction Machine translation Text summarization Information retrieval

Two Types of QA Closed domain Often tied to structured database Open domain Often tied to unstructured data

To Learn More: NLP Basic System Information Retrieval (IR) Information Extraction (IE) Question Question Query Input Analysis Classification Construction Question Document Sentence Corpus Retrieval Retrieval KB Sentence Answer Answer Answer NLP Extraction Validation Neves --- https://hpi.de/fileadmin/user_upload/fachgebiete/plattner/teaching/NaturalLanguageProcessing/NLP09_QuestionAnswering.pdf

Aspects of NLP POS tagging Stemming Shallow Parsing (chunking) Predicate argument representation verb predicates and nominalization Entity Annotation Stand alone NERs with a variable number of classes Dates, times and numeric value normalization Identification of semantic relations complex nominals, genitives, adjectival phrases, and adjectival clauses Event identification Semantic Parsing

Question Classification Albert Einstein was born in 14 March 1879. Albert Einstein was born in Germany. Albert Einstein was born in a Jewish family. Neves --- https://hpi.de/fileadmin/user_upload/fachgebiete/plattner/teaching/NaturalLanguageProcessing/NLP09_QuestionAnswering.pdf

Question Classification Taxonomy LOC:other Where do hyenas live? NUM:date When was Ozzy Osbourne born? LOC:other Where do the adventures of ``The Swiss Family Robinson” take place? LOC:other Where is Procter & Gamble based in the U.S.? HUM:ind What barroom judge called himself The Law West of the Pecos? HUM:gr What Polynesian people inhabit New Zealand? SLP3: Figure 28.4 http://cogcomp.org/Data/QA/QC/train_1000.label

Document & Sentence Retrieval NLP Techniques: Vector Space Model Probabilistic Model tf-idf(d, w) t erm f requency, i nverse d ocument f requency Language Model tf: frequency of word w in document d idf: inverse frequency of documents containing w Software: Lucene sklearn count(𝑥 ∈ 𝑒) # documents # tokens 𝑗𝑜 𝑒 ∗ log( # documents containing 𝑥) nltk

Current NLP QA Tasks TREC (Text Retrieval Conference) http://trec.nist.gov/ Started in 1992 Freebase Question Answering e.g., https://nlp.stanford.edu/software/sempre/ Yao et al. (2014) WikiQA https://www.microsoft.com/en-us/research/publication/wikiqa-a- challenge-dataset-for-open-domain-question-answering/

VISION AUDIO prosody intonation orthography color morphology lexemes syntax semantics pragmatics discourse

Visual Question Answering http://www.visualqa.org/

Additional Semantic Tasks: Entity Coreference and Question - PowerPoint PPT Presentation

Additional Semantic Tasks: Entity Coreference and Question Answering CMSC 473/673 UMBC Outline Entity Coreference Basic Definition Three possible solutions: mention-pair, mention-clustering, coarse-to-fine Question Answering Basic

Entity Linking and Coreference Resolution CSCI 699 Instructor: Xiang Ren USC Computer Science

Cross-Lingual Cross-Document Coreference with Entity Linking Sean Monahan, John Lehmann, Timothy

Resolving Entity Coreference in Croatian with a Constrained Mention-Pair Model s and Jan

Interplay of Coreference and Discourse Research and Annotations Anna Nedoluzhko Charles University,

Easy Victories and Uphill Ba4les in Coreference Resolu9on Greg

Latent Structures for Coreference Resolution Sebastian Martschat and Michael Strube Heidelberg

CORBON 2016: Coreference Resolution Beyond OntoNotes NAACL HLT 2016 Workshop Maciej Ogrodniczuk

Evaluating Theories of Coreference Resolution Coreference Resolution: The Task Bayer AG has

Discourse: Coreference Deep Processing Techniques for NLP Ling 571 March 5, 2014 Roadmap

by Learning Entity-Level Distributed Representations K. Clark and C. Manning, ACL 2016

End-to-end Neural Coreference Resolution Kenton Lee, Luheng He, Mike Lewis and Luke Zettlemoyer

Design Challenges for Entity Linking Xiao Ling , Sameer Singh, Daniel S. Weld Entity Linking

Shared Memory Programming with OpenMP Lecture 6: Tasks What are tasks? Tasks are

Scheduling Aperiodic Tasks Background Scheduling Treat aperiodic tasks as lowest-priority

Cross-Lingual Cross-Document Coreference with Entity Linking Sean Monahan, John Lehmann, Timothy

Coreference & Entity Linking Prof. Sameer Singh CS 295: STATISTICAL NLP WINTER 2017 March

Towards Practical Deletion Repair of Inconsistent DL-programs Thomas Eiter Michael Fink Daria

TRECVID 2016 INSTANCE RETRIEVAL INTRODUCTION AND TASK OVERVIEW Wessel Kraaij The

Lead openCypher Language Group (CLG) Neo Technology opencypher.org | opencypher@googlegroups.com

Advancing Analytics: Putting Risk Analytics to Work For Your Business Sponsored By: Advancing

A Network Programming Language Nate Foster, Mike Freedman, Rob Harrison, Chris Monsanto, Jen

Regular Expressions Pattern and Match objects Genome 559: Introduction to Statistical and

Modeling and Analysis of Real -Time Systems with Mutex Components APDCM10 Guoqiang Li 1 ,

802.1 Plenary - 03/2012 Hawaii Opening Agenda General information... See:

Additional Semantic Tasks: Entity Coreference and Question - PowerPoint PPT Presentation

Additional Semantic Tasks: Entity Coreference and Question Answering CMSC 473/673 UMBC Outline Entity Coreference Basic Definition Three possible solutions: mention-pair, mention-clustering, coarse-to-fine Question Answering Basic

Entity Linking and Coreference Resolution CSCI 699 Instructor: Xiang Ren USC Computer Science

Cross-Lingual Cross-Document Coreference with Entity Linking Sean Monahan, John Lehmann, Timothy

Resolving Entity Coreference in Croatian with a Constrained Mention-Pair Model s and Jan

Interplay of Coreference and Discourse Research and Annotations Anna Nedoluzhko Charles University,

Easy Victories and Uphill Ba4les in Coreference Resolu9on Greg

Latent Structures for Coreference Resolution Sebastian Martschat and Michael Strube Heidelberg

CORBON 2016: Coreference Resolution Beyond OntoNotes NAACL HLT 2016 Workshop Maciej Ogrodniczuk

Evaluating Theories of Coreference Resolution Coreference Resolution: The Task Bayer AG has

Discourse: Coreference Deep Processing Techniques for NLP Ling 571 March 5, 2014 Roadmap

by Learning Entity-Level Distributed Representations K. Clark and C. Manning, ACL 2016

End-to-end Neural Coreference Resolution Kenton Lee, Luheng He, Mike Lewis and Luke Zettlemoyer

Design Challenges for Entity Linking Xiao Ling , Sameer Singh, Daniel S. Weld Entity Linking

Shared Memory Programming with OpenMP Lecture 6: Tasks What are tasks? Tasks are

Scheduling Aperiodic Tasks Background Scheduling Treat aperiodic tasks as lowest-priority

Cross-Lingual Cross-Document Coreference with Entity Linking Sean Monahan, John Lehmann, Timothy

Coreference &amp; Entity Linking Prof. Sameer Singh CS 295: STATISTICAL NLP WINTER 2017 March

Towards Practical Deletion Repair of Inconsistent DL-programs Thomas Eiter Michael Fink Daria

TRECVID 2016 INSTANCE RETRIEVAL INTRODUCTION AND TASK OVERVIEW Wessel Kraaij The

Lead openCypher Language Group (CLG) Neo Technology opencypher.org | opencypher@googlegroups.com

Advancing Analytics: Putting Risk Analytics to Work For Your Business Sponsored By: Advancing

A Network Programming Language Nate Foster, Mike Freedman, Rob Harrison, Chris Monsanto, Jen

Regular Expressions Pattern and Match objects Genome 559: Introduction to Statistical and

Modeling and Analysis of Real -Time Systems with Mutex Components APDCM10 Guoqiang Li 1 ,

802.1 Plenary - 03/2012 Hawaii Opening Agenda General information... See:

Coreference & Entity Linking Prof. Sameer Singh CS 295: STATISTICAL NLP WINTER 2017 March