Natural Language Processing Info 159/259 Lecture 24: Information - PowerPoint PPT Presentation

Natural Language Processing Info 159/259   Lecture 24: Information Extraction (Nov. 15, 2018) David Bamman, UC Berkeley

investigating(SEC, Tesla)

fire(Trump, Sessions)

parent(Mr. Bennet, Jane) https://en.wikipedia.org/wiki/Pride_and_Prejudice

Information extraction • Named entity recognition • Entity linking • Relation extraction

Named entity recognition [tim cook] PER is the ceo of [apple] ORG • Identifying spans of text that correspond to typed entities

Named entity recognition ACE NER categories (+weapon)

Named entity recognition protein • GENIA corpus of MEDLINE abstracts (biomedical) cell line cell type We have shown that [interleukin-1] PROTEIN ([IL-1] PROTEIN ) and [IL-2] PROTEIN control [IL-2 receptor alpha (IL-2R alpha) gene] DNA transcription in [CD4- DNA CD8- murine T lymphocyte precursors] CELL LINE RNA http://www.aclweb.org/anthology/W04-1213

BIO notation B-PERS I-PERS O O O O B-ORG tim cook is the ceo of apple • B eginning of entity • I nside entity • O utside entity [tim cook] PER is the ceo of [apple] ORG

Named entity recognition B-PERS B-PERS After he saw Harry Tom went to the store

Fine-grained NER Giuliano and Gliozzo (2008)

Fine-grained NER

Entity recognition Person … named after [the daughter of a Mattel co-founder] … [The Russian navy] said the submarine was equipped with 24 Organization missiles Fresh snow across [the upper Midwest] on Monday, closing Location schools The [Russian] navy said the submarine was equipped with 24 GPE missiles Fresh snow across the upper Midwest on Monday, closing Facility [schools] The Russian navy said [the submarine] was equipped with 24 Vehicle missiles The Russian navy said the submarine was equipped with [24 Weapon missiles] ACE entity categories https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/english-entities-guidelines-v6.6.pdf

Named entity recognition • Most named entity recognition datasets have flat structure (i.e., non-hierarchical labels). ✔ [The University of California] ORG ✖ [The University of [California] GPE ] ORG • Mostly fine for named entities, but more problematic for general entities: [[John] PER ’s mother] PER said …

Nested NER named after the daughter of a Mattel co-founder B-ORG B-PER I-PER I-PER B-PER I-PER I-PER I-PER I-PER I-PER

Sequence labeling x = { x 1 , . . . , x n } y = { y 1 , . . . , y n } • For a set of inputs x with n sequential time steps, one corresponding label y i for each x i • Model correlations in the labels y.

Sequence labeling • Feature-based models (MEMM, CRF)

Bun Cranncha Dromore West Dromore Youghal Harbour Youghal Bay Gazetteers Youghal Eochaill Yellow River Yellow Furze Woodville Wood View Woodtown House Woodstown • List of place names; more Woodstock House Woodsgift House generally, list of names of some Woodrooff House Woodpark typed category Woodmount Wood Lodge Woodlawn Station • GeoNames (GEO), US SSN Woodlawn Woodlands Station Woodhouse (PER), Getty Thesaurus of Wood Hill Woodfort Geographic Placenames, Getty Woodford River Woodford Thesaurus of Art and Woodfield House Architecture Woodenbridge Junction Station Woodenbridge Woodbrook House Woodbrook Woodbine Hill Wingfield House Windy Harbour Windy Gap

Bidirectional RNN 0.7-1.1-5.4 0.7-1.1-5.4 0.7-1.1-5.4 0.7-1.1-5.4 0.7-1.1-5.4 0.7-1.1-5.4 0.7-1.1-5.4 0.7-1.1-5.4 0.7-1.1-5.4 0.7-1.1-5.4 Jack drove down to LA Jack drove to LA down 2.7 3.1 -1.4 -2.3 0.7 2.7 3.1 -1.4 -2.3 0.7 2.7 3.1 -1.4 -2.3 0.7 2.7 3.1 -1.4 -2.3 0.7 2.7 3.1 -1.4 -2.3 0.7 19

B-PER O O O B-GPE 0.7-1.1-5.4 0.7-1.1-5.4 0.7-1.1-5.4 0.7-1.1-5.4 0.7-1.1-5.4 0.7-1.1-5.4 0.7-1.1-5.4 0.7-1.1-5.4 0.7-1.1-5.4 0.7-1.1-5.4 Jack drove down to LA Jack drove to LA down 2.7 3.1 -1.4 -2.3 0.7 2.7 3.1 -1.4 -2.3 0.7 2.7 3.1 -1.4 -2.3 0.7 2.7 3.1 -1.4 -2.3 0.7 2.7 3.1 -1.4 -2.3 0.7 20

B-PER BiLSTM for each word; concatenate final state of forward LSTM, backward LSTM, and word embedding Obama as representation for a word. Lample et al. (2016), “Neural Architectures for Named Entity Recognition” 4 3 -2 -1 4 9 0 0 0 0 0 0 0 0 0 0 0.7 -1.1 -5.4 0.7 -1.1 -5.4 word embedding o b a m a o b a m a 2.7 3.1 -1.4 -2.3 0.7 2.7 3.1 -1.4 -2.3 0.7 2.7 3.1 -1.4 -2.3 0.7 2.7 3.1 -1.4 -2.3 0.7 2.7 3.1 -1.4 -2.3 0.7 character BiLSTM 21

B-PER Character CNN for each word; concatenate character CNN output and word embedding as representation for a word. Obama Chu et al. (2016), “Named Entity Recognition with Bidirectional LSTM-CNNs” 4 3 -2 -1 4 0 0 0 0 0 0 0 0 0 0 2.7 3.1 -1.4 -2.3 0.7 max pooling convolution word embedding 2.7 3.1 -1.4 -2.3 0.7 2.7 3.1 -1.4 -2.3 0.7 2.7 3.1 -1.4 -2.3 0.7 2.7 3.1 -1.4 -2.3 0.7 character 2.7 3.1 -1.4 -2.3 0.7 2.7 3.1 -1.4 -2.3 0.7 2.7 3.1 -1.4 -2.3 0.7 2.7 3.1 -1.4 -2.3 0.7 2.7 3.1 -1.4 -2.3 0.7 embeddings o b a m a 22

Huang et al. 2015, “Bidirectional LSTM-CRF Models for Sequence Tagging"

Ma and Hovy (2016), “End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF”

Evaluation • We evaluate NER with precision/recall/F1 over typed chunks.

Evaluation 1 2 3 4 5 6 7 tim cook is the CEO of Apple gold B-PER I-PER O O O O B-ORG system B-PER O O O B-PER O B-ORG <start, end, type> gold system Precision 1/3 <1,1,PER> <1,2,PER> <5,5,PER> <7,7,ORG> Recall 1/2 <7,7,ORG>

Entity linking Michael Jordan can dunk from the free throw line B-PER I-PER

Entity linking • Task: Given a database of candidate referents, identify the correct referent for a mention in context.

̂ Learning to rank • Entity linking is often cast as a learning to rank problem: given a mention x, some set of candidate entities 𝓏 (x) for that mention, and context c, select the highest scoring entity from that set. y ∈𝒵 ( x ) Ψ ( y , x , c ) y = arg max Some scoring function over the mention x, candidate y, and context c Eisenstein 2018

Learning to rank • We learn the parameters of the scoring function by minimizing the ranking loss y , y , x , c ) = max ( 0, Ψ ( ̂ y , x , c ) − Ψ ( y , x , c ) + 1 ) ℓ ( ̂ Eisenstein 2018

Learning to rank y , y , x , c ) = max ( 0, Ψ ( ̂ y , x , c ) − Ψ ( y , x , c )+1 ) ℓ ( ̂ We suffer some loss if the predicted entity has a higher score than the true entity y , y , x , c ) = max ( 0, Ψ ( ̂ y , x , c ) − Ψ ( y , x , c ) + 1 ) ℓ ( ̂ You can’t have a negative loss (if the true entity scores way higher than the predicted entity) y , y , x , c ) = max ( 0, Ψ ( ̂ y , x , c ) − Ψ ( y , x , c )+1 ) ℓ ( ̂ The true entity needs to score at least some constant margin better than the prediction; beyond that the higher score doesn’t matter.

Learning to rank Some scoring function Ψ ( y , x , c ) over the mention x, candidate y, and context c feature = f(x,y,c) string similarity between x and y popularity of y NER type(x) = type(y) cosine similarity between c and Wikipedia page for y Ψ ( y , x , c ) = f ( x , y , c ) ⊤ β

Neural learning to rank Parameters measuring the compatibility of Parameters measuring the compatibility of the candidate and mention the candidate and context Ψ ( y , x , c ) = v ⊤ y Θ ( x , y ) x + v ⊤ y Θ ( y , c ) c Embedding   Embedding   Embedding   for candidate for mention for context

Learning to rank • We learn the parameters of the scoring function by minimizing the ranking loss; take the derivative of the loss and backprop using SGD. y , y , x , c ) = max ( 0, Ψ ( ̂ y , x , c ) − Ψ ( y , x , c ) + 1 ) ℓ ( ̂ Eisenstein 2018

Relation extraction subject predicate object The Big Sleep directed_by Howard Hawks The Big Sleep stars Humphrey Bogart The Big Sleep stars Lauren Bacall The Big Sleep screenplay_by William Faulkner The Big Sleep screenplay_by Leigh Brackett The Big Sleep screenplay_by Jules Furthman

Relation extraction ACE relations, SLP3

Relation extraction Unified Medical Language System (UMLS), SLP3

Wikipedia Infoboxes

Regular expressions • Regular expressions are precise ways of extracting high-precisions relations • “NP 1 is a film directed by NP 2 ” → directed_by(NP 1 , NP 2 ) • “NP 1 was the director of NP 2 ” → directed_by(NP 2 , NP 1 )

Hearst patterns pattern sentence temples, treasuries, and other important NP {, NP}* {,} (and|or) other NP H civic buildings NP H such as {NP ,}* {(or|and)} NP red algae such as Gelidium such authors as Herrick, Goldsmith, and such NP H as {NP ,}* {(or|and)} NP Shakespeare common-law countries, including Canada NP H {,} including {NP ,}* {(or|and)} NP and England European countries, especially France, NP H {,} especially {NP}* {(or|and)} NP England, and Spain Hearst 1992; SLP3

Natural Language Processing Info 159/259 Lecture 24: Information - PowerPoint PPT Presentation

Natural Language Processing Info 159/259 Lecture 24: Information Extraction (Nov. 15, 2018) David Bamman, UC Berkeley investigating(SEC, Tesla) fire(Trump, Sessions) parent(Mr. Bennet, Jane)

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Paula

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Information Extraction Industrial Natural Language Processing Industrial Natural Language

Natural Language Processing 1 Lecture 11: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Lecture 10: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Lecture 8: Compositional semantics and discourse processing Katia

Natural Language Processing Fall 2018 Frank Ferraro Natural language processing ITE 358

Natural Language Processing George Konidaris gdk@cs.brown.edu Fall 2019 Natural Language

MIA - Master on Artificial Intelligence Advanced Natural Language Processing Advanced Natural

Advanced Natural Language Processing: What is Natural Language Processing (NLP)? Background

Introduction Karl Stratos Rutgers University Karl Stratos CS 533: Natural Language Processing

Outline of todays lecture Overview of Natural Language Generation Components of Natural

Introduction to Natural Language Processing CMSC 470 Marine Carpuat Natural Language Processing

DISABILITY SERVICES UNIT (DSU) & DISABILITY AWARENESS DISABILITY SERVICES UNIT, OFFICE OF

A Student-run Library and How It Can Be Done with Limited Resources DNG HUYN THM

Fitchs Knowability Paradox and Typing Knowledge Logika: systmov rmec rozvoje oboru v R

Power of one: one eMR for the state, one record for the patient Louise Hayes integrated

Compact Finger Slides September 1, 2011 Cylinder block - Tapped mounting holes Clear anodized

#prep X Carriage JellyBox Build: 05_X Carriage 1.02min In this video, we build the X-carriage.

Imaging with diffraction data on the high-energy beamline for materials engineering Layout I

MYRRHA Technology Development for the realisation of ADS in EU: Current Status & Prospects

Natural Language Processing Info 159/259 Lecture 24: Information - PowerPoint PPT Presentation

Natural Language Processing Info 159/259 Lecture 24: Information Extraction (Nov. 15, 2018) David Bamman, UC Berkeley investigating(SEC, Tesla) fire(Trump, Sessions) parent(Mr. Bennet, Jane)

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Paula

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Information Extraction Industrial Natural Language Processing Industrial Natural Language

Natural Language Processing 1 Lecture 11: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Lecture 10: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Lecture 8: Compositional semantics and discourse processing Katia

Natural Language Processing Fall 2018 Frank Ferraro Natural language processing ITE 358

Natural Language Processing George Konidaris gdk@cs.brown.edu Fall 2019 Natural Language

MIA - Master on Artificial Intelligence Advanced Natural Language Processing Advanced Natural

Advanced Natural Language Processing: What is Natural Language Processing (NLP)? Background

Introduction Karl Stratos Rutgers University Karl Stratos CS 533: Natural Language Processing

Outline of todays lecture Overview of Natural Language Generation Components of Natural

Introduction to Natural Language Processing CMSC 470 Marine Carpuat Natural Language Processing

DISABILITY SERVICES UNIT (DSU) &amp; DISABILITY AWARENESS DISABILITY SERVICES UNIT, OFFICE OF

A Student-run Library and How It Can Be Done with Limited Resources DNG HUYN THM

Fitchs Knowability Paradox and Typing Knowledge Logika: systmov rmec rozvoje oboru v R

Power of one: one eMR for the state, one record for the patient Louise Hayes integrated

Compact Finger Slides September 1, 2011 Cylinder block - Tapped mounting holes Clear anodized

#prep X Carriage JellyBox Build: 05_X Carriage 1.02min In this video, we build the X-carriage.

Imaging with diffraction data on the high-energy beamline for materials engineering Layout I

MYRRHA Technology Development for the realisation of ADS in EU: Current Status &amp; Prospects

DISABILITY SERVICES UNIT (DSU) & DISABILITY AWARENESS DISABILITY SERVICES UNIT, OFFICE OF

MYRRHA Technology Development for the realisation of ADS in EU: Current Status & Prospects