CSCI 699: Machine Learning for Knowledge Extraction and Reasoning - - PowerPoint PPT Presentation

csci 699 machine learning for knowledge extraction and
SMART_READER_LITE
LIVE PREVIEW

CSCI 699: Machine Learning for Knowledge Extraction and Reasoning - - PowerPoint PPT Presentation

CSCI 699: Machine Learning for Knowledge Extraction and Reasoning Instructor: Xiang Ren www-bcf.usc.edu/~xiangren/ml4know19spring USC Computer Science About the Instructor Asst. Professor of Computer Science, affiliated faculty at ISI,


slide-1
SLIDE 1

CSCI 699: Machine Learning for Knowledge Extraction and Reasoning

Instructor: Xiang Ren www-bcf.usc.edu/~xiangren/ml4know19spring USC Computer Science

slide-2
SLIDE 2

About the Instructor

  • Asst. Professor of Computer Science, affiliated

faculty at ISI, PhD@UIUC

  • Research Interest: intersection of machine

learning & NLP (making sense of massive corpora)

  • Problems: information extraction (sequence

labeling, structured prediction), knowledge base construction, knowledge reasoning, graph representation learning

  • Methods: weak & indirect supervision for modeling

sequence & graph-structured data

2

slide-3
SLIDE 3

Self-introduction!

  • Name
  • Background
  • Your research!
  • What bring you to the class J

3

slide-4
SLIDE 4

Course Format

  • Lectures on basic concepts, prior & current

methods

  • Presentation on recent research papers
  • Projects on novel information extraction

methods/applications

  • Assignments on model implementation

4

slide-5
SLIDE 5

Schedule

  • First 5 weeks – 3 hrs lectures only
  • Next 8 weeks – 1.5 hrs lectures + 3 paper

presentation (30min each, including Q&A)

  • Including 3 guest lectures
  • Last week – project presentation (20 min each

including Q&A)

5

slide-6
SLIDE 6

10 mins Break

  • 3-3:10pm
  • 4:10-4:20pm

6

slide-7
SLIDE 7

Homework Assignments (2x)

  • On practicing implementation skills on

information extraction techniques

  • Named entity recognition
  • Relation extraction
  • Python is preferred (C++ also accepted)
  • Report & GitHub repo with code/README
  • 10% of grade each

7

slide-8
SLIDE 8

Paper Presentation

  • Each student pick one paper from the reading

list (will follow up with a Google spreadsheet)

  • You can also suggest papers you want to

present!

  • Prepare a 30 min whiteboard or slides

presentation, including 3-5 min of Q&A

8

slide-9
SLIDE 9

Project

Come up with a novel model/application of information extraction (in your own domain), conduct experiments to validate your idea.

  • Project proposal (~500 words, PDF), due Week 3
  • Check-point 1 (15%): survey paper (2-page, double-

column), due Week 4

  • Check-point 2 (15%): mid-term report (3-page, double-

column), due Week 10

  • Project presentation: 20 min presentation including Q&A,

Week 15

  • Check-point 3 (35%): final report (8-page, double-column,

including GitHub repo), due Dec 12.

9

slide-10
SLIDE 10

Evaluation

  • Homework assignments (20%): 2x 10%
  • Project (80%):
  • Project survey (15%)
  • Project mid-term report (15%)
  • Project presentation (15%)
  • Project final report (35%)

10

slide-11
SLIDE 11

Logistics

  • Instructor: Xiang Ren
  • Email: xiangren@usc.edu
  • Office: SAL 308
  • Office hour: by appointment
  • TA: TBD
  • Course website
  • www-bcf.usc.edu/~xiangren/ml4know19spring
  • Homework submission (PDF and hyperlinks):
  • Blackboard (or by emails if there’s issue uploading)

11

slide-12
SLIDE 12

Today’s Lecture

  • Overview of Knowledge Extraction &

Reasoning: tasks, methods, and applications

  • Overview of my research on Effort-Light

Knowledge Extraction

12

slide-13
SLIDE 13

Knowledge Extraction and Reasoning: An Overview

CSCI 699: Machine Learning for Knowledge Extraction & Reasoning

Instructor: Xiang Ren USC Computer Science

slide-14
SLIDE 14

The Era of Big Data

14

slide-15
SLIDE 15

15

Growth of Unstructured Text Data

Unstructured data, primarily text, account for more than 80% of the data collected by organizations Structured Data

Unstructured Data and the 80 Percent Rule, Seth Grimes, Clarabridge Bridgepoints, 2008 Q3.

slide-16
SLIDE 16

Knowledge in “Big Text Data”

16

News Social media post Web pages … Financial reports Medical records Legal acts … Customer reviews Tech support memos Field service notes

Get overview of recent news events Obtain insights from data for decision support Summarize user feedbacks for quality control

… … …

slide-17
SLIDE 17

Text is accessible, but knowledge in text is not machine-readable …

17

slide-18
SLIDE 18

Structures Bring Analytic Power

18

Databases Networks

Database management, exploration & analysis

Insights & Knowledge

slide-19
SLIDE 19

Turning Unstructured Text Data into Structures

19

Unstructured Text Data

(account for ~80% of all data in organizations)

Knowledge & Insights

?

Structures

(Chakraborty, 2016)

slide-20
SLIDE 20

Information Extraction

  • Can computational systems extract structured,

factual information from unstructured or semi-structured data, and represent them in a machine-readable form?

20

?

slide-21
SLIDE 21

Entity Structures

21

Can computational systems identify real- world entities of different categories from given corpora?

Criticism of government response to the hurricane …

text corpus

… New Orleans Louisiana

Washington DC

... Ray Nagin Mayor President Bush ...

Organization Person Location

United States Red Cross US government ...

slide-22
SLIDE 22

Relation Structures

22

Can computational systems capture different relations between the entities from given corpora?

American Airlines, a unit of AMR corp., immediately matched the move, spokesman Tim Wagner

  • said. United Airlines, a unit of UAL

corp., said the increase took effect Thursday night

text corpus

Entity 1 Relation Entity 2 American Airlines is_subsidiary_of AMR Tim Wagner is_employee_of American Airlines United Airlines is_subsidiary_of UAL … … …

slide-23
SLIDE 23

Event Structures

23

Can computational systems identify real- world event of different types from given corpora?

Criticism of government response to the hurricane …

text corpus

… 07 JAN 90 ROBBERY

LOCATION TYPE Date

CHILE: MOLINA (CITY)

Terrorism Template

slide-24
SLIDE 24

Filling slots in a database from sub-segments of text.

As a task:

October 14, 2002, 4:00 a.m. PT For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open-source concept, by which software code is made public to encourage improvement and development by outside

  • programmers. Gates himself says Microsoft will gladly

disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access.“ Richard Stallman, founder of the Free Software Foundation, countered saying… NAME TITLE ORGANIZATION

What is “Information Extraction”

slide-25
SLIDE 25

What is “Information Extraction”

Filling slots in a database from sub-segments of text.

As a task:

October 14, 2002, 4:00 a.m. PT For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open-source concept, by which software code is made public to encourage improvement and development by outside

  • programmers. Gates himself says Microsoft will gladly

disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access.“ Richard Stallman, founder of the Free Software Foundation, countered saying… NAME TITLE ORGANIZATION Bill Gates CEO Microsoft Bill Veghte VP Microsoft Richard Stallman founder Free Soft..

IE

slide-26
SLIDE 26

Information Extraction = segmentation + classification + clustering + association

As a family

  • f techniques:

October 14, 2002, 4:00 a.m. PT For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open-source concept, by which software code is made public to encourage improvement and development by outside

  • programmers. Gates himself says Microsoft will gladly

disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access.“ Richard Stallman, founder of the Free Software Foundation, countered saying…

Microsoft Corporation CEO Bill Gates Microsoft Gates Microsoft Bill Veghte Microsoft VP Richard Stallman founder Free Software Foundation aka “named entity extraction”

What is “Information Extraction”

slide-27
SLIDE 27

What is “Information Extraction”

Information Extraction = segmentation + classification + association + clustering

As a family

  • f techniques:

October 14, 2002, 4:00 a.m. PT For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open-source concept, by which software code is made public to encourage improvement and development by outside

  • programmers. Gates himself says Microsoft will gladly

disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access.“ Richard Stallman, founder of the Free Software Foundation, countered saying…

Microsoft Corporation CEO Bill Gates Microsoft Gates Microsoft Bill Veghte Microsoft VP Richard Stallman founder Free Software Foundation

slide-28
SLIDE 28

What is “Information Extraction”

Information Extraction = segmentation + classification + association + clustering

As a family

  • f techniques:

October 14, 2002, 4:00 a.m. PT For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open-source concept, by which software code is made public to encourage improvement and development by outside

  • programmers. Gates himself says Microsoft will gladly

disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access.“ Richard Stallman, founder of the Free Software Foundation, countered saying…

Microsoft Corporation CEO Bill Gates Microsoft Gates Microsoft Bill Veghte Microsoft VP Richard Stallman founder Free Software Foundation

slide-29
SLIDE 29

Information Extraction = segmentation + classification + association + clustering

As a family

  • f techniques:

October 14, 2002, 4:00 a.m. PT For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open-source concept, by which software code is made public to encourage improvement and development by outside

  • programmers. Gates himself says Microsoft will gladly

disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access.“ Richard Stallman, founder of the Free Software Foundation, countered saying…

Microsoft Corporation CEO Bill Gates Microsoft Gates Microsoft Bill Veghte Microsoft VP Richard Stallman founder Free Software Foundation

NAME TITLE ORGANIZATION Bill Gates CEO Microsoft Bill Veghte VP Microsoft Richard Stallman founder Free Soft..

* * * *

What is “Information Extraction”

slide-30
SLIDE 30

StructNet: Structured Network of Factual Knowledge

30

news article

agency person location

  • rganization

URL

  • Nodes: entities of different entity types
  • Edges: relationships of different relation types
slide-31
SLIDE 31

1 2 3 4 5 6 7 broadway shows beacon theater broadway dance center broadway plays david letterman show radio city music hall theatre shows 1 2 3 4 5 6 7 high line park chelsea market highline walkway elevated park meatpacking district west side

  • ld railway

A Product Use Case: Finding “Interesting Collections of Hotels”

31 http://engineering.tripadvisor.com/using-nlp-to-find-interesting-collections-of-hotels/

Technology Transfer to TripAdvisor

Features for “Catch a Show” collection

Features for “Near The High Line” collection

Grouping hotels based on structured facts extracted from the review text

slide-32
SLIDE 32

Better Question Answering with Reasoning Capability

32

slide-33
SLIDE 33

A Life Science Use Case: Identifying “Distinctively Related Entities”

33

Collaborate with UCLA Heart BD2K Center & Mayo Clinic

What proteins are distinctively associated with Cardiomyopathy?

http://www.igb.illinois.edu/news/harnessing-power-big-data-revolution-genomic-data-analysis

slide-34
SLIDE 34

Citation prediction for scientific papers

34

Paper titles, abstracts & bibliographic data FacetGist [CIKM’16]

paper

technique dataset application author venue

ClusCite: Citation Recommendation by Information Network- Based Clustering [KDD’14] A new manuscript Suggested papers to cite: 0.8 0.65 0.52

slide-35
SLIDE 35

Corpus-specific StructNet Construction

35

news article

agency person location

  • rganization

URL

How to automate the construction of StructNets from given text corpora?

?

slide-36
SLIDE 36

IE History

Pre-Web

  • Mostly news articles
  • De Jong’s FRUMP [1982]
  • Hand-built system to fill Schank-style “scripts” from news wire
  • Message Understanding Conference (MUC) DARPA [’87-’95], TIPSTER [’92-’96]
  • Early work dominated by hand-built models
  • E.g. SRI’s FASTUS, hand-built FSMs.
  • But by 1990’s, some machine learning: Lehnert, Cardie, Grishman and then

HMMs: Elkan [Leek ’97], BBN [Bikel et al ’98]

Web

  • AAAI ’94 Spring Symposium on “Software Agents”
  • Much discussion of ML applied to Web. Maes, Mitchell, Etzioni.
  • Tom Mitchell’s WebKB, ‘96
  • Build KB’s from the Web.
  • Wrapper Induction
  • Initially hand-build, then ML: [Soderland ’96], [Kushmeric ’97],…
  • Citeseer; Cora; FlipDog; contEd courses, corpInfo, …
slide-37
SLIDE 37

IE History

Biology

  • Gene/protein entity extraction
  • Protein/protein fact interaction
  • Automated curation/integration of databases
  • At CMU: SLIF (Murphy et al, subcellular information from

images + text in journal articles)

Email

  • EPCA, PAL, RADAR, CALO: intelligent office

assistant that “understands” some part of email

  • At CMU: web site update requests, office-space requests;

calendar scheduling requests; social network analysis of email.

slide-38
SLIDE 38

www.apple.com/retail

IE is different in different domains!

Example: on web there is less grammar, but more formatting & linking

The directory structure, link structure, formatting & layout of the Web is its own new grammar. Apple to Open Its First Retail Store in New York City

MACWORLD EXPO, NEW YORK--July 17, 2002--Apple's first retail store in New York City will open in Manhattan's SoHo district on Thursday, July 18 at 8:00 a.m. EDT. The SoHo store will be Apple's largest retail store to date and is a stunning example of Apple's commitment to offering customers the world's best computer shopping experience. "Fourteen months after opening our first retail store, our 31 stores are attracting over 100,000 visitors each week," said Steve Jobs, Apple's CEO. "We hope our SoHo store will surprise and delight both Mac and PC users who want to see everything the Mac can do to enhance their digital lifestyles."

www.apple.com/retail/soho www.apple.com/retail/soho/theatre.html

Newswire Web

slide-39
SLIDE 39

Landscape of IE Tasks (1/4):

Degree of Formatting

Text paragraphs without formatting Grammatical sentences and some formatting & links Non-grammatical snippets, rich formatting & links Tables

Astro Teller is the CEO and co-founder of

  • BodyMedia. Astro holds a Ph.D. in Artificial

Intelligence from Carnegie Mellon University, where he was inducted as a national Hertz fellow. His M.S. in symbolic and heuristic computation and B.S. in computer science are from Stanford

  • University. His work in science, literature and

business has appeared in international media from the New York Times to CNN to NPR.

slide-40
SLIDE 40

Landscape of IE Tasks (2/4):

Intended Breadth of Coverage

Web site specific Genre specific Wide, non-specific

Amazon.com Book Pages Resumes University Names Formatting Layout Language

slide-41
SLIDE 41

Landscape of IE Tasks (3/4):

Complexity

Closed set He was born in Alabama… Regular set Phone: (413) 545-1323 Complex pattern University of Arkansas P.O. Box 140 Hope, AR 71802 …was among the six houses sold by Hope Feldman that year. Ambiguous patterns, needing context and many sources of evidence The CALD main office can be reached at 412-268-1299 The big Wyoming sky…

U.S. states U.S. phone numbers U.S. postal addresses Person names

Headquarters: 1128 Main Street, 4th Floor Cincinnati, Ohio 45210 Pawel Opalinski, Software Engineer at WhizBang Labs.

E.g. word patterns:

slide-42
SLIDE 42

IE: The Broader View

Create ontology

Segment Classify Associate Cluster

Load DB Spider Query, Search Data mine

IE

Document collection Database Filter by relevance Label training data Train extraction models

slide-43
SLIDE 43

Knowledge Graphs are Not Complete

43 Band of Brothers Mini- Series HBO tvProgramCreator tvProgramGenre Graham Yost writtenBy music United States countryOfOrigin Neal McDonough nationality-1 English Tom Hanks awardWorkWinner castActor

...

profession personLanguages personLanguages Caesars Entertain… serviceLocation-1 serviceLanguage Actor c

  • u

n t r y S p

  • k

e n I n

  • 1

Michael Kamen

slide-44
SLIDE 44

Benefits of Knowledge Graph

  • Support various applications
  • Structured Search
  • Question Answering
  • Dialogue Systems
  • Relation Extraction
  • Summarization
  • Knowledge Graphs can be constructed via

information extraction from text, but…

  • There will be a lot of missing links.
  • Goal: complete the knowledge graph.

44

slide-45
SLIDE 45

Reasoning on Knowledge Graph

Query node: Band of brothers Query relation: tvProgramLanguage tvProgramLanguage(Band of Brothers, ?)

45 Band of Brothers Mini- Series HBO tvProgramCreator tvProgramGenre Graham Yost writtenBy music United States countryOfOrigin Neal McDonough nationality-1 English Tom Hanks awardWorkWinner castActor

...

profession personLanguages personLanguages Caesars Entertain… serviceLocation-1 serviceLanguage Actor c

  • u

n t r y S p

  • k

e n I n

  • 1

Michael Kamen

slide-46
SLIDE 46

KB Reasoning Tasks

  • Predicting the missing link.
  • Given e1 and e2, predict the relation r.
  • Predicting the missing entity.
  • Given e1 and relation r, predict the missing entity e2.
  • Fact Prediction.
  • Given a triple, predict whether it is true or false.

46

slide-47
SLIDE 47

Knowledge Base Reasoning

  • Question: can we infer missing links based on background KB?
  • Path-based methods
  • Path-Ranking Algorithm (PRA), Lao et al. 2011
  • RNN + PRA, Neelakantan et al, 2015
  • Chains of Reasoning, Das et al, 2017
  • Embedding-based methods
  • RESCAL, Nickel et al., 2011
  • TransE, Bordes et al, 2013
  • TransR/CTransR, Lin et al, 2015
  • Integrating Path and Embedding-Based Methods
  • DeepPath, Xiong et al, 2017
  • MINERVA, Das et al, 2018
  • DIVA, Chen et al., 2018

47

slide-48
SLIDE 48

Traditional Rule-Based Systems

48

… cities such as NPList …

New York Los Angeles Dallas

City

“The tour includes major cities such as [New York], [Los Angeles], and [Dallas]”

Extraction rules Domain experts Text corpus handcraft

NPList[0] NPList[1] …

City

Text Entities Extraction rules

slide-49
SLIDE 49

Supervised Machine Learning- Based Systems (state-of-the-art)

49

[San Francisco], in northern California, is a hilly city on the tip of a peninsula.

Features

Machine-learning model Domain experts

Training data

Manual annotation Feature engineering

slide-50
SLIDE 50

Effort-Light Knowledge Extraction

CSCI 699: Introduction to Information Extraction

Instructor: Xiang Ren USC Computer Science

slide-51
SLIDE 51

Text data often are highly variable…

  • Domain
  • CS papers ßà biomedical papers
  • Genre
  • News articles ßà tweets
  • Language
  • English ßà Arabic

51

(Grammar, vocabularies, gazetteers)

slide-52
SLIDE 52

However, text data often are highly variable…

52

Domain experts

English News Arabic web forum posts Lift science literature

Manual data annotation & complex feature generation

  • Low efficiency
  • Subjective
  • Costly
  • Limited scale
slide-53
SLIDE 53

Prior Art in NLP: Extracting Structures with Repeated Human Effort

53

This hotel is my favorite Hilton property in NYC! It is located right on 42nd street near Times Square, it is close to all subways, Broadways shows, and next to many great …

… The June 2013 Egyptian protest were mass protest event that occurred in Egypt on 30 June 2013, …

Human labeling … We had a room facing Tim Times Square and a room facing the Em Empire St State Bu Building, The location is close to everything and we love … Extraction Rules Machine-Learning Models Broadways shows NYC Hilton property Labeled data Text Corpus

St Stanford Co CoreNLP CM CMU NE NELL LL UW UW Kn KnowItAll US USC AM AMR IB IBM Alc lchem emy APIs Is

Go Google Kn Knowledge Gr Graph

Mi Microsoft Sa Satori …

Structured Facts Times square hotel

slide-54
SLIDE 54

Our Research: Effort-Light StructMine

54

Corpus-specific Models Text Corpus Entity & Relation Structures

  • Enables quick development of applications over various corpora
  • Extracts complex structures without introducing human errors

News articles PubMed papers

Knowledge Bases (KB)

slide-55
SLIDE 55

External Knowledge Bases as “Distant Supervision”

55

Text corpus Knowledge bases (KBs)

Overlapping factual information: entity names, entity types, relationships, etc. 1% of 10M sentences à 100K labeled sentences!

slide-56
SLIDE 56

External Knowledge Bases as “Distant Supervision”

56

Text corpus Knowledge bases (KBs)

Overlapping factual structures: entity names, entity types, relationships, etc.

slide-57
SLIDE 57

Co-occurrence patterns between text units bring semantic power

57

United States speech president Barack Obama politician

Low-dimensional semantic space

“… a speech was delivered by United States President Barack Obama.” “President Vladimir Putin delivers a speech during …”

Training corpus

Prediction

  • n unseen

entities

slide-58
SLIDE 58

A Cold-Start Factual Structure Mining (StructMine) Framework

58

Text corpus

Data-driven text segmentation

Candidate factual structures & text units/features

Distant supervision

Partially- labeled training corpus

Learn semantic spaces

Extract factual structures from the remaining unlabeled corpus

slide-59
SLIDE 59

Effort–Light StructMine: Where Are We?

59

Hum Human an la labelin ling ef effort Fe Feature en engineer eering ef effort We Weakly-su supervise sed le lear arnin ing me methods Ha Hand-cr crafted me methods Su Supervised le lear arnin ing me methods Di Distantly-su supervise sed lea earning me methods

CMU NELL, 2009 - present UW KnowItAll, Open IE, 2005 - present Max-Planck YAGO, 2008 - present Stanford CoreNLP, 2005 - present UT Austin Dependency Kernel, 2005 IBM Watson Language APIs UCB Hearst Pattern, 1992 NYU Proteus, 1997 Stanford: Snorkel, MIML-RE 2012 - present U Washington: FIGER, MultiR, 2012

Effort-Light StructMine

(KDD’15, 16, 17, WWW’15, 17, 18, EMNLP’16, 17…)

slide-60
SLIDE 60

The Roadmap for Corpus-Specific StructNet Construction

60

document

Text corpus Entity Recognition and Typing (KDD’15)

document document

Fine-grained Entity Typing (EMNLP’16) Joint Entity and Relation Extraction (WWW’17)

Corpus-specific StructNet

slide-61
SLIDE 61

Outline

  • Introduction
  • Challenges & Approach
  • Entity Recognition and Typing (KDD’15)
  • Fine-grained Entity Typing
  • Joint Entity and Relation Extraction
  • Future Work
  • Summary

61

slide-62
SLIDE 62

62

What is Entity Recognition and Typing

  • Identify token spans of entity mentions in text, and

classify them into types of interest

[Barack Obama] arrived this afternoon in [Washington, D.C.]. [President Obama]’s wife [Michelle] accompanied him [TNF alpha] is produced chiefly by activated [macrophages]

slide-63
SLIDE 63

63

What is Entity Recognition and Typing

  • Identify token spans of entity mentions in text, and

classify them into types of interest

[TNF alpha] is produced chiefly by activated [macrophages] [Barack Obama] arrived this afternoon in [Washington, D.C.]. [President Obama]’s wife [Michelle] accompanied him

PERSON LOCATION PROTEIN CELL

slide-64
SLIDE 64

Traditional Named Entity Recognition (NER) Systems

  • Reliance on large amounts of manually-annotated data
  • Slow model training: often slower than O(#word #features #classes)

64

A manual annotation interface A NER system pipeline

Finkel et al., Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling, ACL 2005

slide-65
SLIDE 65

Weak-Supervision Systems: Pattern-Based Bootstrapping

  • Requires manual seed selection & mid-point checking

65

Annotate corpus using entities Generate candidate patterns Score candidate patterns Select Top patterns Apply patterns to find new entities

Se Seeds fo for Food

Pi Pizza Fr French Fr Fries Ho Hot Do Dog Panc ancak ake

.. ...

Seed entities and corpus Patterns for Food th the be best <X <X> I’ I’ve tr tried in in th their <X <X> ta tastes am amaz azing ing …

e.g., (Etzioni et al., 2005), (Talukdar et al., 2010), (Gupta et al., 2014), (Mitchell et al., 2015), …

Systems: CMU NELL UW KnowItAll Stanford DeepDive Max-Planck PROSPERA …

slide-66
SLIDE 66

66

Leveraging Distant Supervision

1. 1. Detec ect entity names from text 2. 2. Ma Match name strings to KB entities 3. 3. Pr Propagate te types to the un-matchable names

ID

Sentence

S1

Ph Phoenix is my all-time favorite dive bar in Ne New Yo York Ci City .

S2

The best BBQ BBQ I’ve tasted in Ph Phoenix.

S3

Ph Phoenix has become one of my favorite bars in NY NY . BBQ BBQ Ph Phoen enix NY NY ta tasted in in has has be become me on

  • ne of
  • f

my my fa favorite bar bars in in Location Ne New Yo York Ci City ??? ??? Food is is my my all all-ti time fa favorite div dive bar bar in in Location à

(Lin et al., 2012), (Ling et al., 2012), (Nakashole et al., 2013)

slide-67
SLIDE 67

ID Sentence S1 Ph Phoenix is my all-time favorite dive bar in Ne New Yo York Ci City . S2 The best BBQ BBQ I’ve tasted in Ph Phoenix. S3 Ph Phoenix has become one of my favorite bars in NY NY .

Current Distant Supervision: Limitation I

  • 1. Context-agnostic type prediction
  • Predict types for each mention regardless of context
  • 2. Sparsity of contextual bridges

67

slide-68
SLIDE 68

Current Distant Supervision: Limitation II

  • 1. Context-agnostic type prediction
  • 2. Sparsity of contextual bridges
  • Some re

relational phr phrases es are in infr frequent in the corpus à ineffective type propagation

68

ID Sentence S1

Ph Phoenix is my all-time favorite dive bar in Ne New Yo York Ci City .

S3

Ph Phoenix has become one of my favorite bars in NY NY .

slide-69
SLIDE 69

69

My Solution: ClusType (KDD’15)

BBQ BBQ NY NY ta tasted in in has has be become me on

  • ne of
  • f

my my fa favorite bar bars in in Ne New Yo York Ci City is is my all all-ti time fa favorite dive bar in

ID Segmented Sentences

S1

Ph Phoenix is is my my all all-ti time fa favorite di dive ba bar in in Ne New Yo York Ci City .

S2

The best BBQ BBQ I’ve ta tasted in in Ph Phoenix.

S3

Ph Phoenix ha has be become on

  • ne of
  • f my

my fa favorite ba bars in in NY NY .

S2 S2: BBQ BBQ

S3 S3: NY NY S1 S1: Ne New Yo York Ci City

S2 S2: Ph Phoen enix S3 S3: Ph Phoen enix

Pu Putting two su sub- ta tasks to toge gether: 1. Type label propagation 2. Relation phrase clustering

Similar relation phrases

Correlated mentions

Ph Phoen enix S1 S1: Ph Phoen enix

Represents

  • bject

interactions

https://github.com/shanzhenren/ClusType

slide-70
SLIDE 70

70

Type Propagation in ClusType

Sm Smoothness Assumption If two objects are similar according to the graph, then their type labels should be also similar

Ed Edge we weight / ob

  • bject si

similarity

BBQ BBQ NY NY ta tasted in in has has be become me on

  • ne of
  • f

my my fa favorite bar bars in in Ne New Yo York Ci City is is my all all-ti time fa favorite dive bar in Ph Phoen enix

fi fj

S3 S3: Ph Phoen enix S1 S1: Ph Phoen enix

Wij

(Belkin & Partha, NIPS’01), (Ren et al., KDD’15)

slide-71
SLIDE 71

71

Relation Phrase Clustering in ClusType

NY NY has has be become me on

  • ne of
  • f

my my fa favorite bar bars in in Ne New Yo York Ci City is is my all all-ti time fa favorite div dive bar bar in in Ph Phoen enix Similar relation phrases Location ??? Location à

(Ren et al., KDD’15)

Two subtasks mutually enhance each other

  • Two relation phrases should be grouped together if:

1. Similar string 2. Similar context 3. Similar types for entity arguments

“Multi-view” clustering 5 102

slide-72
SLIDE 72

72

ClusType: Comparing with State-of-the-Art Systems (F1 Score)

Me Methods NY NYT Ye Yelp Tw Tweet

Pa Pattern (Stanford, CONLL’14) 0.301 0.199 0.223 Se SemT mTagger (U Utah, ACL’10) 0.407 0.296 0.236 NNP NNPLB (UW, EMNLP’12) 0.637 0.511 0.246 AP APOLLO (THU, CIKM’12) 0.795 0.283 0.188 FI FIGER ER (UW, AAAI’12) 0.881 0.198 0.308 Cl Clus usType pe (KDD’15) 0. 0.939 939 0. 0.808 808 0. 0.451 451

Precision (P) =

#"#$$%&'()*')+%, -%.'/#.0 #1)0'%-*$%&#2./3%, -%.'/#.0 , Recall (R) = #"#$$%&'()*')+%, -%.'/#.0 #2$#4.,*'$4'5 -%.'/#.0 , F1 score = 6 7×9 (7;9)

Bootstrapping Label propagation Classifier with linguistic features

NYT: 118k news articles (1k manually labeled for evaluation); Yelp: 230k business reviews (2.5k reviews are manually labeled for evaluation); Tweet: 302 tweets (3k tweets are manually labeled for evaluation)

  • vs. bo

boots tstr trappi pping ng: context-aware prediction on “un-matchable”

  • vs
  • vs. lab

label pr propa pagati tion: group similar relation phrases

  • vs
  • vs. FI

FIGER: no reliance on complex feature engineering

https://github.com/shanzhenren/ClusType

slide-73
SLIDE 73

Outline

  • Introduction
  • Challenges & Approach
  • Entity Recognition and Typing
  • Fine-grained Entity Typing (EMNLP’16)
  • Joint Entity and Relation Extraction
  • Summary and Future Work

73

slide-74
SLIDE 74

74

From Coarse-Grained Typing to Fine-Grained Entity Typing

ID Sentence S1

Don

  • nald Tr

Trump spent 14 television seasons presiding

  • ver a game show, NBC’s The Apprentice.

Person

Location Organization

root product person location

  • rganiz

ation

... ...

politician artist business man

... ... ...

author actor singer

... ... ...

A few common types A type hierarchy with 100+ types (from knowledge base)

(Ling et al., 2012), (Nakashole et al., 2013), (Yogatama et al., 2015)

slide-75
SLIDE 75

Problem Statement

75

Text corpus

  • How to learn an effective model to predict a single

type-path for each unlinkable entity mentions, using the automatically-labeled training corpus

  • ?

? ?

Labeled corpus

root product person location

  • rganiz

ation

... ...

politician artist business man

... ... ...

author actor singer

... ... ...

  • Typing model

Predictions for unlinkable mentions NER + Distant Supervision

slide-76
SLIDE 76

Current Distant Supervision: Context-Agnostic Labeling

76

root person location

  • rganization

politician artist businessman author actor singer

...

En Enti tity ty ty types fr from

  • m

kn knowledge bas base

Entity: Donald Trump S1 S1: Do Donal nald Tru rump Entity Types: pe person, ar artis ist, act ctor, author, businessman, politician

ID Sentence S1

Do Donald Trum ump spent 14 television seasons presiding over a game show, NBC’s The Apprentice

  • Inaccurate labels in training data
  • Prior work: all labels are “perfect”
slide-77
SLIDE 77

ro root pr produc duct pe person lo locat atio ion

  • r
  • rganization
  • n

po politician ar artis ist bus busine nessman au author ac actor si singer

... ...

ID Sentence Si President Trump gave an all-hands address to troops at the U.S. Central Command headquarters …

+ + +

Pr President ga gave spe speech Vectors for text features Test mention: Si_T _Trump

  • Top-down nearest neighbor search

in the given type hierarchy

pr preside dent po politician pe person ac actor sen senator

  • r

ga gave ad address st star pl play

Low-dimensional vector space

Type Inference in PLE

Type hierarchy (in knowledge base) (Ren et al., KDD’16)

slide-78
SLIDE 78

My Solution: Partial Label Embedding (KDD’16)

78

“De-noised” labeled data

ID

Sentence

S1

Donald Trump spent 14 television seasons presiding

  • ver a game show, NBC’s The Apprentice

Extract Text Features “Label Noise Reduction” with PLE Train Classifiers

  • n De-noised Data

Prediction

  • n New Data

S1 S1: Do Donal nald Tru rump Entity Types: person, artist, actor, author, businessman, politician

Text features: TOKEN_Donald, CONTEXT: television, CONTEXT: season, TOKEN_trump, SHAPE: AA

More effective classifiers

(Ren et al., KDD’16) https://github.com/shanzhenren/PLE

slide-79
SLIDE 79

PLE: Modeling Clean and Noisy Mentions Separately

79

For a clean mention, its “positive types” should be ranked higher than all its “negative types” For a noisy mention, its “best candidate type” should be ranked higher than all its “non-candidate types”

S1 S1: Do Donald Trum ump Types in KB: person, artist, actor, author, businessman, politician

ID

Noisy Entity Mention S1

Donald Trump spent 14 television seasons presiding over a game show, NBC’s The Apprentice

(+) actor 0.88 (+ (+) artist 0.74 (+) person 0.55 (+) author 0.41 (+) politician 0.33 (+) business 0.31

“Best” candidate type

(+) actor (-) singer (-) coach (-) doctor (-) location (-) organization

Types ranked

(Ren et al., KDD’16)

Si: Te Ted Cr Cruz Types in KB: person, politician

slide-80
SLIDE 80

ro root pr produc duct pe person lo locat atio ion

  • r
  • rganization
  • n

po politician ar artis ist bus busine nessman au author ac actor si singer

... ...

ID Sentence Si President Trump gave an all-hands address to troops at the U.S. Central Command headquarters …

+ + +

Pr President ga gave spe speech Vectors for text features Test mention: Si_T _Trump

  • Top-down nearest neighbor search

in the given type hierarchy

pr preside dent po politician pe person ac actor sen senator

  • r

ga gave ad address st star pl play

Low-dimensional vector space

Type Inference in PLE

Type hierarchy (from knowledge base) (Ren et al., KDD’16)

slide-81
SLIDE 81

PLE: Performance of Fine-Grained Entity Typing

81

  • Ra

Raw: candidate types from distant supervision

  • WS

WSABIE (Google, ACL’15): joint feature and type embedding

  • Predictive Text Embedding

(MSR, WWW’15): joint mention,

feature and type embedding

  • Both WASBIE and PTE suffer

from “noisy” training labels

  • PL

PLE (KDD’16): partial-label loss for context-aware labeling

0.7 0.45 0.05 0.79 0.49 0.14 0.78 0.51 0.19 0.81 0.62 0.48

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Level-1 Level-2 Level-3 Accuracy on different type levels

Raw WSABIE PTE PLE

Accuracy = # "#$%&'$( )&%* +,, %-.#( /'00#/%,- .0#1&/%#1

# "#$%&'$( &$ %*# %#(% (#%

On OntoNotes public dataset (Weischedel et al. 2011, Gillick et al., 2014): 13,109 news articles, 77 annotated documents, 89 entity types

https://github.com/shanzhenren/PLE

slide-82
SLIDE 82

Outline

  • Introduction
  • Challenges & Approach
  • Entity Recognition and Typing
  • Fine-grained Entity Typing
  • Joint Entity and Relation Extraction (WWW’17)
  • Summary and Future Work

82

slide-83
SLIDE 83

Problem Statement

83

Am American Ai Airlines, a unit

  • f AM

AMR co corp., immediately matched the move, spokesman Ti Tim Wagner

  • said. Un

Unit ited ed Ai Airlines, a unit

  • f UA

UAL co corp., said the increase took effect Thursday night and applies to most routes ...

Input corpus

Entity 1 Relation Entity 2 American Airlines is_subsidiary_of AMR Tim Wagner is_employee_of American Airlines United Airlines is_subsidiary_of UAL … … …

Extracted entity-relation mentions

Person Location

slide-84
SLIDE 84

Previous Work

  • Supervised relation extraction (RE) systems
  • Hard to be ported to deal with different kinds of corpora
  • Pattern-based bootstrapping RE systems
  • Focus on “explicit” relation mentions à limited recall
  • Semantic drift
  • Distantly-supervised RE systems
  • Error propagation (cont.)

84 Mintz et al. Distant supervision for relation extraction without labeled data. ACL, 2009. Etzioni et al. Web-scale information extraction in knowitall:(preliminary results). WWW, 2004. Surdeanu et al. Multi-instance multi-label learning for relation extraction. EMNLP, 2012.

slide-85
SLIDE 85

Prior Work: An “Incremental” System Pipeline

85

Entity mention detection Context-aware entity typing Relation mention detection Context-aware relation typing

(wom

  • men, pro

protest) à (protes est, Ja January 21, 21, 2017 2017) The Wo Women ’s March was a worldwide pro protest on Januar January 21, 21, 2017 2017.

Entity boundary errors:

(wom

  • men, pro

protest) ✗ (pro protest, Januar January 21, 21, 2017 2017)

Relation mention errors:

is a ✗ The Wo Women ’s March was a worldwide pro protest on Januar January 21, 21, 2017

  • 2017. à

Entity type errors: person

(wom

  • men, pro

protest) à (protes est, Ja January 21, 21, 2017 2017)

Relation type errors

is a ✗

Error propagation cascading down the pipeline

(Mintz et al., 2009), (Riedel et al., 2010), (Hoffmann et al., 2011), (Surdeanu et al., 2012), (Nagesh et al., 2014), …

slide-86
SLIDE 86
  • Con

Context-aw aware type modeling

  • Model en

entity-re relation in interac actio ions ns

My Solution: CoType (WWW’17)

86

(Ren et al. WWW’17)

  • 1. Data-driven detection of

entity and relation mentions

  • Data-driven text segmentation
  • Syntactic pattern learning from KBs
  • 2. Joint typing of entity and

relation mentions

Entity mention detection Context-aware entity typing Relation mention detection Context-aware relation typing

https://github.com/shanzhenren/CoType

slide-87
SLIDE 87

87

My Solution: Data-Driven Entity Mention Detection

Corpus-level Concordance Syntactic quality Quality

  • f merging
  • Significance of a merging between two sub-phrases

Pattern Example

(J (J*)N )N* support vector machine VP VP tasted in, damage on VW VW*(P (P) train a classifier with

Good Concordance

The best BBQ BBQ I’ve ta tasted in Ph Phoenix ! I ha had the pulle pulled d po pork sandw sandwic ich h wi with th co coleslaw and bak baked d be beans ans for lunch. … This plac place se serves up up the best ch cheese eese st steak sandw sandwich ch in in we west of

  • f Mi

Missi ssissi ssippi.

slide-88
SLIDE 88

CoType: Co-Embedding for Typing Entities and Relations

88

(Ren et al. WWW’17)

Mention Feature Type politician artist person None (“Barack Obama”, “US”, S1) author_of born_in president_of None EM1_Obama BETWEEN_ book BETWEEN_ president HEAD_Obama TOKEN_ States CONTEXT_ book CONTEXT_ president EM1_ Barack Obama book (“Barack Obama”, “United States”, S3) (“Obama”, “Dream of My Father”, S2) travel_to S1_"US” S1_”Barack Obama” S3_”Barack Obama” S3_”United States” S2_”Dream of My Father” S2_”Obama” Relation Mention Entity Mention Entity Type LOC ORG

S2_Obama author politician S1_Barack Obama

Entity Mention Embedding Space

CONTEXT_ president president_of author_of (Barack Obama, US, S1) BETWEEN_ president BETWEEN_ book

Relation Mention Embedding Space

(“Obama”, “Dream of My Father”, S2)

Model entity- relation interactions

S3_Barack Obama artist CONTEXT_ book person location travel_to BETWEEN_ back to

Object interactions in a heterogeneous graph Low-dimensional vector spaces

slide-89
SLIDE 89

Modeling Entity-Relation Interactions

Object “Translating” Assumption

For a relation mention z between entity arguments m1 m1 and m2 m2:

ve vec(m1) ≈ ve vec(m2) + ve vec(z)

89

Low-dimensional vector space

m1 m1 = “U “USA”

(country)

m2 m2 = “W “Washington D.C. C.” (city) z = capital_ci city_of “F “France” “P “Paris”

po posit itiv ive re relation tr triple ne negativ ive re relation tr triple

(Bordes, NIPS’13), (Ren et al., WWW’17)

Error on a relation triple (z, m1, m2):

slide-90
SLIDE 90

Reducing Error Propagation: A Joint Optimization Framework

90

(Ren et al., WWW’17)

Modeling entity-relation interactions Modeling types of relation mentions Modeling types of entity mentions

slide-91
SLIDE 91

CoType: Comparing with State-of-the-Arts RE Systems

  • Given candidate relation mentions, predict its relation type if it

expresses a relation of interest; otherwise, output “None”

91

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Precision Recall

DeepWalk DS+Logistic LINE MultiR CoType-RM CoType

  • DS+Logistic (Stanford, ACL’09):

logistic classifier on DS

  • MultiR (UW, ACL’11): handles

inappropriate labels in DS

  • DeepWalk (StonyBrook, KDD’14):

homogeneous graph embedding

  • LINE (MSR, WWW’15): joint

feature & type embedding

  • Co

CoTyp ype-RM RM (W (WWW’17): only models relation mentions

  • Co

CoTyp ype (W (WWW’17): models entity-relation interactions

NYT public dataset (Riedel et al. 2010): 1.18M sentences in the corpus, 395 manually annotated sentences for evaluation, 24 relation types

https://github.com/shanzhenren/CoType

slide-92
SLIDE 92

An Application to Life Sciences

92 (Pyysalo et al., BMC Bioinformatics’07) Performance evaluation on BioInfer: Relation Classification Accuracy = 61.7% (11%↑ over the best-performing baseline)

LifeNet: A Knowledge Exploration and Analytics System for Life Sciences

(Ren et al., ACL’17 demo)

Link to PubMed papers LifeNet by Effort-Light StructMine

Machine-created 4 Million+ PubMed papers 1,000+ entity types 400+ relation types <1 hour, single machine 10,000x more facts

BioInfer Network by human labeling

(Pyysalo et al., 2007)

Human-created 1,100 sentences 94 protein-protein interactions 2,500 man-hours 2,662 facts

slide-93
SLIDE 93

Outline

  • Introduction
  • Challenges & Approach
  • Entity Recognition and Typing
  • Fine-grained Entity Typing
  • Joint Entity and Relation Extraction
  • Summary and Future Work

93

slide-94
SLIDE 94

Towards Learning Text Structures with Limited Supervision

94

Unlabeled Data Labeled Data Noisy Training Data Model Training unreliable predictions Imposing priors in at the input stage Prior Embeddings Input Layer Middle Layers Output Layer

Imposing priors at shallow input layers

Model predictions

at Input Stage

slide-95
SLIDE 95

Heterogeneous Supervision for Relation Extraction

  • How to unify multiple sources of supervision (KB-supervision,

hand-crafted rules, crowd-sourced labels, etc.) on same task?

95

(L (Liu iu et et al. al., EM EMNLP LP’17)

return died_in for < , , s> if DiedIn( , ) in KB return born_in for < , , s> if match(‘ * born in * ’, s) return died_in for < , , s> if match(‘ * killed in * ’, s) return born_in for < , , s> if BornIn( , ) in KB

Λ

e1 e2 λ4 λ2 e1 e2 e1 e2 e1 e2 λ3 λ1 e1 e2 e1 e2 Hand-crafted rules Distant supervision

slide-96
SLIDE 96

Robert Newton "Bob" Ford was an American outlaw best known for killing his gang leader Jesse James ( ) in Missouri ( ) Hussein ( ) was born in Amman ( ) on 14 November 1935. Gofraid ( ) died in 989, said to be killed in Dal Riata ( ).

D

e1 e2

c2 c3 c1

e1 e1 e2 e2

return died_in for < , , s> if DiedIn( , ) in KB return born_in for < , , s> if match(‘ * born in * ’, s) return died_in for < , , s> if match(‘ * killed in * ’, s) return born_in for < , , s> if BornIn( , ) in KB

Λ

e1 e2 λ4 λ2 e1 e2 e1 e2 e1 e2 λ3 λ1 e1 e2 e1 e2 λ1 λ2 λ4 λ3

c1 c3 c2

Robert Newton "Bob" Ford was an American outlaw best known for killing his gang leader Jesse James ( ) in Missouri ( ) Hussein ( ) was born in Amman ( ) on 14 November 1935. Gofraid ( ) died in 989, said to be killed in Dal Riata ( ).

D

e1 e2

c2 c3 c1

e1 e1 e2 e2

return died_in for < , , s> if DiedIn( , ) in KB return born_in for < , , s> if match(‘ * born in * ’, s) return died_in for < , , s> if match(‘ * killed in * ’, s) return born_in for < , , s> if BornIn( , ) in KB

Λ

e1 e2 λ4 λ2 e1 e2 e1 e2 e1 e2 λ3 λ1 e1 e2 e1 e2 λ1 λ2 λ4 λ3

c1 c3 c2

Uncover “Expertise” of Labeling Functions

  • Multiple “labeling functions” annotate the same instance à how

to resolve conflicts & redundancy?

  • “expertise” of each labeling function à subset of instances that

the labeling functions are confident on

96

(L (Liu iu et et al. al., EM EMNLP LP’17)

slide-97
SLIDE 97

Towards Learning Text Structures with Limited Supervision

97

Unlabeled Data Labeled Data Noisy Training Data Model Training unreliable predictions Imposing priors in at the input stage Prior Embeddings Input Layer Middle Layers Output Layer

Imposing priors at shallow input layers

Model predictions

at Input Stage at Output Stage

Labeled Data Training

Priors as regularizes in output layers & loss functions

predictions Input Layer Middle Layers Output Layer

slide-98
SLIDE 98

Indirect Supervision for Relation Extraction – using QA Pairs

  • Questions & positive/negative answers
  • Positive à similar relation; negative à distinct relations

98

(W (Wu et et al. al., WS WSDM’1 ’18)

QA Data Format / Example:

Type(“Jack”, “Germany”, A1) = Type(“Jack”, “Germany”, A2) Type(“Jack”, “Germany”, A1) ≠ Type(“Jack”, “France”, A3) Type(“Jack”, “Germany”, A2) ≠ Type(“Jack”, “France”, A3)

slide-99
SLIDE 99

Indirect Supervision for Relation Extraction – using QA Pairs

  • Questions à positive / negative answers
  • pos pairs à similar relation; neg pairs à distinct relations

99

(W (Wu et et al. al., WS WSDM’1 ’18)

slide-100
SLIDE 100

Towards Learning Text Structures with Limited Supervision

100

Labeled Data Input Layer Middle Layers Output Layer (interpretable) predictions

Priors as Network Structures

at Model Stage

slide-101
SLIDE 101

Neural-symbolic Learning for NLP

101

Matched Textual Patterns + Related Knowledge Graph Structures

Composing Graph Networks

Generating GN-blocks SoftMax Classifier