CSCI 699: Machine Learning for Knowledge Extraction and Reasoning - - PowerPoint PPT Presentation
CSCI 699: Machine Learning for Knowledge Extraction and Reasoning - - PowerPoint PPT Presentation
CSCI 699: Machine Learning for Knowledge Extraction and Reasoning Instructor: Xiang Ren www-bcf.usc.edu/~xiangren/ml4know19spring USC Computer Science About the Instructor Asst. Professor of Computer Science, affiliated faculty at ISI,
About the Instructor
- Asst. Professor of Computer Science, affiliated
faculty at ISI, PhD@UIUC
- Research Interest: intersection of machine
learning & NLP (making sense of massive corpora)
- Problems: information extraction (sequence
labeling, structured prediction), knowledge base construction, knowledge reasoning, graph representation learning
- Methods: weak & indirect supervision for modeling
sequence & graph-structured data
2
Self-introduction!
- Name
- Background
- Your research!
- What bring you to the class J
3
Course Format
- Lectures on basic concepts, prior & current
methods
- Presentation on recent research papers
- Projects on novel information extraction
methods/applications
- Assignments on model implementation
4
Schedule
- First 5 weeks – 3 hrs lectures only
- Next 8 weeks – 1.5 hrs lectures + 3 paper
presentation (30min each, including Q&A)
- Including 3 guest lectures
- Last week – project presentation (20 min each
including Q&A)
5
10 mins Break
- 3-3:10pm
- 4:10-4:20pm
6
Homework Assignments (2x)
- On practicing implementation skills on
information extraction techniques
- Named entity recognition
- Relation extraction
- …
- Python is preferred (C++ also accepted)
- Report & GitHub repo with code/README
- 10% of grade each
7
Paper Presentation
- Each student pick one paper from the reading
list (will follow up with a Google spreadsheet)
- You can also suggest papers you want to
present!
- Prepare a 30 min whiteboard or slides
presentation, including 3-5 min of Q&A
8
Project
Come up with a novel model/application of information extraction (in your own domain), conduct experiments to validate your idea.
- Project proposal (~500 words, PDF), due Week 3
- Check-point 1 (15%): survey paper (2-page, double-
column), due Week 4
- Check-point 2 (15%): mid-term report (3-page, double-
column), due Week 10
- Project presentation: 20 min presentation including Q&A,
Week 15
- Check-point 3 (35%): final report (8-page, double-column,
including GitHub repo), due Dec 12.
9
Evaluation
- Homework assignments (20%): 2x 10%
- Project (80%):
- Project survey (15%)
- Project mid-term report (15%)
- Project presentation (15%)
- Project final report (35%)
10
Logistics
- Instructor: Xiang Ren
- Email: xiangren@usc.edu
- Office: SAL 308
- Office hour: by appointment
- TA: TBD
- Course website
- www-bcf.usc.edu/~xiangren/ml4know19spring
- Homework submission (PDF and hyperlinks):
- Blackboard (or by emails if there’s issue uploading)
11
Today’s Lecture
- Overview of Knowledge Extraction &
Reasoning: tasks, methods, and applications
- Overview of my research on Effort-Light
Knowledge Extraction
12
Knowledge Extraction and Reasoning: An Overview
CSCI 699: Machine Learning for Knowledge Extraction & Reasoning
Instructor: Xiang Ren USC Computer Science
The Era of Big Data
14
15
Growth of Unstructured Text Data
Unstructured data, primarily text, account for more than 80% of the data collected by organizations Structured Data
Unstructured Data and the 80 Percent Rule, Seth Grimes, Clarabridge Bridgepoints, 2008 Q3.
Knowledge in “Big Text Data”
16
News Social media post Web pages … Financial reports Medical records Legal acts … Customer reviews Tech support memos Field service notes
Get overview of recent news events Obtain insights from data for decision support Summarize user feedbacks for quality control
… … …
Text is accessible, but knowledge in text is not machine-readable …
17
Structures Bring Analytic Power
18
Databases Networks
Database management, exploration & analysis
Insights & Knowledge
Turning Unstructured Text Data into Structures
19
Unstructured Text Data
(account for ~80% of all data in organizations)
Knowledge & Insights
?
Structures
(Chakraborty, 2016)
Information Extraction
- Can computational systems extract structured,
factual information from unstructured or semi-structured data, and represent them in a machine-readable form?
20
?
Entity Structures
21
Can computational systems identify real- world entities of different categories from given corpora?
Criticism of government response to the hurricane …
text corpus
… New Orleans Louisiana
Washington DC
... Ray Nagin Mayor President Bush ...
Organization Person Location
United States Red Cross US government ...
Relation Structures
22
Can computational systems capture different relations between the entities from given corpora?
American Airlines, a unit of AMR corp., immediately matched the move, spokesman Tim Wagner
- said. United Airlines, a unit of UAL
corp., said the increase took effect Thursday night
text corpus
Entity 1 Relation Entity 2 American Airlines is_subsidiary_of AMR Tim Wagner is_employee_of American Airlines United Airlines is_subsidiary_of UAL … … …
Event Structures
23
Can computational systems identify real- world event of different types from given corpora?
Criticism of government response to the hurricane …
text corpus
… 07 JAN 90 ROBBERY
LOCATION TYPE Date
CHILE: MOLINA (CITY)
Terrorism Template
Filling slots in a database from sub-segments of text.
As a task:
October 14, 2002, 4:00 a.m. PT For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open-source concept, by which software code is made public to encourage improvement and development by outside
- programmers. Gates himself says Microsoft will gladly
disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access.“ Richard Stallman, founder of the Free Software Foundation, countered saying… NAME TITLE ORGANIZATION
What is “Information Extraction”
What is “Information Extraction”
Filling slots in a database from sub-segments of text.
As a task:
October 14, 2002, 4:00 a.m. PT For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open-source concept, by which software code is made public to encourage improvement and development by outside
- programmers. Gates himself says Microsoft will gladly
disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access.“ Richard Stallman, founder of the Free Software Foundation, countered saying… NAME TITLE ORGANIZATION Bill Gates CEO Microsoft Bill Veghte VP Microsoft Richard Stallman founder Free Soft..
IE
Information Extraction = segmentation + classification + clustering + association
As a family
- f techniques:
October 14, 2002, 4:00 a.m. PT For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open-source concept, by which software code is made public to encourage improvement and development by outside
- programmers. Gates himself says Microsoft will gladly
disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access.“ Richard Stallman, founder of the Free Software Foundation, countered saying…
Microsoft Corporation CEO Bill Gates Microsoft Gates Microsoft Bill Veghte Microsoft VP Richard Stallman founder Free Software Foundation aka “named entity extraction”
What is “Information Extraction”
What is “Information Extraction”
Information Extraction = segmentation + classification + association + clustering
As a family
- f techniques:
October 14, 2002, 4:00 a.m. PT For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open-source concept, by which software code is made public to encourage improvement and development by outside
- programmers. Gates himself says Microsoft will gladly
disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access.“ Richard Stallman, founder of the Free Software Foundation, countered saying…
Microsoft Corporation CEO Bill Gates Microsoft Gates Microsoft Bill Veghte Microsoft VP Richard Stallman founder Free Software Foundation
What is “Information Extraction”
Information Extraction = segmentation + classification + association + clustering
As a family
- f techniques:
October 14, 2002, 4:00 a.m. PT For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open-source concept, by which software code is made public to encourage improvement and development by outside
- programmers. Gates himself says Microsoft will gladly
disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access.“ Richard Stallman, founder of the Free Software Foundation, countered saying…
Microsoft Corporation CEO Bill Gates Microsoft Gates Microsoft Bill Veghte Microsoft VP Richard Stallman founder Free Software Foundation
Information Extraction = segmentation + classification + association + clustering
As a family
- f techniques:
October 14, 2002, 4:00 a.m. PT For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open-source concept, by which software code is made public to encourage improvement and development by outside
- programmers. Gates himself says Microsoft will gladly
disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access.“ Richard Stallman, founder of the Free Software Foundation, countered saying…
Microsoft Corporation CEO Bill Gates Microsoft Gates Microsoft Bill Veghte Microsoft VP Richard Stallman founder Free Software Foundation
NAME TITLE ORGANIZATION Bill Gates CEO Microsoft Bill Veghte VP Microsoft Richard Stallman founder Free Soft..
* * * *
What is “Information Extraction”
StructNet: Structured Network of Factual Knowledge
30
news article
agency person location
- rganization
URL
- Nodes: entities of different entity types
- Edges: relationships of different relation types
1 2 3 4 5 6 7 broadway shows beacon theater broadway dance center broadway plays david letterman show radio city music hall theatre shows 1 2 3 4 5 6 7 high line park chelsea market highline walkway elevated park meatpacking district west side
- ld railway
A Product Use Case: Finding “Interesting Collections of Hotels”
31 http://engineering.tripadvisor.com/using-nlp-to-find-interesting-collections-of-hotels/
Technology Transfer to TripAdvisor
Features for “Catch a Show” collection
Features for “Near The High Line” collection
Grouping hotels based on structured facts extracted from the review text
Better Question Answering with Reasoning Capability
32
A Life Science Use Case: Identifying “Distinctively Related Entities”
33
Collaborate with UCLA Heart BD2K Center & Mayo Clinic
What proteins are distinctively associated with Cardiomyopathy?
http://www.igb.illinois.edu/news/harnessing-power-big-data-revolution-genomic-data-analysis
Citation prediction for scientific papers
34
Paper titles, abstracts & bibliographic data FacetGist [CIKM’16]
paper
technique dataset application author venue
ClusCite: Citation Recommendation by Information Network- Based Clustering [KDD’14] A new manuscript Suggested papers to cite: 0.8 0.65 0.52
Corpus-specific StructNet Construction
35
news article
agency person location
- rganization
URL
How to automate the construction of StructNets from given text corpora?
?
IE History
Pre-Web
- Mostly news articles
- De Jong’s FRUMP [1982]
- Hand-built system to fill Schank-style “scripts” from news wire
- Message Understanding Conference (MUC) DARPA [’87-’95], TIPSTER [’92-’96]
- Early work dominated by hand-built models
- E.g. SRI’s FASTUS, hand-built FSMs.
- But by 1990’s, some machine learning: Lehnert, Cardie, Grishman and then
HMMs: Elkan [Leek ’97], BBN [Bikel et al ’98]
Web
- AAAI ’94 Spring Symposium on “Software Agents”
- Much discussion of ML applied to Web. Maes, Mitchell, Etzioni.
- Tom Mitchell’s WebKB, ‘96
- Build KB’s from the Web.
- Wrapper Induction
- Initially hand-build, then ML: [Soderland ’96], [Kushmeric ’97],…
- Citeseer; Cora; FlipDog; contEd courses, corpInfo, …
IE History
Biology
- Gene/protein entity extraction
- Protein/protein fact interaction
- Automated curation/integration of databases
- At CMU: SLIF (Murphy et al, subcellular information from
images + text in journal articles)
- EPCA, PAL, RADAR, CALO: intelligent office
assistant that “understands” some part of email
- At CMU: web site update requests, office-space requests;
calendar scheduling requests; social network analysis of email.
www.apple.com/retail
IE is different in different domains!
Example: on web there is less grammar, but more formatting & linking
The directory structure, link structure, formatting & layout of the Web is its own new grammar. Apple to Open Its First Retail Store in New York City
MACWORLD EXPO, NEW YORK--July 17, 2002--Apple's first retail store in New York City will open in Manhattan's SoHo district on Thursday, July 18 at 8:00 a.m. EDT. The SoHo store will be Apple's largest retail store to date and is a stunning example of Apple's commitment to offering customers the world's best computer shopping experience. "Fourteen months after opening our first retail store, our 31 stores are attracting over 100,000 visitors each week," said Steve Jobs, Apple's CEO. "We hope our SoHo store will surprise and delight both Mac and PC users who want to see everything the Mac can do to enhance their digital lifestyles."
www.apple.com/retail/soho www.apple.com/retail/soho/theatre.html
Newswire Web
Landscape of IE Tasks (1/4):
Degree of Formatting
Text paragraphs without formatting Grammatical sentences and some formatting & links Non-grammatical snippets, rich formatting & links Tables
Astro Teller is the CEO and co-founder of
- BodyMedia. Astro holds a Ph.D. in Artificial
Intelligence from Carnegie Mellon University, where he was inducted as a national Hertz fellow. His M.S. in symbolic and heuristic computation and B.S. in computer science are from Stanford
- University. His work in science, literature and
business has appeared in international media from the New York Times to CNN to NPR.
Landscape of IE Tasks (2/4):
Intended Breadth of Coverage
Web site specific Genre specific Wide, non-specific
Amazon.com Book Pages Resumes University Names Formatting Layout Language
Landscape of IE Tasks (3/4):
Complexity
Closed set He was born in Alabama… Regular set Phone: (413) 545-1323 Complex pattern University of Arkansas P.O. Box 140 Hope, AR 71802 …was among the six houses sold by Hope Feldman that year. Ambiguous patterns, needing context and many sources of evidence The CALD main office can be reached at 412-268-1299 The big Wyoming sky…
U.S. states U.S. phone numbers U.S. postal addresses Person names
Headquarters: 1128 Main Street, 4th Floor Cincinnati, Ohio 45210 Pawel Opalinski, Software Engineer at WhizBang Labs.
E.g. word patterns:
IE: The Broader View
Create ontology
Segment Classify Associate Cluster
Load DB Spider Query, Search Data mine
IE
Document collection Database Filter by relevance Label training data Train extraction models
Knowledge Graphs are Not Complete
43 Band of Brothers Mini- Series HBO tvProgramCreator tvProgramGenre Graham Yost writtenBy music United States countryOfOrigin Neal McDonough nationality-1 English Tom Hanks awardWorkWinner castActor
...
profession personLanguages personLanguages Caesars Entertain… serviceLocation-1 serviceLanguage Actor c
- u
n t r y S p
- k
e n I n
- 1
Michael Kamen
Benefits of Knowledge Graph
- Support various applications
- Structured Search
- Question Answering
- Dialogue Systems
- Relation Extraction
- Summarization
- Knowledge Graphs can be constructed via
information extraction from text, but…
- There will be a lot of missing links.
- Goal: complete the knowledge graph.
44
Reasoning on Knowledge Graph
Query node: Band of brothers Query relation: tvProgramLanguage tvProgramLanguage(Band of Brothers, ?)
45 Band of Brothers Mini- Series HBO tvProgramCreator tvProgramGenre Graham Yost writtenBy music United States countryOfOrigin Neal McDonough nationality-1 English Tom Hanks awardWorkWinner castActor
...
profession personLanguages personLanguages Caesars Entertain… serviceLocation-1 serviceLanguage Actor c
- u
n t r y S p
- k
e n I n
- 1
Michael Kamen
KB Reasoning Tasks
- Predicting the missing link.
- Given e1 and e2, predict the relation r.
- Predicting the missing entity.
- Given e1 and relation r, predict the missing entity e2.
- Fact Prediction.
- Given a triple, predict whether it is true or false.
46
Knowledge Base Reasoning
- Question: can we infer missing links based on background KB?
- Path-based methods
- Path-Ranking Algorithm (PRA), Lao et al. 2011
- RNN + PRA, Neelakantan et al, 2015
- Chains of Reasoning, Das et al, 2017
- Embedding-based methods
- RESCAL, Nickel et al., 2011
- TransE, Bordes et al, 2013
- TransR/CTransR, Lin et al, 2015
- Integrating Path and Embedding-Based Methods
- DeepPath, Xiong et al, 2017
- MINERVA, Das et al, 2018
- DIVA, Chen et al., 2018
47
Traditional Rule-Based Systems
48
… cities such as NPList …
New York Los Angeles Dallas
City
“The tour includes major cities such as [New York], [Los Angeles], and [Dallas]”
Extraction rules Domain experts Text corpus handcraft
NPList[0] NPList[1] …
City
Text Entities Extraction rules
Supervised Machine Learning- Based Systems (state-of-the-art)
49
[San Francisco], in northern California, is a hilly city on the tip of a peninsula.
Features
Machine-learning model Domain experts
Training data
Manual annotation Feature engineering
Effort-Light Knowledge Extraction
CSCI 699: Introduction to Information Extraction
Instructor: Xiang Ren USC Computer Science
Text data often are highly variable…
- Domain
- CS papers ßà biomedical papers
- Genre
- News articles ßà tweets
- Language
- English ßà Arabic
51
(Grammar, vocabularies, gazetteers)
However, text data often are highly variable…
52
Domain experts
English News Arabic web forum posts Lift science literature
Manual data annotation & complex feature generation
- Low efficiency
- Subjective
- Costly
- Limited scale
Prior Art in NLP: Extracting Structures with Repeated Human Effort
53
This hotel is my favorite Hilton property in NYC! It is located right on 42nd street near Times Square, it is close to all subways, Broadways shows, and next to many great …
… The June 2013 Egyptian protest were mass protest event that occurred in Egypt on 30 June 2013, …
Human labeling … We had a room facing Tim Times Square and a room facing the Em Empire St State Bu Building, The location is close to everything and we love … Extraction Rules Machine-Learning Models Broadways shows NYC Hilton property Labeled data Text Corpus
St Stanford Co CoreNLP CM CMU NE NELL LL UW UW Kn KnowItAll US USC AM AMR IB IBM Alc lchem emy APIs Is
Go Google Kn Knowledge Gr Graph
Mi Microsoft Sa Satori …
Structured Facts Times square hotel
Our Research: Effort-Light StructMine
54
Corpus-specific Models Text Corpus Entity & Relation Structures
- Enables quick development of applications over various corpora
- Extracts complex structures without introducing human errors
News articles PubMed papers
Knowledge Bases (KB)
External Knowledge Bases as “Distant Supervision”
55
Text corpus Knowledge bases (KBs)
Overlapping factual information: entity names, entity types, relationships, etc. 1% of 10M sentences à 100K labeled sentences!
External Knowledge Bases as “Distant Supervision”
56
Text corpus Knowledge bases (KBs)
Overlapping factual structures: entity names, entity types, relationships, etc.
Co-occurrence patterns between text units bring semantic power
57
United States speech president Barack Obama politician
Low-dimensional semantic space
“… a speech was delivered by United States President Barack Obama.” “President Vladimir Putin delivers a speech during …”
Training corpus
Prediction
- n unseen
entities
A Cold-Start Factual Structure Mining (StructMine) Framework
58
Text corpus
Data-driven text segmentation
Candidate factual structures & text units/features
Distant supervision
Partially- labeled training corpus
Learn semantic spaces
Extract factual structures from the remaining unlabeled corpus
Effort–Light StructMine: Where Are We?
59
Hum Human an la labelin ling ef effort Fe Feature en engineer eering ef effort We Weakly-su supervise sed le lear arnin ing me methods Ha Hand-cr crafted me methods Su Supervised le lear arnin ing me methods Di Distantly-su supervise sed lea earning me methods
CMU NELL, 2009 - present UW KnowItAll, Open IE, 2005 - present Max-Planck YAGO, 2008 - present Stanford CoreNLP, 2005 - present UT Austin Dependency Kernel, 2005 IBM Watson Language APIs UCB Hearst Pattern, 1992 NYU Proteus, 1997 Stanford: Snorkel, MIML-RE 2012 - present U Washington: FIGER, MultiR, 2012
Effort-Light StructMine
(KDD’15, 16, 17, WWW’15, 17, 18, EMNLP’16, 17…)
The Roadmap for Corpus-Specific StructNet Construction
60
document
Text corpus Entity Recognition and Typing (KDD’15)
document document
Fine-grained Entity Typing (EMNLP’16) Joint Entity and Relation Extraction (WWW’17)
Corpus-specific StructNet
Outline
- Introduction
- Challenges & Approach
- Entity Recognition and Typing (KDD’15)
- Fine-grained Entity Typing
- Joint Entity and Relation Extraction
- Future Work
- Summary
61
62
What is Entity Recognition and Typing
- Identify token spans of entity mentions in text, and
classify them into types of interest
[Barack Obama] arrived this afternoon in [Washington, D.C.]. [President Obama]’s wife [Michelle] accompanied him [TNF alpha] is produced chiefly by activated [macrophages]
63
What is Entity Recognition and Typing
- Identify token spans of entity mentions in text, and
classify them into types of interest
[TNF alpha] is produced chiefly by activated [macrophages] [Barack Obama] arrived this afternoon in [Washington, D.C.]. [President Obama]’s wife [Michelle] accompanied him
PERSON LOCATION PROTEIN CELL
Traditional Named Entity Recognition (NER) Systems
- Reliance on large amounts of manually-annotated data
- Slow model training: often slower than O(#word #features #classes)
64
A manual annotation interface A NER system pipeline
Finkel et al., Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling, ACL 2005
Weak-Supervision Systems: Pattern-Based Bootstrapping
- Requires manual seed selection & mid-point checking
65
Annotate corpus using entities Generate candidate patterns Score candidate patterns Select Top patterns Apply patterns to find new entities
Se Seeds fo for Food
Pi Pizza Fr French Fr Fries Ho Hot Do Dog Panc ancak ake
.. ...
Seed entities and corpus Patterns for Food th the be best <X <X> I’ I’ve tr tried in in th their <X <X> ta tastes am amaz azing ing …
e.g., (Etzioni et al., 2005), (Talukdar et al., 2010), (Gupta et al., 2014), (Mitchell et al., 2015), …
Systems: CMU NELL UW KnowItAll Stanford DeepDive Max-Planck PROSPERA …
66
Leveraging Distant Supervision
1. 1. Detec ect entity names from text 2. 2. Ma Match name strings to KB entities 3. 3. Pr Propagate te types to the un-matchable names
ID
Sentence
S1
Ph Phoenix is my all-time favorite dive bar in Ne New Yo York Ci City .
S2
The best BBQ BBQ I’ve tasted in Ph Phoenix.
S3
Ph Phoenix has become one of my favorite bars in NY NY . BBQ BBQ Ph Phoen enix NY NY ta tasted in in has has be become me on
- ne of
- f
my my fa favorite bar bars in in Location Ne New Yo York Ci City ??? ??? Food is is my my all all-ti time fa favorite div dive bar bar in in Location à
(Lin et al., 2012), (Ling et al., 2012), (Nakashole et al., 2013)
ID Sentence S1 Ph Phoenix is my all-time favorite dive bar in Ne New Yo York Ci City . S2 The best BBQ BBQ I’ve tasted in Ph Phoenix. S3 Ph Phoenix has become one of my favorite bars in NY NY .
Current Distant Supervision: Limitation I
- 1. Context-agnostic type prediction
- Predict types for each mention regardless of context
- 2. Sparsity of contextual bridges
67
Current Distant Supervision: Limitation II
- 1. Context-agnostic type prediction
- 2. Sparsity of contextual bridges
- Some re
relational phr phrases es are in infr frequent in the corpus à ineffective type propagation
68
ID Sentence S1
Ph Phoenix is my all-time favorite dive bar in Ne New Yo York Ci City .
S3
Ph Phoenix has become one of my favorite bars in NY NY .
69
My Solution: ClusType (KDD’15)
BBQ BBQ NY NY ta tasted in in has has be become me on
- ne of
- f
my my fa favorite bar bars in in Ne New Yo York Ci City is is my all all-ti time fa favorite dive bar in
ID Segmented Sentences
S1
Ph Phoenix is is my my all all-ti time fa favorite di dive ba bar in in Ne New Yo York Ci City .
S2
The best BBQ BBQ I’ve ta tasted in in Ph Phoenix.
S3
Ph Phoenix ha has be become on
- ne of
- f my
my fa favorite ba bars in in NY NY .
S2 S2: BBQ BBQ
S3 S3: NY NY S1 S1: Ne New Yo York Ci City
S2 S2: Ph Phoen enix S3 S3: Ph Phoen enix
Pu Putting two su sub- ta tasks to toge gether: 1. Type label propagation 2. Relation phrase clustering
Similar relation phrases
Correlated mentions
Ph Phoen enix S1 S1: Ph Phoen enix
Represents
- bject
interactions
https://github.com/shanzhenren/ClusType
70
Type Propagation in ClusType
Sm Smoothness Assumption If two objects are similar according to the graph, then their type labels should be also similar
Ed Edge we weight / ob
- bject si
similarity
BBQ BBQ NY NY ta tasted in in has has be become me on
- ne of
- f
my my fa favorite bar bars in in Ne New Yo York Ci City is is my all all-ti time fa favorite dive bar in Ph Phoen enix
fi fj
S3 S3: Ph Phoen enix S1 S1: Ph Phoen enix
Wij
(Belkin & Partha, NIPS’01), (Ren et al., KDD’15)
71
Relation Phrase Clustering in ClusType
NY NY has has be become me on
- ne of
- f
my my fa favorite bar bars in in Ne New Yo York Ci City is is my all all-ti time fa favorite div dive bar bar in in Ph Phoen enix Similar relation phrases Location ??? Location à
(Ren et al., KDD’15)
Two subtasks mutually enhance each other
- Two relation phrases should be grouped together if:
1. Similar string 2. Similar context 3. Similar types for entity arguments
“Multi-view” clustering 5 102
72
ClusType: Comparing with State-of-the-Art Systems (F1 Score)
Me Methods NY NYT Ye Yelp Tw Tweet
Pa Pattern (Stanford, CONLL’14) 0.301 0.199 0.223 Se SemT mTagger (U Utah, ACL’10) 0.407 0.296 0.236 NNP NNPLB (UW, EMNLP’12) 0.637 0.511 0.246 AP APOLLO (THU, CIKM’12) 0.795 0.283 0.188 FI FIGER ER (UW, AAAI’12) 0.881 0.198 0.308 Cl Clus usType pe (KDD’15) 0. 0.939 939 0. 0.808 808 0. 0.451 451
Precision (P) =
#"#$$%&'()*')+%, -%.'/#.0 #1)0'%-*$%./3%, -%.'/#.0 , Recall (R) = #"#$$%&'()*')+%, -%.'/#.0 #2$#4.,*'$4'5 -%.'/#.0 , F1 score = 6 7×9 (7;9)
Bootstrapping Label propagation Classifier with linguistic features
NYT: 118k news articles (1k manually labeled for evaluation); Yelp: 230k business reviews (2.5k reviews are manually labeled for evaluation); Tweet: 302 tweets (3k tweets are manually labeled for evaluation)
- vs. bo
boots tstr trappi pping ng: context-aware prediction on “un-matchable”
- vs
- vs. lab
label pr propa pagati tion: group similar relation phrases
- vs
- vs. FI
FIGER: no reliance on complex feature engineering
https://github.com/shanzhenren/ClusType
Outline
- Introduction
- Challenges & Approach
- Entity Recognition and Typing
- Fine-grained Entity Typing (EMNLP’16)
- Joint Entity and Relation Extraction
- Summary and Future Work
73
74
From Coarse-Grained Typing to Fine-Grained Entity Typing
ID Sentence S1
Don
- nald Tr
Trump spent 14 television seasons presiding
- ver a game show, NBC’s The Apprentice.
Person
Location Organization
root product person location
- rganiz
ation
... ...
politician artist business man
... ... ...
author actor singer
... ... ...
A few common types A type hierarchy with 100+ types (from knowledge base)
(Ling et al., 2012), (Nakashole et al., 2013), (Yogatama et al., 2015)
Problem Statement
75
Text corpus
- How to learn an effective model to predict a single
type-path for each unlinkable entity mentions, using the automatically-labeled training corpus
- ?
? ?
Labeled corpus
root product person location
- rganiz
ation
... ...
politician artist business man
... ... ...
author actor singer
... ... ...
- Typing model
Predictions for unlinkable mentions NER + Distant Supervision
Current Distant Supervision: Context-Agnostic Labeling
76
root person location
- rganization
politician artist businessman author actor singer
...
En Enti tity ty ty types fr from
- m
kn knowledge bas base
Entity: Donald Trump S1 S1: Do Donal nald Tru rump Entity Types: pe person, ar artis ist, act ctor, author, businessman, politician
ID Sentence S1
Do Donald Trum ump spent 14 television seasons presiding over a game show, NBC’s The Apprentice
- Inaccurate labels in training data
- Prior work: all labels are “perfect”
ro root pr produc duct pe person lo locat atio ion
- r
- rganization
- n
po politician ar artis ist bus busine nessman au author ac actor si singer
... ...
ID Sentence Si President Trump gave an all-hands address to troops at the U.S. Central Command headquarters …
+ + +
Pr President ga gave spe speech Vectors for text features Test mention: Si_T _Trump
- Top-down nearest neighbor search
in the given type hierarchy
pr preside dent po politician pe person ac actor sen senator
- r
ga gave ad address st star pl play
Low-dimensional vector space
Type Inference in PLE
Type hierarchy (in knowledge base) (Ren et al., KDD’16)
My Solution: Partial Label Embedding (KDD’16)
78
“De-noised” labeled data
ID
Sentence
S1
Donald Trump spent 14 television seasons presiding
- ver a game show, NBC’s The Apprentice
Extract Text Features “Label Noise Reduction” with PLE Train Classifiers
- n De-noised Data
Prediction
- n New Data
S1 S1: Do Donal nald Tru rump Entity Types: person, artist, actor, author, businessman, politician
Text features: TOKEN_Donald, CONTEXT: television, CONTEXT: season, TOKEN_trump, SHAPE: AA
More effective classifiers
(Ren et al., KDD’16) https://github.com/shanzhenren/PLE
PLE: Modeling Clean and Noisy Mentions Separately
79
For a clean mention, its “positive types” should be ranked higher than all its “negative types” For a noisy mention, its “best candidate type” should be ranked higher than all its “non-candidate types”
S1 S1: Do Donald Trum ump Types in KB: person, artist, actor, author, businessman, politician
ID
Noisy Entity Mention S1
Donald Trump spent 14 television seasons presiding over a game show, NBC’s The Apprentice
(+) actor 0.88 (+ (+) artist 0.74 (+) person 0.55 (+) author 0.41 (+) politician 0.33 (+) business 0.31
“Best” candidate type
(+) actor (-) singer (-) coach (-) doctor (-) location (-) organization
Types ranked
(Ren et al., KDD’16)
Si: Te Ted Cr Cruz Types in KB: person, politician
ro root pr produc duct pe person lo locat atio ion
- r
- rganization
- n
po politician ar artis ist bus busine nessman au author ac actor si singer
... ...
ID Sentence Si President Trump gave an all-hands address to troops at the U.S. Central Command headquarters …
+ + +
Pr President ga gave spe speech Vectors for text features Test mention: Si_T _Trump
- Top-down nearest neighbor search
in the given type hierarchy
pr preside dent po politician pe person ac actor sen senator
- r
ga gave ad address st star pl play
Low-dimensional vector space
Type Inference in PLE
Type hierarchy (from knowledge base) (Ren et al., KDD’16)
PLE: Performance of Fine-Grained Entity Typing
81
- Ra
Raw: candidate types from distant supervision
- WS
WSABIE (Google, ACL’15): joint feature and type embedding
- Predictive Text Embedding
(MSR, WWW’15): joint mention,
feature and type embedding
- Both WASBIE and PTE suffer
from “noisy” training labels
- PL
PLE (KDD’16): partial-label loss for context-aware labeling
0.7 0.45 0.05 0.79 0.49 0.14 0.78 0.51 0.19 0.81 0.62 0.48
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Level-1 Level-2 Level-3 Accuracy on different type levels
Raw WSABIE PTE PLE
Accuracy = # "#$%&'$( )&%* +,, %-.#( /'00#/%,- .0#1&/%#1
# "#$%&'$( &$ %*# %#(% (#%
On OntoNotes public dataset (Weischedel et al. 2011, Gillick et al., 2014): 13,109 news articles, 77 annotated documents, 89 entity types
https://github.com/shanzhenren/PLE
Outline
- Introduction
- Challenges & Approach
- Entity Recognition and Typing
- Fine-grained Entity Typing
- Joint Entity and Relation Extraction (WWW’17)
- Summary and Future Work
82
Problem Statement
83
Am American Ai Airlines, a unit
- f AM
AMR co corp., immediately matched the move, spokesman Ti Tim Wagner
- said. Un
Unit ited ed Ai Airlines, a unit
- f UA
UAL co corp., said the increase took effect Thursday night and applies to most routes ...
Input corpus
Entity 1 Relation Entity 2 American Airlines is_subsidiary_of AMR Tim Wagner is_employee_of American Airlines United Airlines is_subsidiary_of UAL … … …
Extracted entity-relation mentions
Person Location
Previous Work
- Supervised relation extraction (RE) systems
- Hard to be ported to deal with different kinds of corpora
- Pattern-based bootstrapping RE systems
- Focus on “explicit” relation mentions à limited recall
- Semantic drift
- Distantly-supervised RE systems
- Error propagation (cont.)
84 Mintz et al. Distant supervision for relation extraction without labeled data. ACL, 2009. Etzioni et al. Web-scale information extraction in knowitall:(preliminary results). WWW, 2004. Surdeanu et al. Multi-instance multi-label learning for relation extraction. EMNLP, 2012.
Prior Work: An “Incremental” System Pipeline
85
Entity mention detection Context-aware entity typing Relation mention detection Context-aware relation typing
(wom
- men, pro
protest) à (protes est, Ja January 21, 21, 2017 2017) The Wo Women ’s March was a worldwide pro protest on Januar January 21, 21, 2017 2017.
Entity boundary errors:
(wom
- men, pro
protest) ✗ (pro protest, Januar January 21, 21, 2017 2017)
Relation mention errors:
is a ✗ The Wo Women ’s March was a worldwide pro protest on Januar January 21, 21, 2017
- 2017. à
Entity type errors: person
✗
(wom
- men, pro
protest) à (protes est, Ja January 21, 21, 2017 2017)
Relation type errors
is a ✗
Error propagation cascading down the pipeline
(Mintz et al., 2009), (Riedel et al., 2010), (Hoffmann et al., 2011), (Surdeanu et al., 2012), (Nagesh et al., 2014), …
- Con
Context-aw aware type modeling
- Model en
entity-re relation in interac actio ions ns
My Solution: CoType (WWW’17)
86
(Ren et al. WWW’17)
- 1. Data-driven detection of
entity and relation mentions
- Data-driven text segmentation
- Syntactic pattern learning from KBs
- 2. Joint typing of entity and
relation mentions
Entity mention detection Context-aware entity typing Relation mention detection Context-aware relation typing
https://github.com/shanzhenren/CoType
87
My Solution: Data-Driven Entity Mention Detection
Corpus-level Concordance Syntactic quality Quality
- f merging
- Significance of a merging between two sub-phrases
Pattern Example
(J (J*)N )N* support vector machine VP VP tasted in, damage on VW VW*(P (P) train a classifier with
Good Concordance
The best BBQ BBQ I’ve ta tasted in Ph Phoenix ! I ha had the pulle pulled d po pork sandw sandwic ich h wi with th co coleslaw and bak baked d be beans ans for lunch. … This plac place se serves up up the best ch cheese eese st steak sandw sandwich ch in in we west of
- f Mi
Missi ssissi ssippi.
CoType: Co-Embedding for Typing Entities and Relations
88
(Ren et al. WWW’17)
Mention Feature Type politician artist person None (“Barack Obama”, “US”, S1) author_of born_in president_of None EM1_Obama BETWEEN_ book BETWEEN_ president HEAD_Obama TOKEN_ States CONTEXT_ book CONTEXT_ president EM1_ Barack Obama book (“Barack Obama”, “United States”, S3) (“Obama”, “Dream of My Father”, S2) travel_to S1_"US” S1_”Barack Obama” S3_”Barack Obama” S3_”United States” S2_”Dream of My Father” S2_”Obama” Relation Mention Entity Mention Entity Type LOC ORG
S2_Obama author politician S1_Barack Obama
Entity Mention Embedding Space
CONTEXT_ president president_of author_of (Barack Obama, US, S1) BETWEEN_ president BETWEEN_ book
Relation Mention Embedding Space
(“Obama”, “Dream of My Father”, S2)
Model entity- relation interactions
S3_Barack Obama artist CONTEXT_ book person location travel_to BETWEEN_ back to
Object interactions in a heterogeneous graph Low-dimensional vector spaces
Modeling Entity-Relation Interactions
Object “Translating” Assumption
For a relation mention z between entity arguments m1 m1 and m2 m2:
ve vec(m1) ≈ ve vec(m2) + ve vec(z)
89
Low-dimensional vector space
m1 m1 = “U “USA”
(country)
m2 m2 = “W “Washington D.C. C.” (city) z = capital_ci city_of “F “France” “P “Paris”
po posit itiv ive re relation tr triple ne negativ ive re relation tr triple
(Bordes, NIPS’13), (Ren et al., WWW’17)
Error on a relation triple (z, m1, m2):
Reducing Error Propagation: A Joint Optimization Framework
90
(Ren et al., WWW’17)
Modeling entity-relation interactions Modeling types of relation mentions Modeling types of entity mentions
CoType: Comparing with State-of-the-Arts RE Systems
- Given candidate relation mentions, predict its relation type if it
expresses a relation of interest; otherwise, output “None”
91
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
Precision Recall
DeepWalk DS+Logistic LINE MultiR CoType-RM CoType
- DS+Logistic (Stanford, ACL’09):
logistic classifier on DS
- MultiR (UW, ACL’11): handles
inappropriate labels in DS
- DeepWalk (StonyBrook, KDD’14):
homogeneous graph embedding
- LINE (MSR, WWW’15): joint
feature & type embedding
- Co
CoTyp ype-RM RM (W (WWW’17): only models relation mentions
- Co
CoTyp ype (W (WWW’17): models entity-relation interactions
NYT public dataset (Riedel et al. 2010): 1.18M sentences in the corpus, 395 manually annotated sentences for evaluation, 24 relation types
https://github.com/shanzhenren/CoType
An Application to Life Sciences
92 (Pyysalo et al., BMC Bioinformatics’07) Performance evaluation on BioInfer: Relation Classification Accuracy = 61.7% (11%↑ over the best-performing baseline)
LifeNet: A Knowledge Exploration and Analytics System for Life Sciences
(Ren et al., ACL’17 demo)
Link to PubMed papers LifeNet by Effort-Light StructMine
Machine-created 4 Million+ PubMed papers 1,000+ entity types 400+ relation types <1 hour, single machine 10,000x more facts
BioInfer Network by human labeling
(Pyysalo et al., 2007)
Human-created 1,100 sentences 94 protein-protein interactions 2,500 man-hours 2,662 facts
Outline
- Introduction
- Challenges & Approach
- Entity Recognition and Typing
- Fine-grained Entity Typing
- Joint Entity and Relation Extraction
- Summary and Future Work
93
Towards Learning Text Structures with Limited Supervision
94
Unlabeled Data Labeled Data Noisy Training Data Model Training unreliable predictions Imposing priors in at the input stage Prior Embeddings Input Layer Middle Layers Output Layer
Imposing priors at shallow input layers
Model predictions
at Input Stage
Heterogeneous Supervision for Relation Extraction
- How to unify multiple sources of supervision (KB-supervision,
hand-crafted rules, crowd-sourced labels, etc.) on same task?
95
(L (Liu iu et et al. al., EM EMNLP LP’17)
return died_in for < , , s> if DiedIn( , ) in KB return born_in for < , , s> if match(‘ * born in * ’, s) return died_in for < , , s> if match(‘ * killed in * ’, s) return born_in for < , , s> if BornIn( , ) in KB
Λ
e1 e2 λ4 λ2 e1 e2 e1 e2 e1 e2 λ3 λ1 e1 e2 e1 e2 Hand-crafted rules Distant supervision
Robert Newton "Bob" Ford was an American outlaw best known for killing his gang leader Jesse James ( ) in Missouri ( ) Hussein ( ) was born in Amman ( ) on 14 November 1935. Gofraid ( ) died in 989, said to be killed in Dal Riata ( ).
D
e1 e2
c2 c3 c1
e1 e1 e2 e2
return died_in for < , , s> if DiedIn( , ) in KB return born_in for < , , s> if match(‘ * born in * ’, s) return died_in for < , , s> if match(‘ * killed in * ’, s) return born_in for < , , s> if BornIn( , ) in KB
Λ
e1 e2 λ4 λ2 e1 e2 e1 e2 e1 e2 λ3 λ1 e1 e2 e1 e2 λ1 λ2 λ4 λ3
c1 c3 c2
Robert Newton "Bob" Ford was an American outlaw best known for killing his gang leader Jesse James ( ) in Missouri ( ) Hussein ( ) was born in Amman ( ) on 14 November 1935. Gofraid ( ) died in 989, said to be killed in Dal Riata ( ).
D
e1 e2
c2 c3 c1
e1 e1 e2 e2
return died_in for < , , s> if DiedIn( , ) in KB return born_in for < , , s> if match(‘ * born in * ’, s) return died_in for < , , s> if match(‘ * killed in * ’, s) return born_in for < , , s> if BornIn( , ) in KB
Λ
e1 e2 λ4 λ2 e1 e2 e1 e2 e1 e2 λ3 λ1 e1 e2 e1 e2 λ1 λ2 λ4 λ3
c1 c3 c2
Uncover “Expertise” of Labeling Functions
- Multiple “labeling functions” annotate the same instance à how
to resolve conflicts & redundancy?
- “expertise” of each labeling function à subset of instances that
the labeling functions are confident on
96
(L (Liu iu et et al. al., EM EMNLP LP’17)
Towards Learning Text Structures with Limited Supervision
97
Unlabeled Data Labeled Data Noisy Training Data Model Training unreliable predictions Imposing priors in at the input stage Prior Embeddings Input Layer Middle Layers Output Layer
Imposing priors at shallow input layers
Model predictions
at Input Stage at Output Stage
Labeled Data Training
Priors as regularizes in output layers & loss functions
predictions Input Layer Middle Layers Output Layer
Indirect Supervision for Relation Extraction – using QA Pairs
- Questions & positive/negative answers
- Positive à similar relation; negative à distinct relations
98
(W (Wu et et al. al., WS WSDM’1 ’18)
QA Data Format / Example:
Type(“Jack”, “Germany”, A1) = Type(“Jack”, “Germany”, A2) Type(“Jack”, “Germany”, A1) ≠ Type(“Jack”, “France”, A3) Type(“Jack”, “Germany”, A2) ≠ Type(“Jack”, “France”, A3)
Indirect Supervision for Relation Extraction – using QA Pairs
- Questions à positive / negative answers
- pos pairs à similar relation; neg pairs à distinct relations
99
(W (Wu et et al. al., WS WSDM’1 ’18)
Towards Learning Text Structures with Limited Supervision
100
Labeled Data Input Layer Middle Layers Output Layer (interpretable) predictions
Priors as Network Structures
at Model Stage
Neural-symbolic Learning for NLP
101
Matched Textual Patterns + Related Knowledge Graph Structures
Composing Graph Networks
Generating GN-blocks SoftMax Classifier