CSCI 699: Machine Learning for Knowledge Extraction and Reasoning - PowerPoint PPT Presentation

What is “Information Extraction” As a family Information Extraction = of techniques : segmentation + classification + association + clustering October 14, 2002, 4:00 a.m. PT For years, Microsoft Corporation CEO Bill Gates railed Microsoft Corporation against the economic philosophy of open-source software with Orwellian fervor, denouncing its CEO communal licensing as a "cancer" that stifled Bill Gates technological innovation. Microsoft Today, Microsoft claims to "love" the open-source Gates concept, by which software code is made public to encourage improvement and development by outside Microsoft programmers. Gates himself says Microsoft will gladly Bill Veghte disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. Microsoft VP "We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's Richard Stallman a super-important shift for us in terms of code access.“ founder Richard Stallman, founder of the Free Software Free Software Foundation Foundation, countered saying…

What is “Information Extraction” As a family Information Extraction = of techniques: segmentation + classification + association + clustering October 14, 2002, 4:00 a.m. PT For years, Microsoft Corporation CEO Bill Gates railed Microsoft Corporation against the economic philosophy of open-source software with Orwellian fervor, denouncing its CEO communal licensing as a "cancer" that stifled Bill Gates technological innovation. Microsoft Today, Microsoft claims to "love" the open-source Gates concept, by which software code is made public to encourage improvement and development by outside Microsoft programmers. Gates himself says Microsoft will gladly Bill Veghte disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. Microsoft VP "We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's Richard Stallman a super-important shift for us in terms of code access.“ founder Richard Stallman, founder of the Free Software Free Software Foundation Foundation, countered saying…

What is “Information Extraction” As a family Information Extraction = of techniques: segmentation + classification + association + clustering October 14, 2002, 4:00 a.m. PT TITLE ORGANIZATION Free Soft.. For years, Microsoft Corporation CEO Bill Gates railed Microsoft Corporation Microsoft Microsoft against the economic philosophy of open-source * software with Orwellian fervor, denouncing its CEO communal licensing as a "cancer" that stifled Bill Gates technological innovation. Microsoft * Today, Microsoft claims to "love" the open-source founder Gates concept, by which software code is made public to encourage improvement and development by outside Microsoft CEO * VP programmers. Gates himself says Microsoft will gladly Bill Veghte disclose its crown jewels--the coveted code behind the Richard Stallman Windows operating system--to select customers. Microsoft * VP "We can be open source. We love the concept of Bill Veghte NAME Bill Gates shared source," said Bill Veghte, a Microsoft VP. "That's Richard Stallman a super-important shift for us in terms of code access.“ founder Richard Stallman, founder of the Free Software Free Software Foundation Foundation, countered saying…

StructNet : Structured Network of Factual Knowledge • Nodes : entities of different entity types • Edges : relationships of different relation types URL person agency news article location organization 30

A Product Use Case : Finding “Interesting Collections of Hotels” Technology Transfer to TripAdvisor Features for “ Catch a Show ” collection 1 broadway shows 2 beacon theater 3 broadway dance center 4 broadway plays 5 david letterman show 6 radio city music hall 7 theatre shows Features for “ Near The High Line ” collection 1 high line park 2 chelsea market 3 highline walkway 4 elevated park 5 meatpacking district 6 west side Grouping hotels based on structured facts 7 old railway extracted from the review text 31 http://engineering.tripadvisor.com/using-nlp-to-find-interesting-collections-of-hotels/

Better Question Answering with Reasoning Capability 32

A Life Science Use Case : Identifying “Distinctively Related Entities” Collaborate with UCLA Heart BD2K Center & Mayo Clinic What proteins are distinctively associated with Cardiomyopathy ? 33 http://www.igb.illinois.edu/news/harnessing-power-big-data-revolution-genomic-data-analysis

Citation prediction for scientific papers author dataset FacetGist [CIKM’16] paper venue Paper titles, abstracts & bibliographic data application technique Suggested papers to cite: ClusCite : Citation Recommendation by 0.8 Information Network- 0.65 Based Clustering [KDD’14] A new manuscript 0.52 34

Corpus-specific StructNet Construction How to automate the construction of StructNets from given text corpora? URL ? person agency news article location organization 35

IE History Pre-Web • Mostly news articles • De Jong’s FRUMP [1982] • Hand-built system to fill Schank-style “scripts” from news wire • Message Understanding Conference (MUC) DARPA [’87-’95], TIPSTER [’92-’96] • Early work dominated by hand-built models • E.g. SRI’s FASTUS , hand-built FSMs. • But by 1990’s, some machine learning: Lehnert, Cardie, Grishman and then HMMs: Elkan [Leek ’97], BBN [Bikel et al ’98] Web • AAAI ’94 Spring Symposium on “Software Agents” • Much discussion of ML applied to Web. Maes, Mitchell, Etzioni. • Tom Mitchell’s WebKB, ‘96 • Build KB’s from the Web. • Wrapper Induction • Initially hand-build, then ML: [Soderland ’96], [Kushmeric ’97],… • Citeseer; Cora; FlipDog; contEd courses, corpInfo, …

IE History Biology • Gene/protein entity extraction • Protein/protein fact interaction • Automated curation/integration of databases • At CMU: SLIF ( Murphy et al , subcellular information from images + text in journal articles) Email • EPCA, PAL, RADAR, CALO: intelligent office assistant that “understands” some part of email • At CMU: web site update requests, office-space requests; calendar scheduling requests; social network analysis of email.

IE is different in different domains! Example: on web there is less grammar, but more formatting & linking Newswire Web www.apple.com/retail Apple to Open Its First Retail Store in New York City MACWORLD EXPO, NEW YORK--July 17, 2002--Apple's first retail store in New York City will open in Manhattan's SoHo district on Thursday, July 18 at 8:00 a.m. EDT. The SoHo store will be Apple's largest retail store to date and is a stunning example of Apple's commitment to offering www.apple.com/retail/soho customers the world's best computer shopping experience. www.apple.com/retail/soho/theatre.html "Fourteen months after opening our first retail store, our 31 stores are attracting over 100,000 visitors each week," said Steve Jobs, Apple's CEO. "We hope our SoHo store will surprise and delight both Mac and PC users who want to see everything the Mac can do to enhance their digital lifestyles." The directory structure, link structure, formatting & layout of the Web is its own new grammar.

Landscape of IE Tasks (1/4): Degree of Formatting Text paragraphs Grammatical sentences without formatting and some formatting & links Astro Teller is the CEO and co-founder of BodyMedia. Astro holds a Ph.D. in Artificial Intelligence from Carnegie Mellon University, where he was inducted as a national Hertz fellow. His M.S. in symbolic and heuristic computation and B.S. in computer science are from Stanford University. His work in science, literature and business has appeared in international media from the New York Times to CNN to NPR. Non-grammatical snippets, Tables rich formatting & links

Landscape of IE Tasks (2/4): Intended Breadth of Coverage Web site specific Genre specific Wide, non-specific Formatting Layout Language Amazon.com Book Pages Resumes University Names

Landscape of IE Tasks (3/4): Complexity E.g. word patterns: Regular set Closed set U.S. phone numbers U.S. states Phone: (413) 545-1323 He was born in Alabama… The CALD main office can be The big Wyoming sky… reached at 412-268-1299 Ambiguous patterns, Complex pattern needing context and U.S. postal addresses many sources of evidence Person names University of Arkansas P.O. Box 140 …was among the six houses sold by Hope, AR 71802 Hope Feldman that year. Pawel Opalinski, Software Headquarters: Engineer at WhizBang Labs. 1128 Main Street, 4th Floor Cincinnati, Ohio 45210

IE: The Broader View Create ontology Spider Filter by relevance IE Segment Classify Associate Cluster Database Load DB Document Train extraction models Query, collection Search Data mine Label training data

Knowledge Graphs are Not Complete English serviceLanguage personLanguages 1 - n I n e k o p S y Actor r Caesars t n u o personLanguages c Entertain… profession nationality -1 Neal serviceLocation -1 Tom United McDonough Hanks States castActor countryOfOrigin awardWorkWinner writtenBy music Graham Band of Michael Yost Brothers Kamen tvProgramGenre tvProgramCreator ... Mini- HBO Series 43

Benefits of Knowledge Graph • Support various applications • Structured Search • Question Answering • Dialogue Systems • Relation Extraction • Summarization • Knowledge Graphs can be constructed via information extraction from text, but… • There will be a lot of missing links. • Goal: complete the knowledge graph. 44

Reasoning on Knowledge Graph Query node: Band of brothers English serviceLanguage personLanguages 1 - n Query relation: tvProgramLanguage I n e k o p S y Actor r Caesars t n tvProgramLanguage(Band of Brothers, ? ) u o personLanguages c Entertain… profession nationality -1 Neal serviceLocation -1 Tom United McDonough Hanks States castActor countryOfOrigin awardWorkWinner writtenBy music Graham Band of Michael Yost Brothers Kamen tvProgramGenre tvProgramCreator ... Mini- HBO Series 45

KB Reasoning Tasks • Predicting the missing link. • Given e1 and e2, predict the relation r. • Predicting the missing entity. • Given e1 and relation r, predict the missing entity e2. • Fact Prediction. • Given a triple, predict whether it is true or false. 46

Knowledge Base Reasoning • Question: can we infer missing links based on background KB? • Path-based methods Path-Ranking Algorithm (PRA), Lao et al. 2011 • • RNN + PRA, Neelakantan et al, 2015 • Chains of Reasoning, Das et al, 2017 • Embedding-based methods • RESCAL, Nickel et al., 2011 TransE, Bordes et al, 2013 • TransR/CTransR, Lin et al, 2015 • • Integrating Path and Embedding-Based Methods DeepPath, Xiong et al, 2017 • MINERVA, Das et al, 2018 • DIVA, Chen et al., 2018 • 47

Traditional Rule-Based Systems handcraft Text corpus Domain experts Extraction rules … cities such as NPList … City “The tour includes City New York major cities such as Los Angeles NPList[0] [New York], [Los Dallas NPList[1] Angeles], and [Dallas]” … Entities Text Extraction rules 48

Supervised Machine Learning- Based Systems (state-of-the-art) Training data Manual Machine-learning model [San Francisco] , in annotation northern California, is a hilly city on the tip of a peninsula. Domain experts Feature engineering Features 49

Effort-Light Knowledge Extraction CSCI 699: Introduction to Information Extraction Instructor: Xiang Ren USC Computer Science

Text data often are highly variable… (Grammar, vocabularies, gazetteers) • Domain • CS papers ßà biomedical papers • Genre • News articles ßà tweets • Language • English ßà Arabic 51

However, text data often are highly variable… Manual data annotation English & complex feature News generation • Low efficiency Arabic web forum • Subjective Domain experts posts • Costly Lift science • Limited scale literature 52

Prior Art in NLP : Extracting Structures with Repeated Human Effort Extraction Rules Structured Facts Machine-Learning Models Broadways shows NYC Times square Human labeling Hilton hotel property … The June 2013 … We had a room Egyptian protest This hotel is my favorite Hilton facing Tim Times Square Stanford Co St CoreNLP property in NYC! It is located were mass protest and a room facing CMU NE CM NELL LL right on 42nd street near Times event that occurred the Em Empire St State UW UW Kn KnowItAll Square, it is close to all US USC AM AMR in Egypt on 30 June Building , The location Bu subways, Broadways shows, IB IBM Alc lchem emy APIs Is is close to everything 2013, … Go Google Kn Knowledge Gr Graph and next to many great … and we love … Mi Microsoft Sa Satori … Labeled data Text Corpus 53

Our Research : Effort-Light StructMine News articles PubMed papers Knowledge Corpus-specific Entity & Relation Bases (KB) Text Corpus Models Structures • Enables quick development of applications over various corpora • Extracts complex structures without introducing human errors 54

External Knowledge Bases as “Distant Supervision” Overlapping factual information : 1% of 10M sentences entity names, entity types, relationships, etc. à 100K labeled sentences! Text corpus Knowledge bases (KBs) 55

External Knowledge Bases as “Distant Supervision” Overlapping factual structures : entity names, entity types, relationships, etc. Text corpus Knowledge bases (KBs) 56

Co-occurrence patterns between text units bring semantic power Low-dimensional “… a speech was semantic space “President Vladimir delivered by United Putin delivers States President United States a speech during …” Barack Obama .” politician Barack Obama Prediction president speech Training on unseen entities corpus 57

A Cold-Start Factual Structure Mining (StructMine) Framework Candidate Data-driven text factual Text structures & text segmentation corpus units/features Extract factual Partially- Learn Distant structures from labeled semantic the remaining training supervision spaces corpus unlabeled corpus 58

Effort–Light StructMine: Where Are We? Ha Hand-cr crafted Hum Human an methods me labelin la ling Su Supervised UCB Hearst Pattern, 1992 effort ef NYU Proteus, 1997 le lear arnin ing me methods Stanford CoreNLP, 2005 - present UT Austin Dependency Kernel, 2005 We Weakly-su supervise sed IBM Watson Language APIs lear le arnin ing me methods CMU NELL, 2009 - present UW KnowItAll, Open IE, 2005 - present Max-Planck YAGO, 2008 - present Distantly-su Di supervise sed Stanford: Snorkel, MIML-RE 2012 - present U Washington: FIGER, MultiR, 2012 lea earning me methods Effort-Light StructMine (KDD’15, 16, 17, WWW’15, 17, 18, EMNLP’16, 17…) Fe Feature en engineer eering ef effort 59

The Roadmap for Corpus-Specific StructNet Construction Entity Recognition and Typing Text document (KDD’15) corpus Fine-grained Entity Typing (EMNLP’16) Corpus-specific StructNet Joint Entity and Relation Extraction (WWW’17) document document 60

Outline • Introduction • Challenges & Approach • Entity Recognition and Typing (KDD’15) • Fine-grained Entity Typing • Joint Entity and Relation Extraction • Future Work • Summary 61

What is Entity Recognition and Typing • Identify token spans of entity mentions in text, and classify them into types of interest [ Barack Obama ] arrived this afternoon in [ Washington, D.C. ] . [ President Obama ] ’s wife [ Michelle ] accompanied him [ TNF alpha ] is produced chiefly by activated [ macrophages ] 62

What is Entity Recognition and Typing • Identify token spans of entity mentions in text, and classify them into types of interest PERSON [ Barack Obama ] arrived this afternoon in [ Washington, D.C. ] . LOCATION [ President Obama ] ’s wife [ Michelle ] accompanied him PROTEIN [ TNF alpha ] is produced chiefly by activated [ macrophages ] CELL 63

Traditional Named Entity Recognition (NER) Systems • Reliance on large amounts of manually-annotated data • Slow model training: often slower than O (#word #features #classes ) A NER system pipeline A manual annotation interface Finkel et al., Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , ACL 2005 64

Weak-Supervision Systems: Pattern-Based Bootstrapping • Requires manual seed selection & mid-point checking Patterns for Food th the be best <X <X> I’ I’ve tr tried in in Annotate Seed entities corpus using th their <X <X> ta tastes am amaz azing ing and corpus entities … Generate Se Seeds fo for Food Apply patterns candidate to find new Systems : Pizza Pi patterns entities CMU NELL Fr French Fr Fries UW KnowItAll Ho Hot Do Dog Select Top Score candidate Stanford DeepDive Panc ancak ake patterns patterns Max-Planck PROSPERA ... .. … e.g., (Etzioni et al., 2005), (Talukdar et al., 2010), (Gupta et al., 2014), (Mitchell et al., 2015), … 65

Leveraging Distant Supervision 1. 1. Detec ect entity Sentence ID names from text Ph Phoenix is my all-time favorite dive bar in Ne New Yo York Ci City . S1 2. 2. Match name strings Ma The best BBQ BBQ I’ve tasted in Ph Phoenix. S2 to KB entities Ph Phoenix has become one of my favorite bars in NY NY . S3 3. 3. Propagate Pr te types to the un-matchable Food names ta tasted in in BBQ BBQ Location New Yo Ne York Ci City is my is my all all-ti time ??? favorite div fa dive bar bar in in Phoen Ph enix ??? Location à has has be become me on one of of NY NY my my fa favorite bar bars in in (Lin et al., 2012), (Ling et al., 2012), (Nakashole et al., 2013) 66

Current Distant Supervision: Limitation I 1. Context-agnostic type prediction • Predict types for each mention regardless of context 2. Sparsity of contextual bridges ID Sentence Phoenix is my all-time favorite dive bar in Ne City . Ph New Yo York Ci S1 BBQ I’ve tasted in Ph The best BBQ Phoenix. S2 Phoenix has become one of my favorite bars in NY Ph NY . S3 67

Current Distant Supervision: Limitation II 1. Context-agnostic type prediction 2. Sparsity of contextual bridges • Some re relational phr phrases es are in infr frequent in the corpus à ineffective type propagation ID Sentence Ph Phoenix is my all-time favorite dive bar in Ne New Yo York Ci City . S1 Ph Phoenix has become one of my favorite bars in NY NY . S3 68

My Solution: ClusType (KDD’15) S2 S2: Ph Phoen enix Represents ID Segmented Sentences object Correlated Ph Phoenix is is my my all all-ti time fa favorite S1: Ph S1 Phoen enix interactions mentions S1 di dive ba bar in in Ne New Yo York Ci City . S3 S3: Ph Phoen enix The best BBQ BBQ I’ve ta tasted in in Ph Phoenix. S2 S2 S2: BBQ BBQ Phoenix ha Ph has be become on one of of my my S1: Ne S1 New S3 fa favorite ba bars in in NY NY . Yo York Ci City BBQ BBQ Phoen Ph enix Putting two su Pu sub- ta tasted in in S3 S3: NY NY tasks to ta toge gether: New Yo Ne York Ci City 1. Type label is is my all all-ti time propagation favorite dive bar in fa NY NY 2. Relation phrase has has be become me on one of of clustering my fa my favorite bar bars in in Similar relation phrases https://github.com/shanzhenren/ClusType 69

Type Propagation in ClusType S1 S1: Ph Phoen enix S3: Ph S3 Phoen enix Sm Smoothness Assumption If two objects are similar tasted in ta in BBQ BBQ according to the graph, then their type labels is my all is all-ti time Ph Phoen enix should be also similar fa favorite dive bar in Ne New Yo York Ci City has be has become me on one of of W ij my fa my favorite bar bars in in NY NY f i f j Edge we Ed weight / ob object si similarity (Belkin & Partha, NIPS’01), (Ren et al., KDD’15) 70

Relation Phrase Clustering in ClusType • Two relation phrases should be grouped together if: 1. Similar string 2. Similar context “Multi-view” clustering 3. Similar types for entity arguments Ph Phoen enix is is my all all-ti time 5 Location Similar favorite div fa dive bar bar in in relation New Yo Ne York Ci City phrases 102 has has be become me on one of of my my fa favorite bar bars in in ??? Location à NY NY Two subtasks mutually enhance each other (Ren et al., KDD’15) 71

ClusType : Comparing with State-of-the-Art Systems (F1 Score) NYT : 118k news articles (1k manually labeled for evaluation); Yelp : 230k business reviews (2.5k reviews are manually labeled for evaluation); Tweet : 302 tweets (3k tweets are manually labeled for evaluation) Me Methods NY NYT Ye Yelp Tw Tweet Pa Pattern (Stanford, CONLL’14) 0.301 0.199 0.223 Bootstrapping Se SemT mTagger (U Utah, ACL’10) 0.407 0.296 0.236 NNP NNPLB (UW, EMNLP’12) 0.637 0.511 0.246 Label propagation APOLLO (THU, CIKM’12) 0.795 0.283 0.188 AP FI FIGER ER (UW, AAAI’12) 0.881 0.198 0.308 Classifier with linguistic features Cl Clus usType pe (KDD’15) 0. 0.939 939 0.808 0. 808 0.451 0. 451 • vs. bo ng : context-aware prediction on “un-matchable” boots tstr trappi pping • vs vs. lab label pr propa pagati tion : group similar relation phrases • vs vs. FI FIGER : no reliance on complex feature engineering #"#$$%&'()*')+%, -%.'/#.0 #"#$$%&'()*')+%, -%.'/#.0 6 7×9 Precision ( P ) = #1)0'%-*$%&#2./3%, -%.'/#.0 , Recall ( R ) = #2$#4.,*'$4'5 -%.'/#.0 , F1 score = (7;9) https://github.com/shanzhenren/ClusType 72

Outline • Introduction • Challenges & Approach • Entity Recognition and Typing • Fine-grained Entity Typing (EMNLP’16) • Joint Entity and Relation Extraction • Summary and Future Work 73

From Coarse-Grained Typing to Fine-Grained Entity Typing ID Sentence Trump spent 14 television seasons presiding Don onald Tr S1 over a game show, NBC’s The Apprentice. root ... product person location organiz ation ... ... Location Person ... business ... Organization politician artist man ... A few common types ... ... author actor singer A type hierarchy with 100+ types (from knowledge base) (Ling et al., 2012), (Nakashole et al., 2013), (Yogatama et al., 2015) 74

Problem Statement • How to learn an effective model to predict a single type-path for each unlinkable entity mentions, using the automatically-labeled training corpus Predictions for Labeled corpus Text corpus Typing model unlinkable mentions ? -------------------- ---- ---- ------------ ---- ---- ------------ -------------------- -------------------- ? -------------------- -------------------- -------------------- -------------------- -------------------- -------------------- ? -------------------- -------------------- -------------------- -------------------- root ... product person location organiz NER + Distant ation ... ... ... Supervision business ... politician artist man ... ... ... author actor singer 75

Current Distant Supervision: Context-Agnostic Labeling En Enti tity ty ty types fr from om knowledge bas kn base root • Inaccurate labels in training data • Prior work : all labels are “perfect” person location organization ... politician artist businessman ID Sentence Donald Trum Do ump spent 14 television seasons presiding over a game show, S1 author actor singer NBC’s The Apprentice S1: Do S1 Donal nald Tru rump Entity: Entity Types : pe person , ar artis ist , act ctor , Donald Trump author, businessman, politician 76

Type Inference in PLE preside pr dent • Top-down nearest neighbor search sen senator or po politician in the given type hierarchy pe person ga gave ID Sentence address ad President Trump gave an play pl ac actor all-hands address to troops at the S i st star U.S. Central Command headquarters Low-dimensional vector space root ro Test mention: Pr President gave ga speech spe S i _T _Trump produc pr duct person pe lo locat atio ion organization or on + + + … ... ... po politician ar artis ist bus busine nessman Type hierarchy Vectors for text features (in knowledge base) au author ac actor si singer (Ren et al., KDD’16)

My Solution: P artial L abel E mbedding (KDD’16) Sentence ID Extract Text Donald Trump spent 14 television seasons presiding S1 Features over a game show, NBC’s The Apprentice Text features : TOKEN_Donald, CONTEXT: television, CONTEXT: season, TOKEN_trump, SHAPE: AA “Label Noise Reduction” with PLE S1 S1: Do Donal nald Tru rump “De-noised” labeled Entity Types : person, artist, actor, data author, businessman, politician Train Classifiers on De-noised Data More Prediction effective on New Data classifiers https://github.com/shanzhenren/PLE (Ren et al., KDD’16) 78

PLE : Modeling Clean and Noisy Mentions Separately S i : Te Ted Cr Cruz For a clean mention, its “ positive types ” should Types in KB : be ranked higher than all its “ negative types ” person, politician “ Best ” candidate type Types ranked Noisy Entity Mention ID (+) actor 0.88 (+) actor Donald Trump spent 14 television (+) artist 0.74 (+ seasons presiding over a game S1 (-) singer show, NBC’s The Apprentice (+) person 0.55 (-) coach (+) author 0.41 (-) doctor S1: Do S1 Donald Trum ump (+) politician 0.33 (-) location Types in KB : person, artist, actor, (+) business 0.31 (-) organization author, businessman, politician For a noisy mention, its “ best candidate type ” should be ranked higher than all its “ non-candidate types ” (Ren et al., KDD’16) 79

Type Inference in PLE Low-dimensional vector space preside pr dent • Top-down nearest neighbor search senator sen or po politician in the given type hierarchy pe person gave ga ID Sentence address ad President Trump gave an play pl ac actor all-hands address to troops at the S i st star U.S. Central Command headquarters root ro Test mention: President Pr gave ga speech spe S i _T _Trump produc pr duct person pe lo locat atio ion organization or on + + + … ... ... po politician ar artis ist busine bus nessman Type hierarchy Vectors for text features (from knowledge au author ac actor si singer base) (Ren et al., KDD’16)

PLE : Performance of Fine-Grained Entity Typing • Ra Raw : candidate types from Accuracy = # "#$%&'$( )&%* +,, %-.#( /'00#/%,- .0#1&/%#1 # "#$%&'$( &$ %*# %#(% (#% distant supervision Accuracy on different type levels • WS WSABIE (Google, ACL’15) : joint 1 Raw WSABIE 0.9 feature and type embedding 0.81 0.79 PTE PLE 0.78 0.8 0.7 • P redictive T ext E mbedding 0.7 0.62 (MSR, WWW’15) : joint mention, 0.6 0.51 0.49 0.48 0.5 0.45 feature and type embedding 0.4 • Both WASBIE and PTE suffer 0.3 0.19 from “noisy” training labels 0.2 0.14 0.1 0.05 • PL PLE (KDD’16) : partial-label loss 0 for context-aware labeling Level-1 Level-2 Level-3 OntoNotes public dataset (Weischedel et al. 2011, Gillick et al., 2014): On https://github.com/shanzhenren/PLE 81 13,109 news articles, 77 annotated documents, 89 entity types

Outline • Introduction • Challenges & Approach • Entity Recognition and Typing • Fine-grained Entity Typing • Joint Entity and Relation Extraction (WWW’17) • Summary and Future Work 82

Problem Statement Extracted entity-relation mentions Input corpus American Ai Am Airlines , a unit Entity 1 Relation Entity 2 of AM AMR co corp. , immediately American matched the move, is_subsidiary_of AMR Airlines spokesman Ti Tim Wagner American said. Un Unit ited ed Ai Airlines , a unit Tim Wagner is_employee_of Airlines of UA UAL co corp. , said the United increase took effect is_subsidiary_of UAL Airlines Thursday night and applies to most routes ... … … … Location Person 83

Previous Work • Supervised relation extraction (RE) systems • Hard to be ported to deal with different kinds of corpora • Pattern-based bootstrapping RE systems • Focus on “explicit” relation mentions à limited recall • Semantic drift • Distantly-supervised RE systems • Error propagation (cont.) Mintz et al. Distant supervision for relation extraction without labeled data . ACL, 2009. Etzioni et al. Web-scale information extraction in knowitall:(preliminary results) . WWW, 2004. Surdeanu et al. Multi-instance multi-label learning for relation extraction . EMNLP, 2012. 84

Prior Work: An “Incremental” System Pipeline Error propagation cascading down the pipeline Entity boundary errors : Entity mention The Wo Women ’s March was a worldwide detection protest on Januar 2017 . pro January 21, 21, 2017 Entity type errors: Context-aware Women ’s March was a worldwide The Wo entity typing protest on Januar pro January 21, 21, 2017 2017 . à person ✗ Relation mention errors: Relation mention protest ) ✗ ( wom omen , pro detection ( pro protest , Januar January 21, 21, 2017 2017 ) Relation type errors is a ✗ is a ✗ Context-aware ( wom omen , pro protest ) à ( wom omen , pro protest ) à relation typing (protes est, Ja January 21, 21, 2017 2017) (protes est, Ja January 21, 21, 2017 2017) 85 (Mintz et al., 2009), (Riedel et al., 2010), (Hoffmann et al., 2011), (Surdeanu et al., 2012), (Nagesh et al., 2014), …

My Solution: CoType (WWW’17) Entity mention 1. Data-driven detection of detection entity and relation mentions Data-driven text segmentation • Context-aware Syntactic pattern learning from KBs • entity typing Relation mention 2. Joint typing of entity and detection relation mentions Context-aw Con aware type modeling • Model en • entity-re relation in interac actio ions ns Context-aware relation typing (Ren et al. WWW’17) https://github.com/shanzhenren/CoType 86

My Solution : Data-Driven Entity Mention Detection • Significance of a merging between two sub-phrases Corpus-level Pattern Example Quality Syntactic Concordance of merging quality (J*)N (J )N* support vector machine VP VP tasted in, damage on VW VW*(P (P) train a classifier with The best BBQ BBQ I’ve ta tasted in Ph Phoenix ! I Good had the pulle ha pulled d po pork sandw sandwic ich h wi with th Concordance co coleslaw and bak baked d be beans ans for lunch. … This plac place se serves up up the best ch cheese eese st steak sandw sandwich ch in in we west of of Mi Missi ssissi ssippi . 87

CoType : Co-Embedding for Typing Entities and Relations Object interactions in a heterogeneous graph Low-dimensional vector spaces Mention None president_of EM1_Obama ( Barack Obama, Feature BETWEEN_ BETWEEN_ US , S1) president book born_in president_of travel_to Type BETWEEN_ BETWEEN_ author_of president back to EM1_ ( “ Obama ” , “ Dream of travel_to author_of Barack Obama My Father ” , S2) BETWEEN_ Relation ( “ Barack Obama ” , ( “ Barack Obama ” , ( “ Obama ” , “ Dream of book Mention “ US ” , S1) “ United States ” , S3) My Father ” , S2) Relation Mention Model entity- Embedding Space relation Entity S1_ ” Barack S3_ ” United S3_ ” Barack S2_ ” Dream of Mention interactions Obama ” Obama ” States ” My Father ” S1_" US ” politician S2_ ” Obama ” S1_Barack Obama S3_Barack Obama TOKEN_ person CONTEXT_ person States president artist artist LOC CONTEXT_ author Entity book president ORG S2_Obama CONTEXT_ Entity Mention CONTEXT_ HEAD_Obama politician book Type None book location Embedding Space (Ren et al. WWW’17) 88

Modeling Entity-Relation Interactions Object “Translating” Assumption For a relation mention z between Error on a relation triple (z, m1, m2): entity arguments m1 m1 and m2 m2 : ve vec (m1) ≈ ve vec (m2) + ve vec (z) “France” “F m1 = “U m1 “USA” (country) z = capital_ci city_of po posit itiv ive negativ ne ive relation tr re triple relation tr re triple “P “Paris” m2 m2 = “W “Washington D.C. C.” (city) Low-dimensional vector space (Bordes, NIPS’13), (Ren et al., WWW’17) 89

Reducing Error Propagation : A Joint Optimization Framework Modeling entity-relation interactions Modeling types of relation mentions Modeling types of entity mentions (Ren et al., WWW’17) 90

CoType : Comparing with State-of-the-Arts RE Systems • Given candidate relation mentions, predict its relation type if it expresses a relation of interest; otherwise, output “None” • DS+Logistic (Stanford, ACL’09): DeepWalk DS+Logistic LINE MultiR CoType-RM CoType logistic classifier on DS 1 0.9 • MultiR (UW, ACL’11): handles 0.8 inappropriate labels in DS 0.7 • DeepWalk (StonyBrook, KDD’14): 0.6 homogeneous graph embedding Precision 0.5 • LINE (MSR, WWW’15): joint 0.4 feature & type embedding 0.3 • Co CoTyp ype-RM RM (W (WWW’17) : only 0.2 models relation mentions 0.1 0 • Co CoTyp ype (W (WWW’17) : models 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 entity-relation interactions Recall NYT public dataset (Riedel et al. 2010): 1.18M sentences in the corpus, https://github.com/shanzhenren/CoType 91 395 manually annotated sentences for evaluation, 24 relation types

An Application to Life Sciences LifeNet : A Knowledge BioInfer Network by human labeling Exploration LifeNet by Effort-Light StructMine (Pyysalo et al., 2007) and Analytics Human-created Machine-created System for Life Sciences 1,100 sentences 4 Million+ PubMed papers 1,000+ entity types 94 protein-protein interactions 400+ relation types 2,500 man-hours <1 hour, single machine 2,662 facts 10,000x more facts Link to PubMed papers Performance evaluation on BioInfer: (Pyysalo et al., BMC Bioinformatics’07) Relation Classification Accuracy = 61.7% (11%↑ over the best-performing baseline) (Ren et al., ACL’17 demo) 92

Outline • Introduction • Challenges & Approach • Entity Recognition and Typing • Fine-grained Entity Typing • Joint Entity and Relation Extraction • Summary and Future Work 93

Towards Learning Text Structures with Limited Supervision unreliable predictions predictions Output Layer Model Model Middle Layers Input Layer Training Imposing priors at Noisy Training Data shallow input layers Prior Imposing priors in Embeddings at the input stage Labeled Data Unlabeled Data at Input Stage 94

Heterogeneous Supervision for Relation Extraction • How to unify multiple sources of supervision (KB-supervision, hand-crafted rules, crowd-sourced labels, etc.) on same task? Distant supervision return born_in for < , , s> if BornIn( , ) in KB λ 1 e 1 e 2 e 1 e 2 return died_in for < , , s> if DiedIn( , ) in KB λ 2 e 1 e 2 e 1 e 2 Hand-crafted return born_in for < , , s> if match(‘ * born in * ’, s) λ 3 e 1 e 2 Λ rules return died_in for < , , s> if match(‘ * killed in * ’, s) λ 4 e 1 e 2 (Liu (L iu et et al. al., EM EMNLP LP’17) 95

Uncover “Expertise” of Labeling Functions • Multiple “ labeling functions ” annotate the same instance à how to resolve conflicts & redundancy? • “ expertise ” of each labeling function à subset of instances that the labeling functions are confident on Robert Newton "Bob" Ford was an American outlaw best known Robert Newton "Bob" Ford was an American outlaw best known D D c 1 c 1 for killing his gang leader Jesse James ( ) in Missouri ( ) for killing his gang leader Jesse James ( ) in Missouri ( ) e 1 e 1 e 2 e 2 Gofraid ( ) died in 989, said to be killed in Dal Riata ( ). Gofraid ( ) died in 989, said to be killed in Dal Riata ( ). c 2 c 2 e 1 e 1 e 2 e 2 c 3 c 3 Hussein ( ) was born in Amman ( ) on 14 November 1935. Hussein ( ) was born in Amman ( ) on 14 November 1935. e 1 e 1 e 2 e 2 c 3 c 2 c 1 c 2 c 3 c 1 λ 1 λ 1 return born_in for < , , s> if BornIn( , ) in KB return born_in for < , , s> if BornIn( , ) in KB λ 1 λ 1 λ 2 λ 2 e 1 e 2 e 1 e 2 e 1 e 2 e 1 e 2 return died_in for < , , s> if DiedIn( , ) in KB return died_in for < , , s> if DiedIn( , ) in KB λ 2 λ 2 e 1 e 2 e 1 e 2 e 1 e 2 e 1 e 2 λ 3 λ 3 return born_in for < , , s> if match(‘ * born in * ’, s) return born_in for < , , s> if match(‘ * born in * ’, s) λ 3 λ 3 λ 4 λ 4 e 1 e 2 e 1 e 2 Λ Λ return died_in for < , , s> if match(‘ * killed in * ’, s) return died_in for < , , s> if match(‘ * killed in * ’, s) λ 4 λ 4 e 1 e 2 e 1 e 2 (L (Liu iu et et al. al., EM EMNLP LP’17) 96

Towards Learning Text Structures with Limited Supervision unreliable predictions predictions predictions Priors as regularizes in output layers & loss functions Output Layer Model Model Middle Layers Input Layer Training Output Layer Imposing priors at Middle Layers Noisy Training Data shallow input layers Input Layer Prior Imposing priors in Embeddings Training at the input stage Labeled Data Labeled Data Unlabeled Data at Output Stage at Input Stage 97

Indirect Supervision for Relation Extraction – using QA Pairs • Questions & positive/negative answers • Positive à similar relation; negative à distinct relations QA Data Format / Example: Type(“Jack”, “Germany”, A1) = Type(“Jack”, “Germany”, A2) Type(“Jack”, “Germany”, A1) ≠ Type(“Jack”, “France”, A3) Type(“Jack”, “Germany”, A2) ≠ Type(“Jack”, “France”, A3) (Wu et (W et al. al., WS WSDM’1 ’18) 98

Indirect Supervision for Relation Extraction – using QA Pairs • Questions à positive / negative answers • pos pairs à similar relation; neg pairs à distinct relations (W (Wu et et al. al., WS WSDM’1 ’18) 99

Towards Learning Text Structures with Limited Supervision (interpretable) predictions Output Layer Middle Layers Priors as Network Structures Input Layer Labeled Data at Model Stage 100

CSCI 699: Machine Learning for Knowledge Extraction and Reasoning - PowerPoint PPT Presentation

CSCI 699: Machine Learning for Knowledge Extraction and Reasoning Instructor: Xiang Ren www-bcf.usc.edu/~xiangren/ml4know19spring USC Computer Science About the Instructor Asst. Professor of Computer Science, affiliated faculty at ISI,

Named Entity Recognition & Sequence Labeling CSCI 699: ML for Knowledge Extraction &

Relation Extraction CSCI 699 Instructor: Xiang Ren USC Computer Science Relation extraction

uf: Minimizing the Coq Extraction TCB Eric Mullen , Stuart Pernsteiner, James Wilcox, Zachary

Entity Linking and Coreference Resolution CSCI 699 Instructor: Xiang Ren USC Computer Science

Text Generative Models CSCI 699 Instructor: Xiang Ren USC Computer Science Language Modeling

Statistical Relational Learning and Knowledge Graph Reasoning CSCI 699 J AY P UJARA Reminder:

Knowledge Graph Reasoning CSCI 699: ML4Know Instructor: Xiang Ren USC Computer Science Overview

Soil Extraction Cell: An Alternative Soil Extraction Cell: An Alternative Method of Soil

Knowledge-Based Agents knowledge knowledge representation, knowledge base, types of knowledge

26:198:722 Expert Systems I Knowledge representation I Knowledge acquisition I Machine learning I

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Declarative Information Extraction Declarative Information Extraction Using Datalog Datalog with

Single Source Since 1989 Proprietary : Smt. Praveena Kanunga Wife of S.L.Kanunga (GPA Holder)

Open Quantum Systems Maison Jean Kuntzmann - 29 novembre au 02 d ecembre 2010 Lieb-Robinson

11/15/2012 Public Health Quality Improvement 101 Public Health Quality Improvement 101 Learning,

Optim imiz izatio ion Coachin ing for Fork/Join in Applic licatio ions on the Java Vir

How to impress your management when you are an Active Directory noob? Vincent LE TOUX 15:15

AMP: Program-Context Specific Buffer Caching Feng Zhou, Rob von Behren, Eric Brewer University

Operational amplifiers ENGR 40M lecture notes August 7, 2017 Chuan-Zheng Lee, Stanford

Transistor Amplifiers Lecture notes: Sec. 6 Sedra & Smith (6 th Ed): Sec. 5.6, 5.8, 6.6 &

CSCI 699: Machine Learning for Knowledge Extraction and Reasoning - PowerPoint PPT Presentation

CSCI 699: Machine Learning for Knowledge Extraction and Reasoning Instructor: Xiang Ren www-bcf.usc.edu/~xiangren/ml4know19spring USC Computer Science About the Instructor Asst. Professor of Computer Science, affiliated faculty at ISI,

Named Entity Recognition &amp; Sequence Labeling CSCI 699: ML for Knowledge Extraction &amp;

Relation Extraction CSCI 699 Instructor: Xiang Ren USC Computer Science Relation extraction

uf: Minimizing the Coq Extraction TCB Eric Mullen , Stuart Pernsteiner, James Wilcox, Zachary

Entity Linking and Coreference Resolution CSCI 699 Instructor: Xiang Ren USC Computer Science

Text Generative Models CSCI 699 Instructor: Xiang Ren USC Computer Science Language Modeling

Statistical Relational Learning and Knowledge Graph Reasoning CSCI 699 J AY P UJARA Reminder:

Knowledge Graph Reasoning CSCI 699: ML4Know Instructor: Xiang Ren USC Computer Science Overview

Soil Extraction Cell: An Alternative Soil Extraction Cell: An Alternative Method of Soil

Knowledge-Based Agents knowledge knowledge representation, knowledge base, types of knowledge

26:198:722 Expert Systems I Knowledge representation I Knowledge acquisition I Machine learning I

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Declarative Information Extraction Declarative Information Extraction Using Datalog Datalog with

Single Source Since 1989 Proprietary : Smt. Praveena Kanunga Wife of S.L.Kanunga (GPA Holder)

Open Quantum Systems Maison Jean Kuntzmann - 29 novembre au 02 d ecembre 2010 Lieb-Robinson

11/15/2012 Public Health Quality Improvement 101 Public Health Quality Improvement 101 Learning,

Optim imiz izatio ion Coachin ing for Fork/Join in Applic licatio ions on the Java Vir

How to impress your management when you are an Active Directory noob? Vincent LE TOUX 15:15

AMP: Program-Context Specific Buffer Caching Feng Zhou, Rob von Behren, Eric Brewer University

Operational amplifiers ENGR 40M lecture notes August 7, 2017 Chuan-Zheng Lee, Stanford

Transistor Amplifiers Lecture notes: Sec. 6 Sedra &amp; Smith (6 th Ed): Sec. 5.6, 5.8, 6.6 &amp;

Named Entity Recognition & Sequence Labeling CSCI 699: ML for Knowledge Extraction &

Transistor Amplifiers Lecture notes: Sec. 6 Sedra & Smith (6 th Ed): Sec. 5.6, 5.8, 6.6 &