Motivation Bootstrapping Semantic Lexicons A semantic lexicon - PowerPoint PPT Presentation

Motivation Bootstrapping Semantic Lexicons • A semantic lexicon contains semantic category Ex: dog, cat, lion, unannotated texts lizard, snake assignments for words. For example: blogger HUMAN sedan VEHICLE AK-47 WEAPON N best words • General purpose resources, such as WordNet, are often insufficient for specific domains. Ex: terrier, poodle, tiger, ANIMAL : gshep, doxy, lab, labx, m/n, mix, patient frog, iguana co-occurrence statistics HUMAN : o • Automatic methods can be used to enhance existing resources or create domain-specific lexicons. prospective category words Pattern Templates Lexico-Syntactic Patterns Lexico-Syntactic Patterns <subject> passive-vp <target> was bombed <subject> active-vp <perpetrator> bombed • Lexico-syntactic contexts often reveal the semantic <subject> active-vp dobj <perpetrator> threw dynamite class of a word. <subject> active-vp infinitive <perpetrator> tried to kill <subject> passive-vp infinitive <perpetrator> was hired to kill • AutoSlog [Riloff 1993] is a pattern generator that was <subject> auxiliary dobj <victim> was fatality originally developed for event extraction tasks. active-vp <dobj> bombed <target> • Each pattern co-occurs with a NP in one of 3 syntactic infinitive <dobj> to kill <victim> active-vp infinitive <dobj> tried to kill <victim> positions: subject, direct object, PP object . passive-vp infinitive <dobj> was hired to kill <victim> subject auxiliary <dobj> fatality was <victim> Example Location Patterns <subject> was inhabited the locality was inhabited… passive-vp prep <np> was killed by <perpetrator> patrolling <direct object> …patrolling Zacamil neighborhood active-vp prep <np> exploded in <target> lives in <PP object> …lives in Argentina infinitive prep <np> to kill with <weapon> noun prep <np> assassination of <victim>

BASILISK = B ootstrapping A pproach to Basilisk Bootstrapping Algorithm S emant I c L exicon I nduction using S emantic K nowledge Key Ideas behind Basilisk • Collective evidence over extraction patterns. Pattern Contexts • Learning multiple categories simultaneously. co-occurring best patterns Pattern words Candidate Pool Lexicon Word Pool 5 best words The Pattern Pool Scoring Patterns • Initially, we used a Pattern Pool of size 20, but the Every extraction pattern is scored and the best patterns pool became stagnant over time. are put into a Pattern Pool . The scoring function is: Solution: begin with a pattern pool of size 20, F i but increase the pool size by 1 after each RlogF (pattern i ) = N i * log 2 (F i ) iteration to infuse the pool with new candidates. where: • All head nouns that co-occur with patterns in the F i is the number of category members extracted by pattern i Pattern Pool are put into the Candidate Word Pool. N i is the total number of nouns extracted by pattern i

Scoring Words based on Collective Selecting Words for the Lexicon Evidence score(word i ) = the average number of category members that co- 1. Given a word, collect all of its pattern contexts. occur with the pattern contexts containing the candidate word. N i 2. Compute the average # of distinct class members per ! F j j=1 pattern. (Actually, average over logarithms.) score (word i ) = N i INTUITION: a word receives a high score if it occurs N i in contexts that also consistently co-occur with ! log 2 (F j + 1) AvgLog (word i ) = j=1 known semantic class members. N i where: F j is the # of distinct category members that co-occur with pattern j N i is the total number of patterns that co-occur with word i Experimental Design Baseline Results • Used the MUC-4 corpus: 1700 texts related to terrorism. Head Nouns (8460 words) • Experiments on 6 semantic categories: building , event , human , location , time , weapon . building 188 ( 2.2%) event 501 ( 5.9%) • 10 seed words for each category. human 1856 (21.9%) location 1018 (12.0%) • 1000 words automatically generated for each category. time 112 ( 1.3%) weapon 147 ( 1.7%) (other) 4638 (54.8%) • Basilisk was compared with our previous algorithm ( meta- bootstrapping ).

Human Location Seed Words 1000 500 Correct Lexicon Correct Lexicon 800 400 Entries Entries 600 BA-1 300 BA-1 400 MB-1 200 MB-1 We used the 10 most frequent words for each category. 200 100 0 0 0 200 400 600 800 1000 1200 0 200 400 600 800 1000 1200 Total Lexicon Entries Total Lexicon Entries Building: embassy, office, headquarters, church, offices, house, home, residence, hospital, airport Event Building 300 Event: attack, actions, war, meeting, elections, murder, attacks, action, 100 Correct Lexicon Correct Lexicon 250 80 struggle, agreement 200 Entries Entries BA-1 60 BA-1 150 MB-1 40 MB-1 100 Human: people, guerrillas, members, troops, Cristiani, rebels, president, 50 20 0 0 terrorists, soldiers, leaders 0 200 400 600 800 1000 1200 0 200 400 600 800 1000 1200 Total Lexicon Entries Total Lexicon Entries Location: country, El Salvador, Salvador, United States, area, Colombia, city, countries, department, Nicaragua Weapon Time 80 40 Time: time, years, days, November, hours, night, morning, week, year, day Correct Lexicon Correct Lexicon 60 30 Entries Entries BA-1 BA-1 40 20 Weapon: weapons, bomb, bombs, explosives, arms, missles, dynamite, rifles, MB-1 MB-1 10 20 materiel, bullets 0 0 0 200 400 600 800 1000 1200 0 200 400 600 800 1000 1200 Total Lexicon Entries Total Lexicon Entries Semantic Learning Case Study Learning Multiple Categories Simultaneously • Seed Words: 10 common disease names • Of the top 200 words hypothesized to be diseases: 89 were • We hypothesized that confusion errors can be already in the UMLS metathesaurus (32,000 names of reduced by learning multiple semantic categories diseases and organisms), but 111 were not! Including: simultaneously. adenomatosis h5n1 flu • “ One Sense per Domain ” assumption. tularaemia h7n3 kawasaki tularamia ev71 mad-cow-disease • Knowledge about competing categories can diarrhoea yf smut diphtheriae jyf constrain and steer the bootstrapping process. pertussis enterovirus-71 nvcjd pleuro-pneumonia fibropapillomas pepmv polioencephalomyelitis gastroeneteritis wsmv poliovirus

Bootstrapping a Single Category Bootstrapping Multiple Categories Human Location Simple Conflict Resolution Correct Lexicon Entries Correct Lexicon Entries 1000 600 500 800 400 600 BA-M BA-M 300 BA-1 BA-1 400 200 200 100 0 0 • A word cannot be assigned to category X if it 0 200 400 600 800 1000 1200 0 200 400 600 800 1000 1200 Total Lexicon Entries Total Lexicon Entries has already been assigned to category Y. Building Event Correct Lexicon Entries Correct Lexicon Entries 120 300 250 100 80 200 BA-M BA-M • If a word is hypothesized for both category X 60 150 BA-1 BA-1 40 100 20 50 and category Y at the same time, choose the 0 0 0 200 400 600 800 1000 1200 0 200 400 600 800 1000 1200 category that receives the highest score. Total Lexicon Entries Total Lexicon Entries Weapon Time Correct Lexicon Entries Correct Lexicon Entries 100 40 80 30 60 BA-M BA-M 20 40 BA-1 BA-1 10 20 0 0 0 200 400 600 800 1000 1200 0 200 400 600 800 1000 1200 Total Lexicon Entries Total Lexicon Entries

Human Location 700 600 A Smarter Scoring Function Correct Lexicon Correct Lexicon 600 500 500 400 Entries Entries 400 MB-M MB-M 300 300 MB-1 MB-1 200 200 100 100 A more proactive approach: incorporate knowledge 0 0 0 500 1000 1500 0 500 1000 1500 about other categories directly into the scoring Total Lexicon Entries Total Lexicon Entries function. Event Building 250 120 Correct Lexicon Correct Lexicon 100 200 80 Entries Entries 150 MB-M MB-M 60 100 MB-1 MB-1 40 50 New scoring function: 20 0 0 0 500 1000 1500 0 500 1000 1500 Total Lexicon Entries Total Lexicon Entries diff (w i ,c a ) = AvgLog (w i ,c a ) - max (AvgLog(w i ,c b )) Weapon Time b " a 100 30 Correct Lexicon Correct Lexicon 25 80 20 Entries Entries 60 MB-M MB-M 15 40 MB-1 MB-1 10 20 5 0 0 0 500 1000 1500 0 500 1000 1500 Total Lexicon Entries Total Lexicon Entries Human Location Subjective Noun Bootstrapping Correct Lexicon Entries Correct Lexicon Entries 1000 600 500 [Riloff, Wiebe, and Wilson, 2003] 800 BA-M+ BA-M+ 400 600 BA-M 300 BA-M 400 200 MB-M MB-M 200 100 0 0 0 200 400 600 800 1000 1200 0 200 400 600 800 1000 1200 hope, grief, joy, Total Lexicon Entries Total Lexicon Entries concern, worries Event Building Correct Lexicon Entries Correct Lexicon Entries 300 120 250 100 200 BA-M+ 80 BA-M+ 150 BA-M 60 BA-M 100 MB-M 40 MB-M 50 20 0 0 expressed <np> 0 200 400 600 800 1000 1200 0 200 400 600 800 1000 1200 Total Lexicon Entries voiced <np> Total Lexicon Entries Weapon show of <np> Time Best Patterns Correct Lexicon Entries Correct Lexicon Entries 100 50 80 40 BA-M+ BA-M+ 60 30 BA-M BA-M 40 20 happiness, relief, MB-M MB-M 20 10 0 0 Lexicon condolences, goodwill Best Nouns 0 200 400 600 800 1000 1200 0 200 400 600 800 1000 1200 Total Lexicon Entries Total Lexicon Entries

Motivation Bootstrapping Semantic Lexicons A semantic lexicon - PowerPoint PPT Presentation

Motivation Bootstrapping Semantic Lexicons A semantic lexicon contains semantic category Ex: dog, cat, lion, unannotated texts lizard, snake assignments for words. For example: blogger HUMAN sedan VEHICLE AK-47 WEAPON N best words

Homework Assignment: 5 11-721: Grammars and Lexicons 11-721: Grammars and Lexicons Fall 2007

Sentiment Analysis Learning Sen*ment Lexicons Dan Jurafsky

Bootstrapping without the Boot We like minimally supervised learning (bootstrapping).

Parametric Bootstrapping 18.05 Spring 2017 Parametric bootstrapping Use the estimated parameter

Creating Semantic Mashups: Bridging Web 2.0 and the Semantic Web Jamie Taylor, Colin Evans, Toby

: on the Semantic Web : on the Semantic Web Building a Semantic Prototype for Danish Building a

Semantic Processing Augmenting CFGs Currying Quantifier scope Semantic Grammars L445 / L545

Align, Disambiguate, and Walk A Unified Approach for Measuring Semantic Similarity Semantic

Semantic Similarity MultiJEDI ERC 259234 Semantic Similarity Semantic Similarity Mostly

Module 13 Introduction to Semantic Technology, Ontologies and the Semantic Web Module 13 Outline

Semantic Analysis CMSC 35100 Natural Language Processing May 8, 2003 Roadmap Semantic

Explorations in Bootstrapping Guided Search 8th Language and Computation Day Deirdre Lungley

Improved Bootstrapping Approach in Multichannel Cognitive Radio Ad Hoc Networks The 4th Workshop

SFU NatLangLab Bootstrapping via Graph Propagation Max Whitney Anoop Sarkar Simon Fraser

INF5210 Information Infrastructure Class #11 Bootstrapping & Gateways Ben Eaton Dan Truong

Statistical analysis and bootstrapping Michel Bierlaire michel.bierlaire@epfl.ch Transport and

AkPhA Legislative Update: Pertinent Pharmacy Legislative Issues Purpose ose State tement: t:

COVID-19 MOH Update CURRENT TOPIC, SCENARIOS, AND A COMMUNITY PROFILE JUNE 4, 2020 QUESTIONS:

Real-world clinical utility and impact on clinical decision making of FFR CT Lessons from the

Bayesian Networks 2 Recap of last lecture: Modeling causal relationships with Bayes nets Direct

NK/T-cell lymphoma: SMILE and other asparaginase containing regimens Experience in Japan

Graphical models and message-passing Part I: Basics and MAP computation Martin Wainwright UC

Impact of Additive Use of Olmesartan in Patients With Chronic Heart Failure: The Supplemental

1 Jesus and John the Baptist Matthew 3:13-17 Mark 1:9-11

Motivation Bootstrapping Semantic Lexicons A semantic lexicon - PowerPoint PPT Presentation

Motivation Bootstrapping Semantic Lexicons A semantic lexicon contains semantic category Ex: dog, cat, lion, unannotated texts lizard, snake assignments for words. For example: blogger HUMAN sedan VEHICLE AK-47 WEAPON N best words

Homework Assignment: 5 11-721: Grammars and Lexicons 11-721: Grammars and Lexicons Fall 2007

Sentiment Analysis Learning Sen*ment Lexicons Dan Jurafsky

Bootstrapping without the Boot We like minimally supervised learning (bootstrapping).

Parametric Bootstrapping 18.05 Spring 2017 Parametric bootstrapping Use the estimated parameter

Creating Semantic Mashups: Bridging Web 2.0 and the Semantic Web Jamie Taylor, Colin Evans, Toby

: on the Semantic Web : on the Semantic Web Building a Semantic Prototype for Danish Building a

Semantic Processing Augmenting CFGs Currying Quantifier scope Semantic Grammars L445 / L545

Align, Disambiguate, and Walk A Unified Approach for Measuring Semantic Similarity Semantic

Semantic Similarity MultiJEDI ERC 259234 Semantic Similarity Semantic Similarity Mostly

Module 13 Introduction to Semantic Technology, Ontologies and the Semantic Web Module 13 Outline

Semantic Analysis CMSC 35100 Natural Language Processing May 8, 2003 Roadmap Semantic

Explorations in Bootstrapping Guided Search 8th Language and Computation Day Deirdre Lungley

Improved Bootstrapping Approach in Multichannel Cognitive Radio Ad Hoc Networks The 4th Workshop

SFU NatLangLab Bootstrapping via Graph Propagation Max Whitney Anoop Sarkar Simon Fraser

INF5210 Information Infrastructure Class #11 Bootstrapping &amp; Gateways Ben Eaton Dan Truong

Statistical analysis and bootstrapping Michel Bierlaire michel.bierlaire@epfl.ch Transport and

AkPhA Legislative Update: Pertinent Pharmacy Legislative Issues Purpose ose State tement: t:

COVID-19 MOH Update CURRENT TOPIC, SCENARIOS, AND A COMMUNITY PROFILE JUNE 4, 2020 QUESTIONS:

Real-world clinical utility and impact on clinical decision making of FFR CT Lessons from the

Bayesian Networks 2 Recap of last lecture: Modeling causal relationships with Bayes nets Direct

NK/T-cell lymphoma: SMILE and other asparaginase containing regimens Experience in Japan

Graphical models and message-passing Part I: Basics and MAP computation Martin Wainwright UC

Impact of Additive Use of Olmesartan in Patients With Chronic Heart Failure: The Supplemental

1 Jesus and John the Baptist Matthew 3:13-17 Mark 1:9-11

INF5210 Information Infrastructure Class #11 Bootstrapping & Gateways Ben Eaton Dan Truong