A Probabilistic Model of Redundancy in Information Extraction Doug - PDF document

Information Extraction and the Future of Web Search A Probabilistic Model of Redundancy in Information Extraction Doug Downey, Oren Etzioni, Stephen Soderland University of Washington Department of Computer Science and Engineering http://www.cs.washington.edu/research/knowitall 2 Motivation for Web IE Review: Unsupervised Web IE • What universities have active biotech Goal: Extract information on any subject research and in what departments? automatically. • What percentage of the reviews of the Thinkpad T-40 are positive? The answer is not on any single Web page! 3 4 Binary Extraction Patterns Review: Extraction Patterns R (I 1 , I 2 ) � I 1 , R of I 2 Generic extraction patterns (Hearst ’92): • “… Cities such as Boston , Los Angeles , and Seattle …” Instantiated Pattern: Ceo (Person, Company) � <person> , CEO of <company> (“ C such as NP1 , NP2 , and NP3 ”) => IS-A(each(head( NP )), C ), … “…Jeff Bezos, CEO of Amazon…” “..Matt Damon, star of The Bourne Supremacy..” • “Detailed information for several countries such as maps , …” ProperNoun(head(NP)) “Erik Jonsson, CEO of Texas Instruments, mayor of “Erik Jonsson, CEO of Texas Instruments, mayor of • “I listen to pretty much all music but prefer Dallas from 1964-1971, and…” Dallas from 1964-1971, and…” country such as Garth Brooks ” 5 6 1

Redundancy in Information Extraction Review: Unsupervised Web IE In large corpora, the same fact is often asserted Goal: Extract information on any subject multiple times: automatically. “…and the rolling hills surrounding Sun Belt cities such as Atlanta” → Generic extraction patterns “ Atlanta is a city with a large number Generic patterns can make mistakes. of museums, theatres…” “…has offices in several major → Redundancy. metropolitan cities including Atlanta ” Given a term x and a set of sentences about a class C , what is the probability that x ∈ C ? 7 8 Redundancy – Two Intuitions Outline 1. Modeling redundancy – the problem 1) Repetition 2. U RNS model 2) Multiple extraction mechanisms 3. Parameter estimation for U RNS Phrase Hits 4. Experimental results 980 “ Atlanta and other cities” 286 5. Summary “ Canada and other cities” 5860 “cities such as Atlanta ” 7 “cities such as Canada ” Goal: A formal model of these intuitions. 9 10 1. Modeling Redundancy – The Problem 1. Modeling Redundancy – The Problem Consider a single extraction pattern: Consider a single extraction pattern: “ C such as x ” “ C such as x ” Given a term x and a set of sentences about a If an extraction x appears k times in a set of n class C , what is the probability that x ∈ C ? sentences containing this pattern , what is the probability that x ∈ C ? 11 12 2

Modeling with k Modeling with k Country(x) Country(x) extractions, n = 10 extractions, n = 10 P k − noisy or Noisy-Or Model : “…countries such as Saudi Arabia…” Saudi Arabia 2 0.99 ( ) “…countries such as the United States…” Japan 2 0.99 ∈ appears times P x C x k “…countries such as Saudi Arabia…” United States 1 0.9 − noisy or ( ) k “…countries such as Japan…” Africa 1 0.9 = − − 1 1 p “…countries such as Africa…” United Kingdom 1 0.9 p is the probability that a single 1 0.9 “…countries such as Japan…” Iraq sentence is true. “…countries such as the United Kingdom…” Afghanistan 1 0.9 “…countries such as Iraq…” Australia 1 0.9 p = 0.9 “…countries such as Afghanistan…” Important: –Distribution of C } Noisy-or ignores these “…countries such as Australia…” –Sample size ( n ) 13 14 Needed in Model: Sample Size Needed in Model: Distribution of C Country(x) Country(x) Country(x) P P P extractions, n = 10 k extractions, n ~50,000 k extractions, n ~50,000 k − − − noisy or noisy or noisy or Saudi Arabia 2 0.99 Japan 1723 0.9999… Japan 1723 0.9999… ( ) 2 0.99 295 0.9999… 295 0.9999… Japan Norway Norway ∈ appears times P x C x k United States 1 0.9 Israil 1 0.9 Israil 1 0.9 freq ( ) Africa 1 0.9 OilWatch Africa 1 0.9 OilWatch Africa 1 0.9 = − − 1000 1 1 k n p United Kingdom 1 0.9 Religion Paraguay 1 0.9 Religion Paraguay 1 0.9 Iraq 1 0.9 Chicken Mole 1 0.9 Chicken Mole 1 0.9 Afghanistan 1 0.9 Republics of Kenya 1 0.9 Republics of Kenya 1 0.9 Australia 1 0.9 Atlantic Ocean 1 0.9 Atlantic Ocean 1 0.9 New Zeland 1 0.9 New Zeland 1 0.9 As sample size increases, noisy-or becomes inaccurate. 15 16 Needed in Model: Distribution of C Needed in Model: Distribution of C Country(x) Country(x) City(x) P P P extractions, n ~50,000 k extractions, n ~50,000 k extractions, n ~50,000 k freq freq freq Japan 1723 0.9999… Japan 1723 0.9999… Toronto 274 0.9999… ( ) Norway 295 0.9999… Norway 295 0.9999… Belgrade 81 0.98 ∈ appears times P x C x k 1 0.05 freq 1 0.05 1 0.05 Israil Israil Lacombe ( ) OilWatch Africa 1 0.05 1000 OilWatch Africa 1 0.05 Kent County 1 0.05 = − − k n 1 1 p Religion Paraguay 1 0.05 Religion Paraguay 1 0.05 Nikki 1 0.05 Chicken Mole 1 0.05 Chicken Mole 1 0.05 Ragaz 1 0.05 Republics of Kenya 1 0.05 Republics of Kenya 1 0.05 Villegas 1 0.05 Atlantic Ocean 1 0.05 Atlantic Ocean 1 0.05 Cres 1 0.05 New Zeland 1 0.05 New Zeland 1 0.05 Northeastwards 1 0.05 Probability that x ∈ C depends on the distribution of C . 17 18 3

2. The U RNS Model – Single Urn Outline 1. Modeling redundancy – the problem 2. U RNS model 3. Parameter estimation for U RNS 4. Experimental results 5. Summary 19 20 2. The U RNS Model – Single Urn 2. The U RNS Model – Single Urn Urn for City(x) Urn for City(x) Tokyo Sydney Tokyo Tokyo Sydney U.K. U.K. Cairo U.K. Cairo U.K. Tokyo Tokyo …cities such as Tokyo … Utah Atlanta Utah Atlanta Yakima Atlanta Yakima Atlanta 21 22 Single Urn – Formal Definition Single Urn Example Urn for City(x) C – set of unique target labels E – set of unique error labels Tokyo Sydney num(b) – number of balls labeled by b ∈ C ∪ E U.K. num ( “Atlanta” ) = 2 num(B) – distribution giving the number of balls for Cairo U.K. each label b ∈ B . num ( C ) = {2, 2, 1, 1, 1} Tokyo num ( E ) = {2, 1} Utah Atlanta Yakima Atlanta Estimated from data 23 24 4

Single Urn: Computing Probabilities Single Urn: Computing Probabilities If an extraction x appears k times in a set of n Given that an extraction x appears k times in n sentences containing a pattern , what is the draws from the urn (with replacement), what is probability that x ∈ C ? the probability that x ∈ C ? 25 26 Uniform Special Case The U RNS Model – Multiple Urns Consider the case where num( c i ) = R C and num( e j ) = R E Correlation across extraction mechanisms is for all c i ∈ C , e j ∈ E higher for elements of C than for elements of E . Then: Then using a Poisson Approximation: Odds increase exponentially with k , but decrease exponentially with n . 27 28 Outline 3. Parameter Estimation for U RNS 1. Modeling redundancy – the problem Simplifying Assumptions: 2. U RNS model – Assume that num ( C ) and num ( E ) are Zipf distributed. 3. Parameter estimation for U RNS i − ∝ z • Frequency of i th most repeated label in C C 4. Experimental results – Then num ( C ) and num ( E ) are characterized by 5. Summary five parameters: , , , , z z C E p C E 29 30 5

Unsupervised Parameter Estimation Parameter Estimation Supervised Learning Unsupervised Learning – Differential Evolution (maximizing conditional – EM, with additional assumptions: likelihood) • | E | = 1,000,000 Unsupervised Learning • z E = 1 – Growing interest in IE without hand-tagged • p is given ( p = 0.9 for KnowItAll patterns) training data (e.g. DIPRE; Snowball; K NOW I T A LL ; Riloff and Jones 1999; Lin, Yangarber, and Grishman 2003) – How to estimate num ( C ) and num ( E )? 31 32 EM Process Outline EM for Unsupervised IE: 1. Modeling redundancy – the problem – E-Step: Assign probabilities to extracted facts 2. U RNS model using U RNS . 3. Parameter estimation for U RNS – M-Step : 4. Experimental results 1. Estimate z C by linear regression on log-log scale. 5. Summary 2. Set | C | equal to expected number of true labels extracted, plus unseen true labels (using Good- Turing estimation). 33 34 4. Experimental Results Unsupervised Likelihood Performance 5 Deviation from ideal log likelihood Previous Approach: PMI (in K NOW I T A LL , urns inspired by Turney, 2001) noisy-or 4 ( ) " Tacoma hotels" pmi Hits PMI (“ <City> hotels ” , “Tacoma”) = ( ) 3 " Tacoma" Hits –Expensive: several hit-count queries per extraction 2 –Using U RNS improves efficiency by ~8x 1 –‘Bootstrapped’ training data not representative 0 –Probabilities are polarized (Naïve Bayes) City Film Country MayorOf 35 36 6

A Probabilistic Model of Redundancy in Information Extraction Doug - PDF document

Information Extraction and the Future of Web Search A Probabilistic Model of Redundancy in Information Extraction Doug Downey, Oren Etzioni, Stephen Soderland University of Washington Department of Computer Science and Engineering

Partial Redundancy Elimination CS243 Review Session Full Redundancy x = b + c y = b + c z = b

Probabilistic model Probabilistic model c Probabilistic model Probabilistic model c c

Overview ECE 753: FAULT-TOLERANT Introduction - Sources COMPUTING Hardware redundancy

Chosen-Ciphertext Security Chosen-Ciphertext Security without Redundancy without Redundancy

Kinematic Redundancy Robert Platt Northeastern University Kinematic Redundancy A

T echBrief Leveraging Redundancy to Leveraging Redundancy to Build Fault-T olerant Networks

Red nd nc Remo Red nd nc Remo Redundancy Removal Using ATPG Redundancy Removal Using ATPG l

CS 4110 Probabilistic Programming Probabilistic Programming It's not about writing software.

Running Probabilistic Running Probabilistic Running Probabilistic Programs Backwards Programs

Probabilistic Tracking and Probabilistic Tracking and Probabilistic Tracking and Thesis

Probabilistic Computation Lecture 13 BPP vs. PH 1 Recap 2 Recap Probabilistic computation 2

Table of Contents I Probabilistic Reasoning Classical Probabilistic Models Basic Probabilistic

Probabilistic Computation Lecture 12 Flipping coins, taking chances PP, BPP 1 Probabilistic

Probabilistic Tracking and Probabilistic Tracking and Probabilistic Tracking and Reconstruction

Probabilistic Computation Lecture 13 Understanding BPP 1 Recap 2 Recap Probabilistic

From Probabilistic Circuits to Probabilistic Programs and Back Guy Van den Broeck PROBPROG - Oct

Parameter-free Mining of Non-redundant Discriminative Itemsets Yoshitaka Kameya Meijo University

Information theory " Information content of a message a boolean value

Learning objectives Understand the basic principles undelying A&T techniques Grasp

Using Component Redundancy for adaptive, self-optimising and self-healing Component-Based Systems

TeCReVis: A Tool for Test Coverage and Test Redundancy Visualization Negar Koochakzadeh Vahid

Eliminating redundant columns from column generation subproblems using classical Benders cuts

Referee Report Guidelines The Basics of Referee Reports Referee reports are a critical part of

Handling of submissions Experiences Dieter Schwarzenbach, Lausanne Responsibilities of editors

A Probabilistic Model of Redundancy in Information Extraction Doug - PDF document

Information Extraction and the Future of Web Search A Probabilistic Model of Redundancy in Information Extraction Doug Downey, Oren Etzioni, Stephen Soderland University of Washington Department of Computer Science and Engineering

Partial Redundancy Elimination CS243 Review Session Full Redundancy x = b + c y = b + c z = b

Probabilistic model Probabilistic model c Probabilistic model Probabilistic model c c

Overview ECE 753: FAULT-TOLERANT Introduction - Sources COMPUTING Hardware redundancy

Chosen-Ciphertext Security Chosen-Ciphertext Security without Redundancy without Redundancy

Kinematic Redundancy Robert Platt Northeastern University Kinematic Redundancy A

T echBrief Leveraging Redundancy to Leveraging Redundancy to Build Fault-T olerant Networks

Red nd nc Remo Red nd nc Remo Redundancy Removal Using ATPG Redundancy Removal Using ATPG l

CS 4110 Probabilistic Programming Probabilistic Programming It's not about writing software.

Running Probabilistic Running Probabilistic Running Probabilistic Programs Backwards Programs

Probabilistic Tracking and Probabilistic Tracking and Probabilistic Tracking and Thesis

Probabilistic Computation Lecture 13 BPP vs. PH 1 Recap 2 Recap Probabilistic computation 2

Table of Contents I Probabilistic Reasoning Classical Probabilistic Models Basic Probabilistic

Probabilistic Computation Lecture 12 Flipping coins, taking chances PP, BPP 1 Probabilistic

Probabilistic Tracking and Probabilistic Tracking and Probabilistic Tracking and Reconstruction

Probabilistic Computation Lecture 13 Understanding BPP 1 Recap 2 Recap Probabilistic

From Probabilistic Circuits to Probabilistic Programs and Back Guy Van den Broeck PROBPROG - Oct

Parameter-free Mining of Non-redundant Discriminative Itemsets Yoshitaka Kameya Meijo University

Information theory &quot; Information content of a message a boolean value

Learning objectives Understand the basic principles undelying A&amp;T techniques Grasp

Using Component Redundancy for adaptive, self-optimising and self-healing Component-Based Systems

TeCReVis: A Tool for Test Coverage and Test Redundancy Visualization Negar Koochakzadeh Vahid

Eliminating redundant columns from column generation subproblems using classical Benders cuts

Referee Report Guidelines The Basics of Referee Reports Referee reports are a critical part of

Handling of submissions Experiences Dieter Schwarzenbach, Lausanne Responsibilities of editors

Information theory " Information content of a message a boolean value

Learning objectives Understand the basic principles undelying A&T techniques Grasp