Collec&ve En&ty Resolu&on in Rela&onal Data - PowerPoint PPT Presentation

Collec&ve ¡En&ty ¡Resolu&on ¡in ¡ ¡ Rela&onal ¡Data ¡(contd) ¡ CompSci ¡590.03 ¡ Instructor: ¡Ashwin ¡Machanavajjhala ¡ ¡ Slides ¡adapted ¡from ¡[Singla ¡et ¡al ¡ICDM06], ¡[Rastogi ¡et ¡al ¡VLDB ¡‘11] ¡ ¡ Lecture ¡22 ¡: ¡590.02 ¡Spring ¡13 ¡ 1 ¡

This ¡class ¡ • Collec&ve ¡En&ty ¡Resolu&on ¡using ¡Markov ¡Logic ¡Networks ¡ • Scaling ¡Collec&ve ¡En&ty ¡Resolu&on ¡ Lecture ¡22 ¡: ¡590.02 ¡Spring ¡13 ¡ 2 ¡

Markov ¡Logic ¡ [Richardson ¡& ¡Domingos, ¡06] ¡ • A ¡logical ¡KB ¡is ¡a ¡set ¡of ¡ hard ¡constraints ¡ on ¡the ¡set ¡of ¡possible ¡ worlds ¡ • Let ¡us ¡make ¡them ¡ so, ¡constraints ¡ • When ¡a ¡world ¡violates ¡a ¡formula, ¡it ¡becomes ¡less ¡probable ¡but ¡ not ¡impossible ¡ • Give ¡each ¡formula ¡a ¡ weight ¡ – Higher ¡weight ¡ ⇒ ¡Stronger ¡constraint ¡ ( ) P(world) exp weights o f formula s it sat isfies ∑ ∝ Lecture ¡22 ¡: ¡590.02 ¡Spring ¡13 ¡ 3 ¡

Markov ¡Logic ¡ • A ¡ Markov ¡Logic ¡Network ¡(MLN) ¡is ¡a ¡set ¡of ¡pairs ¡ (F, ¡w) ¡where ¡ – F ¡is ¡a ¡formula ¡in ¡first-‑order ¡logic ¡ – w ¡is ¡a ¡real ¡number ¡ # true groundings of ith clause 1 ⎛ ⎞ P ( X ) exp w n ( x ) ∑ = ⎜ ⎟ i i Z ⎝ ⎠ i ∈ F Normalization Constant Iterate over all first-order MLN formulas Lecture ¡22 ¡: ¡590.02 ¡Spring ¡13 ¡ 4 ¡

Inference ¡ • Given ¡weights, ¡compu&ng ¡the ¡probability ¡of ¡a ¡world ¡can ¡be ¡ computed ¡using ¡the ¡following ¡techniques ¡ • ¡MCMC ¡ • Gibbs ¡Sampling ¡ • WalkSAT ¡ – Find ¡an ¡assignment ¡of ¡truth ¡values ¡to ¡variables ¡that ¡maximizes ¡the ¡total ¡ weight ¡of ¡the ¡sa&sfied ¡formulae ¡(or ¡clauses) ¡ Lecture ¡22 ¡: ¡590.02 ¡Spring ¡13 ¡ 5 ¡

Problem ¡Formula&on ¡ • Given ¡ ¡ – A ¡database ¡of ¡records ¡represen&ng ¡en&&es ¡in ¡the ¡real ¡world ¡e.g. ¡cita&ons ¡ – A ¡set ¡of ¡fields ¡e.g. ¡author, ¡&tle, ¡venue ¡ – Each ¡record ¡represented ¡as ¡a ¡set ¡of ¡typed ¡predicates ¡e.g. ¡ HasAuthor(citaNon,author), ¡HasVenue(citaNon,venue) ¡ ¡ • Goal ¡ – To ¡determine ¡which ¡of ¡the ¡records/fields ¡refer ¡to ¡the ¡same ¡underlying ¡ en&ty ¡ ¡ Lecture ¡22 ¡: ¡590.02 ¡Spring ¡13 ¡ 6 ¡

Example: ¡Bibliography ¡Database ¡ Citation Title Author Venue C1 Entity Resolution J. Cox ICDM 06 C2 Entity Resolution and Logic Cox J. Sixth ICDM C3 Learning Boolean Formulas Jacob C. ICDM 06 C4 Learning of Boolean Formulas Jacob Coxe Sixth ICDM Lecture ¡22 ¡: ¡590.02 ¡Spring ¡13 ¡ 7 ¡

Problem ¡Formula&on ¡ • En&&es ¡in ¡the ¡real ¡world ¡represented ¡by ¡one ¡or ¡more ¡strings ¡ appearing ¡in ¡the ¡DB ¡e.g. ¡ ”J. ¡Cox” , ¡ ”Cox ¡J.” ¡ • String ¡constant ¡for ¡each ¡record ¡e.g. ¡ ”C1” , ¡ ”C2” ¡ ¡ • Goal: ¡for ¡each ¡pair ¡of ¡string ¡constants ¡ <x 1 , ¡x 2 > ¡of ¡the ¡same ¡type , ¡ ¡ is ¡ x 1 ¡ = ¡x 2 ? ¡ Lecture ¡22 ¡: ¡590.02 ¡Spring ¡13 ¡ 8 ¡

Handling ¡Equality ¡ • Introduce ¡ Equals(x,y) ¡for ¡ x ¡= ¡y ¡ • Introduce ¡the ¡axioms ¡of ¡equality ¡ – Reflexivity: ¡ x ¡= ¡x ¡ ¡ – Symmetry: ¡ x ¡= ¡y ¡ ⇒ ¡y ¡ ¡= ¡x ¡ ¡ – Transi&vity: ¡ x ¡= ¡y ¡ ∧ ¡y ¡= ¡z ¡ ⇒ ¡z ¡= ¡x ¡ Lecture ¡22 ¡: ¡590.02 ¡Spring ¡13 ¡ 9 ¡

Predicate ¡Equivalence ¡ ¡ R(x 1 ,y 1 ) ¡ ∧ ¡x 1 ¡= ¡x 2 ¡ ¡ ∧ ¡ ¡ y 1 ¡= ¡y 2 ¡ ⇒ ¡R(x 2 ,y 2 ) ¡ ¡ • If ¡(x1,x2) ¡and ¡(y1,y2) ¡are ¡the ¡same, ¡then ¡if ¡x1,y1 ¡are ¡related, ¡then ¡ x2,y2 ¡are ¡also ¡related. ¡ – Hard ¡constraints ¡like ¡the ¡equality ¡axioms. ¡ ¡ – Infinite ¡weight ¡ Lecture ¡22 ¡: ¡590.02 ¡Spring ¡13 ¡ 10 ¡

Reverse ¡Predicate ¡Equivalence ¡ • Same ¡rela&on ¡with ¡the ¡same ¡en&ty ¡gives ¡evidence ¡about ¡two ¡ en&&es ¡being ¡same ¡ ¡ R(x 1 ,y 1 ) ¡ ∧ ¡R(x 2 ,y 2 ) ¡ ∧ ¡x 1 ¡= ¡x 2 ¡ ¡ ⇒ ¡ ¡ y 2 ¡= ¡y 2 ¡ • Not ¡true ¡logically, ¡but ¡gives ¡useful ¡informa&on ¡ – Soe ¡constraint ¡with ¡weights ¡ – Weight ¡determines ¡strength ¡of ¡the ¡constraint ¡ • Example ¡ ¡ HasAuthor(C1, ¡J. ¡Cox) ¡ ∧ ¡HasAuthor(C2, ¡Cox ¡J.) ¡ ∧ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡C1 ¡= ¡C2 ¡ ⇒ ¡(J. ¡Cox ¡= ¡ Cox ¡J.) ¡ Lecture ¡22 ¡: ¡590.02 ¡Spring ¡13 ¡ 11 ¡

Model ¡for ¡En&ty ¡Resolu&on ¡ • Model ¡is ¡in ¡the ¡form ¡of ¡an ¡MLN ¡ – Each ¡formula ¡has ¡a ¡weight ¡(which ¡can ¡be ¡specified ¡by ¡humans ¡or ¡learnt ¡ from ¡training ¡data) ¡ • Evidence ¡predicates ¡are ¡rela&ons ¡which ¡hold ¡according ¡to ¡the ¡DB ¡ • Goal: ¡Query ¡predicate ¡is ¡ Equality ¡ – Compute ¡likelihood ¡of ¡the ¡equality ¡predicated ¡being ¡true ¡ – Equality ¡predicates ¡are ¡related ¡to ¡evidence ¡via ¡predicate ¡and ¡reverse ¡ predicate ¡equivalence. ¡ Lecture ¡22 ¡: ¡590.02 ¡Spring ¡13 ¡ 12 ¡

Enriching ¡the ¡model ¡ • Predicate ¡and ¡reverse ¡predicate ¡equivalence ¡only ¡fire ¡when ¡ ¡ – Either, ¡x1, ¡x2 ¡are ¡constants ¡and ¡are ¡iden&cal ¡ – Or, ¡Equality(x1, ¡x2) ¡is ¡sa&sfied. ¡ – Need ¡to ¡be ¡able ¡to ¡encode ¡similarity ¡func&ons ¡ • Can ¡add ¡other ¡constraints. ¡ Lecture ¡22 ¡: ¡590.02 ¡Spring ¡13 ¡ 13 ¡

Encoding ¡Similarity ¡Func&ons ¡ • Each ¡field ¡is ¡a ¡string ¡composed ¡of ¡tokens ¡ • Introduce ¡ HasWord(field, ¡word) ¡ • Use ¡reverse ¡predicate ¡equivalence ¡ ¡ HasWord(f 1 ,w 1 ) ¡ ∧ ¡HasWord(f 2 ,w 2 ) ¡ ∧ ¡w 1 ¡ = ¡ w 2 ¡ ⇒ ¡ ¡ f 1 ¡ = ¡f 2 ¡ • Example ¡ ¡ HasWord(J. ¡Cox, ¡Cox) ¡ ∧ ¡HasWord(Cox ¡J., ¡Cox) ¡ ∧ ¡(Cox ¡= ¡Cox) ¡ ⇒ ¡(J. ¡Cox ¡= ¡ Cox ¡J.) ¡ Lecture ¡22 ¡: ¡590.02 ¡Spring ¡13 ¡ 14 ¡

Encoding ¡Similarity ¡ ¡ HasWord(f 1 ,w 1 ) ¡ ∧ ¡HasWord(f 2 ,w 2 ) ¡ ∧ ¡w 1 ¡ = ¡ w 2 ¡ ⇒ ¡ ¡ f 1 ¡ = ¡f 2 ¡ • If ¡these ¡rules ¡have ¡the ¡same ¡weight ¡for ¡all ¡rules, ¡ ¡Pr[f1 ¡= ¡f2 ¡| ¡n ¡words ¡in ¡common] ¡= ¡e wn ¡/ ¡(e wn ¡+ ¡1) ¡ • Different ¡weight ¡for ¡each ¡word ¡ ¡ – Similar ¡to ¡a ¡learnable ¡similarity ¡measure ¡of ¡[Bilenko ¡& ¡Mooney ¡2003] ¡ Lecture ¡22 ¡: ¡590.02 ¡Spring ¡13 ¡ 15 ¡

Two-‑level ¡Similarity ¡ • Individual ¡words ¡as ¡units: ¡Can’t ¡deal ¡with ¡spelling ¡mistakes ¡ • Break ¡each ¡word ¡into ¡ngrams: ¡Introduce ¡ HasEngram(word, ¡ ngram) ¡ • Use ¡reverse ¡predicate ¡equivalence ¡for ¡word ¡comparisons ¡ • Gives ¡a ¡two ¡level ¡similarity ¡measure. ¡ Lecture ¡22 ¡: ¡590.02 ¡Spring ¡13 ¡ 16 ¡

Fellegi-‑Sunter ¡Model ¡ • Uses ¡Naïve ¡Bayes ¡for ¡match ¡decisions ¡with ¡field ¡comparisons ¡ used ¡as ¡predictors ¡ ¡ • Simplest ¡Version: ¡Field ¡similari&es ¡measured ¡by ¡presence/ absence ¡of ¡words ¡in ¡common ¡ ¡ HasWord(f 1 , ¡w 1 ) ¡ ∧ ¡HasWord(f 2 ,w 2 ) ¡ ∧ ¡ ¡HasField(r 1, ¡ ¡ f 1 ) ¡ ∧ ¡HasField(r 2 , ¡f 2 ) ¡ ∧ ¡ w 1 ¡ = ¡ w 2 ¡ ⇒ ¡ ¡ r 1 ¡ = ¡r 2 ¡ • Example ¡ ¡ HasWord(J. ¡Cox, ¡Cox) ¡ ∧ ¡HasWord(Cox ¡J., ¡Cox) ¡ ∧ ¡HasAuthor(C1, ¡J. ¡Cox) ¡ ∧ ¡ HasAuthor(C2, ¡Cox ¡J.) ¡ ∧ ¡ ¡ ¡ ¡(Cox ¡= ¡Cox) ¡ ¡ ⇒ ¡ ¡ (C1 ¡= ¡C2) ¡ Lecture ¡22 ¡: ¡590.02 ¡Spring ¡13 ¡ 17 ¡

Collec&ve En&ty Resolu&on in Rela&onal Data - PowerPoint PPT Presentation

Collec&ve En&ty Resolu&on in Rela&onal Data (contd) CompSci 590.03 Instructor: Ashwin Machanavajjhala Slides adapted from [Singla et al ICDM06],

Collec&ve Impact: Measuring Collec&ve Outcomes Agenda

Collec&ve En&ty Resolu&on in Rela&onal Data CompSci

What is En#ty Resolu#on? Problem of idenBfying and

Recap: En#ty Resolu#on Problem of idenBfying and linking/grouping

Symmetry energy constrained by Nuclear collec:ve excita:ons ( February 16-18, 2017, Iizaka

Es#ma#ons of Collec#ve Instabili#es for JLEIC Rui Li JLEIC Collabora#on Mee#ng 4-3-2016

More on collec)ons and sor)ng CSCI 136: Fundamentals of

Civil Resolution Tribunal Update Civil Resolu+on Tribunal Amendment Act

Very high-resolu.on numerical modeling for climate extremes in

Sensi&vity of Tropical Cyclones to Resolu&on, Convec&on

ARP Address Resolu,on Protocol Security Joo Paulo Barraca

WTO Law Ferrara 2018 Dr. Holger Hestermeyer Shell Reader in Interna;onal Dispute Resolu;on,

Comb Combined Genera ral M Meeti ting Votin oting results lts on on resolu olution tions

Decentralized En.ty-Level Modeling for Coreference Resolu.on Greg

Low resolu*on power spectra of the data with gaps E.

Agenda Item Number 11. Use of the Household surveys for collec*ng data for SDG indicators

MAE 598: Multi-Robot Systems Fall 2016 Instructor: Spring Berman spring.berman@asu.edu Assistant

Table Search SIGIR 2019 tutorial - Part IV Shuo Zhang and Krisztian Balog University of Stavanger

State Funded Diversion Programs / Use of Trueblood Fines Dr. Thomas Kinlen, Director Dr. Danna

1 What is HUD Policy on Tenant Participation? Federal Regulations State that: HUD promotes

q

I' b: ? rob. WrauuJT f? lrrvltal- 4'/t7 cq.)ct V 24 pbrf ,{, Et{#ilrlffiffitffi o o i

DATA MINING CSE4334/5334 Data Mining, Fall 2014 Lecture 5: Department of Computer Science and

Tel"Aviv,"5."Jan."2014" 1" Physics landscape at the end 1970s

Collec&ve En&ty Resolu&on in Rela&onal Data - PowerPoint PPT Presentation

Collec&ve En&ty Resolu&on in Rela&onal Data (contd) CompSci 590.03 Instructor: Ashwin Machanavajjhala Slides adapted from [Singla et al ICDM06],

Collec&amp;ve Impact: Measuring Collec&amp;ve Outcomes Agenda

Collec&amp;ve En&amp;ty Resolu&amp;on in Rela&amp;onal Data CompSci

What is En#ty Resolu#on? Problem of idenBfying and

Recap: En#ty Resolu#on Problem of idenBfying and linking/grouping

Symmetry energy constrained by Nuclear collec:ve excita:ons ( February 16-18, 2017, Iizaka

Es#ma#ons of Collec#ve Instabili#es for JLEIC Rui Li JLEIC Collabora#on Mee#ng 4-3-2016

More on collec)ons and sor)ng CSCI 136: Fundamentals of

Civil Resolution Tribunal Update Civil Resolu+on Tribunal Amendment Act

Very high-resolu.on numerical modeling for climate extremes in

Sensi&amp;vity of Tropical Cyclones to Resolu&amp;on, Convec&amp;on

ARP Address Resolu,on Protocol Security Joo Paulo Barraca

WTO Law Ferrara 2018 Dr. Holger Hestermeyer Shell Reader in Interna;onal Dispute Resolu;on,

Comb Combined Genera ral M Meeti ting Votin oting results lts on on resolu olution tions

Decentralized En.ty-Level Modeling for Coreference Resolu.on Greg

Low resolu*on power spectra of the data with gaps E.

Agenda Item Number 11. Use of the Household surveys for collec*ng data for SDG indicators

MAE 598: Multi-Robot Systems Fall 2016 Instructor: Spring Berman spring.berman@asu.edu Assistant

Table Search SIGIR 2019 tutorial - Part IV Shuo Zhang and Krisztian Balog University of Stavanger

State Funded Diversion Programs / Use of Trueblood Fines Dr. Thomas Kinlen, Director Dr. Danna

1 What is HUD Policy on Tenant Participation? Federal Regulations State that: HUD promotes

q

I' b: ? rob. W*rauuJT f? lrrvltal- 4'/t7 cq.)c*t V 24 pbrf ,{, Et{#ilrlffiffitffi o o i

DATA MINING CSE4334/5334 Data Mining, Fall 2014 Lecture 5: Department of Computer Science and

Tel&quot;Aviv,&quot;5.&quot;Jan.&quot;2014&quot; 1&quot; Physics landscape at the end 1970s

Collec&ve Impact: Measuring Collec&ve Outcomes Agenda

Collec&ve En&ty Resolu&on in Rela&onal Data CompSci

Sensi&vity of Tropical Cyclones to Resolu&on, Convec&on

I' b: ? rob. WrauuJT f? lrrvltal- 4'/t7 cq.)ct V 24 pbrf ,{, Et{#ilrlffiffitffi o o i

Tel"Aviv,"5."Jan."2014" 1" Physics landscape at the end 1970s