Generic Entity Resolution with Negative Rules Steven Whang Hector - - PowerPoint PPT Presentation

generic entity resolution with negative rules
SMART_READER_LITE
LIVE PREVIEW

Generic Entity Resolution with Negative Rules Steven Whang Hector - - PowerPoint PPT Presentation

Generic Entity Resolution with Negative Rules Steven Whang Hector Garcia-Molina Omar Benjelloun Stanford University Google Inc. 1 Entity Resolution Name SSN Gender Pat 999-04-1234 r 1 Patricia F r 2 Pat 999-04-1234 M r 3 M(r 1


slide-1
SLIDE 1

1

Generic Entity Resolution with Negative Rules

Steven Whang Hector Garcia-Molina Stanford University Omar Benjelloun Google Inc.

slide-2
SLIDE 2

2

Entity Resolution

  • M(r1, r2) = T, merge <r1, r2> = r12
  • M(r3, r12) = T, merge <r3, r12> = r123

Name SSN Gender r1 Pat 999-04-1234 r2 Patricia F r3 Pat 999-04-1234 M

slide-3
SLIDE 3

3

Entity Resolution

r12 r123 r1 r2 r3

Name SSN Gender r1 Pat 999-04-1234 r2 Patricia F r3 Pat 999-04-1234 M r12 {Pat, Patricia} 999-04-1234 F r123 {Pat, Patricia} 999-04-1234 {F, M}

slide-4
SLIDE 4

4

Entity Resolution

r12 r123 r1 r2 r3 Negative Rules

Name SSN Gender r1 Pat 999-04-1234 r2 Patricia F r3 Pat 999-04-1234 M r12 {Pat, Patricia} 999-04-1234 F r123 {Pat, Patricia} 999-04-1234 {F, M}

slide-5
SLIDE 5

5

Entity Resolution

r12 r1 r2 r3 Negative Rules

Name SSN Gender r1 Pat 999-04-1234 r2 Patricia F r3 Pat 999-04-1234 M r12 {Pat, Patricia} 999-04-1234 F

slide-6
SLIDE 6

6

Entity Resolution

Solutions:

Name SSN Gender r1 Pat 999-04-1234 r2 Patricia F r3 Pat 999-04-1234 M

Undesirable: {r13, r2} or {r12} {r1, r2}

slide-7
SLIDE 7

7

Negative Rules

I input records ER R resolved records match, merge func. negative rules

slide-8
SLIDE 8

8

Negative Rules

I input records ER R resolved records match, merge func. negative rules I input records ER R resolved records match, merge func. negative rules

slide-9
SLIDE 9

9

Why not simply extend match func.?

r1 r12 r123 r2 r3 M M|F F M M

slide-10
SLIDE 10

10

Algorithm

Name SSN Gender r1 Pat 999-04-1234 r2 Patricia F r3 Pat 999-04-1234 M

r12 r123 r1 r2 r3 r23 r13

Solution

slide-11
SLIDE 11

11

Algorithm

Name SSN Gender r1 Pat 999-04-1234 r2 Patricia F r3 Pat 999-04-1234 M

r12 r123 r1 r2 r3 r23 r13

Solution

slide-12
SLIDE 12

12

Algorithm

Name SSN Gender r1 Pat 999-04-1234 r2 Patricia F r3 Pat 999-04-1234 M

r12 r123 r1 r2 r3 r23 r13

Solution

slide-13
SLIDE 13

13

Algorithm

Name SSN Gender r1 Pat 999-04-1234 r2 Patricia F r3 Pat 999-04-1234 M

r12 r1 r2 r3 r23 r13

Solution

slide-14
SLIDE 14

14

Algorithm

Name SSN Gender r1 Pat 999-04-1234 r2 Patricia F r3 Pat 999-04-1234 M

r12 r1 r2 r3 r23 r13

Solution

slide-15
SLIDE 15

15

Algorithm

Name SSN Gender r1 Pat 999-04-1234 r2 Patricia F r3 Pat 999-04-1234 M

r12 r1 r2 r3 r13

Solution

slide-16
SLIDE 16

16

Algorithm

Name SSN Gender r1 Pat 999-04-1234 r2 Patricia F r3 Pat 999-04-1234 M

r12 r1 r2 r3 r13

Solution

slide-17
SLIDE 17

17

Algorithm

Name SSN Gender r1 Pat 999-04-1234 r2 Patricia F r3 Pat 999-04-1234 M

r12 r1 r2 r3 r13

Solution

slide-18
SLIDE 18

18

Algorithm

Name SSN Gender r1 Pat 999-04-1234 r2 Patricia F r3 Pat 999-04-1234 M

r1 r2 r13

Solution

slide-19
SLIDE 19

19

Algorithm

Name SSN Gender r1 Pat 999-04-1234 r2 Patricia F r3 Pat 999-04-1234 M

r2 r13

Solution

slide-20
SLIDE 20

20

Resolving Inconsistencies

r1 r2 Discard r1 r2 Forced Merge r12 Override r1 r2

slide-21
SLIDE 21

21

Precision and Recall

Match and Merge Func. Discard Forced Merge Solver Best Point

slide-22
SLIDE 22

22

Runtime

General Alg. Enhanced Alg.

slide-23
SLIDE 23

23

Negative Rules Summary

Negative Rules can improve the precision and recall of Entity Resolution Entity Resolution with Negative Rules is very expensive and should be used within buckets after blocking

slide-24
SLIDE 24

24

Evolving Rules

I input records ER R resolved records

  • ld match,

merge func.

slide-25
SLIDE 25

25

Evolving Rules

I input records ER R resolved records

  • ld match,

merge func. new match, merge func. ER S resolved records

slide-26
SLIDE 26

26

Evolving Rules

I input records ER R resolved records

  • ld match,

merge func. new match, merge func. ER S resolved records Merge Undo T resolved records ER

slide-27
SLIDE 27

27

ER in the InfoLab

  • Generic ER
  • Confidences
  • Distributed ER
  • Negative Rules
  • Evolving Rules
  • Blocking