Automatic Merging of Automatic Merging of Pedigree Information - - PowerPoint PPT Presentation

automatic merging of automatic merging of pedigree
SMART_READER_LITE
LIVE PREVIEW

Automatic Merging of Automatic Merging of Pedigree Information - - PowerPoint PPT Presentation

Automatic Merging of Automatic Merging of Pedigree Information Pedigree Information Annual Workshop on Family History Technology April 3, 2003 Sue Dintelman and Tim Maness Pleiades Software Development, Inc. Pleiades Software Development,


slide-1
SLIDE 1

Pleiades Software Development, Inc.

Automatic Merging of Automatic Merging of Pedigree Information Pedigree Information

Annual Workshop on Family History Technology April 3, 2003 Sue Dintelman and Tim Maness

Pleiades Software Development, Inc.

Pleiades Software Development, Inc.

slide-2
SLIDE 2

Pleiades Software Development, Inc.

Source of Duplicates Source of Duplicates

 Common Ancestry Trees

– Most large pedigrees have branches that

intermarry

 Combining Data Sources

– Working with other family members to build a

common genealogy

– Utilizing on-line or other sources to expand your

genealogy

Pleiades Software Development, Inc.

slide-3
SLIDE 3

Pleiades Software Development, Inc.

Current Solutions Current Solutions

Not automated Utilize limited clustering options Utilize limited family information

(Parents’ names)

slide-4
SLIDE 4

Pleiades Software Development, Inc.

Goals for Merge Utility Goals for Merge Utility

  • Automatic
  • Fast
  • Accurate
  • Eliminate duplicates in a single

family database

  • Combine multiple family databases
slide-5
SLIDE 5

Pleiades Software Development, Inc.

Record Linking Background Record Linking Background

Decide if two records are for the

same individual

Use sum of weights for a comparison

  • f each common field in the records

Use a cut off score to choose “true”

links

Pleiades Software Development, Inc.

slide-6
SLIDE 6

Pleiades Software Development, Inc.

Sample Scores Sample Scores

slide-7
SLIDE 7

Pleiades Software Development, Inc.

Problems Linking Individuals Problems Linking Individuals in Family Data in Family Data

Few fields that can actually be

compared (name, birth date and place, death date and place)

Many names will be similar or

identical because of naming conventions

Many places will be the same because

these are families

slide-8
SLIDE 8

Pleiades Software Development, Inc.

Advantages Linking Advantages Linking Individuals in Family Data Individuals in Family Data

Family members provide additional

field values for comparison

Additional family information helps

prevent incorrect matches

slide-9
SLIDE 9

Pleiades Software Development, Inc.

Other Record Linking Other Record Linking Considerations Considerations

Misspellings of names and places Incorrect dates Initial inconsistencies

– Any family database with 20+ generations has

some type serious inconsistency

slide-10
SLIDE 10

Pleiades Software Development, Inc.

The Process The Process

Data preparation Find initial duplicates Use a recursive process to find other

duplicates

Pleiades Software Development, Inc.

slide-11
SLIDE 11

Pleiades Software Development, Inc.

Data Source Preparation Data Source Preparation

Find loops (an individual is his own

ancestor)

Find inconsistent information (a

person is born before his parents)

Identify connected components Pre-process names, places and

dates

slide-12
SLIDE 12

Pleiades Software Development, Inc.

Generate Duplicate List Generate Duplicate List

Cluster using last name variation

– Transducer

Compute score

– Individual component – Family component

Choose the links with the highest

scores

slide-13
SLIDE 13

Pleiades Software Development, Inc.

Merge Duplicates Merge Duplicates

For each pair of duplicates:

Combine data Recursively consider the relatives of the duplicates

Add any new duplicates to the list

slide-14
SLIDE 14

Pleiades Software Development, Inc.

New Duplicate New Duplicate

Misspelling:

– Jones, Jerrolyn, Mary – Jonesanderson, Jerrolyn, Mary

Duplicate sib:

– Kimball, Lanette 3/4/1905 – Kimball, Lannette 0/0/1905

slide-15
SLIDE 15

Pleiades Software Development, Inc.

The Merge Reports The Merge Reports

List of people who merged List of new people List of parent problems

slide-16
SLIDE 16

Pleiades Software Development, Inc.

Example Parent Problem Example Parent Problem

Jonathan Anderson, born 07/07/1848 Nauvoo, Hancock, OH

Spouse: Maria Babcock, born 08/09/1852 Nauvoo, Hancock, OH (five children Ann, John, Alex, Samantha, Elizabeth) Mother: Emily Adams, born 02/19/1823 Pomphret, Chautauqua, NY Father: Jonathan P. Anderson, born 10/28/1824 Wartrace Creek, Bedford, TN Jonathan Anderson, born 07/07/1848 Nauvoo, Hancock, OH Spouse: Maria Babcock, born 08/09/1852 Nauvoo, Hancock, OH (five children Ann, John, Alex, Samantha, Elizabeth) Mother: Theresa Johnson, born 04/17/1825 New York City, NY Father: Jonathan K. Anderson, born 08/15/1820 Weakly, TN

slide-17
SLIDE 17

Pleiades Software Development, Inc.

GenMerge GenMerge

Automates finding and eliminating

duplicates in a single data source or when combining data sources

Fast Accurate Allow review of inconsistencies