Graph-Based Remerging of Genealogical Databases D. Randall Wilson - - PDF document

graph based remerging of genealogical databases
SMART_READER_LITE
LIVE PREVIEW

Graph-Based Remerging of Genealogical Databases D. Randall Wilson - - PDF document

Graph-Based Remerging of Genealogical Databases D. Randall Wilson fonix Corporation Draper, Utah, USA e-mail: WilsonR@fonix.com or randy@axon.cs.byu.edu Workshop on Technology for Family History and Genealogical Research Brigham Young


slide-1
SLIDE 1

Graph-Based Remerging

Graph-Based Remerging of Genealogical Databases

  • D. Randall Wilson

e-mail: WilsonR@fonix.com

  • r randy@axon.cs.byu.edu

Slide 1

fonix Corporation Draper, Utah, USA Workshop on Technology for Family History and Genealogical Research Brigham Young University March 29, 2001

slide-2
SLIDE 2

Graph-Based Remerging Slide 2

“Remerging” Problem

Original Database Share a copy Both make independent updates....

Now what??

slide-3
SLIDE 3

Graph-Based Remerging Slide 3

Common Approaches

  • Give up
  • One person does everything,

and everyone else is uninvolved; or

  • Everyone duplicates work for

themselves.

  • Visual Inspection, and hand-typing
  • Unix “diff” command, and hand-typing
  • Match/Merge function
  • Import second database into first
  • Decide which pairs of similar people

should be merged back together Time wasters :(

slide-4
SLIDE 4

Graph-Based Remerging

Better Solutions

  • Locking
  • One person has master database
  • Others can “check out” portions

[but overly restrictive]

  • Unique ID Numbers
  • Program assigns unique ID numbers
  • ID numbers allow automatic

match/merging of identical people.

  • [but ID numbers may not survive

translations to/from other software]

  • Graph-Based Merging Algorithm

Slide 4

slide-5
SLIDE 5

Graph-Based Remerging Slide 5

Graph-Based Merging

  • No need to check out (lock) portions
  • f the database.
  • No need for ID numbers
  • No need to examine people

who have not changed.

  • Retroactive: Works on databases

that have already diverged.

slide-6
SLIDE 6

Graph-Based Remerging

Merging Algorithm

Slide 6

  • I. Sort both databases
  • Surname, given name
  • Birth date, birth place
  • Death date, death place
  • ID numbers, if available
  • II. Find “matching” person
  • Search lists in parallel; O(N+M) time.
  • Find people with same personal information
  • Then search relationship graph
slide-7
SLIDE 7

Graph-Based Remerging

Search relationship graph

Slide 7

Merging Algorithm (cont’d)

child 1 child 2 child 3 father mother individual child 1 child 2 child 3 father individual mother spouse spouse

slide-8
SLIDE 8

Graph-Based Remerging Slide 8

Merging Algorithm (cont’d)

1 1 1 1 1 1 1 1 2 2 2 2 1 1 1 1 1 1 1 2 2 2 2

Labeling subgraphs

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Continue Continue child 1 child 2 child 3 father mother individual child 1 child 2 child 3 father individual mother spouse spouse

slide-9
SLIDE 9

Graph-Based Remerging Slide 9

  • Additional information
  • Conflicting information
  • [Missing information]
  • V. Connect subgraphs.

Continue until all incoming information has been included or rejected.

  • III. Choose largest subgraph
  • IV. Incorporate new information
  • Additional individuals

Merging Algorithm (cont’d)

slide-10
SLIDE 10

Graph-Based Remerging

Uses for Graph- Based Merging

Slide 10

  • Collaboration with family members
  • Independent updates/work/research
  • Collect information on immediate family
  • Family history organization
  • Archivist assigns work to helpers
  • Research director, archivist, helpers

all add to database concurrently.

  • Database on multiple computers
  • Desktop/laptop; home machine; etc.
  • Include previously excluded info
  • Find differences between databases
slide-11
SLIDE 11

Graph-Based Remerging

Advantages

  • f using graph-based merging

for remerging genealogy databases

  • Much easier than manual approaches
  • Much faster than global match/merge
  • No need for checking out (locking)
  • No need for ID#s
  • Not restricted to single platform
  • r software package
  • Retroactive solution
  • User controls changes to their data

Slide 11

slide-12
SLIDE 12

Graph-Based Remerging

Further Work

Slide 12

  • Actual implementation
  • Identifying “similar” people

(to distinguish between additional

individuals vs. additional or conflicting information)

  • Note-merging
  • Reordered notes
  • Minor changes vs. new notes
  • Multimedia
  • Global differences/Style
  • “Lee Co., VA” vs. “,Lee,VA”
  • Surname capitalization
  • Remembering decisions
  • Avoid repeating same decisions next time.