1
Novel Data Linkage Techniques
Dongwon Lee
The Pennsylvania State University http://pike.psu.edu/ dongwon@psu.edu
KOCSEA 2008
2
Problem Landscape
Data Linkage: Given two data collection D1
and D2, identify/link all crosswise similar data
- bject set S with few false positives
Abundant research in many disciplines
DB: record linkage, merge/purge, approx. join DL: citation matching, de-duplication AI: identity matching NLP: word sense disambiguation IR: name disambiguation LIS: name authority control