data linking
play

Data Linking: Capturing and Utilising Implicit Schema-Level - PowerPoint PPT Presentation

Data Linking: Capturing and Utilising Implicit Schema-Level Relations Andriy Nikolov Victoria Uren Enrico Motta Data linking: current state Automatic instance matching algorithms SILK, ODDLinker, KnoFuss, Pairwise matching


  1. Data Linking: Capturing and Utilising Implicit Schema-Level Relations Andriy Nikolov Victoria Uren Enrico Motta

  2. Data linking: current state • Automatic instance matching algorithms – SILK, ODDLinker, KnoFuss, … • Pairwise matching of datasets – Requires significant configuration effort • Transitive closure of links – Use of “reference” datasets

  3. Problems • Transitive closures often incomplete – Reference “hub” dataset is incomplete – Missing intermediate links – Direct comparison of relevant datasets is desirable • Schema heterogeneity – Which instances to compare? – Which properties are relevant?

  4. Background • KnoFuss architecture Knowledge fusion Knowledge Ontology base Source Target integration integration KB KB Ontology Coreference Inconsistency Instance matching transformation resolution processing

  5. Overview Inferring schema • mappings from pre- existing instance mappings • Utilizing schema mappings to produce new instance mappings • Background knowledge: – Data-level (intermediate repositories) – Schema-level (datasets with more fine-grained schemas)

  6. Algorithm • Step 1: – Obtaining transitive closure of existing mappings DBPedia LinkedMDB dbpedia:Ennio_Morricone movie:music_contributor/2490 = = MusicBrainz music:artist/a16…9fdf

  7. Algorithm • Step 2: Inferring class and property mappings – ClassOverlap and PropertyOverlap mappings – Confidence (classes A, B) = |c(A)Πc(B)| / min(c(|A|), c(|B|)) (overlap coefficient) – Confidence (properties r1, r2) = |c(X)|/|c(Y)| • X – identity clusters with equivalent values of r1 and r2 • Y – all identity clusters which have values for both r1 and r2 movie:music_contributor dbpedia:Artist is_a is_a LinkedMDB DBPedia MusicBrainz = = movie:music_contributor/2490 dbpedia:Ennio_Morricone music:artist/a16…9fdf

  8. Algorithm Step 3: Inferring data • patterns • Functionality restrictions • IF 2 equivalent movies do not have overlapping actors AND have different release dates THEN break the equivalence link • Note: – Only usable if not taken into account at the initial instance matching stage

  9. Algorithm • Step 4: utilizing mappings and patterns – Run instance-level matching for individuals of strongly overlapping classes – Use patterns to filter out existing mappings • DBLP • DBPedia SELECT ?uri SELECT ?uri WHERE { WHERE { ?uri rdf:type ?uri rdf:type movie:music_contributor . dbpedia:Artist . } }

  10. Results 1 DBPedia/ 0.9 • Class mappings: 0.8 DBLP 0.7 0.6 Precision 0.5 – Improvement in recall Recall 0.4 0.3 F1-measure 0.2 • Previously omitted mappings 0.1 0 Existing KnoFuss Combined were discovered after direct (only) 1 DBPedia/ comparison of instances 0.9 0.8 LinkedMDB 0.7 0.6 • Data patterns Precision 0.5 Recall 0.4 0.3 F1-measure 0.2 – Improved precision 0.1 0 Existing KnoFuss Combined • Filtered out spurious mappings (only) 1 DBPedia/ 0.9 • Identified 140 mappings 0.8 0.7 BookMashup 0.6 between movies as “potentially Precision 0.5 Recall 0.4 spurious” 0.3 F1-measure 0.2 0.1 • 132 identified correctly 0 Existing KnoFuss Combined (only)

  11. Limitations & future work • Large-scale tests – Billion Triple Challenge 2009, other repositories • Initial mappings – What to do if a repository is not connected to any other one? – Utilizing low-cost instance-matching techniques

  12. Questions? Thanks for your attention

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend