Quality Assessment and Refinement Tom De Nies, Anastasia Dimou, - - PowerPoint PPT Presentation

▶

May 17, 2023 138 likes •231 views

ELIS Multimedia Lab Enabling Dataset Trustworthiness by Exposing the Provenance of Mapping Quality Assessment and Refinement Tom De Nies, Anastasia Dimou, Ruben Verborgh, Erik Mannens, and Rik Van de Walle ELIS Multimedia Lab Contents

SLIDE 1

ELIS – Multimedia Lab

Tom De Nies, Anastasia Dimou, Ruben Verborgh, Erik Mannens, and Rik Van de Walle

Enabling Dataset Trustworthiness by Exposing the Provenance of Mapping Quality Assessment and Refinement

SLIDE 2

ELIS – Multimedia Lab

Context: assessing trust of RDF datasets Mapping semi-structured data to RDF Mapping quality assessment and refinement workflow Capturing provenance of the workflow Deriving Trust

SLIDE 3

ELIS – Multimedia Lab

How do we decide to trust an RDF dataset or not? One important aspect is: where did it come from? In a lot of cases, the RDF data was mapped from semi- structured data using a mapping language. In our lab, we developed such a language: The RDF Mapping Language http://rml.io

Context

SLIDE 4

ELIS – Multimedia Lab

W3C R2RML exists to map databases to RDF To map all other data formats, there’s RML The cool thing: RML definitions are RDF themselves → they can be queried using SPARQL The problem: not all mappings are perfect right away

Mapping semi-structured data to RDF

SLIDE 5

ELIS – Multimedia Lab

Evaluate data quality during the mapping stage Based on RDFUnit tests for mapping documents instead of data Turns out to be much more efficient for mapping documents than for data (seconds vs. hours) Generates violations (warnings and errors), based on which the mapping definitions are refined

Mapping quality assessment and refinement workflow

SLIDE 6

ELIS – Multimedia Lab

Goal: evaluate the difference (delta) in trust between the new and old dataset Capturing provenance of the workflow

newRDF

ldRDF

newRML

ldRML

refinement Diff in trust? generates generates

SLIDE 7

ELIS – Multimedia Lab

Query the provenance for violations and see:

How many there are
How bad they are (e.g., errors can be worse than warnings)

ELIS – Multimedia Lab

Tom De Nies, Anastasia Dimou, Ruben Verborgh, Erik Mannens, and Rik Van de Walle

Enabling Dataset Trustworthiness by Exposing the Provenance of Mapping Quality Assessment and Refinement

ELIS – Multimedia Lab

Context: assessing trust of RDF datasets Mapping semi-structured data to RDF Mapping quality assessment and refinement workflow Capturing provenance of the workflow Deriving Trust

Contents

ELIS – Multimedia Lab

How do we decide to trust an RDF dataset or not? One important aspect is: where did it come from? In a lot of cases, the RDF data was mapped from semi- structured data using a mapping language. In our lab, we developed such a language: The RDF Mapping Language http://rml.io

Context

ELIS – Multimedia Lab

W3C R2RML exists to map databases to RDF To map all other data formats, there’s RML The cool thing: RML definitions are RDF themselves → they can be queried using SPARQL The problem: not all mappings are perfect right away

Mapping semi-structured data to RDF

ELIS – Multimedia Lab

Evaluate data quality during the mapping stage Based on RDFUnit tests for mapping documents instead of data Turns out to be much more efficient for mapping documents than for data (seconds vs. hours) Generates violations (warnings and errors), based on which the mapping definitions are refined

Mapping quality assessment and refinement workflow

ELIS – Multimedia Lab

Goal: evaluate the difference (delta) in trust between the new and old dataset Capturing provenance of the workflow

newRDF

newRML

refinement Diff in trust? generates generates

ELIS – Multimedia Lab

Query the provenance for violations and see:

The cool thing: it’s all RDF, so it can be done with standard reasoning tools (N3, SPARQL, …)

Deriving Trust