daQ, an Ontology for Dataset Quality Information Jeremy Debattista, - - PowerPoint PPT Presentation

daq an ontology for
SMART_READER_LITE
LIVE PREVIEW

daQ, an Ontology for Dataset Quality Information Jeremy Debattista, - - PowerPoint PPT Presentation

daQ, an Ontology for Dataset Quality Information Jeremy Debattista, Christoph Lange, Sren Auer Presenter: Claus Stadler Motivation What are the quality aspects of a dataset for a particular domain? Quality of data is subjective


slide-1
SLIDE 1

daQ, an Ontology for Dataset Quality Information

Jeremy Debattista, Christoph Lange, Sören Auer 
 Presenter: Claus Stadler

slide-2
SLIDE 2

What are the quality aspects of a 
 dataset for a particular domain?

  • Quality of data is subjective
  • Different domains require different quality attributes
  • Data quality is commonly defined as fitness for use

2

Motivation

slide-3
SLIDE 3

How can we find a good quality dataset?

3

Motivation (ii)

http://www.datahub.io

slide-4
SLIDE 4

The daQ is a light-weight, extensible vocabulary for attaching the results of quality benchmarking of a linked open dataset to that dataset

4

Dataset Quality Ontology

daQ (pronounced \ˈdək\)

slide-5
SLIDE 5

Publishers are interested in publishing good quality

  • data. But how can they convince the consumer?

5

Use Cases

  • is the published data fit to use for its domain?
  • how can publishers calculate the quality of a 


dataset and have this metadata part of it?

slide-6
SLIDE 6

Consumers are interested in finding dataset which are fit to use in their domain.

6

Use Cases (ii)

  • how can consumers

discover certain aspects


  • f a potential

dataset?

  • how can consumers

retrieve datasets?

slide-7
SLIDE 7

7

6th Star?

OL RE OF URI LD DAQ

As a Consumer you can do all that ★★★★★ enables you to do, and additionally ✔ discovery good quality dataset

  • As a Publisher, …

✔ make your data conform to domain quality metrics ✔ make your data more discoverable on certain quality aspects

http://www.5stardata.info

slide-8
SLIDE 8

8

daQ Ontology

rdfg:Graph

A

computedOn

rdfs:Resource QualityGraph

http://purl.org/eis/vocab/daq

A daq:QualityGraph is a Named Graph


✔ Separate aggregated metadata
 ✔ Digitally signed graphs using the swp:assertedBy 
 (Semantic Web Publishing - Chris Bizer)

A daq:QualityGraph in theory can be computed on 
 any resource but typically on a Dataset

slide-9
SLIDE 9

9

daQ Ontology (ii)

B

Category Dimension Metric

rdfs:Resource

hasDimension hasMetric dateComputed requires value

xsd:dateTime

The daQ ontology is a generic framework, where classes
 and properties are defined in an abstract manner

slide-10
SLIDE 10

10

Category

A category represent the highest level of quality assessment

B

Category Dimension Metric

rdfs:Resource

hasDimension hasMetric dateComputed requires value

xsd:dateTime

slide-11
SLIDE 11

11

Dimension

A dimension groups one or more metrics

B

Category Dimension Metric

hasDimension hasMetric value dateComputed requires

rdfs:Resource

xsd:dateTime

slide-12
SLIDE 12

12

Metric

The smallest unit of measuring a quality dimension

B

Category Dimension Metric

hasDimension hasMetric value dateComputed requires

rdfs:Resource

xsd:dateTime

slide-13
SLIDE 13

13

Using the daQ

slide-14
SLIDE 14

The daQ is a light-weight, extensible vocabulary for attaching the results of quality benchmarking of a linked open dataset to that dataset

14

Concluding Remarks

Next Steps:

⎕ Extend the daQ framework with more concepts ⎕ Represent more concrete quality metrics ⎕ Dataset Retrieval based on Quality Metrics - extend a portal such as CKAN

slide-15
SLIDE 15

How can we sign the (dataset,qualitygraph) pair to make sure that:
 a) the Quality Graph has not been tempered with
 b) the Dataset is unchanged from the state in which the quality graph has been computed on?

15

Discussion

Jeremy Debattista 
 jeremy.debattista@iais- extern.fraunhofer.de Christoph Lange
 math.semantic.web @gmail.com