Thoughts on Validating RDF Healthcare Data David Booth, Ph.D. - - PowerPoint PPT Presentation

thoughts on validating rdf healthcare data
SMART_READER_LITE
LIVE PREVIEW

Thoughts on Validating RDF Healthcare Data David Booth, Ph.D. - - PowerPoint PPT Presentation

Thoughts on Validating RDF Healthcare Data David Booth, Ph.D. KnowMED, Inc. 2013 W3C RDF Validation Workshop Latest version of these slides: http://dbooth.org/2013/validation/dbooth-slides.pdf Why RDF? Schema promiscuous Green Model Blue


slide-1
SLIDE 1

Thoughts on Validating RDF Healthcare Data

David Booth, Ph.D. KnowMED, Inc. 2013 W3C RDF Validation Workshop

Latest version of these slides: http://dbooth.org/2013/validation/dbooth-slides.pdf

slide-2
SLIDE 2

2

Why RDF?

Schema promiscuous

HomePhone Town ZipPlus4 FullName Country Address FirstName LastName Email City ZipCode Red Model Blue Model Green Model Country subClassOf sameAs hasLast hasFirst

Multiple models peacefully coexist

slide-3
SLIDE 3

3

Why RDF?

Schema promiscuous

  • What the Blue app sees:

HomePhone Town ZipPlus4 FullName Country Address FirstName LastName Email City ZipCode Red Model Blue Model Green Model Country Country Address FirstName LastName Email City ZipCode Blue Model Country

slide-4
SLIDE 4

4

Why RDF?

Schema promiscuous

  • What the Red app sees

HomePhone Town ZipPlus4 FullName Country Address FirstName LastName Email City ZipCode Red Model Blue Model Green Model Country HomePhone Town ZipPlus4 FullName Country Red Model

slide-5
SLIDE 5

5

Why RDF?

Schema promiscuous

  • What the Green app sees

HomePhone Town ZipPlus4 FullName Country Address FirstName LastName Email City ZipCode Red Model Blue Model Green Model Country HomePhone Town ZipPlus4 Country FirstName LastName Email Green Model Country

Need multiple validation perspectives on the same data!

slide-6
SLIDE 6

6

Data producers and consumers

A B C Red Blue Green Producers Producers Consumers

slide-7
SLIDE 7

7

Two perspectives of validation

  • Producers: Model integrity

– Is the data well formed? (Sanity check) – Does it contain what I promised?

  • Consumers: Suitability for use

– Does the data meet my needs? – Different consumers have different needs!

Need multiple validation perspectives on the same data!

slide-8
SLIDE 8

8

Features I'd like to see . . .

slide-9
SLIDE 9

9

  • 1. SPARQL-based framework
  • Fewer languages == easier maintenance
  • Nice to either:

– Build on SPARQL, or – Use from SPARQL

  • BUT if a new language were very concise

and powerful, I'd jump on it.

slide-10
SLIDE 10

10

  • 2. Validation pipelines
  • Simpler to write a series of SPARQL

UPDATE operations than one big query

  • Want standard ways to define validation

pipelines

slide-11
SLIDE 11

11

  • 3. Better URI pattern matching and

munging

  • Often need to generate URIs from natural

keys

  • Want easier mechanisms for:

– Checking URI patterns – Detecting misspellings

slide-12
SLIDE 12

12

  • 4. Validation like automated

regression testing

  • Lots of small, independent tests over one

big one

– E.g., one file per test – Contrast big ontology approach

  • Goals:

– Easy to add a new test – Can test anything

slide-13
SLIDE 13

13

  • 5. Operational versus declarative
  • Declarative is convenient for very simple

tests, e.g., pattern matching

  • Operational is easier for more complex

tests, e.g.:

– "Do A, then B, then C, then result should be X"

  • Note: SPARQL UPDATES can be used

this way

slide-14
SLIDE 14

14

Summary

  • SPARQL-based
  • Or something else that is powerful and

concise