Thoughts on Validating RDF Healthcare Data David Booth, Ph.D. - - PowerPoint PPT Presentation
Thoughts on Validating RDF Healthcare Data David Booth, Ph.D. - - PowerPoint PPT Presentation
Thoughts on Validating RDF Healthcare Data David Booth, Ph.D. KnowMED, Inc. 2013 W3C RDF Validation Workshop Latest version of these slides: http://dbooth.org/2013/validation/dbooth-slides.pdf Why RDF? Schema promiscuous Green Model Blue
2
Why RDF?
Schema promiscuous
HomePhone Town ZipPlus4 FullName Country Address FirstName LastName Email City ZipCode Red Model Blue Model Green Model Country subClassOf sameAs hasLast hasFirst
Multiple models peacefully coexist
3
Why RDF?
Schema promiscuous
- What the Blue app sees:
HomePhone Town ZipPlus4 FullName Country Address FirstName LastName Email City ZipCode Red Model Blue Model Green Model Country Country Address FirstName LastName Email City ZipCode Blue Model Country
4
Why RDF?
Schema promiscuous
- What the Red app sees
HomePhone Town ZipPlus4 FullName Country Address FirstName LastName Email City ZipCode Red Model Blue Model Green Model Country HomePhone Town ZipPlus4 FullName Country Red Model
5
Why RDF?
Schema promiscuous
- What the Green app sees
HomePhone Town ZipPlus4 FullName Country Address FirstName LastName Email City ZipCode Red Model Blue Model Green Model Country HomePhone Town ZipPlus4 Country FirstName LastName Email Green Model Country
Need multiple validation perspectives on the same data!
6
Data producers and consumers
A B C Red Blue Green Producers Producers Consumers
7
Two perspectives of validation
- Producers: Model integrity
– Is the data well formed? (Sanity check) – Does it contain what I promised?
- Consumers: Suitability for use
– Does the data meet my needs? – Different consumers have different needs!
Need multiple validation perspectives on the same data!
8
Features I'd like to see . . .
9
- 1. SPARQL-based framework
- Fewer languages == easier maintenance
- Nice to either:
– Build on SPARQL, or – Use from SPARQL
- BUT if a new language were very concise
and powerful, I'd jump on it.
10
- 2. Validation pipelines
- Simpler to write a series of SPARQL
UPDATE operations than one big query
- Want standard ways to define validation
pipelines
11
- 3. Better URI pattern matching and
munging
- Often need to generate URIs from natural
keys
- Want easier mechanisms for:
– Checking URI patterns – Detecting misspellings
12
- 4. Validation like automated
regression testing
- Lots of small, independent tests over one
big one
– E.g., one file per test – Contrast big ontology approach
- Goals:
– Easy to add a new test – Can test anything
13
- 5. Operational versus declarative
- Declarative is convenient for very simple
tests, e.g., pattern matching
- Operational is easier for more complex
tests, e.g.:
– "Do A, then B, then C, then result should be X"
- Note: SPARQL UPDATES can be used
this way
14
Summary
- SPARQL-based
- Or something else that is powerful and