Data Management with Ontologies Bijan Parsia bparsia@cs.man.ac.uk - - PowerPoint PPT Presentation

data management with ontologies
SMART_READER_LITE
LIVE PREVIEW

Data Management with Ontologies Bijan Parsia bparsia@cs.man.ac.uk - - PowerPoint PPT Presentation

Data Management with Ontologies Bijan Parsia bparsia@cs.man.ac.uk 1 Friday, 7 December 2012 1 The Future is Coming...Soon Reducing Paperwork and Administrative Costs. Health care remains one of the few industries that relies on paper


slide-1
SLIDE 1

Data Management with Ontologies

Bijan Parsia bparsia@cs.man.ac.uk

1

1 Friday, 7 December 2012

slide-2
SLIDE 2

The Future is Coming...Soon

Reducing Paperwork and Administrative Costs. Health care remains one of the few industries that relies on paper records. The new law will institute a series of changes to standardize billing and requires health plans to begin adopting and implementing rules for the secure, confidential, electronic exchange of health information. Using electronic health records will reduce paperwork and administrative burdens, cut costs, reduce medical errors and most importantly, improve the quality of

  • care. First regulation effective October 1, 2012.

—http://www.whitehouse.gov/healthreform/timeline

http://en.wikipedia.org/wiki/File:ColumbiaStahrArtwork.jpg http://uncyclopedia.wikia.com/wiki/File:Uncle_Sam_I_Want_You_1.jpg

2 Friday, 7 December 2012

slide-3
SLIDE 3

50% of doctors use EHR

  • This is up from 25% in 2005

http://www.washingtonpost.com/blogs/ezra-klein/wp/2012/07/19/about-half-of-doctors-use-electronic-records/

3 Friday, 7 December 2012

slide-4
SLIDE 4

50% of doctors use EHR

  • This is up from 25% in 2005

...not as impressive as it seems!

http://thehealthcareblog.com/blog/2011/12/02/2011-ehr-adoption-rates/

4 Friday, 7 December 2012

slide-5
SLIDE 5

What’s Stopping Us? (1)

“I wish the doctor had spent as much time with me as she did with her PC” Many years ago, an excited friend who worked for one of the electronic health record (EHR) vendors at that time — it was really more of a billing and patient tracking and management system than an EHR — was desperate to show me some of their latest applications. In particular, a new module they had developed to capture clinical data. My friend pulled out his laptop (see here for visual), fired up the application, selected a patient and proceeded to enter blood pressure (BP). Some 20-plus clicks later, he had entered a BP of 120/80. While he was excited, I was

  • dumbfounded. When it comes to patient care, doctors didn’t

have time for 20 clicks to record BP years ago and they definitely don’t have that luxury in today’s demanding medical environment.

http://ehrintelligence.com/2012/07/12/clinical-documentation-in-the-ehr/

5 Friday, 7 December 2012

slide-6
SLIDE 6

Clinical Data Capture

Choose terms from a coding scheme

enter search:

cystitis

Acute cystitis Subacute cystitis, NOS Follicular cystitis Cystitis, NOS Idiopathic cystitis Chemical cystitis Postoperative cystitis Drug induced cystitis Iatrogenic cystitis Radiation cystitis Chronic cholecystitis Acute cholecystitis Bacterial Cholecystitis Cholecystitis, NOS Bacterial cystitis

next page

etc

...picking lists too long Too Big Too Small ...not enough clinical detail

http://www.cs.man.ac.uk/~rector/presentations/snomed-rector-history-and-future-of-terminology.ppt

6 Friday, 7 December 2012

slide-7
SLIDE 7

Shape and value sensitivity

7 Friday, 7 December 2012

slide-8
SLIDE 8

What (else) is Stopping Us?

http://thehealthcareblog.com/blog/2012/01/27/medical-records-supporting-san-francisco%E2%80%99s- universal-care-add-millions-to-official/

A separate and much more complex piece of technology — electronic health records — is proving difficult and expensive. Knitting together incompatible computer systems across the 35 medical sites so they can easily share detailed patient medical records could costs the city millions beyond what is included in the official price tag. An incomplete survey of technology costs borne by the clinics themselves this year reveals spending of at least $15 million in addition to what was budgeted for the whole program...But that sum is likely millions higher, since eight clinics could not or would not say how much they spent

  • r were planning to spend integrating their patient records.

The current patchwork of at least 11 different computer systems across the network do not easily talk with one another. ... This incompatibility of record keeping sometimes causes delays, repeated tests, unnecessary procedures and gaps in care as patients move from doctor to doctor. Ideally, say technology planners, there ought to be just one system citywide. But that is unlikely to happen soon.

8 Friday, 7 December 2012

slide-9
SLIDE 9

Database Modelling and Integration

9

9 Friday, 7 December 2012

slide-10
SLIDE 10

Simple ER Representation

http://www.inf.unibz.it/~franconi/icom/tutorial-1.html

10 Friday, 7 December 2012

slide-11
SLIDE 11

INFERENCE!

http://www.inf.unibz.it/~franconi/icom/tutorial-1.html

11 Friday, 7 December 2012

slide-12
SLIDE 12

Modify the Schema

http://www.inf.unibz.it/~franconi/icom/tutorial-1.html

12 Friday, 7 December 2012

slide-13
SLIDE 13

Unwanted Consequences

http://www.inf.unibz.it/~franconi/icom/tutorial-1.html

13 Friday, 7 December 2012

slide-14
SLIDE 14

Strengthen a Constraint

http://www.inf.unibz.it/~franconi/icom/tutorial-1.html

14 Friday, 7 December 2012

slide-15
SLIDE 15

The Problem Spreads

http://www.inf.unibz.it/~franconi/icom/tutorial-1.html

15 Friday, 7 December 2012

slide-16
SLIDE 16

Data Integration

  • The prior example was single schema

– So Design Support – Developing, extending, evolving, exploring a schema – (We could treat it as two, of course.)

  • Often must combine schemata

– Either different schemas for the “same” data – Disparate schemas for “overlapping” data – Different schemas for separate but related data

  • Development Time only?!

16 Friday, 7 December 2012

slide-17
SLIDE 17

Two Schemata; “Same” data

http://www.inf.unibz.it/~franconi/icom/tutorial-2.html

17 Friday, 7 December 2012

slide-18
SLIDE 18

http://www.inf.unibz.it/~franconi/icom/tutorial-2.html

18 Friday, 7 December 2012

slide-19
SLIDE 19

INFERENCE!!!

http://www.inf.unibz.it/~franconi/icom/tutorial-2.html

19 Friday, 7 December 2012

slide-20
SLIDE 20

Where’s the error!!!

http://www.inf.unibz.it/~franconi/icom/tutorial-2.html

1 2 3 4 5

20 Friday, 7 December 2012

slide-21
SLIDE 21

What should we change?

http://www.inf.unibz.it/~franconi/icom/tutorial-2.html

1 2 3

21 Friday, 7 December 2012

slide-22
SLIDE 22

Insert a Fix and Verify

22 Friday, 7 December 2012

slide-23
SLIDE 23

Ontology Use at Run Time

  • Ontology at run time?

– More, ontology for the end user!??!

  • By end user, I mean, “someone writing queries”

– Ontology Based Data Access (ODBA)

  • Familiar

– Controlled vocabulary based – Query by example

  • New

– “Better” queries – Integrated views of data

23 Friday, 7 December 2012

slide-24
SLIDE 24

“Better” queries

  • Better how?

– Consider a simple schema – What does the logical schema look like? – Lots of variants

  • Sane queries

– SELECT hasAge FROM employee WHERE hasSalary >= 50000; – SELECT hasAge FROM student WHERE hasSalary >= 50000; – What about Persons?

  • Union query?
  • Rather write

– SELECT hasAge FROM Person WHERE hasSalary >= 50000; – no matter what kind of persons there are

Person Student Employee

hasAge hasSalary

create table employee (id number(4) hasAge number(3), hasSalary number(6); create table student (id number(4) hasAge number(3), hasSalary number(5);

24 Friday, 7 December 2012

slide-25
SLIDE 25

Clinical Model OBDA Model (Mappings) OBDA Aware Query Engine Legacy Database Query Report Data Entry Form Query Report Data Entry Form

Integrating with Legacy

https://babbage.inf.unibz.it/trac/obdapublic/raw-attachment/wiki/ObdalibQuestIntro/virtual.png

25 Friday, 7 December 2012

slide-26
SLIDE 26

Clinical Model OBDA Model (Mappings) OBDA Aware Query Engine Query Report Data Entry Form

Legacy Database Query Report Data Entry Form

OBDA Model (Mappings)

Legacy Database Query Report Data Entry Form

Data Integration

26 Friday, 7 December 2012

slide-27
SLIDE 27

What do we need?

  • Richer query language!

– Need at least conjunctive queries

  • I.e., patterns with explicit varibles
  • Tree queries don’t cut it!
  • SQLesque

– Ontology sensitive!

  • The queries should respect the semantics
  • Data access!

– ETL...populate an ABox from a Database – Distributed

  • Leave my database ALOOOOOONNNNNEE!!!!!

– Need mappings

  • Good computation

– Some fragments of OWL tuned for this

  • Cf OWL QL and OWL EL
  • Polynomial; OWL QL has pure query expansion implementations

27 Friday, 7 December 2012

slide-28
SLIDE 28

OWL and Data (Properties)

28 Friday, 7 December 2012

slide-29
SLIDE 29

OWL Has Two “Worlds”

  • The world of logic

– Classes, individuals, (object) object properties – Java analogue:

  • Classes, instances, and object valued instance variables
  • The world of “data”

– Datatypes, data values, data properties (well, these span worlds) – Java analogue:

  • Primitive types, primitive data values, primitively-valued instance vars

29 Friday, 7 December 2012

slide-30
SLIDE 30

The World of Logic

  • “Abstract”

– Individuals are members of classes – We know nothing about them except what the ontology says

  • Individual: Bijan Types: Person.
  • Individual: Sean Types: Person.
  • Class: Instructor SubClassOf: Person
  • What do we know about Bijan, Sean, Instructor, and Person?

– Individuals (etc.) are characterized entirely by the user axioms

  • Ok, mostly!

– Tautologies hold: Bijan Types: owl:Thing.

  • What’s left unsaid may or may not hold

– Open world assumption (and no unique name assumption) – Think of the various models

  • Remember: The Domain is Arbitrary

30 Friday, 7 December 2012

slide-31
SLIDE 31

The World of Data

  • “Concrete”

– Just as with primitive types, we have predefined names:

  • For individuals:

– 1, 2, 0, 1.0, “I’m a string!”, "51"^^xsd:integer

  • For sets of individuals (aka types)

– integer, xsd:string, xsd:nonNegativeInteger, xsd:decimal, etc.

– These names have a fixed interpretation!

  • That is, I(“1”^^xsd:integer) is always the integer 1.
  • xsd:integer is always the set of integers
  • The atomic names (singular and plural) have built-in meaning

– On the abstract side, this is only true for owl:Thing, owl:Nothing,

  • wl:topObjectProperty, owl:bottomObjectProperty
  • wl:topDataProperty, owl:bottomDataProperty, and the logical

connectives – (The actual meanings for the tops will vary with the domain)

31 Friday, 7 December 2012

slide-32
SLIDE 32

Fixed meaning!

  • There is a lot we know about integers

– DataProperty: Height Characteristics: Functional – Individual: Bijan Facts: height “6”^^xsd:integer

  • We know that my height cannot be equal to 2, 4, or 8

– Bijan Facts: height 6, height 2 » Inconsistent!

  • We know that my height cannot be a xsd:string
  • Compare with:

– ObjectProperty: Height Characteristics: Functional – Individual: Bijan height Six, height Two – What follows?

  • We can replicate the inequality on the abstract side

– Just add Individual: Six DifferentFrom: Two – For all integers… (DifferentIndividuals helps only a little) – Many more entailments to formalize...

32 Friday, 7 December 2012

slide-33
SLIDE 33

What Can We Define?

  • We have an “expression language” for data

– We can derive new types from our primitives

  • integer[<= 0 , >= 150]
  • This is a restriction on integer (a DataRestriction)

– The subset of integers between 0 and 150, inclusive

  • “<=” and “>=” have built in meaning

– That the values respect, e.g., 1<=2 but not 2 <=1 – Formalize THAT on the logic side!

– We can name these expressions

  • In a limited way
  • Datatype: personAge EquivalentTo: integer[<= 0 , >= 150]

– We can express boolean combinations of expressions

  • not personAge
  • integer[<=0] or integer[>=150]
  • (integer[<=0] or integer[>=150]) and not personAge
  • (And enumeratons, {1, 2, 3})

33 Friday, 7 December 2012

slide-34
SLIDE 34

Between Two Worlds

  • DataProperties

– Disjoint from ObjectProperties – ObjectProperties are interpreted into the crossproduct of the (abstract) domain (i.e., Δ⨯Δ) – DataProperties are interpreted into the crossproduct of the abstract and data domains (i.e., Δ⨯Δd) – Δ and Δd are disjoint – Δd is a (large) superset of the union of the value spaces

  • DataProperty Axioms

– Most of the usual: Sub/Equivalent/Disjoint, etc.

  • Restrictions on DataProperties

– In general no “chaining”:

  • No transitive, inverse, reflexive, etc.
  • Anything that would potentially “merge” the domains

34 Friday, 7 December 2012

slide-35
SLIDE 35

We Can Define and Describe

  • Datatypes

– in terms of facets and boolean operators

  • Another embedding of propositional logic!
  • DataProperties

– in terms of other DataProperties and their characteristics

  • Classes

– in terms of restrictions (existential, universal, and counting) on DataProperties

  • Individuals

– in terms of their relations to data values

  • But not

– Datatypes in terms of classes – DataProperties are “one way”

35 Friday, 7 December 2012

slide-36
SLIDE 36

Between Two Worlds

  • height(Bijan, 6) where the second argument is from a

different type or sort Bijan 6 height

36 Friday, 7 December 2012

slide-37
SLIDE 37

Between Two Worlds

  • Class: To3 EquivalentTo: P min 3 Thing
  • Individual: Bijan Facts: P x, P y, P z

Bijan X Y Z Bijan = X= Y Z Both models of the KB!

37 Friday, 7 December 2012

slide-38
SLIDE 38

Between Two Worlds

  • Class: To3 EquivalentTo: P min 3 integer
  • Individual: Bijan Facts: P 1, P 2, P 3

Bijan 1 P 2 3 P P Part of every model

38 Friday, 7 December 2012

slide-39
SLIDE 39

Two World Philosophy

  • OWL is for developing theories about the world

– Very blank slate – We’re cautious about what we conclude

  • Pedantry is critical!
  • We have excellent theories about integers

– And strings! As well as how to compute with them

  • There’s no point in trying to formalize integers

– Integers should be a standard part of our language – We don’t want 43 half baked integer ontologies – Such theories are hard to control and understand

  • The slate isn’t blank, it’s muddled
  • With built-ins, the slate isn’t blank, but it is clean
  • Very hard to recognize a half baked integer theory

– As a theory of integers

39 Friday, 7 December 2012

slide-40
SLIDE 40

The Standard Datatype Map: Types

“Maths” Numbers String Misc

  • wl:real
  • wl:rational

xsd:decimal rdf:plainLiteral xsd:IRI xsd:string rdf:XMLLiteral xsd:boolean xsd:decimal xsd:integer xsd:string xsd:normalizedString Date and Time xsd:integer xsd:nonNegativeInteger xsd:nonPositiveInteger xsd:positiveInteger xsd:negativeInteger xsd:long xsd:normalizedString xsd:token xsd:language xsd:Name xsd:NCName xsd:NMTOKEN xsd:dateTime xsd:dateTimeStamp xsd:long xsd:int xsd:NMTOKEN Binary xsd:int xsd:hexBinary xsd:int xsd:short xsd:byte “Computer” Numbers xsd:hexBinary xsd:hex64Binary xsd:byte xsd:unsignedLong xsd:unsignedInt xsd:unsignedShort xsd:unsignedByte xsd:double xsd:float xsd:hex64Binary

40 Friday, 7 December 2012

slide-41
SLIDE 41

The Standard Datatype Map: Facets

“Maths” Numbers String Misc xsd:minInclusive xsd:maxIncluse xsd:minExclusive xsd:length xsd:minLength xsd:maxLength None xsd:minExclusive xsd:maxExclusive xsd:maxLength xsd:pattern Date and Time xsd:maxExclusive xsd:pattern (for rdf:plainLiteral rdf:langRange) xsd:minInclusive xsd:maxIncluse xsd:minExclusive xsd:maxExclusive Binary xsd:length “Computer” Numbers xsd:length xsd:minLength xsd:minInclusive xsd:maxIncluse xsd:minExclusive xsd:maxExclusive xsd:minLength xsd:maxLength

41 Friday, 7 December 2012

slide-42
SLIDE 42

The Standard Datatype Map: Meaning

“Maths” Numbers String Misc Everything derived from

  • wl:Real. Note that

there area many elements of owl:Real Everything derived from rdf:plainLiteral except xsd:IRI (which is disjoint from the Mutually Disjoint there area many elements of owl:Real which have no lexical except xsd:IRI (which is disjoint from the rest) Date and Time which have no lexical form. rest) Mutually disjoint Binary Mutually disjoint “Computer” Numbers Mutually disjoint Mutually disjoint

Each main category is disjoint from the rest

42 Friday, 7 December 2012

slide-43
SLIDE 43

Restrictions on Data Definitions

  • There are four big restrictions
  • 1. No redefining built-in datatypes
  • 2. No facets on user defined datatypes
  • integer[<=0] is ok; personAge[<=10,>=20] is not
  • 3. No (direct or indirect) cycles in definitions
  • PersonAge EquivalentTo: PersonAge or string
  • 4. Only one definition axiom per user defined datatype
  • The following (alone) is ok

– personAge EquivalentTo: integer[<= 0 , >= 150]

  • Adding the latter is syntactically wrong

– personAge EquivalentTo: xsd:nonNegativeInteger[>= 150] – Even though the definitions are equivalent! – (Consider what happens when they are not) – Hidden cyclish thing!

43 Friday, 7 December 2012

slide-44
SLIDE 44

Two Results of the Restrictions

  • The datatype map doesn’t change an ontology

– I.e., only used datetypes matter – Thus, the built-in datatype map is “shrinkable”

  • I can implement only a subset

– Thus, the built-in dataype map is “extensible”

  • I can implement additional datatypes
  • Datatype definitions are “unfoldable”

– We can treat them as “macros”

  • Replace all defined datatypes (recursively) with the body of their definitions
  • These are related!
  • Datatype maps are semantically robust

– As long as we cover the used datatypes, nothing else in the map matters to the meaning of an ontology

44 Friday, 7 December 2012

slide-45
SLIDE 45

More Restrictions

  • Restrictions on Base/Primitive Datatypes

– They must be “admissible”

  • They must support three properties

– A top predicate – Closed under negation – Satisfiability of conjunctions is decidable » integer[>=0] and integer[<=-10] is unsatisfiable

– We consider only facets (i.e., unary predicates)

  • Restriction on DataProperty restrictions

– They are path or “chain” free

  • “people whose height is less than 10”
  • not “people whose spouse’s height is less than 10”
  • Any datatype that meets these criteria

– When combined with OWL results in a decidable formalism – Can be implemented using a “datatype oracle”

Bijan

9

height

Bijan

9

height

Zoe

hasSpouse

45 Friday, 7 December 2012

slide-46
SLIDE 46

Robustness of Datatypes

  • Semantically robust

– Ontologies don’t change when you add or remove non-used datatypes

  • Computationally robust

– Very robust for decidability – Complexity is a bit trickier

  • Implementably robust

– Highly modular implementation

  • Expressively limited

– Can’t even say “a square’s height equals its width”! – Can’t talk about the whole data domain

  • We trade off expressivity for robustness

46 Friday, 7 December 2012

slide-47
SLIDE 47

Two World Philosophy Benefits

  • From a user perspective:
  • + Integers "Just Work"
  • So do strings, floats, decimals, etc.
  • Powerful constructors
  • + Normal syntax
  • + Clean separation (data and objects; user theory and builtin

theory)

  • May be a -
  • - Limits on user extensibility
  • And transparency, explorability
  • - Expressivity restrictions (no addition!)
  • From a theory perspective:
  • +Analyzable
  • From an implementation perspective:
  • +Modular implementation
  • -Must extend implementation to accommodate new types, facets

47 Friday, 7 December 2012

slide-48
SLIDE 48

A Study of Quantities

How much wood would a wood chuck chuck?

48 Friday, 7 December 2012

slide-49
SLIDE 49

All About Quantities

Quantities are ubiquitous

  • Length, time, charge, amount of money, acceleration, velocity,

interest rate...

  • Measures of some sort
  • Generally represented with a magnitude (a real number) and a

unit of measurement

  • Implicitly, quantities have a dimension
  • Dimensions are disjoint
  • Quantities may be derivied from other quantities
  • (Except for "base" quantities)
  • Quantities can be complex
  • Consider a first order (plus maths) ontology
  • http://www-ksl.stanford.edu/knowledge-sharing/papers/engmath.htm

49 Friday, 7 December 2012

slide-50
SLIDE 50

One Definition

(defrelation UNIT-Of-MEASURE ;; units are scalar quantities (=> (unit-of-measure ?u) (scalar-quantity ?u)) ;; units are positive (=> (unit-of-measure ?u) (forall ?u2 (=> (and (unit-of-measure ?u2) (= (quantity.dimension ?u) (quantity.dimension ?u2))) (positive (magnitude ?u ?u2))))) ;; units can be combined using * (abelian-group unit-of-measure * identity-unit) ;; units can be combined using expt (=> (and (unit-of-measure ?u) (real-number ?r)) (unit-of-measure (expt ?u ?r))) ;; * is commutative for units and other Qs (=> (and (unit-of-measure ?u) (constant-quantity ?q))) (= (* ?u ?q) (* ?q ?u))))

50 Friday, 7 December 2012

slide-51
SLIDE 51

Formalizing in OWL

  • Much we can't hope to capture

– Without extensions

  • How might we capture with vanilla OWL?

– First thought: data properties – Yangtze Types: River

  • Facts: length-in-miles "3937.5"ˆˆxsd:decimal
  • Advantages: Simple, flexible, extensible
  • What happens with different units?

– Yangtze Facts: length-in-kilometers "6300"ˆˆxsd:decimal

  • Are these facts consistent?
  • How do we convert?
  • (Linear (in)equations would help!)

51 Friday, 7 December 2012

slide-52
SLIDE 52

Capturing Conversion

  • Equations capture the unit conversion

– SubClassOf(River – DataAllValuesFrom(length-in-miles length-in-kilometers – DataComparison(Arguments(mi km) eq( mi (* 1.609 km) ))))

  • Dimensional analysis

– Are the properties disjoint? – Do we need a superproperty "length"? – How about derived quantities?

  • Big problem

– Horrific proliferation of strange properties

  • -in-miles, -in-feet, -in-centimeters, ...
  • length-, height-, depth-, circumference-

– Lots of extra stuff in the ontology – Error prone; computationally expensive

52 Friday, 7 December 2012

slide-53
SLIDE 53

A Different Axiomitization?

  • A diagnosis

– The problem was overloading properties – OWL does better with classes structuring objects – So, let's try structured objects

  • Basic idea:

– A quantity is an instance of the class Quantity – Every such individual has a magnitude and an associated unit

  • E.g., it has a property hasUnit which is range owl:real (or whatever)
  • Extreme version

– UnitDim – (There are dozens of unit ontologies.)

53 Friday, 7 December 2012

slide-54
SLIDE 54

Comparing Approaches

  • Units in Properties (height-in-meters, height-in-feet)

– Pros: Works with path free n-ary – Cons: Insane proliferation; mixing of concerns

  • Structured objects

– Pros: Extensible, "manageable", can be used to explore quantities – Cons: Hard to get right; blows up ontology; pathful N-ary; verbose and weird; OWL can't model most of the semantics; two much expressivity required for too little gain

54 Friday, 7 December 2012

slide-55
SLIDE 55

A Solution

  • Datatypes!

– Exactly what Two World Philosophy called for

  • Separates domain and quantity theories
  • User friendly:

– Yangtze Types: River

  • Facts: length Yangtze "6300 km"ˆˆowl:quantity,

length Yangtze "3937.5 miles"ˆˆowl:quantity

  • Simple, direct, no clutter

– And an efficient engine works behind the scenes! – Doesn’t touch anything when it doesn’t have to – No n-ary used up!

55 Friday, 7 December 2012

slide-56
SLIDE 56

More Examples

  • Assuming height is functional:
  • Consistent

– sheeva Facts: height "2 meters"ˆˆowl:quantity,

height "200 cm"ˆˆowl:quantity

  • Inconsistent

– sheeva Facts: height "2 m"ˆˆowl:quantity,

height "2 cm"ˆˆowl:quantity

  • Entails that Sheeva is tall

– EquivalentClasses(Tall SomeValuesFrom(height DatatypeRestriction(xsd:integer minExclusive "6 feet"ˆˆowl:quantity)))

56 Friday, 7 December 2012

slide-57
SLIDE 57

What About Implementation?

  • For datatype reasoning (i.e., no equations) we can

implement by translation

– That is, in a preprocessing step

  • Normalize and simplify the units

– I.e., ft, cm etc. go to m – This eliminates the need for equations in the ontology

  • Translate the normalized quantities into a structured object

– JohnDoe Facts: height "6.0 ft"ˆˆowl:quantity to – JohnDoe Facts: mint:height new:d1. – new:d1 Types: mint:DimLength – Facts: quantity:magnitude "1.8288"ˆˆowl:real

  • Import the relevant ontology (e.g., where every dimension class is

disjoint from all the others).

  • Feed the new ontology to the reasoner
  • Hide all this from the user

57 Friday, 7 December 2012

slide-58
SLIDE 58

(Dis)Advantages Of The Implementation

  • Advantages

– Much better than doing it by hand – Reasoner independent – Easy to inspect and experiment with

  • Disadvantages

– Only works in a restricted case – Need to make sure details of the translation don't "leak" out

58 Friday, 7 December 2012

slide-59
SLIDE 59

(Dis)Advantages Of A Datatype

  • Advantages

– Very very user friendly – Can scale to sophisticated equations – Proper simplification and analysis – Solve the problem once – Implementation easy

  • Disadvantages

– Doesn't help reason about quantities

  • Supports ontologies with quantities, not about quantities

– Needs a fair bit of tool support – Extensibility story needed

59 Friday, 7 December 2012

slide-60
SLIDE 60

A Policy Use Case

If (we have time) then (do this bit)

60 Friday, 7 December 2012

slide-61
SLIDE 61

Applications: Beyond terminological

  • WS-Policy

–Represent conditions for service use –Sets of policy alternatives

  • Conjunctions of basic propositions
  • The policy is a disjunction of the conj.
  • XML representation

–Represents the standard syntax

  • http://www.w3.org/2007/02/ws-policy.xsd

–Semantics given in prose

  • http://www.w3.org/TR/ws-policy/

http://www.mindswap.org/2005/services-policies/

61 Friday, 7 December 2012

slide-62
SLIDE 62

Example Policy

<wsp:Policy xmlns:sp="http://docs.oasis-open.org/ws-sx/ws-securitypolicy/200702" xmlns:wsp="http://www.w3.org/ns/ws-policy"> <wsp:ExactlyOne> <wsp:All> <sp:SignedParts> <sp:Body/> </sp:SignedParts> </wsp:All> <wsp:All> <sp:EncryptedParts> <sp:Body/> </sp:EncryptedParts> </wsp:All> </wsp:ExactlyOne> </wsp:Policy>

http://www.w3.org/TR/ws-policy/#Normal_Form_Policy_Expression

62 Friday, 7 December 2012

slide-63
SLIDE 63

What Services?

  • XML Services

– Policy document is valid

  • Policy Services

– A message conforms to the policy

  • What else?

63 Friday, 7 December 2012

slide-64
SLIDE 64

Encoding in OWL

  • (OWL is “semantical”, why not?)
  • Two ways:

– Encode the XML directly

  • I.e., an OWL based description of the semantics
  • OWL qua XML schema languages

– Map the policy constructs

  • To similar meaninged OWL constructs

64 Friday, 7 December 2012

slide-65
SLIDE 65

Syntactic Mapping

<ClassAssertion> <Class IRI="ExactlyOne"/> <NamedIndividual IRI="MyPolicy"/></> <ClassAssertion> <Class IRI="All"/> <NamedIndividual IRI="All1"/></> <ObjectPropertyAssertion> <ObjectProperty IRI="hasAlt"/> <NamedIndividual IRI="MyPolicy"/> <NamedIndividual IRI="All1"/></> <ObjectPropertyAssertion> <ObjectProperty IRI="hasProposition"/> <NamedIndividual IRI="All1"/> <NamedIndividual IRI="SignedPart"/></>

65 Friday, 7 December 2012

slide-66
SLIDE 66

Worth it?

  • We could add constraints

–“Every Policy hasAlt only ExactlyOnes” –But we wouldn’t get validation

  • Like Schematron, what isn’t forbidden is permitted
  • Consider,

– Axiom: Parents must have at least one child – Axiom: Sally is a Parent – In XML, the latter isn’t valid – In OWL, it is consistent » We just don’t know who the child is »Open World Assumption (OWA)

66 Friday, 7 December 2012

slide-67
SLIDE 67

Semantic Mapping

  • Note:

– All = conjunction = IntersectionOf – ExactlyOne = Disjunction = UnionOf* – Propositions = Named Classes – Policies can be seen as a class

  • With conforming things members of that class
  • Translate Policies into Classes

* Cheating a bit: it’s exclusive or

Demo

67 Friday, 7 December 2012

slide-68
SLIDE 68

Services!

  • Policy conformance

– Class membership

  • Policy containment

– Subsumption – Conforming to A => conforming to B

  • Equivalence
  • Incoherence

– No one can conform to my policy

68 Friday, 7 December 2012

slide-69
SLIDE 69

A New Style of Development

  • Two new aspects

– Possibility of rich background knowledge

  • Ontologies!

– Iterative, ontologicalish development style

  • Policies as classes!
  • They interact

– Consider an integration scenario

  • One organization requires “Retry-On-Failure” messaging
  • The other requires “Retry-Until-Succeed”
  • The integrator can mark these as forms of ReliableMessaging

– Subclasses! Not disjoint per se, but not equivalent

  • The general policy of “use some form of ReliableMessaging” holds

– But we can also see that they are not using the same policy

– General policies can be specialized

  • The general policy might be a law or regulation
  • Verify that changes comply with the law!

69 Friday, 7 December 2012

slide-70
SLIDE 70

Additional Flexibility

  • As well as prescription, description

– Alternative hierarchies of policies provide interesting analytics

  • Which can interact with non-policy background knowledge
  • “Which dept. with a new manager in the past year has a conforming

policy?”

– Factor interests

  • Degrees of privacy vs. degrees of reliability
  • Who doesn’t have access?
  • Policies can remain simple

– while the background knowledge expands

70 Friday, 7 December 2012

slide-71
SLIDE 71

Benefits

  • We have a policy checker!

– And so much more – But...slow?

  • The so much more

– Very, very useful at development time

  • Worth the computation time!

– Up to a point

– Deployment time has other constraints

  • Restrict our logic to development time

– Compile to simpler representations at deployment

  • Sometimes may want logic at deployment

– E.g., if the system need to be extensible.

71 Friday, 7 December 2012

slide-72
SLIDE 72

Linked Data

72 Friday, 7 December 2012

slide-73
SLIDE 73

What is Linked Data?

  • Linked Data is characterised by a set of principles
  • 1. Use URIs as names for things
  • 2. Use http URIs so that those names can be dereferenced.
  • 3. When a URI is looked up, provide useful information
  • 4. Include statements that link to other URIs so that more

information can be discovered.

– http://www.w3.org/DesignIssues/LinkedData.html

  • Linked Data is a social movement

– Particularly the Linked Open Data and Open Gov Data folks

  • Linked Data has favored technologies

73 Friday, 7 December 2012

slide-74
SLIDE 74

Why Linked Data?

  • Sharing

– Obviously good, right?

  • Er...privacy?
  • “Integration”

– RDF allows for syntactic integration

  • Every union of graphs is a well formed graph

– XML waves its arm wildly

– URI merging

  • Entities and properties with the same URI are “the same”

– Human hints

  • HTTP can provide human readable information!

– Look up the meaning

– Computer hints

  • HTTP can provide machine readable information

– I.e., ontologies

74 Friday, 7 December 2012

slide-75
SLIDE 75

Is OWL ready for Linked Data?

  • URIs!

– Built deep into the language

  • Trades off expressivity vs. computability

– OWL 2 DL itself is decidable – The profiles (OWL 2 EL, QL, RL)

  • Have polynomial reasoning services
  • Have “tuned” expressivity

– EL “looks like (biomedical) ontologies” – QL “looks like UML/ER diagrams” » And has nifty DB implementations – RL “looks like databases” » plus rules

  • Rich infrastructure

75 Friday, 7 December 2012

slide-76
SLIDE 76

Is it used?

http://events.linkeddata.org/ldow2012/papers/ldow2012-paper-16.pdf

76 Friday, 7 December 2012

slide-77
SLIDE 77

Lots of Issues

  • Missing expressivity

– Calculations, validation, probabilities, space, time... – How much do we really need?

  • Computational issues

– “Web Scale” – Issues just with RDF (qua ABox)!

  • FishBase MySQL DB = 195 tables (3GB)
  • FishDelish RDF ETL Dump = 1.38bn triples (250GB)
  • Performance....

77 Friday, 7 December 2012

slide-78
SLIDE 78

Samantha Bail FishMark: A Linked Data Application Benchmark

Results: Queries per second (with cache)

78

Factor = comparison against MySQL performance

Query name Virtuoso Virtuoso factor Quest Quest factor MySQL CSpeciesInformation CAquariumTrade CUsedForAquaculture CommonName PicturePage Species FamilyNominalSpecies Genus FamilyAllfish CEndemic CPotentialAquaculture CollaboratorPage FamilyListOfPictures CGameFish FamilyInformation SpeciesPage CCommercial CIntroduced CAllFish CPelagic CReefAssociated CFreshwater 15 1.10% 866 65.40% 1324 84 6.40% 910 69.80% 1303 14 1.10% 850 66.70% 1274 128 10.20% 850 67.50% 1258 149 12.10% 893 72.10% 1238 197 16.30% 951 78.40% 1212 173 15.40% 849 75.40% 1126 157 14.50% 818 75.20% 1087 155 14.70% 796 75.90% 1049 19 1.90% 733 70.70% 1037 45 4.50% 714 70.40% 1014 26 2.60% 657 65.40% 1006 105 10.60% 728 73.30% 993 11 1.10% 639 65.20% 979 17 1.90% 593 65.60% 903 1 0.10% 578 71.20% 811 14 1.80% 541 70.20% 771 14 2.10% 442 67.90% 651 28 5.10% 413 74.70% 553 9 1.90% 349 74.10% 471 5 1.40% 273 70.40% 388 7 2.30% 217 69.90% 310

Thanks to

ODBA/OWL QL looks good even though it’s not doing anything!

78 Friday, 7 December 2012

slide-79
SLIDE 79

Is Linked Data Ready?

  • Lots still to do!
  • I’ve several projects in this space

– FishMark — An Application Benchmark for Linked Open Data

  • Published (workshop) paper; conference paper in submission!

– A Linked Open Data, Crowd Curated Wolfram Alpha Clone

  • And with ontologies

– A Clinical Ontology Based on Ferri's Clinical Advisor

  • More for the asking!

79 Friday, 7 December 2012

slide-80
SLIDE 80

That’s All

80 Friday, 7 December 2012

slide-81
SLIDE 81

Well, not quite

  • Exam

– Stopford PC Cluster 1 15 Jan 09:45

  • Online exam! Blackboard

– 2 hrs (plenty o’ time!) – 32 questions

  • 24 MCQ/TF for 29 points
  • 8 Short Essays for 36 points

– Bit shorter (44q last year) – (More essays; fewer MCQs)

  • Exam skewed a bit low last year

– But coursework a bit up

  • Revision session after break

– Come with questions – Say the 14th? – BB or Email or Skype as well

!" #" $" %" &" '" (" )" *" +" #!" % ! ,

  • %

+ , " & ! ,

  • &

& , " & ' ,

  • &

+ , " ' ! ,

  • '

& , " ' ' ,

  • '

+ , " ( ! ,

  • (

& , " ( ' ,

  • (

+ , " ) ! ,

  • )

& , " ./01/2#"

81 Friday, 7 December 2012