When and Why to use a Classifier? When and Why to use a Classifier? - - PowerPoint PPT Presentation

when and why to use a classifier when and why to use a
SMART_READER_LITE
LIVE PREVIEW

When and Why to use a Classifier? When and Why to use a Classifier? - - PowerPoint PPT Presentation

When and Why to use a Classifier? When and Why to use a Classifier? Alan Rector Alan Rector with acknowledgement to with acknowledgement to Jeremy Rogers, Pieter Zanstra Zanstra, & the GALEN Consortium , & the GALEN Consortium


slide-1
SLIDE 1

When and Why to use a Classifier? When and Why to use a Classifier?

Alan Rector Alan Rector

with acknowledgement to with acknowledgement to Jeremy Rogers, Pieter Jeremy Rogers, Pieter Zanstra Zanstra, & the GALEN Consortium , & the GALEN Consortium Nick Drummond, Matthew Horridge, Hai Wang in CO Nick Drummond, Matthew Horridge, Hai Wang in CO-

  • ODE/

ODE/HyOntUSE HyOntUSE Information Management Group Dept of Computer Science, U Manches Information Management Group Dept of Computer Science, U Manchester ter Holger Knublauch, Ray Holger Knublauch, Ray Fergerson Fergerson, , … … and the Prot and the Proté ég gé é-

  • Owl Team

Owl Team

rector@cs.man.ac.uk rector@cs.man.ac.uk co co-

  • ode
  • de-
  • admin@cs.man.ac.uk

admin@cs.man.ac.uk

www.co www.co-

  • ode.org
  • de.org

protege.stanfo protege.stanfo rd.org rd.org www.opengalen.org www.opengalen.org

1

O penGA LEN

slide-2
SLIDE 2

Reasons to classify (1) Reasons to classify (1)

  • Managing Compositional ontologies / Terminologies

– “Conceptual Lego”

  • Managing combinatorial explosions - the exploding bicycle

– Empowering users

  • “Just in time” ontologies

– Give the users the Lego set with limited connectors

– Organising polyhierarchies / Modularizing ontologies

  • “Normalising ontologies”
  • Multiaxial indexing of resources

– Providing multiple views -

  • Reorganising the ontology by new abstractions

– Constraining ontologies & schemas

  • Enforcing constraints
  • Imposing policies

– For clinical Statements with SNOMED entries in request mode context is request

slide-3
SLIDE 3

Reasons to classify(2) Reasons to classify(2)

  • ‘Matching’ instances against classes

– Resource/service discovery – Self-describing storage

  • ‘Archetypes’ & templates
  • Providing a skeleton for default reasoning & Prototypes

(but not to do the reasoning itself)

– Molluscs typically have shells

  • Cephalopods are kinds of Molluscs but typically do not have shells

– Nautiloids are kinds of Cephalopods but typically do have shells » Nautilus ancestor are kinds of Nautiloids but do (did) not have shells

– Biology is full of exceptions

slide-4
SLIDE 4

Classification is about Classes Classification is about Classes

  • Classification works for

– Organising & constraining classes / schemas – Identifying the classes to which an instance definitely belongs

  • Or those to which it cannot belong
  • Classification is open world

– Negation as unsatisfiability

  • ‘not’ == ‘impossible’ (“unsatisfiable”)

– Databases, logic programming, PAL, queries etc are closed world

  • Negation as failure

– ‘not’ == cannot be found

slide-5
SLIDE 5

Reasons not to Classify Reasons not to Classify

  • To query large number of instances

– Open world (“A-Box”) reasoning does not work over large numbers of instances

  • If the question is closed world
  • E.g. “Drugs licensed for treatment of asthma”
  • If the query requires non-DL reasoning
  • E.g. numerical, optimisation, probabilistic, …

– Would like to have a more powerful hybrid reasoner

  • For Metadata and Higher Order Information

– Classifiers are strictly first order

  • A few things can be ‘kluged’
  • If there are complex defaults and exceptions

– “Prototypical Knowledge”

  • E.g. “Molluscs typically have shells”

– NB Simple exceptions can be handled, but requires care

slide-6
SLIDE 6

Use instead Use instead

  • To query large numbers of instances OR

If the query is closed world

– Queries / constraints over databases – Instance stores / triple stores / … – Rules

  • DL-programming
  • JESS, Algernon, Prolog, …

– Belief revision / non-monotonic reasoning

  • If query requires Non DL Reasoning

– Hybrid reasoners or ?SWRL?

  • No good examples at the moment
  • For defaults and Exceptions & Prototypical Knowledge

– Traditional frame systems

  • More expressive default structure than Protégé

– Exceptions for classes as well as instances » Over-riding rather than narrowing

slide-7
SLIDE 7

Classification to build Ontologies: Classification to build Ontologies: Conceptual Lego Conceptual Lego

hand extremity body acute chronic abnormal normal ischaemic deletion bacterial polymorphism cell protein gene infection inflammation Lung expression

slide-8
SLIDE 8

Logic Logic-

  • based Ontologies:

based Ontologies: Conceptual Lego Conceptual Lego

“SNPolymorphism of CFTRGene causing Defect in MembraneTransport of ChlorideIon

causing Increase in Viscosity of Mucus in CysticFibrosis…” “Hand which is anatomically normal”

slide-9
SLIDE 9

Linking taxonomies: Linking taxonomies: Conceptual Lego Conceptual Lego Normalisation Normalisation

Genes

Species Protein Function Disease Protein coded by (CFTRgene & in humans) Membrane transport mediated by (Protein coded by (CFTRgene in humans)) Disease caused by (abnormality in (Membrane transport mediated by (Protein coded by (CTFR gene & in humans)))) CFTRGene in humans

slide-10
SLIDE 10

Conceptual Lego and Conceptual Lego and Normalisation Normalisation Practical Example Practical Example

slide-11
SLIDE 11

Take a Few Simple Concepts & Properties Take a Few Simple Concepts & Properties

slide-12
SLIDE 12

Combine them in Descriptions Combine them in Descriptions which can be simple which can be simple… …. . Sickle cell disease is a disease caused Sickle cell disease is a disease caused some some sickling sickling haemoglobin haemoglobin

slide-13
SLIDE 13
  • r which can be as complex as you like
  • r which can be as complex as you like

Cytstic Cytstic fibrosisis fibrosisis is caused by some non is caused by some non-

  • normal ion transport that is the function of

normal ion transport that is the function of a protein coded for by a CFTR gene a protein coded for by a CFTR gene

slide-14
SLIDE 14

Add some definitions Add some definitions “ “Diseases linked to CFTR Genes Diseases linked to CFTR Genes” ”

slide-15
SLIDE 15

We have built a simple tree We have built a simple tree easy to maintain easy to maintain

slide-16
SLIDE 16

Let the classifier organise it Let the classifier organise it

slide-17
SLIDE 17

If you want more abstractions, If you want more abstractions, just add new definitions just add new definitions

(re (re-

  • use existing data)

use existing data) “Diseases linked to abnormal proteins”

slide-18
SLIDE 18

And let the classifier work again And let the classifier work again

slide-19
SLIDE 19

And again And again – – For a view based on species For a view based on species

“Diseases linked genes described in the mouse”

slide-20
SLIDE 20

And let classifier check consistency And let classifier check consistency

(My first try wasn (My first try wasn’ ’t) t)

slide-21
SLIDE 21

Normalising (untangling) Normalising (untangling) Ontologies Ontologies

Structure Function Part-whole Structure Function Part-whole

slide-22
SLIDE 22

Untangling and Enrichment Untangling and Enrichment

Using a classifier to make life easier Using a classifier to make life easier

Substance

  • Protein
  • - ProteinHormone
  • - - Insulin
  • Steroid
  • - SteroidHormone
  • - - Cortisol
  • Hormone
  • -ProteinHormone
  • - - Insulin
  • - SteroidHormone
  • - - Cortisol
  • Catalyst
  • - Enzyme
  • - - ATPase
  • PhsioloicRole
  • - HormoneRole
  • - CatalystRole
  • Substance
  • - Protein
  • - - Insulin
  • - - ATPase
  • Steroid
  • - Cortisol
  • playsRole

someValuesFrom CatalystRole

ATPase

  • playsRole

someValuesFrom HormoneRole

Cortisol

  • playsRole

someValuesFrom HormoneRole

Insulin Protein & playsRole

someValuesFrom CatalystRole

Enzyme Substance & playsRole someValuesFrom CatalystRole Catalyst Steroid & playsRole

someValuesFrom HormoneRole

SteroidHomone Protein & playsRole

someValuesFrom HormoneRole

ProteinHormone Substance & playsRole-someValuesFrom HormoneRole Hormone Substance

  • Protein
  • - ProteinHormone
  • - - Insulin
  • - Enzyme
  • - - ATPase
  • Steroid
  • - SteroidHomone^
  • - - Cortisol
  • Hormone
  • - ProteinHormone^
  • - - Insulin^
  • - SteroidHormone^
  • - - Cortisol^
  • Catalyst
  • - Enzyme^
  • - - ATPase^
slide-23
SLIDE 23

Normalisation Normalisation & Quality Assurance & Quality Assurance

  • Humans recognise errors of commision easily

– Miss errors of omission

  • Classifiers convert errors of omission to errors of

commission

– Inadequate definitions create “orphans”

  • BodyPart

Parts of Heart Ventricle … CardiacSeptum

  • Classifiers flag errors of commision

– Over definition leads to inconsistency (unsatisfiability)

  • “Pneumonia located in the brain”
slide-24
SLIDE 24

Enforcing constraints & policies Enforcing constraints & policies

  • A class with both necessary & sufficient and additional

necessary conditions acts as a rule

  • The Unit testing Framework supports checking that

rules are enforced

slide-25
SLIDE 25

A Probe class to check a constraint A Probe class to check a constraint

All Tests Passed All Tests Passed

slide-26
SLIDE 26

Skeleton for Defaults & Exceptions Skeleton for Defaults & Exceptions

use of beta blocker in asthma

beta blocker asthma serious contraindication mild contraindication cardioselective

cardioselective beta blocker use of cardioselective beta blocker in asthma

Experience: Normalised Experience: Normalised ontologists

  • ntologists lead to clean default

lead to clean default inheritance inheritance

slide-27
SLIDE 27

When to Classify When to Classify

  • … but isn’t having a classifier an intollerable
  • verhead for the applications?

– It depends on the life cycle you choose

  • Life cycles

– Pre-coordination – Just in time coordination – Post Coordination

slide-28
SLIDE 28

Pre Pre-

  • coordination

coordination

If the terms can be enumerated in advance If the terms can be enumerated in advance

Asserted form “Sources” Classifier “Compiler” Classified form “binary” Application Authors Users

(“high level language” “intermediate representations”)

Overwhelming the dominant Overwhelming the dominant pattern today. pattern today.

“ “binary binary” ” can be can be OWL Light RDF(S), OWL Light RDF(S), XML Schema, OBO, XML Schema, OBO, … …

slide-29
SLIDE 29

Commit Results to a Pre Commit Results to a Pre-

  • Coordinated

Coordinated Ontology Ontology

Assert (“Commit”) changes inferred by classifier

slide-30
SLIDE 30

Post coordination| Post coordination|

When there are When there are combinatorially combinatorially many potential terms many potential terms

  • “Lazy classification” on demand

Big on the outside: small kernel on the inside Big on the outside: small kernel on the inside Avoid the exploding bicycle Avoid the exploding bicycle

kernel model

Externally available resource

API

slide-31
SLIDE 31

Post Coordination Post Coordination

Terminology Services

Asserted form “Sources” Classifier “Compiler” Application Authors Users

(& intermediate representations)

Asserted Ontology Store But requires a classifier But requires a classifier available at application time available at application time

slide-32
SLIDE 32

Terminology Services

A Compromise A Compromise Just in Time Coordination Just in Time Coordination

Asserted form “Sources” Classifier “Compiler” Application Authors Users

(& intermediate representations)

Asserted Ontology Store Pre-coordinated Cache Only the occasional new notion Only the occasional new notion Requires classification Requires classification -

  • need not be real time

need not be real time

slide-33
SLIDE 33

Summary Summary

  • Why Classify

– Managing Compositional ontologies / Terminologies

  • “Conceptual Lego”
  • Empowering users - just in time classification
  • Providing views & deferring decisions on abstractions
  • Quality assurance

– Constraining concepts & schemas – Providing a skeleton for default reasoning & Prototypes

  • When to classify

– Pre-coordination – Just-in-time coordination – Post-coordination

slide-34
SLIDE 34

Summary: When to Classify? Summary: When to Classify?

Applications do not need a classifier Applications do not need a classifier to benefit from classification to benefit from classification

  • Pre-coordination

– If concepts/terms can be predicted – When classifier is not available at run time – When we must fit with legacy applications

  • Post-coordination

– When a a few concepts are needed from a large potential set – When a classifier is available

  • and time cost is acceptable

– When applications can be built or adapted to take advantage

slide-35
SLIDE 35
slide-36
SLIDE 36

Idiopathic Hypertension in our co’s Phase 2 study

Skeleton for Fractal tailoring forms for clinical trials Skeleton for Fractal tailoring forms for clinical trials

Hypertension Idiopathic Hypertension In our company’s studies In Phase 2 studies Hypertension Idiopathic Hypertension` In our company’s studies In Phase 2 studies

slide-37
SLIDE 37

More on Views More on Views

  • A problem in the digital anatomist

– To an anatomist, the Pericardium and the Heart are separate organs – To a clinician they are part of the same organ

  • A disease of the pericardium counts as a kind of heart

disease

slide-38
SLIDE 38

Represent context and views by Represent context and views by variant properties variant properties

Organ Heart Pericardium OrganPart CardiacValve

Disease of (Heart or part-of-heart) Disease of Pericardium

is_part_of is_structurally_part_of is_clinically_part_of

slide-39
SLIDE 39

Prot Proté ég gé é-

  • OWL alternative views

OWL alternative views

Disorder of “Clinical heart” “Disorder of heart of any part of the heart” (including clinical and functional parts) Disorder of “FMA heart” “Disorder of heart or any structural part of the heart”