Douglas Teodoro, Emilie Pasche, Julien Gobeill, Patrick Ruch, - - PowerPoint PPT Presentation

douglas teodoro emilie pasche julien gobeill patrick ruch
SMART_READER_LITE
LIVE PREVIEW

Douglas Teodoro, Emilie Pasche, Julien Gobeill, Patrick Ruch, - - PowerPoint PPT Presentation

Douglas Teodoro, Emilie Pasche, Julien Gobeill, Patrick Ruch, Christian Lovis Rmy Choquet, Christel Daniel The project The problem Data technical and semantic heterogeneity Different languages: French, German, Greek, Swedish, etc.


slide-1
SLIDE 1

Douglas Teodoro, Emilie Pasche, Julien Gobeill, Patrick Ruch, Christian Lovis Rémy Choquet, Christel Daniel

slide-2
SLIDE 2

The project

slide-3
SLIDE 3

The problem

  • Data technical and semantic heterogeneity

– Different languages: French, German, Greek, Swedish, etc. – Different types: RDBS, free text, xml files

Drug Quantity / Frequency CICLOSPORINE (SANDIMMUN, NEORAL) ;;;;;;;;;1;;;;;;;;;;;;;;; MYCOPHENOLATE T 60 MINUTES ;;;;;;;;1;;;;;;;;;;;;;;;; PROTOCOLE:TEST STIMULATION SYNACTHENE 1H 3E TUBE /3 ;;;;;;;;1;;;;;;;;;;;;;;;; co-trimoxazole lundi - mercredi - vendredi (3x/sem) ciprofloxacine 1x/sem (dimanche) vancomycine 1x18h

slide-4
SLIDE 4

The problem

  • Data privacy

– Very high concern – Patient identity and other confidential items cannot be revealed by any means to unauthorized people

  • Political barriers

– External connection to ODBS – Security risks

slide-5
SLIDE 5

Our solution

  • Clinical Data Repository (CDR): a distributed

storage system, which provides transparent access to heterogeneous data sources, featuring SQL/SPARQL query interfaces and result sets in SQL tuple and RDF, where patient privacy is assured.

  • Based on two visions:

1. Pragmatic: uses database federation, which is a known technology, in order to provide faster data integration to the other project components 2. Innovative: uses semantic web technology. A new approach that will be explored during the whole project duration

slide-6
SLIDE 6

CDR::Architecture

  • Database federation - based

HUG LIU INSERM AVERBIS

Query entry

slide-7
SLIDE 7

CDR::Architecture

  • Semantic web - based

HUG LIU AVERBIS INSERM

Query entry

slide-8
SLIDE 8

CDR::Information Model

  • Information Model: HL7-RIM based

– Other candidates: OpenEHR, EAV/CR, customized model

  • The data stored in the CDR covers the following aspects:
  • Patient information
  • Pathogens related information
  • Objects related information
  • Information on locations
  • Operational data
slide-9
SLIDE 9

CDR::Information Model

Cultures Patient data Health care setting Prescriptions Pathogens Antibiograms Diseases Adverse events

Adverse events

slide-10
SLIDE 10

CDR::Business Model

  • Agents

– Responsible for the CDR – CIS interoperability – Data management within the CDR – Communication with other DebugIT components

  • Orders:
  • DataExtraction
  • DataNormalisation
  • DataMigration
  • DataDepersonalisation
  • OntologyUpdate

Receive order Try to execute order Mark success success Wait order fail Mark failed

slide-11
SLIDE 11

CDR::Business Model

  • Federated engine

– Based on MySQL Federated Engine – Federate the distributed data sources – Receive, create plan and execute SQL requests

  • SPARQL Engine

– Based on D2R – Transform the ER model into a semantic linked data model – Receive, create plan and execute SPARQL requests

slide-12
SLIDE 12

CDR::Data Privacy

  • Security

– Sensitive data encrypted – Mapping table: original term  encrypted term – Original term kept only within the intranet – Encrypted term exposed on the internet

Artefact ID Original Artefact Encrypted Artefact 1 10 001b98ab4335f1d3da23946bce9e4279 2 59 0109cfbecd89a3aaeeb92fde6420f29b 3 39 010c1482764323fd479510ef6a8f5f48 Patient ID Patient Age Patient Sex 001b98ab4335f1d3da23946bce9e4279 58 F 0109cfbecd89a3aaeeb92fde6420f29b 38 F 010c1482764323fd479510ef6a8f5f48 19 M

slide-13
SLIDE 13

Our results

  • DebugIT CDR has already its first pilot
  • SQL endpoints ready at HUG and LiU

– Data integration via database federation – Based on MySQL Federated Engine – SQL requests and SQL tuple result sets

  • SPARQL endpoints set up at 3 demonstration centers:

HUG, INSERM and LiU

– Data integration via ‘linked data’ – Based on D2R – Transform the ER model into a semantically linked data model – SPARQL requests and RDF result sets

slide-14
SLIDE 14
  • select cr.data_source data_source,
  • cr.antibiotic_tested_result sensibility,
  • count(cr.antibiotic_tested_result) value,
  • date_format(c.result_date, '%Y') result_date
  • from culture_results cr
  • join culture c on cr.culture_id = c.culture_id
  • join bacteria b on b.bacterium_id = cr.identified_bacteria_name
  • join drug d on d.drug_id = cr.antibiotic_tested
  • where
  • b.name = 'Escherichia coli'
  • and d.name = 'sulfamethoxazole and trimethoprim'
  • group by 1,2,4

Database federation

+-------------+---------------+-------+-------------+ | data_source | sensibility | value | result_date | +-------------+---------------+-------+-------------+ | hug | indeterminate | 2 | 2006 | | hug | resistant | 72 | 2004 | | hug | resistant | 71 | 2005 | | hug | resistant | 112 | 2006 | | hug | resistant | 94 | 2007 | | hug | resistant | 8 | 2008 | | hug | susceptible | 302 | 2004 | | hug | susceptible | 318 | 2005 | | hug | susceptible | 288 | 2006 | | hug | susceptible | 269 | 2007 | | hug | susceptible | 4 | 2008 | | liu | indeterminate | 1 | 2007 | | liu | resistant | 10 | 2005 | | liu | resistant | 21 | 2006 | | liu | resistant | 30 | 2007 | | liu | resistant | 46 | 2008 | | liu | susceptible | 108 | 2005 | | liu | susceptible | 90 | 2006 | | liu | susceptible | 132 | 2007 | | liu | susceptible | 100 | 2008 | +-------------+---------------+-------+-------------+

CDR

HUG L i U

  • 1 min 14.68 sec
slide-15
SLIDE 15

Database federation

CDR

H U G L i U

Demonstration

  • f CDR: query

distributed between LiU and HUG

slide-16
SLIDE 16

SPARQL endpoints

slide-17
SLIDE 17

SPARQL data query service

slide-18
SLIDE 18

Next steps

  • Improve overall database performance
  • Scale to more sites
  • Tighter integration with the DebugIT Ontology
  • Finalise semantic web integration
  • Security access based on roles
slide-19
SLIDE 19

Data Normalisation

  • CDR content automatically normalised
  • Terminologies used: SNOMED, NEWT, WHO-ATC, etc.
slide-20
SLIDE 20

Ecoli resistance pattern over time (monthly)

CDR

H U G L i U

Database federation