Douglas Teodoro, Emilie Pasche, Julien Gobeill, Patrick Ruch, - - PowerPoint PPT Presentation
Douglas Teodoro, Emilie Pasche, Julien Gobeill, Patrick Ruch, - - PowerPoint PPT Presentation
Douglas Teodoro, Emilie Pasche, Julien Gobeill, Patrick Ruch, Christian Lovis Rmy Choquet, Christel Daniel The project The problem Data technical and semantic heterogeneity Different languages: French, German, Greek, Swedish, etc.
The project
The problem
- Data technical and semantic heterogeneity
– Different languages: French, German, Greek, Swedish, etc. – Different types: RDBS, free text, xml files
Drug Quantity / Frequency CICLOSPORINE (SANDIMMUN, NEORAL) ;;;;;;;;;1;;;;;;;;;;;;;;; MYCOPHENOLATE T 60 MINUTES ;;;;;;;;1;;;;;;;;;;;;;;;; PROTOCOLE:TEST STIMULATION SYNACTHENE 1H 3E TUBE /3 ;;;;;;;;1;;;;;;;;;;;;;;;; co-trimoxazole lundi - mercredi - vendredi (3x/sem) ciprofloxacine 1x/sem (dimanche) vancomycine 1x18h
The problem
- Data privacy
– Very high concern – Patient identity and other confidential items cannot be revealed by any means to unauthorized people
- Political barriers
– External connection to ODBS – Security risks
Our solution
- Clinical Data Repository (CDR): a distributed
storage system, which provides transparent access to heterogeneous data sources, featuring SQL/SPARQL query interfaces and result sets in SQL tuple and RDF, where patient privacy is assured.
- Based on two visions:
1. Pragmatic: uses database federation, which is a known technology, in order to provide faster data integration to the other project components 2. Innovative: uses semantic web technology. A new approach that will be explored during the whole project duration
CDR::Architecture
- Database federation - based
HUG LIU INSERM AVERBIS
Query entry
CDR::Architecture
- Semantic web - based
HUG LIU AVERBIS INSERM
Query entry
CDR::Information Model
- Information Model: HL7-RIM based
– Other candidates: OpenEHR, EAV/CR, customized model
- The data stored in the CDR covers the following aspects:
- Patient information
- Pathogens related information
- Objects related information
- Information on locations
- Operational data
CDR::Information Model
Cultures Patient data Health care setting Prescriptions Pathogens Antibiograms Diseases Adverse events
Adverse events
CDR::Business Model
- Agents
– Responsible for the CDR – CIS interoperability – Data management within the CDR – Communication with other DebugIT components
- Orders:
- DataExtraction
- DataNormalisation
- DataMigration
- DataDepersonalisation
- OntologyUpdate
Receive order Try to execute order Mark success success Wait order fail Mark failed
CDR::Business Model
- Federated engine
– Based on MySQL Federated Engine – Federate the distributed data sources – Receive, create plan and execute SQL requests
- SPARQL Engine
– Based on D2R – Transform the ER model into a semantic linked data model – Receive, create plan and execute SPARQL requests
CDR::Data Privacy
- Security
– Sensitive data encrypted – Mapping table: original term encrypted term – Original term kept only within the intranet – Encrypted term exposed on the internet
Artefact ID Original Artefact Encrypted Artefact 1 10 001b98ab4335f1d3da23946bce9e4279 2 59 0109cfbecd89a3aaeeb92fde6420f29b 3 39 010c1482764323fd479510ef6a8f5f48 Patient ID Patient Age Patient Sex 001b98ab4335f1d3da23946bce9e4279 58 F 0109cfbecd89a3aaeeb92fde6420f29b 38 F 010c1482764323fd479510ef6a8f5f48 19 M
Our results
- DebugIT CDR has already its first pilot
- SQL endpoints ready at HUG and LiU
– Data integration via database federation – Based on MySQL Federated Engine – SQL requests and SQL tuple result sets
- SPARQL endpoints set up at 3 demonstration centers:
HUG, INSERM and LiU
– Data integration via ‘linked data’ – Based on D2R – Transform the ER model into a semantically linked data model – SPARQL requests and RDF result sets
- select cr.data_source data_source,
- cr.antibiotic_tested_result sensibility,
- count(cr.antibiotic_tested_result) value,
- date_format(c.result_date, '%Y') result_date
- from culture_results cr
- join culture c on cr.culture_id = c.culture_id
- join bacteria b on b.bacterium_id = cr.identified_bacteria_name
- join drug d on d.drug_id = cr.antibiotic_tested
- where
- b.name = 'Escherichia coli'
- and d.name = 'sulfamethoxazole and trimethoprim'
- group by 1,2,4
Database federation
+-------------+---------------+-------+-------------+ | data_source | sensibility | value | result_date | +-------------+---------------+-------+-------------+ | hug | indeterminate | 2 | 2006 | | hug | resistant | 72 | 2004 | | hug | resistant | 71 | 2005 | | hug | resistant | 112 | 2006 | | hug | resistant | 94 | 2007 | | hug | resistant | 8 | 2008 | | hug | susceptible | 302 | 2004 | | hug | susceptible | 318 | 2005 | | hug | susceptible | 288 | 2006 | | hug | susceptible | 269 | 2007 | | hug | susceptible | 4 | 2008 | | liu | indeterminate | 1 | 2007 | | liu | resistant | 10 | 2005 | | liu | resistant | 21 | 2006 | | liu | resistant | 30 | 2007 | | liu | resistant | 46 | 2008 | | liu | susceptible | 108 | 2005 | | liu | susceptible | 90 | 2006 | | liu | susceptible | 132 | 2007 | | liu | susceptible | 100 | 2008 | +-------------+---------------+-------+-------------+
CDR
HUG L i U
- 1 min 14.68 sec
Database federation
CDR
H U G L i U
Demonstration
- f CDR: query
distributed between LiU and HUG
SPARQL endpoints
SPARQL data query service
Next steps
- Improve overall database performance
- Scale to more sites
- Tighter integration with the DebugIT Ontology
- Finalise semantic web integration
- Security access based on roles
Data Normalisation
- CDR content automatically normalised
- Terminologies used: SNOMED, NEWT, WHO-ATC, etc.
Ecoli resistance pattern over time (monthly)
CDR
H U G L i U