formalizing mappings to optimize automated schema
play

Formalizing mappings to optimize automated schema alignment: - PDF document

MIE2014, Istanbul Formalizing mappings to optimize automated schema alignment: application to rare diseases Meriem Maaroufi Rmy Choquet Paul Landais Marie-Christine Jaulent Paris, France meriem.maaroufi@bndmr.fr Monday, September 1,2014


  1. MIE2014, Istanbul Formalizing mappings to optimize automated schema alignment: application to rare diseases Meriem Maaroufi Rémy Choquet Paul Landais Marie-Christine Jaulent Paris, France

  2. meriem.maaroufi@bndmr.fr Monday, September 1,2014 French Rare Disease Organization French rare disease French ministry of registry Decision making and health patients identification > 1000 clinical sites Towards heterogeneous and not centralized data 131 rare diseases excellence centers Researchers 2

  3. meriem.maaroufi@bndmr.fr Monday, September 1,2014 Data integration process 3

  4. meriem.maaroufi@bndmr.fr Monday, September 1,2014 Hypothesis Characterizing mappings in a (C1,C2,S ) complete formalization will improve alignment results usability 4

  5. meriem.maaroufi@bndmr.fr Monday, September 1,2014 1st duality Data element – Value element ● Data element = the container ● Value element = the content Characteristics: ● An integer, a string, a ● Boolean value or an entry of Label, definition ● a list ● Data type Value domain, restrictions… ● Depends on ● Notation: Ei (i=1..n;n=card(schema)) ● ● Notation: eik (k=1..p;p=card(Ei)) 5

  6. meriem.maaroufi@bndmr.fr Monday, September 1,2014 2nd duality Source element – Target element • A mapping is rarely a bijection. • It is often due to generalization/specification. is a Multidisciplinary Consultation is not consultation • A mapping has a direction : from source schema to target schema. Esi ETj esik eTjl Cytogenetic confirmation Confirmation mode True Genetic 6

  7. meriem.maaroufi@bndmr.fr Monday, September 1,2014 Conditional structure: rules If… then… formalism is supported by most True programming languages Condition Well suitable for bi-level False mappings Instructions Can define exact mappings and data transformations 7

  8. meriem.maaroufi@bndmr.fr Monday, September 1,2014 Result Mapping formalization Mapping = {ESi - ETj ; eSik-eTjl ; Rule(S- >T)} A rule defines the relation between the involved source and target data elements and value elements Esi ETj esik eTjl Rule Glycemia – Hypoglycemic state integer – true If glycemia < $threshold then hypoglycemic state = true Act type – Participant profession nurse intervention – nurse If Act type = nurse intervention then Participant profession = nurse 8

  9. Application to BNDMR context

  10. meriem.maaroufi@bndmr.fr Monday, September 1,2014 Rules generation methodology A specific process : - is a workflow - that involves some chosen alignment approaches - operating in a given order - on selected data elements - To detect specific mappings: defined rules. Mappings detection Elements Rules selection generation 10

  11. meriem.maaroufi@bndmr.fr Monday, September 1,2014 Tools & experimentation Reference test CEMARA BNDMR extract Schema Schema 68 ETj 73 Esi Linguistic approach Incharge doctor – Care provider : 0,82 Activity context – Encounter type : 0,88 Experimentation 43 boolean ESi 16 boolean ETj 56 eSik (from 106 eTjl (from lists) lists) 11

  12. meriem.maaroufi@bndmr.fr Monday, September 1,2014 Process example Source CEMERA 43 boolean DE 56 lists VE Reference test Experimentation If ESi=true then bool-bool 3 3 ETj=true If ESi= eSik then ETj=eTjl list-list 6 (DE-DE) 35 If ESi=true then ETj=eTjl bool-list 1 (DE-DE) 22 If ESi= eSik then ETj=true list-bool 0 1 Target BNDMR 106 lists VE 16 boolean DE If PropLink=propositus [source] then Propositus=true [target] Linguistic If ConfCyto=true [source] approach then ConfirmationMode=cytogenetic [target] Elements Mappings Rules selection detection generation 12

  13. meriem.maaroufi@bndmr.fr Monday, September 1,2014 Conclusion • The proposed formalization “ mapping = {ESi-ETj ; eSik-eTjl ; Rule } ” is well suitable to characterize simple and complex mappings. • Mappings characterized by the proposed formalization can be directly used in data integration processes (e.g. ETL). • Depending on input data types, processes for mappings detection will be different. 13

  14. meriem.maaroufi@bndmr.fr Monday, September 1,2014 Perspectives More specific processes to cover more data types Automating rules generation 14

  15. meriem.maaroufi@bndmr.fr Thank you for your attention! Special thanks to: - BNDMR team - INSERM UMR-1142 team LIMICS

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend