Formalizing mappings to optimize automated schema alignment: - - PDF document

formalizing mappings to optimize automated schema
SMART_READER_LITE
LIVE PREVIEW

Formalizing mappings to optimize automated schema alignment: - - PDF document

MIE2014, Istanbul Formalizing mappings to optimize automated schema alignment: application to rare diseases Meriem Maaroufi Rmy Choquet Paul Landais Marie-Christine Jaulent Paris, France meriem.maaroufi@bndmr.fr Monday, September 1,2014


slide-1
SLIDE 1

Meriem Maaroufi Rémy Choquet Paul Landais Marie-Christine Jaulent Paris, France MIE2014, Istanbul

Formalizing mappings to optimize automated schema alignment: application to rare diseases

slide-2
SLIDE 2

French Rare Disease Organization

Monday, September 1,2014 meriem.maaroufi@bndmr.fr

2

French ministry of health

131 rare diseases excellence centers > 1000 clinical sites

Decision making and patients identification Towards heterogeneous and not centralized data

Researchers

French rare disease registry

slide-3
SLIDE 3

Data integration process

Monday, September 1,2014 meriem.maaroufi@bndmr.fr

3

slide-4
SLIDE 4

Hypothesis

Monday, September 1,2014 meriem.maaroufi@bndmr.fr

4

Characterizing mappings in a complete formalization will improve alignment results usability (C1,C2,S)

slide-5
SLIDE 5

1st duality Data element – Value element

  • Data element = the container
  • Characteristics:
  • Label, definition
  • Data type
  • Value domain, restrictions…
  • Notation: Ei (i=1..n;n=card(schema))
  • Value element = the content
  • An integer, a string, a

Boolean value or an entry of a list

  • Depends on
  • Notation: eik (k=1..p;p=card(Ei))

Monday, September 1,2014 meriem.maaroufi@bndmr.fr

5

slide-6
SLIDE 6

2nd duality Source element – Target element

  • A mapping is rarely a bijection.
  • It is often due to generalization/specification.
  • A mapping has a direction : from source schema to

target schema.

Multidisciplinary consultation Consultation is a is not Esi esik ETj eTjl

Monday, September 1,2014 meriem.maaroufi@bndmr.fr

6

Cytogenetic confirmation True Confirmation mode Genetic

slide-7
SLIDE 7

Conditional structure: rules

Monday, September 1,2014 meriem.maaroufi@bndmr.fr

7

If… then… formalism is supported by most programming languages Well suitable for bi-level mappings Can define exact mappings and data transformations

Condition Instructions

True False

slide-8
SLIDE 8

Mapping formalization

Monday, September 1,2014 meriem.maaroufi@bndmr.fr

8

Mapping = {ESi-ETj ; eSik-eTjl ; Rule(S- >T)}

Esi ETj esik eTjl Rule

Glycemia – Hypoglycemic state integer – true If glycemia < $threshold then hypoglycemic state = true Act type – Participant profession nurse intervention – nurse If Act type = nurse intervention then Participant profession = nurse

A rule defines the relation between the involved source and target data elements and value elements

Result

slide-9
SLIDE 9

Application to BNDMR context

slide-10
SLIDE 10

Rules generation methodology

Monday, September 1,2014 meriem.maaroufi@bndmr.fr

10

Elements selection

A specific process :

  • is a workflow
  • that involves some chosen alignment approaches
  • perating in a given order
  • n selected data elements
  • To detect specific mappings: defined rules.

Rules generation Mappings detection

slide-11
SLIDE 11

Tools & experimentation

Monday, September 1,2014 meriem.maaroufi@bndmr.fr

11

CEMARA extract Schema 73 Esi BNDMR Schema 68 ETj 43 boolean ESi 56 eSik (from lists) 16 boolean ETj 106 eTjl (from lists)

Linguistic approach

Incharge doctor – Care provider : 0,82 Activity context – Encounter type : 0,88 Reference test Experimentation

slide-12
SLIDE 12

Process example

Monday, September 1,2014 meriem.maaroufi@bndmr.fr

12

Linguistic approach

Source CEMERA 43 boolean DE 56 lists VE Target BNDMR 106 lists VE 16 boolean DE

If ESi=true then ETj=eTjl If ESi=true then ETj=true If ESi= eSik then ETj=true If ESi= eSik then ETj=eTjl

Reference test Experimentation bool-bool 3 3 list-list 6 (DE-DE) 35 bool-list 1 (DE-DE) 22 list-bool 1

Elements selection Rules generation Mappings detection

If PropLink=propositus [source] then Propositus=true [target] If ConfCyto=true [source] then ConfirmationMode=cytogenetic [target]

slide-13
SLIDE 13

Conclusion

  • The proposed formalization “mapping = {ESi-ETj ; eSik-eTjl ; Rule}”

is well suitable to characterize simple and complex mappings.

  • Mappings characterized by the proposed formalization

can be directly used in data integration processes (e.g. ETL).

  • Depending on input data types, processes for mappings

detection will be different.

Monday, September 1,2014 meriem.maaroufi@bndmr.fr

13

slide-14
SLIDE 14

Perspectives

Monday, September 1,2014 meriem.maaroufi@bndmr.fr

14

More specific processes to cover more data types Automating rules generation

slide-15
SLIDE 15

Thank you for your attention!

Special thanks to:

  • BNDMR team
  • INSERM UMR-1142 team LIMICS

meriem.maaroufi@bndmr.fr