From Nancy, France to Pisa, Italia Ontology-guided Data Preparation - - PowerPoint PPT Presentation
From Nancy, France to Pisa, Italia Ontology-guided Data Preparation - - PowerPoint PPT Presentation
From Nancy, France to Pisa, Italia Ontology-guided Data Preparation for Discovering Genotype-Phenotype Relationships Adrien Coulet, Malika Smal-Tabbone, Pascale Benlian, Amedeo Napoli and Marie-Dominique Devignes Laboratoire Lorrain de
Adrien Coulet, Malika Smaïl-Tabbone, Pascale Benlian, Amedeo Napoli and Marie-Dominique Devignes
Laboratoire Lorrain de Recherche en Informatique et ses Applications (CNRS, INRIA, University of Nancy), Nancy, France
Ontology-guided Data Preparation for Discovering Genotype-Phenotype Relationships
3/5
- A. Coulet, Ontology-guided Data Preparation
The Problem: Limits to KDD in life sciences
Results of KDD in biology are complex
COMPLEX DATA COMPLEX PROCESS COMPLEX RESULTS
Biological results: e.g. large scale clinical study Knowledge Discovery in Databases (KDD) Process
Interpretation
D a ta B a s e s
D a ta m in in g F o r m a ttin g I n te g r a tio n S e le c tio n
In t e g r a te d D a ta S e le c t e d D a t a F o rm a tt e d d a ta P a tt e r n D a ta B a s e s
D a ta m in in g F o r m a ttin g I n te g r a tio n S e le c tio n
In t e g r a te d D a ta S e le c t e d D a t a F o rm a tt e d d a ta P a tt e r n
COMPLEX DATA COMPLEX PROCESS COMPLEX RESULTS COMPLEX DATA COMPLEX PROCESS COMPLEX RESULTS
Biological results: e.g. large scale clinical study Biological results: e.g. large scale clinical study Knowledge Discovery in Databases (KDD) Process Knowledge Discovery in Databases (KDD) Process
Interpretation
D a ta B a s e s
D a ta m in in g F o r m a ttin g I n te g r a tio n S e le c tio n
In t e g r a te d D a ta S e le c t e d D a t a F o rm a tt e d d a ta P a tt e r n D a ta B a s e s
D a ta m in in g F o r m a ttin g I n te g r a tio n S e le c tio n
In t e g r a te d D a ta S e le c t e d D a t a F o rm a tt e d d a ta P a tt e r n
4/5
- A. Coulet, Ontology-guided Data Preparation
Proposition: Use ontologies for guiding the KDD
1) Build bridges between data and knowledge
Mapping between
assertions of the KB and
attributes of the DB
Example:
2) Use knowledge in order to reduce the size of the data set
Thanks to subsumptions, object properties, class definitions, etc. In order to simplify the interpretation step of KDD process
variant coding_variant rs_003 rs_004 rs_005 non_ coding_variant
… rs_007 rs_006 rs_005 rs_004 rs_003 rs_002 rs_001 … xanthoma [LDL]b … rs_007 rs_006 rs_005 rs_004 rs_003 rs_002 rs_001 … xanthoma [LDL]b … patient_004 patient_003 patient_002 patient_001 … patient_004 patient_003 patient_002 patient_001
Large scale clinical study
SNP-Ontology
(detail)
SNP-Ontology
(detail)
SNP-KB
(detail)
5/5
- A. Coulet, Ontology-guided Data Preparation
For more details …
…see you around the poster Poster n°7 Contact: adrien.coulet@loria.fr