Querying a bioinformatic data sources registry with concept lattices - - PowerPoint PPT Presentation

querying a bioinformatic data sources registry with
SMART_READER_LITE
LIVE PREVIEW

Querying a bioinformatic data sources registry with concept lattices - - PowerPoint PPT Presentation

Querying a bioinformatic data sources registry with concept lattices Nizar Messai, Marie-Dominique Devignes, Amedeo Napoli and Malika Smail-Tabbone Nizar.Messai@loria.fr LORIA UMR 7503 BP 239, 54506 Vandoeuvre-l s-Nancy ICCS


slide-1
SLIDE 1

Querying a bioinformatic data sources registry with concept lattices

Nizar Messai, Marie-Dominique Devignes, Amedeo Napoli and Malika Smail-Tabbone

Nizar.Messai@loria.fr

LORIA – UMR 7503 – BP 239, 54506 Vandoeuvre-l

✂✁

s-Nancy ICCS 2005 Kassel – July, 18 - 22, 2005

Querying a bioinformatic data sources registry with concept lattices – p.1/21

slide-2
SLIDE 2

Outline

  • 1. Motivation
  • 2. BioRegistry: data source metadata repository
  • 3. FCA for classifying and querying data sources
  • 4. Ontology-based query refinement
  • 5. Conclusion and future work

Querying a bioinformatic data sources registry with concept lattices – p.2/21

slide-3
SLIDE 3

Outline

  • 1. Motivation

1.1 Bioinformatic data sources on the web 1.2 Existing solutions 1.3 Challenge

  • 2. BioRegistry: data source metadata repository
  • 3. FCA for classifying and querying data sources
  • 4. Ontology-based query refinement
  • 5. Conclusion and future work

Querying a bioinformatic data sources registry with concept lattices – p.3/21

slide-4
SLIDE 4

1.1 Bioinformatic data sources on the web

Bioinformatic data sources available on the Web

719 in 2005 (171 more than 2004) Diversity of contents (e.g. particular/any organism(s)) Different data types (e.g. nucleic/proteic sequences) Different data qualities (e.g. update, revision, annotation) New data source appearance

Querying a bioinformatic data sources registry with concept lattices – p.4/21

slide-5
SLIDE 5

1.2 Existing solutions

Thematic Portals Access to collection of selected data sources Correspond to given points of view Limited search capabilities

Querying a bioinformatic data sources registry with concept lattices – p.5/21

slide-6
SLIDE 6

1.2 Existing solutions

Thematic Portals Access to collection of selected data sources Correspond to given points of view Limited search capabilities Structured catalogs Bioinformatic data source catalog: DBcat small set of "free text" metadata no more maintained (since 2001)

Querying a bioinformatic data sources registry with concept lattices – p.5/21

slide-7
SLIDE 7

1.3 Challenge

Improve data source identification through: gathering metadata in a structured repository taking into account existing domain ontologies

  • rganising data sources for browsing and querying

Querying a bioinformatic data sources registry with concept lattices – p.6/21

slide-8
SLIDE 8

Outline

  • 1. Motivation
  • 2. BioRegistry: data source metadata repository

2.1 BioRegistry model 2.2 A subpart of the BioRegistry

  • 3. FCA for classifying and querying data sources
  • 4. Ontology-based query refinement
  • 5. Conclusion and future work

Querying a bioinformatic data sources registry with concept lattices – p.7/21

slide-9
SLIDE 9

2.1 BioRegistry model

Querying a bioinformatic data sources registry with concept lattices – p.8/21

slide-10
SLIDE 10

2.1 BioRegistry model

Querying a bioinformatic data sources registry with concept lattices – p.8/21

slide-11
SLIDE 11

2.1 BioRegistry model

Querying a bioinformatic data sources registry with concept lattices – p.8/21

slide-12
SLIDE 12

2.1 BioRegistry model

BioRegistry

Associate metadata to the data sources (from ontologies)

Idea

Extract properties on the data sources from these metadata

  • A formal context: data sources

properties

Querying a bioinformatic data sources registry with concept lattices – p.8/21

slide-13
SLIDE 13

2.2 A subpart of the BioRegistry

Querying a bioinformatic data sources registry with concept lattices – p.9/21

slide-14
SLIDE 14

2.2 A subpart of the BioRegistry

Data source properties extracted from the BioRegistry

Data Source Sequence Organism Manual Revision Swissprot (S1) Proteic (PS) Any Organism (AO) Yes RefSeq (S2) Nucleic (NS),Proteic (PS) Any Organism (AO) Yes TIGR-HGI (S3) Nucleic (NS) Human (Hu) No GPCRDB (S4) Proteic (PS) Any Organism (AO) Yes HUGE (S5) Nucleic (NS),Proteic (PS) Human (Hu) No ENSEMBL (S6) Nucleic (NS) Animal (An) No MGDB (S7) Proteic (PS) Mouse (Mo) No VGB (S8) Nucleic (NS) Vertebrate (Ve) No

Querying a bioinformatic data sources registry with concept lattices – p.9/21

slide-15
SLIDE 15

2.2 A subpart of the BioRegistry

Ontologies to valuate the properties (from NCBI)

Querying a bioinformatic data sources registry with concept lattices – p.9/21

slide-16
SLIDE 16

2.2 A subpart of the BioRegistry

Corresponding formal context

Sources

  • Metadata

NS PS AO An Ve Hu Mo MR S1 1 1 1 S2 1 1 1 1 S3 1 1 S4 1 1 1 S5 1 1 1 S6 1 1 S7 1 1 S8 1 1

Querying a bioinformatic data sources registry with concept lattices – p.9/21

slide-17
SLIDE 17

Outline

  • 1. Motivation
  • 2. BioRegistry: data source metadata repository
  • 3. FCA for classifying and querying data sources

3.1 Methodology 3.2 Data source classification 3.3 Query 3.4 Data source retrieval algorithm 3.5 Problem

  • 4. Ontology-based query refinement
  • 5. Conclusion and future work

Querying a bioinformatic data sources registry with concept lattices – p.10/21

slide-18
SLIDE 18

3.1 Methodology

Querying a bioinformatic data sources registry with concept lattices – p.11/21

slide-19
SLIDE 19

3.1 Methodology

Querying a bioinformatic data sources registry with concept lattices – p.11/21

slide-20
SLIDE 20

3.1 Methodology

Querying a bioinformatic data sources registry with concept lattices – p.11/21

slide-21
SLIDE 21

3.2 Data source classification

Incremental construction of the concept lattices [Godin et Al. 1995] Add new data sources (Registry updating) Insert queries (Registry querying)

Querying a bioinformatic data sources registry with concept lattices – p.12/21

slide-22
SLIDE 22

3.3 Query

A set of properties Example :

"Data sources, that are manually revised, containing nucleic sequences of Human organism"

  • nucleic sequences (NS)

human organism (Hu) manually revised (MR) Transform the query into a concept {Query} {nucleic sequences (NS), Human (Hu), Manual Revision (MR)}

  • = (
✁✄✂

,

☎ ✂

) = ({Query}, {NS, Hu, MR})

Querying a bioinformatic data sources registry with concept lattices – p.13/21

slide-23
SLIDE 23

3.4 Data source retrieval algorithm

Querying a bioinformatic data sources registry with concept lattices – p.14/21

slide-24
SLIDE 24

3.4 Data source retrieval algorithm

Insert the query concept into the concept lattice [Carpineto 2000] Search relevant data sources:

A data source is relevant to a query if it shares at least one of its properties

Querying a bioinformatic data sources registry with concept lattices – p.14/21

slide-25
SLIDE 25

3.4 Data source retrieval algorithm

Step 0: Locate the new query concept in the resulting lattice Begin the result construction :

✂✁✄ ☎✆✝ ✞ ✁ ✟

Ø

Querying a bioinformatic data sources registry with concept lattices – p.14/21

slide-26
SLIDE 26

3.4 Data source retrieval algorithm

Step 1: Get the query concept subsumers and continue the result construction

  • ✁✄
☎✆✝ ✞ ✁

= 1) S3, S5 (Hu,NS), S2 (NS,MR)

Querying a bioinformatic data sources registry with concept lattices – p.14/21

slide-27
SLIDE 27

3.4 Data source retrieval algorithm

Step 2:

  • ✁✄
☎✆✝ ✞ ✁

= 1) S3, S5 (Hu,NS), S2 (NS,MR) 2) S1, S4 (MR), S6 (NS)

Querying a bioinformatic data sources registry with concept lattices – p.14/21

slide-28
SLIDE 28

3.4 Data source retrieval algorithm

Step 3: A concept with an empty intension is reached

  • end of the algorithm

return the result

✄ ☎✆✝ ✞✁

Querying a bioinformatic data sources registry with concept lattices – p.14/21

slide-29
SLIDE 29

3.5 Problem

When query properties are not in the context Examples :

1 -

  • = ({Query}, {Chicken (Ch)})
✄ ☎✆✝ ✞✁

= Ø although data sources dealing with vertebrate can be interesting 2 -

  • = ({Query}, {Eucaryote (Eu)})
  • ✁✄
☎✆✝ ✞ ✁

= Ø although data sources dealing with animals can be interesting

Querying a bioinformatic data sources registry with concept lattices – p.15/21

slide-30
SLIDE 30

3.5 Problem

When query properties are not in the context Examples :

1 -

  • = ({Query}, {Chicken (Ch)})
✄ ☎✆✝ ✞✁

= Ø although data sources dealing with vertebrate can be interesting 2 -

  • = ({Query}, {Eucaryote (Eu)})
  • ✁✄
☎✆✝ ✞ ✁

= Ø although data sources dealing with animals can be interesting

Idea :

Ontology-based query refinement

Querying a bioinformatic data sources registry with concept lattices – p.15/21

slide-31
SLIDE 31

Outline

  • 1. Motivation
  • 2. BioRegistry: data source metadata repository
  • 3. FCA for classifying and querying data sources
  • 4. Ontology-based query refinement

4.1 Generalisation refinement 4.2 Specialisation refinement

  • 5. Conclusion and future work

Querying a bioinformatic data sources registry with concept lattices – p.16/21

slide-32
SLIDE 32

4.1 Generalisation refinement

Querying a bioinformatic data sources registry with concept lattices – p.17/21

slide-33
SLIDE 33

4.1 Generalisation refinement

Generalisation refinement

Add to the query the ancestors of the considered property in the

  • ntology

Only those that are in the formal context

Querying a bioinformatic data sources registry with concept lattices – p.17/21

slide-34
SLIDE 34

4.1 Generalisation refinement

Refined query:

  • = ({Query}, {Ve, An, AO})

New result:

✄ ☎✆✝ ✞ ✁

= 1) S6 (An) 1) S8 (Ve) 1) S1,S2,S4 (AO)

Querying a bioinformatic data sources registry with concept lattices – p.17/21

slide-35
SLIDE 35

4.2 Specialisation refinement

Querying a bioinformatic data sources registry with concept lattices – p.18/21

slide-36
SLIDE 36

4.2 Specialisation refinement

Specialisation refinement

Add to the query the descendants of the considered property in the

  • ntology

Only those that are in the formal context

Querying a bioinformatic data sources registry with concept lattices – p.18/21

slide-37
SLIDE 37

4.2 Specialisation refinement

Refined query:

  • = ({Query}, {An, Ve, Hu, Mo})

New result:

✄ ☎✆✝ ✞ ✁

= 1) S6 (An) 1) S8 (Ve) 1) S5 (Hu) 1) S7 (Mo)

Querying a bioinformatic data sources registry with concept lattices – p.18/21

slide-38
SLIDE 38

Outline

  • 1. Motivation
  • 2. BioRegistry: data source metadata repository
  • 3. FCA for classifying and querying data sources
  • 4. Ontology-based query refinement
  • 5. Conclusion and future work

Querying a bioinformatic data sources registry with concept lattices – p.19/21

slide-39
SLIDE 39

5 Conclusion and future work

Conclusion

Classification of data sources according to their metadata Identifying relevant data sources for a given query Ontology-based query refinement

Querying a bioinformatic data sources registry with concept lattices – p.20/21

slide-40
SLIDE 40

5 Conclusion and future work

Conclusion

Classification of data sources according to their metadata Identifying relevant data sources for a given query Ontology-based query refinement

Future work

Refine the definition of relevance (take into account some preferences) Define an order for data source composition (case of complex queries)

Querying a bioinformatic data sources registry with concept lattices – p.20/21

slide-41
SLIDE 41

Thank you for your attention

Querying a bioinformatic data sources registry with concept lattices – p.21/21