Data modeling: the key to biological data integration Franois - - PowerPoint PPT Presentation

data modeling the key to biological data integration
SMART_READER_LITE
LIVE PREVIEW

Data modeling: the key to biological data integration Franois - - PowerPoint PPT Presentation

Data modeling: the key to biological data integration Franois Rechenmann NETTAB 2012 Biological data: not so big, but highly heterogeneous and evolving Big data Satellite images, particle physics, Banks, insurance, telecom


slide-1
SLIDE 1

Data modeling: the key to biological data integration

NETTAB 2012

François Rechenmann

slide-2
SLIDE 2

 Genostar 2012

Big data

  • Satellite images, particle physics,…
  • Banks, insurance, telecom companies,…

Heterogeneous biological data

  • Genomic, transcriptomic, proteic, metabolic data
  • Spectra, structures…

Evolving biological data

  • New technologies
  • New problematics

Biological data: not so big, but highly heterogeneous and evolving

slide-3
SLIDE 3

 Genostar 2012

Data modeling via UML

Protein

class

Regulator

“is-A”

inheritance

Regulates

association

regulator regulated-prot

roles Km association slots

Compound

effector

N-ary associations MW Length Sequence class slots

slide-4
SLIDE 4

 Genostar 2012

Data modeling via UML

slide-5
SLIDE 5
  • Intuitive (and graphical) UML-like representation of

biological entities and of their relationships

  • Formal modeling (vs. natural language): no ambiguity over

the definition of entities and relationships

  • An integrated data space as a large network where nodes

are entities and edges are relationships

  • Efficient support for data consistency checking
  • Navigation and query facilities over the whole data space

Advantages

slide-6
SLIDE 6

Data modeling in software

  • Entities described as classes: types and subtypes
  • Distinction between « sequence » and « replicon »
  • Relationships
  • « Feature » is-located-on « sequence »
  • Methods described as classes
  • Typed input and output
  • Typed input and ouput of methods
  • Type checking: testing method adequacy for input data
  • Type assignment to output data
slide-7
SLIDE 7

Data modeling in database

  • MicroB: a relationnal database
  • Interconnected genomic, proteic and metabolic reference

data on more than 1500 microbial organisms

  • Overlapping schema with software schema
  • More than 300 relations/tables
  • Easy data import and export from and back to the

software

slide-8
SLIDE 8

An integrated bioinformatics platform

MicroB database Connected genomic, proteic & metabolic data on 1500+ reference microorganisms Integration of new annotated genomes Metabolic Pathway Builder Perform comparative genomics & metabolic analyses from annotation to analysis of relevant metabolic reactions & pathways

slide-9
SLIDE 9

An integrated bioinformatics platform

  • Dedicated visualizers

and editors

  • Exploration and

query mechanism

slide-10
SLIDE 10

 Genostar 2012

Contacts

Francois.Rechenmann@genostar.com

www.genostar.com