Atlas a data warehouse for integrative bioinformatics Shorab P - - PowerPoint PPT Presentation

atlas
SMART_READER_LITE
LIVE PREVIEW

Atlas a data warehouse for integrative bioinformatics Shorab P - - PowerPoint PPT Presentation

Presentation: Andrew Carbonetto Discussion: Sukesh Chopra Atlas a data warehouse for integrative bioinformatics Shorab P Shah, Yong Huang, Tao Xu, Macaire MS Yuen, John Ling, BF Francis Ouellette Atlas Motivation by Example Goal of


slide-1
SLIDE 1

Atlas

a data warehouse for integrative bioinformatics

Presentation: Andrew Carbonetto Discussion: Sukesh Chopra Shorab P Shah, Yong Huang, Tao Xu, Macaire MS Yuen, John Ling, BF Francis Ouellette

slide-2
SLIDE 2

Atlas

  • Motivation by Example
  • Goal of Atlas
  • Atlas Architecture
  • Atlas Ontologies
  • Pros and Cons of Atlas
slide-3
SLIDE 3

Atlas

  • Motivation by Example
  • Goal of Atlas
  • Atlas Architecture
  • Atlas Ontologies
  • Pros and Cons of Atlas
slide-4
SLIDE 4

Motivation by Example

  • Definitions:

DNA (Deoxyribonucleic acid) : In 1957, Watson, Crick and Wilkins, in their Nobel prize winning paper, described the process how DNA is “read” to produce proteins. Genes: The “read” regions of DNA are commonly refereed to as Genes.

slide-5
SLIDE 5

Case Scenario

  • What if a Biologist approaches you with

DNA region in Humans, and they want you to describe it as best possible, what do you do?

slide-6
SLIDE 6

Case Scenario

  • GenBank: Find all known genes
  • Taxonomy: Find all related species
  • GenBank: Find all known genes in similar

Species

  • RefSeq: Find predicted Genes
  • Gene Ontology: Find all functionally

related genes.

  • ...
slide-7
SLIDE 7

Case Scenario

  • What happens if we have 100 regions?
  • Or how about 1000 regions?
slide-8
SLIDE 8

Atlas

  • Motivation by Example
  • Goal of Atlas
  • Atlas Architecture
  • Atlas Ontologies
  • Pros and Cons of Atlas
slide-9
SLIDE 9

Goal of Atlas

  • Integration of biological databases, with

ability to perform complex queries

  • Efficient storage and handling of data
slide-10
SLIDE 10

Atlas

  • Motivation by Example
  • Goal of Atlas
  • Atlas Architecture
  • Atlas Ontologies
  • Pros and Cons of Atlas
slide-11
SLIDE 11
slide-12
SLIDE 12

Architecture: Data Sources

  • 15 data sources.
  • Loaders periodically

(user specified) updates data to local data warehouse.

slide-13
SLIDE 13

Architecture:Databases

  • 4 Types of Databases:
  • Ontology
  • Sequence
  • Molecular

Interactions

  • Gene Related
slide-14
SLIDE 14

Atlas Ontologies

  • Ontologies describe the relationships

(mappings) between two databases.

  • Describes the legal operations available

when querying more then one database.

slide-15
SLIDE 15

Atlas Ontologies

  • Possible Relationships:
  • “is_a”
  • “part_of”
  • “inverse_of”
  • “is_synonym_of”
  • “refers_to_PubMed”
  • “feature-includes-qualifier”
  • “gene-contains-promoter”
slide-16
SLIDE 16

Atlas Ontologies

  • 2 Types of Ontologies used by Atlas:
  • External: Ontologies downloaded

externally.

  • Internal: Lists ontologies described by

Atlas.

slide-17
SLIDE 17

Atlas Ontologies

  • Ontologies are described using:
  • Ontology: holds descriptions in plain

english

  • Ontology_Type: holds the source and

description of the source type

  • Ontology_Ontology: hold binary

relationships

  • (fig 2: SQL schema)
slide-18
SLIDE 18

Discussion

  • 1) Atlas uses a combined approach to data integration

and data warehousing. The data is queried, and can be queried together, but it is maintained separately. What are the advantages and disadvantages of this? What

  • ther applications could benefit from this blending of

data integration and data warehousing?

  • 2) What other domains could use ontologies ? suggest

specific applications. How is it going to help? How would relationships in that ontology be defined (such as “is-a” or “part-of”) ?

slide-19
SLIDE 19

Architecture: Retrieval

  • 3 querying levels:
  • directly from SQL
  • through API languages
  • or from application

toolbox (pre-defined queries) from the command-line

slide-20
SLIDE 20

Atlas

  • Motivation by Example
  • Goal of Atlas
  • Atlas Architecture
  • Atlas Ontologies
  • Pros and Cons of Atlas
slide-21
SLIDE 21

Positives

  • Shallow learning curve (for toolbox

usage)

  • Small and portable to promote efficiency
  • Solid ontology (relationships are well

defined)

slide-22
SLIDE 22

Negatives

  • Steep learning curve (for extended

usage)

  • Small, not easily extended to other data

sources

  • No automatic expansion to alternative

data sources

slide-23
SLIDE 23

Discussion

  • The authors give as a challenge that conflicts occur if there are

different representations of the same semantic entity. They provide the following solution “store the information from all sources as is, and also annotate that information with the source from which it came, so as to not have any information loss”.

  • a) Is this solution only relevant/feasible to bioinformatics? Why
  • r why not?
  • b) Would it work for the other applications you thought of earlier?

Why or why not?

  • c) Are these the challenges that you would expect, or would you

expect other challenges?