Bioinformatics Databases Introduction to Bioinformatics Dortmund, - - PowerPoint PPT Presentation

bioinformatics databases
SMART_READER_LITE
LIVE PREVIEW

Bioinformatics Databases Introduction to Bioinformatics Dortmund, - - PowerPoint PPT Presentation

Bioinformatics Databases Introduction to Bioinformatics Dortmund, 16.-20.07.2007 Lectures: Sven Rahmann Exercises: Udo Feldkamp, Michael Wurst 1 Overview Databases at NCBI (via Entrez) DNA GenBank, EMBL, DDBJ Data Format


slide-1
SLIDE 1

1

Bioinformatics Databases

Introduction to Bioinformatics Dortmund, 16.-20.07.2007 Lectures: Sven Rahmann Exercises: Udo Feldkamp, Michael Wurst

slide-2
SLIDE 2

2

Overview

  • Databases at NCBI (via Entrez)
  • DNA – GenBank, EMBL, DDBJ

– Data Format Issues

  • UCSC Genome Browser
  • Protein – SwissProt, PIR, PDB
  • Sequence Retrieval System at EBI
slide-3
SLIDE 3

3

Fundamentals

  • Accession number :=

– unique identifier for each entry (“record”) in a DB – Example: PubMed ID [PMID] – If you know the accession number, you obtain the

record without searching

– Different databases can be linked via accession

numbers

– Data integration: Hide the details (accession

numbers) behind a convenient interface

slide-4
SLIDE 4

4

Databases at NCBI (2007) http://www.ncbi.nlm.nih.gov/

slide-5
SLIDE 5

5

Different Databases

  • DNA

– nucleotide sequence – gene – transcript / gene expression – genome

  • Protein

– sequence and annotation – structure

  • ...
slide-6
SLIDE 6

6

Different Databases

  • Repositories of primary sequence data

– Everything related to a topic goes in here – GenBank (NCBI Nucleotide): all nucleotide seq's

  • Machine-curated annotation data

– automatically generated from primary data – quality depends on primary data and method

  • Manually curated annotation data

– reviewed by experts (SwissProt – Amos Bairoch) – high quality, slow to grow

slide-7
SLIDE 7

7

Integration

  • “Meta Search Engines”

– Entrez at NCBI (U.S.) – SRS at EBI (Europe)

  • Value comes from linking databases
  • Accession numbers provide unique identifiers
slide-8
SLIDE 8

8

Security

  • Assume that everything you send over the

internet can be intercepted.

  • Don't send confidential data, patent data, etc.
  • None of the public databases currently supports

encryption

slide-9
SLIDE 9

9

Searching Entrez

slide-10
SLIDE 10

10

Nucleotide Results

slide-11
SLIDE 11

11

Core Nucleotide DB

slide-12
SLIDE 12

12

DNA / Nucleotide DBs

  • International Nucleotide Sequence Database

Collaboration (INSDC)

same content GenBank = NCBI Nucleotide

slide-13
SLIDE 13

13

File Formats: GenBank

LOCUS AAURRA 118 bp ss-rRNA RNA 16-JUN-1986 DEFINITION A.auricula-judae (mushroom) 5S ribosomal RNA. ACCESSION K03160 VERSION K03160.1 GI:173593 KEYWORDS 5S ribosomal RNA; ribosomal RNA. SOURCE A.auricula-judae (mushroom) ribosomal RNA. ORGANISM Auricularia auricula-judae Eukaryota; Fungi; Eumycota; Basidiomycotina; Phragmobasidiomycetes; Heterobasidiomycetidae; Auriculariales; Auriculariaceae. REFERENCE 1 (bases 1 to 118) AUTHORS Huysmans,E., Dams,E., Vandenberghe,A. and De Wachter,R. TITLE The nucleotide sequences of the 5S rRNAs of four mushrooms and their use in studying the phylogenetic position of basidiomycetes among the eukaryotes JOURNAL Nucleic Acids Res. 11, 2871-2880 (1983) FEATURES Location/Qualifiers rRNA 1..118 /note="5S ribosomal RNA" BASE COUNT 27 a 34 c 34 g 23 t ORIGIN 5' end of mature rRNA. 1 atccacggcc ataggactct gaaagcactg catcccgtcc gatctgcaaa gttaaccaga 61 gtaccgccca gttagtacca cggtggggga ccacgcggga atcctgggtg ctgtggtt // LOCUS ABCRRAA 118 bp ss-rRNA RNA 15-SEP-1990 ...

slide-14
SLIDE 14

14

File Formats: FASTA

>gi|173593|gb|K03160.1|AAURRA Auricula auricula-judae 5S ribosomal RNA ATCCACGGCCATAGGACTCTGAAAGCACTGCATCCCGTCCGATCTGCAA AGTTAACCAGAGTACCGCCCAGTTAGTACCACGGTGGGGGACCACGCG GGAATCCTGGGTGCTGTGGTT

slide-15
SLIDE 15

15

Sequence Retrieval System (SRS)

  • URL: http://srs.ebi.ac.uk/
slide-16
SLIDE 16

16

Selecting Libraries (DBs) to Search

slide-17
SLIDE 17

17

Standard Query Form

slide-18
SLIDE 18

18

UCSC Genome Browser

  • Portal to ENCODE:

Encyclopedia of DNA elements functional annotation of the human genome

slide-19
SLIDE 19

19

Protein: UniProt / SwissProt

  • URL: http://expasy.org/sprot/

– SwissProt: manually curated – TrEMBL: anntotated automatically

slide-20
SLIDE 20

20

Protein Structure: (WW)PDB

  • http://www.wwpdb.org/