Genome Annotation The steps in genome sequencing Generate genome - - PowerPoint PPT Presentation

genome annotation
SMART_READER_LITE
LIVE PREVIEW

Genome Annotation The steps in genome sequencing Generate genome - - PowerPoint PPT Presentation

Genome Annotation The steps in genome sequencing Generate genome sequence Assembly ORF calling tRNA identifjcation rRNA identifjcation Functional annotation Annotating Genomes Identifying which protein performs which


slide-1
SLIDE 1

Genome Annotation

slide-2
SLIDE 2

The steps in genome sequencing

  • Generate genome sequence

– Assembly – ORF calling – tRNA identifjcation – rRNA identifjcation – Functional annotation

slide-3
SLIDE 3

Annotating Genomes

  • Identifying which protein performs which

function

slide-4
SLIDE 4

www.sigmaaldrich.com

slide-5
SLIDE 5

Why annotate a genome?

  • Catalog what's there
  • Identify what's missing – but should be there!

– Things you don't know

  • In vitro growth

– Mycoplasma pneumoniae

  • Comparative genomics
  • Hypothesis generation
slide-6
SLIDE 6

The goals of annotation

  • Exchange information with others
  • Compare annotations between organisms
slide-7
SLIDE 7

How to annotate a genome?

  • Sequence
  • Assemble
  • Identify open reading frames

– Putative proteins

slide-8
SLIDE 8

Putative protein

  • Open Reading Frame (ORF)

– A stretch of amino acids with no stop codon

  • Coding Sequence (CDS)

– An ORF that could encode a protein

  • Protein encoding gene (PEG)

– An ORF that could encode a protein

  • Hypothetical protein = putative protein

– Something that has not been experimentally shown

  • Polypeptide

– Short stretch of ~50 amino acids. Often a domain

slide-9
SLIDE 9

PEGS

  • E. coli

– 4,391 genes – 4,288 genes that make proteins (pegs)

slide-10
SLIDE 10

ORF Calling

slide-11
SLIDE 11

Genome Annotation

slide-12
SLIDE 12

The steps in genome sequencing

  • Generate genome sequence

– Assembly – ORF calling – tRNA identifjcation – rRNA identifjcation – Functional annotation

slide-13
SLIDE 13

Traditional genome annotation

slide-14
SLIDE 14

Traditional genome annotation BLAST Similarities

slide-15
SLIDE 15

Traditional genome annotation BLAST Similarities

slide-16
SLIDE 16

Traditional genome annotation BLAST Similarities

slide-17
SLIDE 17

Traditional genome annotation BLAST Similarities

slide-18
SLIDE 18

Traditional genome annotation BLAST Similarities

slide-19
SLIDE 19

Traditional genome annotation BLAST Similarities

slide-20
SLIDE 20

Traditional genome annotation BLAST Similarities

slide-21
SLIDE 21

Traditional genome annotation BLAST Similarities

slide-22
SLIDE 22

Traditional genome annotation BLAST Similarities

slide-23
SLIDE 23

Traditional genome annotation BLAST Similarities

slide-24
SLIDE 24

Traditional genome annotation BLAST Similarities

slide-25
SLIDE 25

Traditional genome annotation BLAST Similarities

slide-26
SLIDE 26

Traditional genome annotation BLAST Similarities

slide-27
SLIDE 27

Protein Families

slide-28
SLIDE 28

Protein Families

slide-29
SLIDE 29

Protein Families

slide-30
SLIDE 30

Protein Families

slide-31
SLIDE 31

Gene Ontology

  • Ontology

– A “hierarchy” of functions – Does not need to be linear

  • Directed Acyclic Graph
  • Controlled Vocabulary

– Decides which words or phrases to use

slide-32
SLIDE 32

GO

  • Gene ontology

– A eukaryotic focus

  • Drosophila
  • Mus
  • Saccharomyces
  • Homo
slide-33
SLIDE 33

GO

  • Cellular component

– The parts of a cell

  • Molecular function

– e.g. ligand binding

  • Biological processes

– What things do

slide-34
SLIDE 34

GO Terms

  • [GO ID, function]
  • e.g:

– GO:0004743 – Ontology: molecular function – Name: pyruvate kinase activity

slide-35
SLIDE 35

GO Terms

  • [GO ID, function]
  • e.g:

– GO:0004743 – Ontology: molecular function – Name: pyruvate kinase activity

  • Mainly assigned by BLAST/HMMER/... etc
slide-36
SLIDE 36

Directed Acyclic Graph

Molecular function Catalytic activity Transferase activity Transferase activity, transferring phosphorous Kinase activity phosphotransferase activity, alcohol group as acceptor Pyruvate kinase activity

slide-37
SLIDE 37

Problems

  • Annotation by committee
  • Eukaryotic focus

– Some efgorts to counter that

  • Owen White
  • Arriane Toussaint
  • Not very deep
  • Strict controlled vocabulary
slide-38
SLIDE 38

Alternatives

slide-39
SLIDE 39

lacZ lacI lacY lacA

Jacob & Monod, 1961 Basic biology

slide-40
SLIDE 40

lacZ lacI lacY lacA

Basic biology

slide-41
SLIDE 41

< 80 % < 80 % < 80%

Difgerent types of clustering

slide-42
SLIDE 42

< 80 % < 80 % < 80%

Difgerent types of clustering

slide-43
SLIDE 43

Purine metabolism

slide-44
SLIDE 44

< 80 % < 80 % < 80%

Difgerent types of clustering

slide-45
SLIDE 45

Heme / chlorophyll metabolism is conserved

They are both porphyrins

slide-46
SLIDE 46

A q u i f c a e B a c t e r

  • i

d e t e s C h l a m y d i a e C h l

  • r
  • f

e x i C y a n

  • b

a c t e r i a D e i n

  • c
  • c

c u s

  • T

h e r m u s S p i r

  • c

h a e t e s T h e r m

  • t
  • g

a e

1 0.8 0.6 0.4 0.2 Clusters of genes w/ maximum 80% identity Genes in subsystems in clusters T

  • tal number of genomes in group

Fraction of genes in clusters Number of genomes 40 80 120

Occurrence of clustering in difgerent genomes

slide-47
SLIDE 47
  • Subsystem is a generalization of “pathway”

– collection of functional roles jointly involved in a

biological process or complex

  • Functional Role is the abstract biological function
  • f a gene product

– atomic, or user-defjned, examples:

  • 6-phosphofructokinase (EC 2.7.1.11)
  • LSU ribosomal protein L31p
  • Streptococcal virulence factors
  • Should not contain “putative”, “thermostable”, etc
  • Populated subsystem is complete spreadsheet
  • f functions and roles

The Subsystems Approach to Annotation

slide-48
SLIDE 48

1 HutH Histidine ammonia-lyase (EC 4.3.1.3) 2 HutU Urocanate hydratase (EC 4.2.1.49) 3 HutI Imidazolonepropionase (EC 3.5.2.7) 4 GluF Glutamate formiminotransferase (EC 2.1.2.5) 5 HutG Formiminoglutamase (EC 3.5.3.8) 6 NfoD N-formylglutamate deformylase (EC 3.5.1.68) 7 ForI Formiminoglutamic iminohydrolase (EC 3.5.3.13)

Subsystem: Histidine Degradation

  • Conversion of histidine to glutamate
  • Functional roles defjned in table
  • Inclusion in subsystem is only by functional role
  • Controlled vocabulary …

Histidine Degradation

slide-49
SLIDE 49
  • Column headers taken from table of functional roles
  • Rows are selected genomes or organisms
  • Cells are populated with specifjc, annotated genes
  • Functional variants defjned by the annotated roles
  • Variant code -1 indicates subsystem is not functional
  • Clustering shown by color

Organism Variant HutH HutU HutI GluF HutG NfoD ForI Bacteroides thetaiotaomicron 1

Q8A4B3 Q8A4A9 Q8A4B1 Q8A4B0

Desulfotela psychrophila 1

gi51246205 gi51246204 gi51246203 gi51246202

Halobacterium sp. 2

Q9HQD5 Q9HQD8 Q9HQD6 Q9HQD7

Deinococcus radiodurans 2

Q9RZ06 Q9RZ02 Q9RZ05 Q9RZ04

Bacillus subtilis 2

P10944 P25503 P42084 P42068

Caulobacter crescentus 3

P58082 Q9A9MI P58079 Q9A9M0 Q9A9L9

Pseudomonas putida 3

Q88CZ7 Q88CZ6 Q88CZ9 Q88D00 Q88CZ3

Xanthomonas campestris 3

Q8PAA7 P58988 Q8PAA6 Q8PAA8 Q8PAA5

Listeria monocytogenes

  • 1

Subsystem Spreadsheet

Subsystem Spreadsheet

slide-50
SLIDE 50

1 HutH Histidine ammonia-lyase (EC 4.3.1.3) 2 HutU Urocanate hydratase (EC 4.2.1.49) 3 HutI Imidazolonepropionase (EC 3.5.2.7) 4 GluF Glutamate formiminotransferase (EC 2.1.2.5) 5 HutG Formiminoglutamase (EC 3.5.3.8) 6 NfoD N-formylglutamate deformylase (EC 3.5.1.68) 7 ForI Formiminoglutamic iminohydrolase (EC 3.5.3.13)

Subsystem: Histidine Degradation

Organism Variant HutH HutU HutI GluF HutG NfoD ForI Bacteroides thetaiotaomicron 1

Q8A4B3 Q8A4A9 Q8A4B1 Q8A4B0

Desulfotela psychrophila 1

gi51246205 gi51246204 gi51246203 gi51246202

Halobacterium sp. 2

Q9HQD5 Q9HQD8 Q9HQD6 Q9HQD7

Deinococcus radiodurans 2

Q9RZ06 Q9RZ02 Q9RZ05 Q9RZ04

Bacillus subtilis 2

P10944 P25503 P42084 P42068

Caulobacter crescentus 3

P58082 Q9A9MI P58079 Q9A9M0 Q9A9L9

Pseudomonas putida 3

Q88CZ7 Q88CZ6 Q88CZ9 Q88D00 Q88CZ3

Xanthomonas campestris 3

Q8PAA7 P58988 Q8PAA6 Q8PAA8 Q8PAA5

Listeria monocytogenes

  • 1

Subsystem Spreadsheet

“The Populated Subsystem”

slide-51
SLIDE 51

Microbial sialic acid metabolism has now been frmly established as a virulence determinant in a range

  • f infectious diseases

Nan-operon within Sialic Acid Metabolism

slide-52
SLIDE 52

The nan-operon

slide-53
SLIDE 53

No in cluster

Abbr. Functional role in subsystem 1 NanK N-acetylmannosamine kinase (EC 2.7.1.60)

ABH- 0028250

putative NAGC-like transcriptional regulator

ABS- 0084973

possible kinase

ADD- 0003671

putative NAGC-like transcriptional regulator

ACZ- 0002834

putative sugar kinase 2 NanE N-acetylmannosamine-6- phosphate 2-epimerase (EC 5.1.3.9)

ABH- 0028251

putative enzyme

ABS- 0083505

conserved hypothetical protein

ADD- 0003672

putative enzyme

ACZ- 0002836

conserved hypothetical protein 3 NanA N-acetylneuraminate lyase (EC 4.1.3.3)

ABH- 0028253

N-acetyl- neuraminate lyase

ABS- 0084976

N-acetyl- neuraminate lyase

ADD- 0003674

N-acetyl- neuraminate lyase

ACZ- 0002837

probable N- acetylneuraminate lyase 4 YhcH Putative sugar isomerase involved in processing of exogenous sialic acid*

ABH- 0028249

  • rf, hypothetical

protein

ABS- 0084972

conserved hypothetical protein

ADD- 0003670

conserved hypothetical protein

ACZ- 0002833

conserved hypothetical protein 5 NanT Sialic acid transporter (permease) NanT

ABH- 0028252

sialic acid transporter

ABS- 0084975

putative sialic acid transporter

ADD- 0003673

sialic acid transporter

ACZ- 0002831

MFS family sialic acid transporter 10 NanR Transcriptional regulator NanR**

ABH- 0028254

putative FADA-type transcriptional regulator

ABS- 0084977

putative GntR-family transcriptional regulator

ADD- 0003675

putative FADA-type transcriptional regulator NOT PRESENT (likely repalced by a clustered member of RpiR family)

* proposed by: 9. Teplyakov, A., Obmolova, G., Toedt, J., Galperin, M. Y ., Gilliland, G. L. (2005). Crystal Structure of the Bacterial YhcH Protein Indicates a Role in Sialic Acid Catabolism. J. Bacteriol. 187: 5520-5527

** K. A. Kalivoda, S. M. Steenbergen, E. R. Vimr , and J. Plumbridge Regulation of Sialic Acid Catabolism by the DNA Binding Protein NanR in Escherichia coli. J. Bacteriol., August 15, 2003; 185(16): 4806 - 4815

Escherichia coli O157:H7 EDL933 Salmonella enterica subsp. enterica serovar Typhi Ty2 Shigella sonnei 53G Yersinia pseudotuberculosis IP

Color coding for annotations:

  • green, consistent
  • yellow; general class;
  • gray, inconsistent or not

informative

Annotations in conserved cluster (nan-operon)

slide-54
SLIDE 54

Methionine Biosynthesis

You need to get to here From here

slide-55
SLIDE 55

Sulfhydrylation

Organism

Variant Code HSDH HK HSST HSAT AHSH/ SHSH CTGS CTBL MetH MetE BhmT MTHFR

Nostoc sp. PCC 7120

4427 657 619 1093

Synechocystis sp. PCC 6803

2356 1112 2469 1144

Thermosynechococcus elongatus BP-1

277 1764 1027 1090 1770

Trichodesmium erythraeum IMS101

415, 4266 6167 106, 1229 2279 4433

Gloeobacter violaceus PCC 7421

4295 1127 2500 477 789

Anabaena variabilis ATCC 29413

33 2331 5519 3872 3873 4254, 6365 6434

Nostoc punctiforme

33 2895 6648 5301 5302 4055 1885

Prochlorococcus marinus MED4

66 1204 1764 1714 1715 2 1 1421 295

Prochlorococcus marinus str. MIT 9313

66 1141 426 875 874 225 226 728 2005

Prochlorococcus marinus subsp. marinus str. CCMP1375

66 1148 1064 799 798 404 405 957 176

Prochlorococcus marinus subsp. pastoris str. CCMP1986

66 1047 592 640 639 405 406 874 153

Synechococcus sp. WH 8102

66 706 1476 845 846 669 670 1233 2258

Synechococcus elongatus PCC 7942

1397 769 2172 1030 2173 702 639 Homocerine activation Transsulfuration Methylation

slide-56
SLIDE 56

Sulfhydrylation

Organism

Variant Code HSDH HK HSST HSAT AHSH/ SHSH CTGS CTBL MetH MetE BhmT MTHFR

Nostoc sp. PCC 7120

4427 657 619 1093

Synechocystis sp. PCC 6803

2356 1112 2469 1144

Thermosynechococcus elongatus BP-1

277 1764 1027 1090 1770

Trichodesmium erythraeum IMS101

415, 4266 6167 106, 1229 2279 4433

Gloeobacter violaceus PCC 7421

4295 1127 2500 477 789

Anabaena variabilis ATCC 29413

33 2331 5519 3872 3873 4254, 6365 6434

Nostoc punctiforme

33 2895 6648 5301 5302 4055 1885

Prochlorococcus marinus MED4

66 1204 1764 1714 1715 2 1 1421 295

Prochlorococcus marinus str. MIT 9313

66 1141 426 875 874 225 226 728 2005

Prochlorococcus marinus subsp. marinus str. CCMP1375

66 1148 1064 799 798 404 405 957 176

Prochlorococcus marinus subsp. pastoris str. CCMP1986

66 1047 592 640 639 405 406 874 153

Synechococcus sp. WH 8102

66 706 1476 845 846 669 670 1233 2258

Synechococcus elongatus PCC 7942

1397 769 2172 1030 2173 702 639 Homocerine activation Transsulfuration Methylation

slide-57
SLIDE 57

Sulfhydrylation

Organism

Variant Code HSDH HK HSST HSAT AHSH/ SHSH CTGS CTBL MetH MetE BhmT MTHFR

Nostoc sp. PCC 7120

4427 657 619 1093

Synechocystis sp. PCC 6803

2356 1112 2469 1144

Thermosynechococcus elongatus BP-1

277 1764 1027 1090 1770

Trichodesmium erythraeum IMS101

415, 4266 6167 106, 1229 2279 4433

Gloeobacter violaceus PCC 7421

4295 1127 2500 477 789

Anabaena variabilis ATCC 29413

33 2331 5519 3872 3873 4254, 6365 6434

Nostoc punctiforme

33 2895 6648 5301 5302 4055 1885

Prochlorococcus marinus MED4

66 1204 1764 1714 1715 2 1 1421 295

Prochlorococcus marinus str. MIT 9313

66 1141 426 875 874 225 226 728 2005

Prochlorococcus marinus subsp. marinus str. CCMP1375

66 1148 1064 799 798 404 405 957 176

Prochlorococcus marinus subsp. pastoris str. CCMP1986

66 1047 592 640 639 405 406 874 153

Synechococcus sp. WH 8102

66 706 1476 845 846 669 670 1233 2258

Synechococcus elongatus PCC 7942

1397 769 2172 1030 2173 702 639 Homocerine activation Transsulfuration Methylation

?

slide-58
SLIDE 58

Sulfhydrylation

Organism

Variant Code HSDH HK HSST HSAT AHSH/ SHSH CTGS CTBL MetH MetE BhmT MTHFR

Nostoc sp. PCC 7120

4427 657 619 1093

Synechocystis sp. PCC 6803

2356 1112 2469 1144

Thermosynechococcus elongatus BP-1

277 1764 1027 1090 1770

Trichodesmium erythraeum IMS101

415, 4266 6167 106, 1229 2279 4433

Gloeobacter violaceus PCC 7421

4295 1127 2500 477 789

Anabaena variabilis ATCC 29413

33 2331 5519 3872 3873 4254, 6365 6434

Nostoc punctiforme

33 2895 6648 5301 5302 4055 1885

Prochlorococcus marinus MED4

66 1204 1764 1714 1715 2 1 1421 295

Prochlorococcus marinus str. MIT 9313

66 1141 426 875 874 225 226 728 2005

Prochlorococcus marinus subsp. marinus str. CCMP1375

66 1148 1064 799 798 404 405 957 176

Prochlorococcus marinus subsp. pastoris str. CCMP1986

66 1047 592 640 639 405 406 874 153

Synechococcus sp. WH 8102

66 706 1476 845 846 669 670 1233 2258

Synechococcus elongatus PCC 7942

1397 769 2172 1030 2173 702 639 Homocerine activation Transsulfuration Methylation

? ? Missing genes

slide-59
SLIDE 59

Hypothesis generation that leads to the wet lab...

slide-60
SLIDE 60
  • Wet lab
  • Chromosomal context
  • Metabolic context
  • Phylogenetic context
  • Microarray data
  • Proteomics data

Subsystems developed based on

slide-61
SLIDE 61

How can we compare annotations

  • There are several groups doing annotations
  • f microbial genomes
  • How do we compare them?
slide-62
SLIDE 62

Caveat emptor!

slide-63
SLIDE 63
  • Number of subsystems defjned
  • Number of functional roles defjned
  • Number of genes connected to functional

roles

Natural Metrics

slide-64
SLIDE 64

Annotations for some genomes

slide-65
SLIDE 65

Number of solid connections of gene to functional role where “solid” is

  • 1. supported by experimental data
  • 2. connected to functional role and in chromosomal

cluster with genes implementing functional roles from the same subsystem

  • 3. only gene in genome connected to a functional role

in an active variant of a subsystem

Reactions, GO terms, Articles, Other databases cross references (number and diversity)

Applied Metrics

slide-66
SLIDE 66

Applied Metrics

slide-67
SLIDE 67

Talmudic question*

If I find the identical protein sequence in two different organisms, is it doing the same function in both organisms? Per: Elio Schaecter, Small Things Considered. A talmudic question is unanswerable

slide-68
SLIDE 68

FIG function:

Phosphoribosylformimino-5-aminoimidazole carboxamide ribotide isomerase (EC 5.3.1.16)

Other functions in RefSeq:

phosphoribosylformimino-5-aminoimidazole carboxamide phosphoribosylformimino-5-aminoimidazole carboxamide ribotide isomerase phosphoribosylformimino-5-aminoimidazole carboxamide ribotide... 1-(5-phosphoribosyl)-5-[(5- phosphoribosylamino)methylideneamino] imidazole-4-carboxamide isomerase N-(5-phospho-L-ribosyl-formimino)-5-amino-1-(5- phosphoribosyl)-4-imidazolecarboxamide isomerase N-(5'-phospho-L-ribosyl-formimino)-5-amino-1-(5'-phosphoribosyl)-4-imidazolecarboxamide isomerase N-(5'-phospho-L-ribosyl-formimino)-5-amino-1- (5'- phosphoribosyl)-4-imidazolecarboxamide isomerase N-(5'-phospho-L-ribosyl-formimino)-5-amino-1- (5'-phosphoribosyl)-4-imidazolecarboxamide isomerase N-(5'-phospho-L-ribosyl-formimino)-5-amino-1- (5'-phosphoribosyl)-4- imidazolecarboxamide isomerase Phosphoribosyl isomerase A [1-[5-phosphoribosyl]-5-[[5-phosphoribosylamino]methylideneamino] imidazole-4-carboxamide isomerase]

hisA

slide-69
SLIDE 69
  • Defjne a set of protein families such that each family contains

genes playing the same function

  • Attach functional roles to protein families
  • Measure the consistency of the annotations made to genes

within each family

  • 1. "consistency" is the odds that two proteins from

the same family have the same function

  • 2. Evaluate both families and functions.

Measuring Consistency

slide-70
SLIDE 70

Consistency among databases (2008)

slide-71
SLIDE 71

Number of RefSeq proteins in families

slide-72
SLIDE 72
  • If everything was called “hypothetical protein” the database

would be 100% consistent

  • Need to measure accuracy (specifjcity) as well as consistency
  • Sample 100 proteins at random from “curated” set (i.e. that

are believed to be correct)

  • Manually inspect annotations to score correctness

How to measure accuracy

slide-73
SLIDE 73

Problems

  • Subsytems are biased!
  • Subsystems are inaccurate!
  • Merging annotations between difgerent

groups is political/psychological not technical!

slide-74
SLIDE 74

Problems

  • E. coli

– 4,391 genes – 4,288 genes that make proteins (pegs) – 676 genes that make enzymes

15% of genes encode enzymes!

slide-75
SLIDE 75

The SEED Family

www.nmpdr.org www.theseed.org

slide-76
SLIDE 76

Three level “hierarchy”

  • Amino Acids and Derivatives

– Alanine, serine, and glycine

  • Serine Biosynthesis
  • Amino Acids and Derivatives

– Lysine, threonine, methionine, and cysteine

  • Methionine Biosynthesis

Make your own subsystems!

About 2,500 Subsystems

slide-77
SLIDE 77

Classifjcation # SS Classifjcation # SS Classifjcation # SS Experimental Subsystems 498 Regulation and Cell signaling 51 Motility and Chemotaxis 11 Clustering-based subsystems 352 Virulence 49 Plant cell walls and

  • uter surfaces

10 Carbohydrates 160 Stress Response 43 Phages 10 Cofactors, Vitamins, Prosthetic Groups, Pigments 123 DNA Metabolism 41 Cell Division and Cell Cycle 10 Amino Acids and Derivatives 96 Aromatic Compounds 38 Photosynthesis 9 Protein Metabolism 95 Phages 36 Metabolite damage 8 Virulence, Disease, Defense 70 Secondary Metabolism 34 Phosphorus Metabolism 7 Miscellaneous 70 Iron acquisition and metabolism 31 Potassium metabolism 4 RNA Metabolism 65 Nucleosides and Nucleotides 24 Transcriptional regulation 2 Membrane Transport 65 Sulfur Metabolism 20 Plasmids 2 Respiration 62 Dormancy and Sporulation 17 Central metabolism 2 Cell Wall and Capsule 62 Plant-prokaryote 12 Autotrophy 2 Fatty Acids, Lipids, and 60 Nitrogen Metabolism 12 Arabinose Transport 1

slide-78
SLIDE 78
  • http://rast.nmpdr.org
  • Rapid Annotation using Subsystem T

echnology

  • Started in 2008
  • Designed for annotating bacterial and archaeal

genomes

  • As of Monday, May 11th 248,822 annotation

jobs

  • 19,918 registered users
slide-79
SLIDE 79
  • Find the phylogenetic neighborhood of your

genome

  • Look for proteins that related organisms have

– Core proteins – Subset of all subsystems

  • Use those calls as a training set for

critica/glimmer

– Intrinsic training set!

The annotation process (complete genomes)

slide-80
SLIDE 80

This one’s for Gary

slide-81
SLIDE 81
  • Subsystem, GO, and KEGG connections

– KEGG EC numbers – KEGG reaction numbers – SEED reaction numbers (Chris Henry)

  • Metabolic fmux models

– Automatically generate FBA matrices (Aaron Best/Matt DeJongh; Hope College)

Automatic metabolic reconstruction

slide-82
SLIDE 82
slide-83
SLIDE 83

The Populated Subsystem

slide-84
SLIDE 84

Automatically compare metabolic reconstructions

slide-85
SLIDE 85
  • Rapidly correct missing annotations
  • Add more members to subsystems
  • Improves future genome annotations!

(especially with new subsystems)

Find and suggest candidate functions

slide-86
SLIDE 86
  • 10 genomes submitted on Thursday at 6 pm
  • First annotation complete before 8 am Friday
  • Remaining annotations completed Friday

before noon

  • (there were others in the pipeline too!)
  • Presentation ASM 2009 Tuesday, 8pm

The Live ASM Test

Philadelphia, 2009

slide-87
SLIDE 87

Genome Percent of Proteins in Subsystems

Haloferax denitrifcans 20% Haloferax mediterranei 19% Haloferax sulfurifontis 19% Haloferax volcanii DS2 19% Haloarcula sp 33800 19% Haloarcula sp 33799 18%

Subsystems coverage of sequenced Archaea

slide-88
SLIDE 88

Phage talk Work by Sajia Akhter

Haloferax sulfurifontis prophage

Prophages

slide-89
SLIDE 89

Metagenomics RAST had 300 public metagenomes Compared using tblastx Comparing complete genomes to metagenomes

slide-90
SLIDE 90

Human Poop

slide-91
SLIDE 91

Thanks Nick Celms, Beltran Rodriguez-Mueller, Mya Breitbart, & Forest Rohwer

High Salinity Salterns

San Diego, July 2004

slide-92
SLIDE 92

Low salinity salterns High salinity salterns July 2004 Nov 2005

slide-93
SLIDE 93

RAST usage grows...

slide-94
SLIDE 94

RAST coverage....

slide-95
SLIDE 95

RASTtk

  • RAST2.0
  • Customizable choice of pipelines to run
  • Same behind the scenes infrastructure
slide-96
SLIDE 96

RASTtk

slide-97
SLIDE 97

Vibrio genomes

slide-98
SLIDE 98

Sequencing at Sea