Problems with metagenome annotation How much has been sequenced? - - PowerPoint PPT Presentation
Problems with metagenome annotation How much has been sequenced? - - PowerPoint PPT Presentation
Problems with metagenome annotation How much has been sequenced? Number of known sequences 100 Environmental bacterial sequencing genomes First 1,000 bacterial bacterial genome genomes Year If the database doubles every 15 months,
First bacterial genome 100 bacterial genomes 1,000 bacterial genomes Number of known sequences Year
How much has been sequenced?
Environmental sequencing
If the database doubles every 15 months, how
- ften do you need to rerun your sample?
Long Queues
MG-RAST speed is not dependent on MG size!
Days to weeks Minutes to seconds
The SEED database
- Started with a few subsystems
Over 2,000 subsystems
- Unmanageable!
- Needed a solution so the annotators
could fjnd their subsystems.
- Created hierarchy
Three level “hierarchy”
- Amino Acids and Derivatives
– Alanine, serine, and glycine
- Serine Biosynthesis
- Amino Acids and Derivatives
– Lysine, threonine, methionine, and cysteine
- Methionine Biosynthesis
Over 2,000 Subsystems
Classifjcation # SS Classifjcation # SS Classifjcation # SS Experimental Subsystems 498 Regulation and Cell signaling 51 Motility and Chemotaxis 11 Clustering-based subsystems 352 Virulence 49 Plant cell walls and
- uter surfaces
10 Carbohydrates 160 Stress Response 43 Phages 10 Cofactors, Vitamins, Prosthetic Groups, Pigments 123 DNA Metabolism 41 Cell Division and Cell Cycle 10 Amino Acids and Derivatives 96 Aromatic Compounds 38 Photosynthesis 9 Protein Metabolism 95 Phages 36 Metabolite damage 8 Virulence, Disease, Defense 70 Secondary Metabolism 34 Phosphorus Metabolism 7 Miscellaneous 70 Iron acquisition and metabolism 31 Potassium metabolism 4 RNA Metabolism 65 Nucleosides and Nucleotides 24 Transcriptional regulation 2 Membrane Transport 65 Sulfur Metabolism 20 Plasmids 2 Respiration 62 Dormancy and Sporulation 17 Central metabolism 2 Cell Wall and Capsule 62 Plant-prokaryote 12 Autotrophy 2 Fatty Acids, Lipids, and 60 Nitrogen Metabolism 12 Arabinose Transport 1
FQ8D8DZ01AWR9I One hit: xxx07431423 (fjg|448385.11.peg.379) DNA-directed RNA polymerase beta' subunit (EC 2.7.7.6) RNA polymerase bacterial
FQ8D8DZ02G8RSI has two hits: xxx02998721 3e-04 “hypothetical protein” xxx05921978 4e-03 “Fibrinogen-binding protein” Fibrinogen-binding protein is in subsystem “Streptococcus pyogenes virulome”
- Ammonia assimilation
- Ammonium metabolism H. pylori
- Glutamine, Glutamate, Aspartate
and Asparagine Biosynthesis
- Iron-sulfur experimental
FQ8D8DZ02GF820 207 hits Glutamate synthase [NADPH] large chain (EC 1.4.1.13)
FQ8D8DZ02GF820 has 250 hits:
Does it matter?
- Compare things that are the same!
- Know which version of the database you
used
- Recompute if you are not sure!