Problems with metagenome annotation How much has been sequenced? - - PowerPoint PPT Presentation

problems with metagenome annotation
SMART_READER_LITE
LIVE PREVIEW

Problems with metagenome annotation How much has been sequenced? - - PowerPoint PPT Presentation

Problems with metagenome annotation How much has been sequenced? Number of known sequences 100 Environmental bacterial sequencing genomes First 1,000 bacterial bacterial genome genomes Year If the database doubles every 15 months,


slide-1
SLIDE 1

Problems with metagenome annotation

slide-2
SLIDE 2

First bacterial genome 100 bacterial genomes 1,000 bacterial genomes Number of known sequences Year

How much has been sequenced?

Environmental sequencing

If the database doubles every 15 months, how

  • ften do you need to rerun your sample?
slide-3
SLIDE 3

Long Queues

slide-4
SLIDE 4

MG-RAST speed is not dependent on MG size!

Days to weeks Minutes to seconds

slide-5
SLIDE 5
slide-6
SLIDE 6

The SEED database

  • Started with a few subsystems
slide-7
SLIDE 7

Over 2,000 subsystems

  • Unmanageable!
  • Needed a solution so the annotators

could fjnd their subsystems.

  • Created hierarchy
slide-8
SLIDE 8

Three level “hierarchy”

  • Amino Acids and Derivatives

– Alanine, serine, and glycine

  • Serine Biosynthesis
  • Amino Acids and Derivatives

– Lysine, threonine, methionine, and cysteine

  • Methionine Biosynthesis

Over 2,000 Subsystems

slide-9
SLIDE 9

Classifjcation # SS Classifjcation # SS Classifjcation # SS Experimental Subsystems 498 Regulation and Cell signaling 51 Motility and Chemotaxis 11 Clustering-based subsystems 352 Virulence 49 Plant cell walls and

  • uter surfaces

10 Carbohydrates 160 Stress Response 43 Phages 10 Cofactors, Vitamins, Prosthetic Groups, Pigments 123 DNA Metabolism 41 Cell Division and Cell Cycle 10 Amino Acids and Derivatives 96 Aromatic Compounds 38 Photosynthesis 9 Protein Metabolism 95 Phages 36 Metabolite damage 8 Virulence, Disease, Defense 70 Secondary Metabolism 34 Phosphorus Metabolism 7 Miscellaneous 70 Iron acquisition and metabolism 31 Potassium metabolism 4 RNA Metabolism 65 Nucleosides and Nucleotides 24 Transcriptional regulation 2 Membrane Transport 65 Sulfur Metabolism 20 Plasmids 2 Respiration 62 Dormancy and Sporulation 17 Central metabolism 2 Cell Wall and Capsule 62 Plant-prokaryote 12 Autotrophy 2 Fatty Acids, Lipids, and 60 Nitrogen Metabolism 12 Arabinose Transport 1

slide-10
SLIDE 10

FQ8D8DZ01AWR9I One hit: xxx07431423 (fjg|448385.11.peg.379) DNA-directed RNA polymerase beta' subunit (EC 2.7.7.6) RNA polymerase bacterial

slide-11
SLIDE 11

FQ8D8DZ02G8RSI has two hits: xxx02998721 3e-04 “hypothetical protein” xxx05921978 4e-03 “Fibrinogen-binding protein” Fibrinogen-binding protein is in subsystem “Streptococcus pyogenes virulome”

slide-12
SLIDE 12
  • Ammonia assimilation
  • Ammonium metabolism H. pylori
  • Glutamine, Glutamate, Aspartate

and Asparagine Biosynthesis

  • Iron-sulfur experimental

FQ8D8DZ02GF820 207 hits Glutamate synthase [NADPH] large chain (EC 1.4.1.13)

slide-13
SLIDE 13

FQ8D8DZ02GF820 has 250 hits:

slide-14
SLIDE 14

Does it matter?

  • Compare things that are the same!
  • Know which version of the database you

used

  • Recompute if you are not sure!