Metagenomics What is metagenomics Cloning genes from the - - PowerPoint PPT Presentation
Metagenomics What is metagenomics Cloning genes from the - - PowerPoint PPT Presentation
Metagenomics What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics Screening from the environment Random fragments of DNA Clone
What is metagenomics
- Cloning genes from the environment, screening for
function
- 16S sequencing
- Random community genomics
- Eukaryotic metagenomics
Screening from the environment
- Random fragments of DNA
- Clone into a vector
– Low copy vectors – BACs – YACs
BACs
Science Creative Quarterly
Screening from the environment
- Random fragments of DNA
- Clone into a vector
– Low copy vectors – BACs – YACs
- Screen for a phenotype
- e.g. Diversa patents > 1,000 amylase genes
Why did Diversa sequence whale-falls?
Screening from the environment
- Expression host?
- Pathway or single gene?
- Get what you select
- But remember …
A selection is worth a thousand screens
16S sequencing
- Catalogs the bacteria that are present
- PCR amplify the 16S gene with standard
primers
- Sequence the primers
- Compare to known databases
Prokaryotic ribosome: Large subunit: 50S 5S and 23S rRNA Small subunit: 30S 16S rRNA
Ribosomes
Ribosomes are made of proteins and RNA
Blue: protein Orange: rRNA
30S Thermus aquaticus subunit
- E. coli
- E. coli
16S rRNA 16S rRNA secondary secondary structure structure
- Highly conserved
- Base pairs = stems
- No pairing = loops
Variable regions in Variable regions in the 16S rRNA. the 16S rRNA. Vn – 9 regions forward/rev primers
V1 V2 V3 V4 V5 V6 V7 V8 V9
- E. coli
- E. coli
16S rRNA 16S rRNA secondary secondary structure structure
16S Primers
- 27F – 1492R full length
- 967F – 1046R V6 region
- 1380F – 1510R V9 region
1,465 base pairs 130 base pairs 79 base pairs
Variable regions = Variable results!
V1-V3 V1-V3 V3-V5 V3-V5 V6-V9 V6-V9
16S databases
- Greengenes
– http://greengenes.lbl.gov/ – Gary Andersen, Lawrence Berkeley National Laboratory
- SILVA – ARB
– http://www.arb-silva.de/ – Frank Oliver Glöckner, MPI, Bremen, Germany
- VAMPS
– http://vamps.mbl.edu/ – Mitch Sogin, Woods Hole, USA
- Ribosomal Database Project (RDP)
– http://rdp.cme.msu.edu/ – James Cole, Michigan State University, USA
16S sequencing
- Cheap
- Easy
- Portable
- PCR bias
- Variable regions give
variable answers
- Only tells you which
- rganisms are present
& abundance
- Does not explain much
- f the variance of the
data
What does 16S sequencing actually tell you?
What does 16S sequencing tell you?
What does 16S sequencing tell you?
What is metagenomics
- Cloning genes from the environment, screening for
function
- 16S sequencing
- Random community genomics
- Eukaryotic metagenomics
16S sequencing is not good for functions
How much of the data?
Findley et al, Nature 2013 doi: 10.1038/nature12171
Topography of [fungi and] bacteria on the skin
Study = 5,000 taxa 14 skin sites 10 people 3 skin types 5,000 variables
They don't explain the meaning of j-q
The remainder of the variance (85.1%) is explained by a few taxa each Each dimension only adds marginal information
How much of the data?
Nine biomes paper Dinsdale et al., Nature 2008 doi:10.1038/nature06810 Variance: 1,040,665 reads total (from 45 samples) 30 subsystems 9 biomes 30 variables Fewer of the variables explain more of the data The variables are distinctive for each environment
Shotgun sequencing (HiSeq)
Movies courtesy Will Trimble, Argonne National Labs http://www.mcs.anl.gov/~trimble/flowcell/
16S sequencing (MiSeq)
Movies courtesy Will Trimble, Argonne National Labs http://www.mcs.anl.gov/~trimble/flowcell/
Shotgun + 16S (HiSeq)
Movies courtesy Will Trimble, Argonne National Labs http://www.mcs.anl.gov/~trimble/flowcell/
There is no 16S for viruses
Rohwer and Edwards, 2002. The phage proteomic tree. doi: 10.1128/JB.184.16.4529-4535.2002
200 liters water 5-500 g fresh fecal matter DNA/RNA LASL Sequence
Epifmuorescent Microscopy
Extract nucleic acids Concentrate and purify viruses or bacteria
Random community genomics
- Extract DNA
– Soil extraction kit – Water extraction kit
- Create library
– LASLs – fosmids
- Sequence fragments
How do you sequence the environment?
Hydroshear Blunt-ending Addition of Linkers Amplification of Fragments
H y d r- s
- e
- n
- f
- n
- f
This method produces high coverage libraries of
- ver 1 million clones
from as little as 1 ng DNA
Soil Extraction Kit
David Mead -
Breitbart (2002) PNAS
Linker-Amplifjed Shotgun Libraries (LASLs)
- http://phage.sdsu.edu/~rob/cgi-bin/remoteblast.cgi
- Submit BLAST to local and remote databases
– Local (as fast as possible) – NCBI (one search every 3 seconds)
- Many concurrent searches
– One search versus 1,000 searches
- Parse data into tables for Excel
– Access to taxonomy etc
Early Attempts at a Metagenomics Platform
- More bacteria than somatic cells
by at least an order of magnitude
- More phages than bacteria by an
- rder of magnitude
- Sample the bacteria in the
intestine by sampling their phage
Human-associated viruses
Known 40% Unknown 60%
Breitbart (2003) J. Bacteriol.
Phages 94% Eukaryotic Viruses 6%
Most Viral DNA Sequences in Adult Human Feces are Unknown Phages
Abundance of viruses in twins
Reyes et al, Nature 2010
Microbial samples in guts don't change very much
Reyes et al, Nature 2010
Abundance of viruses in twins
Phage samples in guts change a lot
Reyes et al, Nature 2010
Abundance of viruses in twins
Microbial Phage
Reyes et al, Nature 2010
Abundance of viruses in twins
Known 92% Unknown 8% Pepper Mild Mottle Virus 65% Other Plant Viruses 9% Other 26%
Zhang (2006) PLoS Biology
Most Human RNA Viruses are Known
- ssRNA virus; ≈6 kb genome
- Related to T
- bacco Mosaic Virus
- Infects members of Capsicum family
- Widely distributed – spread through seeds
- Fruits are small, malformed, mottled
- Rod-shaped virions
TOBACCO MOSAIC VIRUS http://www.rothamsted.bbsrc.ac .uk/ppi/links/pplinks/virusems/
Viral particles in fecal sample
Pepper Mild Mottle Virus (PMMV)
S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 S 9 PMM V
Fecal samples Extract total RNA RT-PCR for PMMV San Diego : 78% people are positive Singapore : 67% people are positive 10-50 fold increase in feces compared to food 106-109 PMMV copies per gram dry weight of feces
PMMV is common in Human Feces
Indian curry Pork noodle red chili Chicken rice Chinese food Hong Kong chili sauce Hong Kong green chili Vegetarian chili
Chili powder Chili sauces NOT FOUND IN FRESH PEPPERS
Which Foods Contain PMMV?
Rosario et al. AEM (2009)
PMMV is Present at High Concentrations in Raw Sewage and Treated Wastewater
0.1 Lib3 Contig[0064] Lib2 Contig[0070] AB084456.1 AB062049.1 AB062051.1 AF103778 AY632863.1 AB119482.1 AJ429088.1 AB062054.1 CoatProtein AB069853.1 AB062052.1 AB000709.2 M87827.1 AJ429087.1 AF525080.1 Lib2_2217 Lib3_Contig[0494] Lib3_Contig[1213] Lib2_Contig[0458] Lib2_Contig[1099] Lib3_65 Lib3_Contig[0273] Lib3_Contig[0078] Lib3_Contig[0863] AJ308228.1 AB062053.1 AJ429089.1 X72587.1 Lib2_1377 Lib2_2914 Lib1_2299 Lib3_928 Lib2_1656 Lib2_2549 Lib3_462 Lib2_492 Lib3_Contig[0655] Lib2_133 Lib1_Contig[0253] Lib1_Contig[0123] Lib1_Contig[0279] Lib1_Contig[0107 ] Lib1_Contig[0052 ] Lib1_Contig[0004 ] Lib2_Contig[0995] Lib1_Contig[0009] Lib1_Contig[0166] Lib1_Contig[0657] Lib1_1449 Lib1_2211 Lib1_Contig[0029] Lib1_1733 Lib1_Contig[0076] Lib1_1168 Lib1_Contig[0261] Lib1_2361 Lib2 1468 Lib2 Contig[0031] Lib2 Contig[1202] Lib1_Contig[0005] Lib1_Contig[0558] AF103776.1 AB062050.1
I II III IV V
0.1 Lib3 Contig[0064] Lib2 Contig[0070] AB084456.1 AB062049.1 AB062051.1 AF103778 AY632863.1 AB119482.1 AJ429088.1 AB062054.1 CoatProtein AB069853.1 AB062052.1 AB000709.2 M87827.1 AJ429087.1 AF525080.1 Lib2_2217 Lib3_Contig[0494] Lib3_Contig[1213] Lib2_Contig[0458] Lib2_Contig[1099] Lib3_65 Lib3_Contig[0273] Lib3_Contig[0078] Lib3_Contig[0863] AJ308228.1 AB062053.1 AJ429089.1 X72587.1 Lib2_1377 Lib2_2914 Lib1_2299 Lib3_928 Lib2_1656 Lib2_2549 Lib3_462 Lib2_492 Lib3_Contig[0655] Lib2_133 Lib1_Contig[0253] Lib1_Contig[0123] Lib1_Contig[0279] Lib1_Contig[0107 ] Lib1_Contig[0052 ] Lib1_Contig[0004 ] Lib2_Contig[0995] Lib1_Contig[0009] Lib1_Contig[0166] Lib1_Contig[0657] Lib1_1449 Lib1_2211 Lib1_Contig[0029] Lib1_1733 Lib1_Contig[0076] Lib1_1168 Lib1_Contig[0261] Lib1_2361 Lib2 1468 Lib2 Contig[0031] Lib2 Contig[1202] Lib1_Contig[0005] Lib1_Contig[0558] AF103776.1 AB062050.1
I II III IV V
Library 1 Library 2 Library 3 Same person 6 months apart
- Diverse populations
- Differences between individuals
and over time
Difgerent PMMV families
Infected leaf Control Fecal sample Total RNA PMMV RT-PCR Viral concentrate Plant leaf inoculation
- Spread of infection to Hungarian
wax pepper evident within 1 week
- Infected leaf was positive by
RT-PCR for PMMV
- Animals may serve as vectors
for plant viruses
Human-fecal borne PMMV can infect plants
Thesunmachine.net http://www.sweatnspice.com
Koch’s Postulates
Random community genomics
Eukaryotic metagenomics
- ITS sequences
– Internal transcribed spacer regions
– http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4113289/
- Individual genes
– Cox1
- Exome sequencing
– Pull out ESTs and sequence
- What is there?
- How many are there?
- What are they doing?
- Experimental manipulations
- Diagnostics
Why Metagenomics?
Sequencing costs decreasing
http://genome.gov/sequencingcosts
First bacterial genome 100 bacterial genomes 1,000 bacterial genomes Number of known sequences Year Environmental sequencing
How much has been sequenced?
Everybody in San Diego Everybody in USA All cultured Bacteria 100 people One genome from every species Most major microbial environments Year
How much will be sequenced?
Most pipelines work the same way!
Metagenomics Processing
B i n n i n g r e a d s Contamination removal Contig Clustering Functional Assignments G e n e P r e d i c t i
- n
M e r g e p a i r e d
- e
n d r e a d s P r e p r
- c
e s s i n g Taxonomic assignments
Metagenomics
- Quality control –
Prinseq
- Deconseq
- Annotation
– FOCUS – Real time
metagenomics
– mg-rast – Super FOCUS
- Statistics
– STAMP
- Population genomes
– crAss – metabat – ContigClustering
Metagenomics Processing
AbundanceBin CompostBin concoct crAss tetra Contig clustering FragGeneScan GlimmerMG MetaGeneAnnotator MetaGeneMark MetaGun Orphelia Prodigal Gene Prediction FASTQC FastX Toolkit fjtGCP NGS QC Toolkit Non-pareil Prinseq QC-Chain Streaming Trim Preprocessing CARMA myTaxa FOCUS PhylopythiaS KRAKEN phymmbl LMAT RAIphy MEGAN TACOA Metaplan Taxy Taxonomic assignment CLAMS Sequedex DiScRIBinATE SORT-ITEMS genometa SPANNER GSMer SPHINX PPLACER TaxSOM RTMg Treephyler Functional assignment