Metagenomics What is metagenomics Cloning genes from the - - PowerPoint PPT Presentation

metagenomics
SMART_READER_LITE
LIVE PREVIEW

Metagenomics What is metagenomics Cloning genes from the - - PowerPoint PPT Presentation

Metagenomics What is metagenomics Cloning genes from the environment, screening for function 16S sequencing Random community genomics Eukaryotic metagenomics Screening from the environment Random fragments of DNA Clone


slide-1
SLIDE 1

Metagenomics

slide-2
SLIDE 2

What is metagenomics

  • Cloning genes from the environment, screening for

function

  • 16S sequencing
  • Random community genomics
  • Eukaryotic metagenomics
slide-3
SLIDE 3

Screening from the environment

  • Random fragments of DNA
  • Clone into a vector

– Low copy vectors – BACs – YACs

slide-4
SLIDE 4

BACs

Science Creative Quarterly

slide-5
SLIDE 5

Screening from the environment

  • Random fragments of DNA
  • Clone into a vector

– Low copy vectors – BACs – YACs

  • Screen for a phenotype
  • e.g. Diversa patents > 1,000 amylase genes

Why did Diversa sequence whale-falls?

slide-6
SLIDE 6

Screening from the environment

  • Expression host?
  • Pathway or single gene?
  • Get what you select
  • But remember …

A selection is worth a thousand screens

slide-7
SLIDE 7

16S sequencing

  • Catalogs the bacteria that are present
  • PCR amplify the 16S gene with standard

primers

  • Sequence the primers
  • Compare to known databases
slide-8
SLIDE 8

Prokaryotic ribosome: Large subunit: 50S 5S and 23S rRNA Small subunit: 30S 16S rRNA

Ribosomes

Ribosomes are made of proteins and RNA

slide-9
SLIDE 9

Blue: protein Orange: rRNA

30S Thermus aquaticus subunit

slide-10
SLIDE 10
  • E. coli
  • E. coli

16S rRNA 16S rRNA secondary secondary structure structure

  • Highly conserved
  • Base pairs = stems
  • No pairing = loops
slide-11
SLIDE 11

Variable regions in Variable regions in the 16S rRNA. the 16S rRNA. Vn – 9 regions forward/rev primers

V1 V2 V3 V4 V5 V6 V7 V8 V9

  • E. coli
  • E. coli

16S rRNA 16S rRNA secondary secondary structure structure

slide-12
SLIDE 12

16S Primers

  • 27F – 1492R full length
  • 967F – 1046R V6 region
  • 1380F – 1510R V9 region

1,465 base pairs 130 base pairs 79 base pairs

slide-13
SLIDE 13

Variable regions = Variable results!

V1-V3 V1-V3 V3-V5 V3-V5 V6-V9 V6-V9

slide-14
SLIDE 14

16S databases

  • Greengenes

– http://greengenes.lbl.gov/ – Gary Andersen, Lawrence Berkeley National Laboratory

  • SILVA – ARB

– http://www.arb-silva.de/ – Frank Oliver Glöckner, MPI, Bremen, Germany

  • VAMPS

– http://vamps.mbl.edu/ – Mitch Sogin, Woods Hole, USA

  • Ribosomal Database Project (RDP)

– http://rdp.cme.msu.edu/ – James Cole, Michigan State University, USA

slide-15
SLIDE 15

16S sequencing

  • Cheap
  • Easy
  • Portable
  • PCR bias
  • Variable regions give

variable answers

  • Only tells you which
  • rganisms are present

& abundance

  • Does not explain much
  • f the variance of the

data

What does 16S sequencing actually tell you?

slide-16
SLIDE 16

What does 16S sequencing tell you?

slide-17
SLIDE 17

What does 16S sequencing tell you?

slide-18
SLIDE 18

What is metagenomics

  • Cloning genes from the environment, screening for

function

  • 16S sequencing
  • Random community genomics
  • Eukaryotic metagenomics
slide-19
SLIDE 19

16S sequencing is not good for functions

slide-20
SLIDE 20

How much of the data?

Findley et al, Nature 2013 doi: 10.1038/nature12171

Topography of [fungi and] bacteria on the skin

Study = 5,000 taxa 14 skin sites 10 people 3 skin types 5,000 variables

They don't explain the meaning of j-q

The remainder of the variance (85.1%) is explained by a few taxa each Each dimension only adds marginal information

slide-21
SLIDE 21

How much of the data?

Nine biomes paper Dinsdale et al., Nature 2008 doi:10.1038/nature06810 Variance: 1,040,665 reads total (from 45 samples) 30 subsystems 9 biomes 30 variables Fewer of the variables explain more of the data The variables are distinctive for each environment

slide-22
SLIDE 22

Shotgun sequencing (HiSeq)

Movies courtesy Will Trimble, Argonne National Labs http://www.mcs.anl.gov/~trimble/flowcell/

slide-23
SLIDE 23

16S sequencing (MiSeq)

Movies courtesy Will Trimble, Argonne National Labs http://www.mcs.anl.gov/~trimble/flowcell/

slide-24
SLIDE 24

Shotgun + 16S (HiSeq)

Movies courtesy Will Trimble, Argonne National Labs http://www.mcs.anl.gov/~trimble/flowcell/

slide-25
SLIDE 25

There is no 16S for viruses

Rohwer and Edwards, 2002. The phage proteomic tree. doi: 10.1128/JB.184.16.4529-4535.2002

slide-26
SLIDE 26

200 liters water 5-500 g fresh fecal matter DNA/RNA LASL Sequence

Epifmuorescent Microscopy

Extract nucleic acids Concentrate and purify viruses or bacteria

Random community genomics

slide-27
SLIDE 27
  • Extract DNA

– Soil extraction kit – Water extraction kit

  • Create library

– LASLs – fosmids

  • Sequence fragments

How do you sequence the environment?

slide-28
SLIDE 28

Hydroshear Blunt-ending Addition of Linkers Amplification of Fragments

H y d r
  • s
h e a r B l u n t
  • e
n d i n g A d d i t i
  • n
  • f
L i n k e r s A m p l i f i c a t i
  • n
  • f
F r a g me n t s

This method produces high coverage libraries of

  • ver 1 million clones

from as little as 1 ng DNA

Soil Extraction Kit

David Mead -

Breitbart (2002) PNAS

Linker-Amplifjed Shotgun Libraries (LASLs)

slide-29
SLIDE 29
  • http://phage.sdsu.edu/~rob/cgi-bin/remoteblast.cgi
  • Submit BLAST to local and remote databases

– Local (as fast as possible) – NCBI (one search every 3 seconds)

  • Many concurrent searches

– One search versus 1,000 searches

  • Parse data into tables for Excel

– Access to taxonomy etc

Early Attempts at a Metagenomics Platform

slide-30
SLIDE 30
  • More bacteria than somatic cells

by at least an order of magnitude

  • More phages than bacteria by an
  • rder of magnitude
  • Sample the bacteria in the

intestine by sampling their phage

Human-associated viruses

slide-31
SLIDE 31

Known 40% Unknown 60%

Breitbart (2003) J. Bacteriol.

Phages 94% Eukaryotic Viruses 6%

Most Viral DNA Sequences in Adult Human Feces are Unknown Phages

slide-32
SLIDE 32

Abundance of viruses in twins

Reyes et al, Nature 2010

slide-33
SLIDE 33

Microbial samples in guts don't change very much

Reyes et al, Nature 2010

Abundance of viruses in twins

slide-34
SLIDE 34

Phage samples in guts change a lot

Reyes et al, Nature 2010

Abundance of viruses in twins

slide-35
SLIDE 35

Microbial Phage

Reyes et al, Nature 2010

Abundance of viruses in twins

slide-36
SLIDE 36

Known 92% Unknown 8% Pepper Mild Mottle Virus 65% Other Plant Viruses 9% Other 26%

Zhang (2006) PLoS Biology

Most Human RNA Viruses are Known

slide-37
SLIDE 37
  • ssRNA virus; ≈6 kb genome
  • Related to T
  • bacco Mosaic Virus
  • Infects members of Capsicum family
  • Widely distributed – spread through seeds
  • Fruits are small, malformed, mottled
  • Rod-shaped virions

TOBACCO MOSAIC VIRUS http://www.rothamsted.bbsrc.ac .uk/ppi/links/pplinks/virusems/

Viral particles in fecal sample

Pepper Mild Mottle Virus (PMMV)

slide-38
SLIDE 38

S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 S 9 PMM V

Fecal samples Extract total RNA RT-PCR for PMMV San Diego : 78% people are positive Singapore : 67% people are positive 10-50 fold increase in feces compared to food 106-109 PMMV copies per gram dry weight of feces

PMMV is common in Human Feces

slide-39
SLIDE 39

Indian curry Pork noodle red chili Chicken rice Chinese food Hong Kong chili sauce Hong Kong green chili Vegetarian chili

Chili powder Chili sauces NOT FOUND IN FRESH PEPPERS

Which Foods Contain PMMV?

slide-40
SLIDE 40

Rosario et al. AEM (2009)

PMMV is Present at High Concentrations in Raw Sewage and Treated Wastewater

slide-41
SLIDE 41

0.1 Lib3 Contig[0064] Lib2 Contig[0070] AB084456.1 AB062049.1 AB062051.1 AF103778 AY632863.1 AB119482.1 AJ429088.1 AB062054.1 CoatProtein AB069853.1 AB062052.1 AB000709.2 M87827.1 AJ429087.1 AF525080.1 Lib2_2217 Lib3_Contig[0494] Lib3_Contig[1213] Lib2_Contig[0458] Lib2_Contig[1099] Lib3_65 Lib3_Contig[0273] Lib3_Contig[0078] Lib3_Contig[0863] AJ308228.1 AB062053.1 AJ429089.1 X72587.1 Lib2_1377 Lib2_2914 Lib1_2299 Lib3_928 Lib2_1656 Lib2_2549 Lib3_462 Lib2_492 Lib3_Contig[0655] Lib2_133 Lib1_Contig[0253] Lib1_Contig[0123] Lib1_Contig[0279] Lib1_Contig[0107 ] Lib1_Contig[0052 ] Lib1_Contig[0004 ] Lib2_Contig[0995] Lib1_Contig[0009] Lib1_Contig[0166] Lib1_Contig[0657] Lib1_1449 Lib1_2211 Lib1_Contig[0029] Lib1_1733 Lib1_Contig[0076] Lib1_1168 Lib1_Contig[0261] Lib1_2361 Lib2 1468 Lib2 Contig[0031] Lib2 Contig[1202] Lib1_Contig[0005] Lib1_Contig[0558] AF103776.1 AB062050.1

I II III IV V

0.1 Lib3 Contig[0064] Lib2 Contig[0070] AB084456.1 AB062049.1 AB062051.1 AF103778 AY632863.1 AB119482.1 AJ429088.1 AB062054.1 CoatProtein AB069853.1 AB062052.1 AB000709.2 M87827.1 AJ429087.1 AF525080.1 Lib2_2217 Lib3_Contig[0494] Lib3_Contig[1213] Lib2_Contig[0458] Lib2_Contig[1099] Lib3_65 Lib3_Contig[0273] Lib3_Contig[0078] Lib3_Contig[0863] AJ308228.1 AB062053.1 AJ429089.1 X72587.1 Lib2_1377 Lib2_2914 Lib1_2299 Lib3_928 Lib2_1656 Lib2_2549 Lib3_462 Lib2_492 Lib3_Contig[0655] Lib2_133 Lib1_Contig[0253] Lib1_Contig[0123] Lib1_Contig[0279] Lib1_Contig[0107 ] Lib1_Contig[0052 ] Lib1_Contig[0004 ] Lib2_Contig[0995] Lib1_Contig[0009] Lib1_Contig[0166] Lib1_Contig[0657] Lib1_1449 Lib1_2211 Lib1_Contig[0029] Lib1_1733 Lib1_Contig[0076] Lib1_1168 Lib1_Contig[0261] Lib1_2361 Lib2 1468 Lib2 Contig[0031] Lib2 Contig[1202] Lib1_Contig[0005] Lib1_Contig[0558] AF103776.1 AB062050.1

I II III IV V

Library 1 Library 2 Library 3 Same person 6 months apart

  • Diverse populations
  • Differences between individuals

and over time

Difgerent PMMV families

slide-42
SLIDE 42

Infected leaf Control Fecal sample Total RNA PMMV RT-PCR Viral concentrate Plant leaf inoculation

  • Spread of infection to Hungarian

wax pepper evident within 1 week

  • Infected leaf was positive by

RT-PCR for PMMV

  • Animals may serve as vectors

for plant viruses

Human-fecal borne PMMV can infect plants

slide-43
SLIDE 43

Thesunmachine.net http://www.sweatnspice.com

Koch’s Postulates

slide-44
SLIDE 44

Random community genomics

slide-45
SLIDE 45

Eukaryotic metagenomics

  • ITS sequences

– Internal transcribed spacer regions

– http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4113289/

  • Individual genes

– Cox1

  • Exome sequencing

– Pull out ESTs and sequence

slide-46
SLIDE 46
  • What is there?
  • How many are there?
  • What are they doing?
  • Experimental manipulations
  • Diagnostics

Why Metagenomics?

slide-47
SLIDE 47

Sequencing costs decreasing

http://genome.gov/sequencingcosts

slide-48
SLIDE 48

First bacterial genome 100 bacterial genomes 1,000 bacterial genomes Number of known sequences Year Environmental sequencing

How much has been sequenced?

slide-49
SLIDE 49

Everybody in San Diego Everybody in USA All cultured Bacteria 100 people One genome from every species Most major microbial environments Year

How much will be sequenced?

slide-50
SLIDE 50

Most pipelines work the same way!

slide-51
SLIDE 51

Metagenomics Processing

B i n n i n g r e a d s Contamination removal Contig Clustering Functional Assignments G e n e P r e d i c t i

  • n

M e r g e p a i r e d

  • e

n d r e a d s P r e p r

  • c

e s s i n g Taxonomic assignments

slide-52
SLIDE 52

Metagenomics

  • Quality control –

Prinseq

  • Deconseq
  • Annotation

– FOCUS – Real time

metagenomics

– mg-rast – Super FOCUS

  • Statistics

– STAMP

  • Population genomes

– crAss – metabat – ContigClustering

slide-53
SLIDE 53

Metagenomics Processing

AbundanceBin CompostBin concoct crAss tetra Contig clustering FragGeneScan GlimmerMG MetaGeneAnnotator MetaGeneMark MetaGun Orphelia Prodigal Gene Prediction FASTQC FastX Toolkit fjtGCP NGS QC Toolkit Non-pareil Prinseq QC-Chain Streaming Trim Preprocessing CARMA myTaxa FOCUS PhylopythiaS KRAKEN phymmbl LMAT RAIphy MEGAN TACOA Metaplan Taxy Taxonomic assignment CLAMS Sequedex DiScRIBinATE SORT-ITEMS genometa SPANNER GSMer SPHINX PPLACER TaxSOM RTMg Treephyler Functional assignment