Population Genomics Image: Lisa Brown for National Public Radio Rob - - PowerPoint PPT Presentation

population genomics
SMART_READER_LITE
LIVE PREVIEW

Population Genomics Image: Lisa Brown for National Public Radio Rob - - PowerPoint PPT Presentation

Population Genomics Image: Lisa Brown for National Public Radio Rob Edwards San Diego State University Phages in the Worlds Oceans ARC 56 samples 16 sites 1 year BBC 85 samples 38 sites 8 years SAR 1 sample 1 site 1 year GOM 41


slide-1
SLIDE 1

Population Genomics

Rob Edwards San Diego State University

Image: Lisa Brown for National Public Radio

slide-2
SLIDE 2

GOM 41 samples 13 sites 5 years SAR 1 sample 1 site 1 year BBC 85 samples 38 sites 8 years ARC 56 samples 16 sites 1 year LI 4 sites 1 year

Phages in the Worlds Oceans

slide-3
SLIDE 3

Most Marine Phage Sequences are Novel

slide-4
SLIDE 4
  • 6% of SAR sequences ssDNA phage

(Chlamydia-like Microviridae)

  • 40% viral particles in SAR are ssDNA phage
  • Several full-genome sequences were

recovered via de novo assembly of these fragments

  • Confjrmed by PCR and sequencing

Marine Single-Stranded DNA Viruses

slide-5
SLIDE 5

12,297 sequence fragments hit using TBLASTX over a ~4.5 kb genome

Individual sequence reads Chlamydia phi 4 genome Coverage Concatenated hits Chl4 ORF calls

SAR metagenome and Chlamydia φ4

slide-6
SLIDE 6

The phage proteomic tree

slide-7
SLIDE 7
  • HECTOR and PARIS

– Degenerate primers that amplify T7 DNA

polymerase

– T

ested samples from around world

– T

ested by difgerent investigators in difgerent laboratories

Signature sequences

Mya Breitbart

slide-8
SLIDE 8

Breitbart et al, FEMS Micro Lett

~1026 copies of each sequence on the planet = 60 metric tons of this DNA sequence

T7 phages are globally distributed

slide-9
SLIDE 9

Phage Proteomic T ree v. 5 (Edwards, Rohwer)

ssDNA λ-like T7-like T4-like

Some phages are everywhere

slide-10
SLIDE 10

Phage P4 – 11kb, 10 ORFs

Azul – individual sequence reads in a metagenome Verde – coverage across genome

Compare viruses to all metagenomes

slide-11
SLIDE 11

P4 phage genome # metagenome hits

Parts of viruses are everywhere

slide-12
SLIDE 12

Unknown genes Known genes Viral Microbial

Viruses have lots of unknown genes

slide-13
SLIDE 13

Bas Dutilh

slide-14
SLIDE 14

cross Assembly

metagenome 1 metagenome 2 metagenome 3 metagenome 4

slide-15
SLIDE 15

cross Assembly

Assembly metagenome 1 metagenome 2 metagenome 3 metagenome 4

slide-16
SLIDE 16

cross Assembly

http://edwards.sdsu.edu/crass/

Contigs directly represent the overlap between samples

slide-17
SLIDE 17

Reyes et al. Nature 2010

HMP viruses

slide-18
SLIDE 18

Microbes Phages

Phages are more variable than microbes

Reyes et al. Nature 2010

Functions present in samples

slide-19
SLIDE 19

1 2 3 4 5 6 7 8 9 10 11 12 1 10 100 1000 10000

Number of samples contributing reads to contig Number of contigs 6,988 de novo cross-contigs

Reyes et al. Nature 2010

De novo assembly HMP data

Number of contigs

slide-20
SLIDE 20

Big data – microbiome style

F1M F1T1 F1T2 F2M F2T1 F2T2 F3M F3T1 F3T2 F4M F4T1 F4T2

Average depth → Samples →

slide-21
SLIDE 21

Complete crAssphage genome

slide-22
SLIDE 22
slide-23
SLIDE 23

Complete crAssphage genome

slide-24
SLIDE 24

How big is the chimerization problem?

Assembly algorithms include “chimera protection”

  • Break contigs at ambiguities

contig1 contig2 contig4 contig5 contig3

Investigate the efgect of chimerization:

  • Use difgerent assembly parameters and assess results
  • High stringency

few chimeras →

  • Low stringency

many chimeras →

slide-25
SLIDE 25

What are chimeras?

Chimerization is more frequent between closely related strains

  • Similar sequences

Aziz et al. NAR 2010

Venus the chimeric cat

https://www.facebook.com/VenusTheAmazingChimeraCat https://twitter.com/Venustwofacecat

What are intra-phyla chimeras???

slide-26
SLIDE 26

What are chimeras?

Chimerization is more frequent between closely related strains

  • Similar sequences
  • What are intra-phyla chimeras???

Aziz et al. NAR 2010

Evolutionary conserved entities!

abundant and conserved enough to assemble

slide-27
SLIDE 27

What is the host?

1) Sequence homology between phage and bacterial genes 2) Similarity in CRISPR spacers 3) Oligonucleotide usage profjle 4) Co-occurrence across metagenomic samples

  • Reads mapped from 152 fecal total community

metagenomes

  • Reads mapped to phages and bacteria
  • Normalize; Spearman rank correlations; cluster
  • crAssphage clusters with Bacteroidetes
  • Just like two known Bacteroides phages B40-8 and

B124-14 5) Plaques

slide-28
SLIDE 28
slide-29
SLIDE 29
  • Requires correct host strain
  • Requires phage makes visible

plaques

  • Often requires correct

concentrations of Mg++, Ca++, etc

What is the host?

No PCR hits in at least 100 plaques isolated from 10 pooled viral preparations on Bacteroides fragilis and B. thetaiotaomicron lawns.

slide-30
SLIDE 30

% Genome position: 0 – 97,065 nt

crAssphage found in intestines

Looked at 2,906 metagenomes Only found in 940 metagenomes

slide-31
SLIDE 31

crAssphage is abundant!

Abundance-ubiquity plot

slide-32
SLIDE 32
  • Present in 32.3% of sequenced environmental samples (940 / 2,906)

– Includes virus metagenomes and total community metagenomes

  • >6x more abundant than all (1,192) other known phages combined

– Corrected for genome size

  • Present in 73.4% of sequenced human fecal samples (342 / 466)

– 99.9% of all crAssphage reads were found in feces (signifjcant)

  • 1.68% of the reads in all human fecal metagenomes
  • Estimate: ~6 crAssphage genomes per Bacteroides genome in your gut
  • >90% of the reads in some of virus metagenomes from the US twin study
  • 24% of the reads in an unrelated virus metagenome from Korea
  • 22% of the reads in total community metagenomes from USA (HMP data)
  • Found on every continent (where we have data)

crAssphage by the numbers

slide-33
SLIDE 33

Virome reads mapping to viral database

Viral database vs crAssphage

Reyes et al. Nature 2010 Virome reads mapping to crAssphage

Unknown sequences

slide-34
SLIDE 34
  • Phage or contamination?

– Highly abundant in viral metagenomes size- and density-fjltered for VLPs – ORFs show similarity to bacteriophage and bacterial proteins (no conserved bacterial or archaeal metabolic genes) – Phage-like modularity among functions – Coding structure of the ORFs is typical of a phage genome – Putative prokaryotic promoter patterns – Genome detected in many metagenomes around the world

  • Amplifjcation skews?

Potential caveats

slide-35
SLIDE 35

Summary

  • crAssphage is everywhere
  • everyone has it (rounding up)
  • we don't know what it does
slide-36
SLIDE 36

metagenomics

  • metagenomics 1.0: profjling
  • metagenomics 2.0: population genomics
slide-37
SLIDE 37

Tools for population genomics

  • AbundanceBin
  • CompostBin
  • CONCOCT
  • crAss
  • GroopM
  • Metabat
  • mmgenome
slide-38
SLIDE 38

Discussion points

  • How many genomes would you expect in a

population?

  • More coverage versus more samples?
  • Cutofgs for inclusion (e.g. GC, closeness, etc)
slide-39
SLIDE 39

Discussion points

How do you know the contigs are from the same organism (genotype)

– http://edwards.sdsu.edu/GenomePeek – BLAST hits – GC content or k-mer composition – Single copy genes – abundance profjles across metagenomes – Paired ends/mate pairs – PCR – PFGE and size comparisons – SIP and metabolically active fraction – Single cell genomics – Culturing / genome sequencing