Data analysis of 16S rRNA gene amplicons
Gerrit Botha Microbiome workshop University of the Cape Town, Cape Town, South Africa October 2017
This material is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0) 1
Data analysis of 16S rRNA gene amplicons Gerrit Botha Microbiome - - PowerPoint PPT Presentation
Data analysis of 16S rRNA gene amplicons Gerrit Botha Microbiome workshop University of the Cape Town, Cape Town, South Africa October 2017 This material is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0) 1 Why 16S
This material is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0) 1
Darryl Leja, National Human Genome Research Institute (CC BY 2.0) 2
3
4
5
6
7
8
9
10
Instrument Amplification Run time Bases / read bp/run cost/Gb Applied Biosystems 3730 (capillary) PCR, cloning 2 hrs. 650 62,400 $2,307,692.31 454 GS Jr. Titanium emPCR 10 hrs. 400 50,000,000 $19,540.00 454 FLX+ emPCR 20 hrs. 650 650,000,000 $9,538.46 Illumina GA IIx - v5 PE bridgePCR 14 days 288 184,320,000,000 $97.54 Illumina MiSeq v3 bridgePCR 55 hrs. 600 13,200,000,000 $109.24 Illumina HiSeq 2500 - high output v3 BridgePCR 11 days 200 300,000,000,000 $45.27 Illumina HiSeq X (2 flow cells) BridgePCR 3 days 300 1,800,000,000,000 $7.08 Ion Torrent – PGM 318 chip emPCR 7.3 hrs. 400 1,900,000,000 $460.00 Ion Torrent - Proton III (forecast) emPCR 6 hrs. 175 87,500,000,000 $11.43 Life Technologies SOLiD – 5500xl emPCR 8 days 110 155,100,000,000 $67.72 Pacific Biosciences RS II None - SMS 2 hrs. 3000 90,000,000 $1,111.11 Oxford Nanopore MinION (forecast) None - SMS ≤6 hrs. 9000 900,000,000 $1,000.00 Oxford Nanopore GridION 8000 (forecast) None - SMS varies 10000 100,000,000,000 $10.00 paired-end sum
11
12
Fastq SAM VCF BED GFF3 BIOM
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
Sobs = number of species in the sample, F1 = number of singletons (number of species appear once in the sample) F2 = is the number of doubletons (number of species appear twice in the sample). Central concept is that if rare species (singletons) are still being discovered when sampling a community then there is probably more rare species yet to be found. If all species have been found at least twice (doubleton) then it is less likely new species still to be discovered.
30
N= total number of individuals of all species, n = total number of individuals for each species
Berger-Parker)
31
depth enough?
50 individuals 250 individuals 500 individuals 2 species 4 species 8 species
2401-2405; DOI: 10.1098/rspb.2002.2116. Published 7 December 2002.
32
33
34
li is the branch length between node i and its parent, and Ai and Bi are indicators equal to 0 or 1 as descendants of node i are absent or present in communities A and B respectively
A = red, B= blue, branches in common are purple, branches unique to A are red and unique to B are blue. Presence/absence metric.
35
36
37
38
39
40