SLIDE 1 Next Next Generation Sequencing: an overview of Generation Sequencing: an overview of technologies and applications technologies and applications
Matthew Tinning Australian Genome Research Facility
July 2012
SLIDE 2
History of Sequencing History of Sequencing Where have we been? Where have we been?
1869 – Discovery of DNA 1909 – Chemical characterisation 1953 – Structure of DNA solved 1977 – Sanger sequencing invented – First genome sequenced – ФX174 (5 kb) – First genome sequenced – ФX174 (5 kb) 1986 – First automated sequencing machine 1990 – Human Genome Project started 1992 – First “sequencing factory” at TIGR
SLIDE 3
History of Sequencing History of Sequencing Where have we been? Where have we been?
1995 – First bacterial genome – H. influenzae (1.8 Mb) 1998 – First animal genome – C. elegans (97 Mb) 2003 – Completion of Human Genome Project (3 Gb) – 13 years, $2.7 bn 2005 – First “next-generation” sequencing instrument 2005 – First “next-generation” sequencing instrument 2008 – 2356 genome sequences in NCBI database
SLIDE 4 Origin of DNA Sequencing Origin of DNA Sequencing
– First genome (ФX174) – Sequencing by synthesis (Sanger) – Sequencing by degradation (Maxam – Sequencing by degradation (Maxam Gilbert)
SLIDE 5 Sanger sequencing Sanger sequencing
- Uses DNA polymerase
- All four nucleotides, plus one
dideoxynucleotide
- Random termination at specific bases
- Random termination at specific bases
- Separate by gel electrophoresis
SLIDE 6
Chain extension and Chain extension and termination termination
SLIDE 7 Chain extension and Chain extension and termination termination
dideoxynucleotide
SLIDE 8
Fragment identification Fragment identification
SLIDE 9
Gel electrophoresis Gel electrophoresis
SLIDE 10 1986: 4 Reactions to 1 Lane 1986: 4 Reactions to 1 Lane
Sequencing Reaction Products Progression of Sequencing Reaction
SLIDE 11
ABI377 ABI377
SLIDE 12
ABI3730xl ABI3730xl
SLIDE 13
Electropherograms Electropherograms
SLIDE 14 Sanger Sequencing Sanger Sequencing
~900 base
< 2.1 million bases (rapid mode, 500 bp reads) < 0.1% of the human genome > 1000 days of sequencing for a 1 fold coverage ...
SLIDE 15
Human Genome Project Human Genome Project
SLIDE 16 Human Genome Project Human Genome Project
- Launched in 1989 –expected to take 15 years
– Competing Celera project launched in 1998
- Genome estimated to be 92% complete
– 1st Draft released in 2000 – “Complete” genome released in 2003 – “Complete” genome released in 2003 – Sequence of last chromosome published in 2006
– Celera ~$300 million
SLIDE 17
Shotgun Sequencing Approach Shotgun Sequencing Approach
SLIDE 18
Next Next Gen Sequencing Gen Sequencing Technologies Technologies
SLIDE 19 Next Next Gen Sequencing Gen Sequencing Technologies Technologies
- Four platforms, four technologies
- All massively parallel sequencing
– Sequencing by synthesis – Sequencing by Ligation – Sequencing by Ligation
- Read lengths vary from ~36bp to >400bp
- Read numbers vary from ~ 1 million to ~
1 billion per run
SLIDE 20 Next Next Gen Sequencing Gen Sequencing Technologies Technologies
Life Technologies SOLiD Roche GS-FLX Illumina HiSeq Life Technologies Ion Torrent
SLIDE 21
Next Gen Sequencing Library Next Gen Sequencing Library Preparation Preparation
SLIDE 22
Roche GS Roche GS FLX FLX
SLIDE 23 Workflow Workflow
Library Preparation emPCR Setup Sample Fragmentation emPCR Amplification Pyrosequencing Data Analysis
SLIDE 24
Pyrosequencing Pyrosequencing
SLIDE 25
emPCR emPCR
Emulsion PCR is a method of clonal amplification which allows for millions of unique PCRs to be performed at once through the generation of microreactors.
SLIDE 26 emPCR
The Water-in-Oil-Emulsion
SLIDE 27
Massively Parallel Sequencing Massively Parallel Sequencing
SLIDE 28 Data Analysis Data Analysis
T Base Flow A Base Flow C Base Flow G Base Flow
Raw Image Files
Image Processing Base calling Quality Filtering
SFF File
SLIDE 29 454 Platform Updates 454 Platform Updates
- 100bp reads, ~20Mbp / run
GS20
- 250bp reads ~100 Mbp / run (7.5 hrs)
GSFLX
- 400bp reads ~400 Mbp / run (10 hrs)
GSFLX Titanium
- 400bp reads ~400 Mbp / run (10 hrs)
GSFLX Titanium
- 700 bp reads ~700 Mbp/run (18 hrs)
GSFLX Titanium Plus
- 400 bp reads ~ 35Mbp/run (10 hrs)
GS Junior
SLIDE 30
Illumina Illumina HiSeq HiSeq
SLIDE 31 DNA (0.1-1.0 ug)
5’ 3’
G T C A T A G C A C A G T C A T C A C G T
Illumina Illumina Sequencing Technology Sequencing Technology
Robust Reversible Terminator Chemistry Foundation
Sample preparation Cluster growth
5’
G T C T C C T A G C G T A
1 2 3 7 8 9 4 5 6 Image acquisition Base calling
T G C T A C G A T …
Sequencing
SLIDE 32 Platform Updates Platform Updates
Solexa 1G
Illumina GA
- 75bp paired reads ~10Gbp / run (8 days)
Illumina GAII
- 75bp paired reads ~40Gbp / run (8 days)
Illumina GAIIx Illumina HiSeq 2000
- 100 bp paired reads ~200 Gbp/ run (10 days)
Illumina HiSeq 2000
- 100bp paired reads ~600Gbp / run (12 days)
Illumina HiSeq, v3 SBS
- 150 paired reads ~1.5 Gb/run (27 hrs)
MiSeq
Maximum yield / day 50,Gbp ~16x the human genome
SLIDE 33
Applied Applied Biosystems Biosystems SOLiD SOLiD
SLIDE 34
Sequencing by Ligation Sequencing by Ligation
SLIDE 35
Base Interrogations Base Interrogations
SLIDE 36 2 Base encoding 2 Base encoding
AT
SLIDE 37 emPCR and Enrichment emPCR and Enrichment
3’ Modification allows covalent bonding to the slide surface
SLIDE 38 Platform Updates Platform Updates
- 50bp Paired reads ~50Gbp / run
(12 days)
SOLiD 3
- 50bp Paired reads ~100Gbp / run
(12 days)
SOLiD 4
- 75bp Paired reads ~300Gbp / run
(14 days)
5500xl
Maximum yield / day 21,000,000,000bp 7x the human genome 3.5 hours of sequencing for a 1 fold coverage.....
SLIDE 39
Applied Applied Biosystems Biosystems: : Ion Torrent PGM Ion Torrent PGM
SLIDE 40 Ion Torrent Ion Torrent
- Ion Semiconductor Sequencing
- Detection of hydrogen ions during
the polymerization DNA
- Sequencing occurs in microwells
- Sequencing occurs in microwells
with ion sensors
- No modified nucleotides
- No optics
SLIDE 41 Ion Torrent Ion Torrent
– Nucleotides flow sequentially over Ion semiconductor chip – One sensor per well per sequencing reaction – Direct detection of natural DNA extension – Millions of sequencing reactions per chip – Fast cycle time, real time detection
dNTP
∆ pH ∆ Q
H+
– Fast cycle time, real time detection
Sensor Plate Silicon Substrate
Drain Source Bulk
To column receiver ∆ Q ∆ V
Sensing Layer
SLIDE 42 Ion Torrent: System Updates Ion Torrent: System Updates
- 100bp reads ~10 Mb/run (1.5 hrs)
314 Chip
- 100 bp reads ~100 Mbp / run (2 hrs)
- 200 bp reads ~200 Mbp/run (3 hrs)
316 Chip
- 200 bp reads ~1 Gbp / run (4.5 hrs)
318 Chip
SLIDE 43 Summary of NGS Platforms Summary of NGS Platforms
- Clonal amplification of sequencing template
– emPCR (454, SOLiD and Ion Torrent) – Bridge amplification (Illumina)
– 454 – Illumina – Illumina – Ion Torrent
– SOLiD – 2 base encoding
- Dramatic reduction in cost of sequencing
– GSFLX provides > 100x decrease in costs compared to Sanger Sequencing – HiSeq and SOLiD > 100x decrease in costs over GSFLX
SLIDE 44 Rapid Innovation Driving Cost Rapid Innovation Driving Cost Down Down
Evolution of NGS system output Cost per Human Genome
Throughput (GB) 120 300
300GB 3GB 6GB 20GB
20 40 60 80 100 2007 2008 2009 2010
SLIDE 45 NGS Library Preparation NGS Library Preparation
SLIDE 46
Library Preparation Library Preparation
SLIDE 47 Sample preparation Sample preparation
DNA Fragmentation Fragmentation mRNA cDNA Synthesis mechanical chemical Fragmentation Ligation of Amplification/ Sequencing Adaptors Library Fragment Size Selection cDNA Synthesis
SLIDE 48 Applications Applications
- DNA
- Whole Genome
- hybridization Capture
- amplicon
- amplicon
- ChIPseq
- RNA
- mRNA
- small RNA
SLIDE 49 Whole Genome Sequencing Whole Genome Sequencing
- assembly
- Reference Mapping
– SNVs, rearrangements
- Comparative genomics
- E. coli assembly from MiSeq Data
Illumina application notes
SLIDE 50 Targeted Re Targeted Re sequencing: sequencing: sequence capture sequence capture
targets via capture with
– Exome Capture
" "
– Custom Capture
sequences
SLIDE 51 Targeted Targeted Resequencing Resequencing: : Amplicons Amplicons
- Preparation of amplicons tagged
with sequencing adapters
– Well suited for 454 and bench top sequencers – Deep sequencing for detection of somatic mutations somatic mutations – 16S Sequencing for microbial diversity
SLIDE 52 RNA applications RNA applications
– Gene expression analysis – Transcriptome analysis
– Small RNA discovery – Small RNA discovery – Gene regulation
SLIDE 53