Ultra high throughput DNA sequencing technologies Keith Harshman - - PowerPoint PPT Presentation
Ultra high throughput DNA sequencing technologies Keith Harshman - - PowerPoint PPT Presentation
Ultra high throughput DNA sequencing technologies Keith Harshman DNA Array Facility Center for Integrative Genomics University of Lausanne Outline: 1. What UHTS is replacing: Sanger sequencing/CE 2. Current UHTS next generation
Outline: 1. What UHTS is replacing: Sanger sequencing/CE 2. Current UHTS “next generation” technologies: a. Illumina Genome Analyzer II (aka “Solexa”) b. Applied Biosystem’s SOLiD c. 454 3. Some next next generation technologies 4. Some next next next generation technologies
Human Genome Re-sequencing using the Sanger Method 5.3x coverage $2,000,000-$4,000,000
~15,000,000 plasmid preps 27,000,000 AB 3730 reads
Enter UHTS (following a brief performance by MPSS)
3730: ~1 x 106 bases/day (12 x 96 sample run/day; 900bp reads) Genome Analyzer II: ~ 2 x 109/run = ~670 x 106 bases/day (35bp reads)
Ultra high throughput/output:
=
25x 35bp reads ≠ 1x 900bp read
Illumina Genome Analyzer II
Sequencing Process
Fragment DNA Repair ends / Add A overhang Ligate adapters Select ligated DNA
Library prep (~ 6 hrs)
1
Automated Cluster Generation (~ 5 hrs)
2
Hybridize to flow cell Extend hybridized oligos Perform bridge amplification
1-8 samples
Sequencing (~ 48 to 72 hrs)
3
Perform sequencing Generate base calls
1-8 samples
Genomic DNA Library Prep
DNA fragments Blunting by Fill-in and exonuclease Phosphorylation Addition of A-overhang Ligation to adapters
Cluster generation: Cluster Station
- Aspirates DNA
samples into flow cell
- Automates the
formation of amplified clonal clusters from the DNA single molecules
DNA libraries Flow cell (clamped into place)
Flow cell
- Clonal
clusters are generated in a contained environment (need no clean rooms)
- Sequencing also
performed in the flow cell on the generated clusters Key to the simplified workflow
8 channels
Surface of flow cell coated with a lawn of oligo pairs
Cluster generation: Hybridize fragment & extend
> 50 M single molecules hybridize to the lawn of primers Bound molecules are then extended by polymerases
Adapter sequence 3’ extension
Cluster generation: Denature double-stranded DNA
Double-stranded molecule is denatured. Original template is washed away. Newly synthesized covalently attached to the flow cell surface.
discard
Newly synthesized strand Original template
Cluster generation: Covalently bound spatially separated single molecules
Single molecules bound to flow cell in a random pattern
Cluster generation: Bridge amplification
Single-strand flips
- ver to hybridize to
adjacent primers to form a bridge. Hybridized primer is extended by polymerases.
Cluster generation: Bridge amplification
double-stranded bridge is formed.
Cluster generation: Bridge amplification
Double-stranded bridge is denatured. Result: Two copies of covalently bound single- stranded templates.
Cluster generation: Bridge amplification
Single-strands flip over to hybridize to adjacent primers to form bridges. Hybridized primer is extended by polymerase.
Cluster generation: Bridge amplification
Bridge amplification cycle repeated till multiple bridges are formed
Cluster generation
dsDNA bridges denatured. Reverse strands cleaved and washed away…..
Cluster generation
… leaving a cluster with forward strands only.
Cluster generation
Free 3’ ends are blocked to prevent unwanted DNA priming.
Sequencing primer is hybridized to adapter sequence.
Sequencing
Sequencing primer
Add 4 Fl- NTP’s + Polymerase Incorporate d Fl-NTP is imaged Terminator and fluorescent dye are cleaved from the Fl-NTP
X 36 - 50
Hybridize sequencing primer
Genome Analyzer II Sequencing
Flow cell imaging
laser
Fluidics port Fluidics port Flow cell Prism
Genome Analyzer II imaging set up
OIL FLOWCELL PRISM
Obj. lens camera laser . . . . . .
Tile
50 tiles/column X 2 columns/channel X 8 channels/flow cell
50 MILLION CLUSTERS PER FLOW CELL
20 MICRONS 100 MICRONS
Genome Analyzer II Sequencing
Base Calling
1 2 3 7 8 9 4 5 6
T T T T T T T G T … T G C T A C G A T …
The identity of each base of a cluster is read off from sequential images
What comes out today:
– 36bp standard read length; enabled for 50-75bp – >50 million reads per 8-channel (lane) flowcell; >6.25 million reads per channel – >1.5GB per standard run; >3GB per paired-end run – 2 day standard and 4 day paired-end run – Raw read accuracy of >99.5% (36bp) – Consensus accuracy of >99.999% (20x depth of coverage)
What comes out at the end of 2008 (Ha!) :
– 36bp 75bp standard read length – 50million >130 million reads per flowcell; >6.25million>16 million reads per channel – >1.5GB 10GB per standard run; >3GB 20GB per paired-end run – 2 3.5 day standard and 4 7 day paired-end run – Raw read accuracy of >99.5% (36bp) – Consensus accuracy of >99.999% (20x depth of coverage) Plus improvements in data quality
What goes in:
DNA Fragments + Adapters + Sequencing Library
DNA fragment sources Applications
- Genomic DNA
- Genome and directed
SNP/mutation; genome structure re-arrangements; re-sequencing breakpoints; CNVs; methylation pattern
- Genome sequencing
de novo genome sequencing
- ChIP
products transcription factor binding sites; protein complex positioning; methylation patterns
- cDNA
mRNA transcript structure and differential expression; small RNA discovery & differential expression
- ???
????
454 and SOLiD sequencing template preparation
Fan et al., Nature Reviews Genetics 2006
Library preparation by Emulsion PCR
DNA to be sequenced Single-stranded PCR template Emulsion PCR Clonal sequencing template Sequencing Chambers Single DNA molecules + capture beads + PCR mix
SOLiD: 90bp template fragment size; 1um beads, 10-20,000 template copies/bead 454: 300-500bp template fragment size; 30um beads, “millions” template copies/bead
454/Roche
Sequencing-by-Synthesis – pyrosequencing (454)
ABI 3730xl: ~ 1 x106 bases per day (at 15 runs/day) 800 bases per read and 1250 reads per day Cost to sequence a human genome (2007): $4,000,000 454/Roche: ~ 100 x106 bases per day (at 1 run/day) 250 bases per read and 400,000 reads per run Cost to sequence a human genome (2007): $1,000,000 Illumina GA II/SOLiD ~ 1.5–3.0 x109 bases per run (1 run/3 days) 35 bases per read and 40-100 x106 reads per run Cost to sequence a human genome (2008): $100,000 (GA2) $60,000 (SOLiD)
Sequencing Technologies
The Next Next Generation Technologies
- Complete Genomics
(http://www.completegenomics.com): Sequencing of DNA Nano-balls (DNBs) using combinatorial Probe-Anchor Ligation (cPAL)
- Pacific Biosciences
(http://www.pacificbiosciences.com): Single Molecule Real Time DNA sequencing based on zero mode waveguides
Complete Genomics – Library Generation
Library construction Template Amplification
Complete Genomics – Sequencing “Complete Genomics says that by next spring it will be conducting complete genome scans for $5,000.”
- BioITWorld.com