Ultra high throughput DNA sequencing technologies Keith Harshman - PowerPoint PPT Presentation

Ultra high throughput DNA sequencing technologies Keith Harshman DNA Array Facility Center for Integrative Genomics University of Lausanne

Outline: 1. What UHTS is replacing: Sanger sequencing/CE 2. Current UHTS “next generation” technologies: a. Illumina Genome Analyzer II (aka “Solexa”) b. Applied Biosystem’s SOLiD c. 454 3. Some next next generation technologies 4. Some next next next generation technologies

Human Genome Re-sequencing using the Sanger Method 5.3x coverage $2,000,000-$4,000,000 27,000,000 AB 3730 reads ~15,000,000 plasmid preps

Enter UHTS (following a brief performance by MPSS)

Ultra high throughput/output: 3730: ~1 x 10 6 bases/day (12 x 96 sample run/day; 900bp reads) Genome Analyzer II: ~ 2 x 10 9 /run = ~670 x 10 6 bases/day (35bp reads)

= 25x 35bp reads ≠ 1x 900bp read

Illumina Genome Analyzer II

Sequencing Process 1 Library prep (~ 6 hrs) Fragment DNA Repair ends / Add A overhang Ligate adapters Select ligated DNA Automated Cluster Generation ( ~ 5 hrs) 2 Hybridize to flow cell Extend hybridized oligos 1-8 samples Perform bridge amplification 3 Sequencing (~ 48 to 72 hrs) Perform sequencing 1-8 samples Generate base calls

Genomic DNA Library Prep DNA fragments Blunting by Fill-in and exonuclease Phosphorylation Addition of A-overhang Ligation to adapters

Cluster generation: Cluster Station • Aspirates DNA samples into flow cell • Automates the formation of amplified clonal Flow cell clusters from the (clamped DNA single into place) molecules DNA libraries

Flow cell 8 channels Key to the simplified workflow Surface of flow • Clonal clusters are cell coated with a generated in a contained lawn of oligo pairs environment (need no clean rooms) • Sequencing also performed in the flow cell on the generated clusters

Cluster generation: Hybridize fragment & extend Adapter sequence > 50 M single molecules hybridize to the lawn of primers Bound molecules are then extended by polymerases 3’ extension

Cluster generation: Denature double-stranded DNA Newly Double-stranded synthesized molecule is strand Original denatured. template Original template is washed away. discard Newly synthesized covalently attached to the flow cell surface.

Cluster generation: Covalently bound spatially separated single molecules Single molecules bound to flow cell in a random pattern

Cluster generation: Bridge amplification Single-strand flips over to hybridize to adjacent primers to form a bridge. Hybridized primer is extended by polymerases.

Cluster generation: Bridge amplification � double-stranded bridge is formed.

Cluster generation: Bridge amplification Double-stranded bridge is denatured. Result: Two copies of covalently bound single- stranded templates.

Cluster generation: Bridge amplification Single-strands flip over to hybridize to adjacent primers to form bridges. Hybridized primer is extended by polymerase.

Cluster generation: Bridge amplification Bridge amplification cycle repeated till multiple bridges are formed

Cluster generation dsDNA bridges denatured. Reverse strands cleaved and washed away…..

Cluster generation … leaving a cluster with forward strands only.

Cluster generation Free 3’ ends are blocked to prevent unwanted DNA priming.

Sequencing Sequencing primer is Sequencing hybridized primer to adapter sequence.

Genome Analyzer II Sequencing Hybridize sequencing primer Terminator and Add 4 Fl- Incorporate fluorescent dye NTP’s + d Fl-NTP is are cleaved from Polymerase imaged the Fl-NTP X 36 - 50

Flow cell imaging Fluidics port Flow cell Prism laser Fluidics port

Genome Analyzer II imaging set up Tile . . . . camera . . Obj. lens FLOWCELL OIL 50 tiles/column PRISM X laser 2 columns/channel X 8 channels/flow cell

Genome Analyzer II Sequencing 50 MILLION CLUSTERS PER FLOW CELL 20 MICRONS 100 MICRONS

Base Calling T G C T A C G A T … 1 2 5 8 9 3 6 7 4 T T T T T T T G T … The identity of each base of a cluster is read off from sequential images

What comes out today: – 36bp standard read length; enabled for 50-75bp – >50 million reads per 8-channel (lane) flowcell; >6.25 million reads per channel – >1.5GB per standard run; >3GB per paired-end run – 2 day standard and 4 day paired-end run – Raw read accuracy of >99.5% (36bp) – Consensus accuracy of >99.999% (20x depth of coverage)

What comes out at the end of 2008 (Ha!) : – 36bp 75bp standard read length – 50million >130 million reads per flowcell; >6.25million>16 million reads per channel – >1.5GB 10GB per standard run; >3GB 20GB per paired-end run – 2 3.5 day standard and 4 7 day paired-end run – Raw read accuracy of >99.5% (36bp) – Consensus accuracy of >99.999% (20x depth of coverage) Plus improvements in data quality

What goes in: DNA Fragments + Adapters + Sequencing Library DNA fragment sources Applications • Genomic DNA -Genome and directed SNP/mutation; genome structure re-arrangements; re-sequencing breakpoints; CNVs; methylation pattern -Genome sequencing de novo genome sequencing • ChIP products transcription factor binding sites; protein complex positioning; methylation patterns • cDNA mRNA transcript structure and differential expression; small RNA discovery & differential expression • ??? ????

454 and SOLiD sequencing template preparation

Library preparation by Emulsion PCR Single DNA molecules + capture beads DNA to be sequenced Single-stranded PCR template + PCR mix Emulsion PCR Clonal sequencing template Sequencing Chambers Fan et al., Nature Reviews Genetics 2006 SOLiD: 90bp template fragment size; 1um beads, 10-20,000 template copies/bead 454: 300-500bp template fragment size; 30um beads, “millions” template copies/bead

454/Roche

Sequencing-by-Synthesis – pyrosequencing (454)

Sequencing Technologies ABI 3730xl: ~ 1 x10 6 bases per day (at 15 runs/day) 800 bases per read and 1250 reads per day Cost to sequence a human genome (2007): $4,000,000 454/Roche: ~ 100 x10 6 bases per day (at 1 run/day) 250 bases per read and 400,000 reads per run Cost to sequence a human genome (2007): $1,000,000 Illumina GA II/SOLiD ~ 1.5–3.0 x10 9 bases per run (1 run/3 days) 35 bases per read and 40-100 x10 6 reads per run Cost to sequence a human genome (2008): $100,000 (GA2) $60,000 (SOLiD)

The Next Next Generation Technologies • Complete Genomics (http://www.completegenomics.com): Sequencing of DNA Nano-balls (DNBs) using combinatorial Probe-Anchor Ligation (cPAL) • Pacific Biosciences (http://www.pacificbiosciences.com): Single Molecule Real Time DNA sequencing based on zero mode waveguides

Complete Genomics – Library Generation Library construction Template Amplification

Complete Genomics – Sequencing Sequencing surface Sequencing chemistry “Complete Genomics says that by next spring it will be conducting complete genome scans for $5,000.” -BioITWorld.com 6 January 2009

Pacific Biosciences – Sequencing vessel and method

Pacific Biosciences – Sequencing

Nanopore Sequencing: the Next Next Next Generation Sequencing Technology (?)

Ultra high throughput DNA sequencing technologies Keith Harshman - PowerPoint PPT Presentation

Ultra high throughput DNA sequencing technologies Keith Harshman DNA Array Facility Center for Integrative Genomics University of Lausanne Outline: 1. What UHTS is replacing: Sanger sequencing/CE 2. Current UHTS next generation

Sequencing technology and assembly Sanger sequencing Sanger sequencing with radioactivity

DNA D DNA Double bl Helix DNA stands for: DNA stands for: U d Under a Deoxyribose

Table of Contents Why DNA Computing? The Structure of DNA DNA Computing Operations on DNA

Take out your DNA model DNA and the Human Genome DNA Model How was your How was your model

Table of Contents Why DNA Computing? The Structure of DNA DNA Computing Operations on

Bioinformatics for High-Throughput Sequencing Misha Kapushesky St. Petersburg Russia 2010

Introduction to Bioinformatics Genome sequencing & assembly Genome sequencing & assembly

DNA Computing Information Processing with DNA Molecules Christian Jacob, 01/2002. Table of

Eastern Shores (GHOTES) DNA A Family Tree DNA Project Family Tree DNA Family Tree DNA or

Genomics Sequencing tech Sequencing tech: next generation What do we get from sequencing? How

ChemBioDraw Today & Tomorrow Mark L. Olson, PhD Vice-President, Software Development

A method for high throughput sequencing data analysis: application for mapping genome-wide

Next Generation Sequencing Technologies What is first generation? Sanger Sequencing DNA

Next Generation Sequencing Technologies What is first generation? Sanger Sequencing DNA

DNA IN OUR FOOD? EXTRACTION OF DNA FROM STRAWBERRIES (GETTING THE DNA OUT OF STRAWBERRIES) -OR

The Design of Autonomous DNA The Design of Autonomous DNA Nanomechanical Devices: Devices:

Recombinant DNA Regulations Simon Muchohi, PhD, MPH, CIH, CSP, CHMM, Director of Biological Safety,

Molecular Programming Luca Cardelli University of Oxford 2018-10-10 , ECSS Gothenburg

Mathematical Modeling of DNA Microarray Data: Discovery of Biological Mechanisms with Tensor

Your DNA, Your Say Global public views on sharing genomic data Anna Middleton Richard Milne 1

Lab Automation by WWT: Ansible Tower and Cisco DNA Center as a Platform Andrius Benokraitis Jeff

Introduction to Biostrings Paula Andrea Martinez, PhD. Data Scientist DataCamp Introduction to

10.1 Experiments showed that DNA is the genetic material 10.2 DNA and RNA are polymers of

Targeting the Genotype Request 9 th Annual International SADS Foundation Conference Susan P.

Sambuz

Useful Links

Newsletter

Mail Us

Ultra high throughput DNA sequencing technologies Keith Harshman - PowerPoint PPT Presentation

Ultra high throughput DNA sequencing technologies Keith Harshman DNA Array Facility Center for Integrative Genomics University of Lausanne Outline: 1. What UHTS is replacing: Sanger sequencing/CE 2. Current UHTS next generation

Sequencing technology and assembly Sanger sequencing Sanger sequencing with radioactivity

DNA D DNA Double bl Helix DNA stands for: DNA stands for: U d Under a Deoxyribose

Table of Contents Why DNA Computing? The Structure of DNA DNA Computing Operations on DNA

Take out your DNA model DNA and the Human Genome DNA Model How was your How was your model

Table of Contents Why DNA Computing? The Structure of DNA DNA Computing Operations on

Bioinformatics for High-Throughput Sequencing Misha Kapushesky St. Petersburg Russia 2010

Introduction to Bioinformatics Genome sequencing &amp; assembly Genome sequencing &amp; assembly

DNA Computing Information Processing with DNA Molecules Christian Jacob, 01/2002. Table of

Eastern Shores (GHOTES) DNA A Family Tree DNA Project Family Tree DNA Family Tree DNA or

Genomics Sequencing tech Sequencing tech: next generation What do we get from sequencing? How

ChemBioDraw Today &amp; Tomorrow Mark L. Olson, PhD Vice-President, Software Development

A method for high throughput sequencing data analysis: application for mapping genome-wide

Next Generation Sequencing Technologies What is first generation? Sanger Sequencing DNA

Next Generation Sequencing Technologies What is first generation? Sanger Sequencing DNA

DNA IN OUR FOOD? EXTRACTION OF DNA FROM STRAWBERRIES (GETTING THE DNA OUT OF STRAWBERRIES) -OR

The Design of Autonomous DNA The Design of Autonomous DNA Nanomechanical Devices: Devices:

Recombinant DNA Regulations Simon Muchohi, PhD, MPH, CIH, CSP, CHMM, Director of Biological Safety,

Molecular Programming Luca Cardelli University of Oxford 2018-10-10 , ECSS Gothenburg

Mathematical Modeling of DNA Microarray Data: Discovery of Biological Mechanisms with Tensor

Your DNA, Your Say Global public views on sharing genomic data Anna Middleton Richard Milne 1

Lab Automation by WWT: Ansible Tower and Cisco DNA Center as a Platform Andrius Benokraitis Jeff

Introduction to Biostrings Paula Andrea Martinez, PhD. Data Scientist DataCamp Introduction to

10.1 Experiments showed that DNA is the genetic material 10.2 DNA and RNA are polymers of

Targeting the Genotype Request 9 th Annual International SADS Foundation Conference Susan P.

Sambuz

Useful Links

Newsletter

Mail Us

Introduction to Bioinformatics Genome sequencing & assembly Genome sequencing & assembly

ChemBioDraw Today & Tomorrow Mark L. Olson, PhD Vice-President, Software Development