Lectures 20, 21: Single-cell Sequencing and Assembly - PowerPoint PPT Presentation

Lectures ¡20, ¡21: ¡Single-‑cell ¡ Sequencing ¡and ¡Assembly ¡ Spring ¡2017 ¡ April ¡20,25, ¡2017 ¡ 1

SINGLE-CELL SEQUENCING AND ASSEMBLY 2

Single-cell Sequencing  Motivation: ◦ Vast majority of environmental bacteria are unculturable outside of their natural habitat. ◦ Cell culture may distort genomic information, e.g. cancerous cells. Metagenomics:  ◦ Superimposed sequencing of mixed cells of different species in one pool. Single-cell genomics : sequencing of one DNA molecule from  one cell. 3

Single Cell Genome Sequencing Start with a single copy of genome. Amplify (copy only) the genome using multiple displacement amplification (MDA) technique. F.B. Dean ,et al., PNAS (2002) 99(8): 5261-6 Fragment the amplified DNA and sequence reads at both ends. 4

Multiple Displacement Amplification Video https://www.youtube.com/watch?v=CaFq9cnfTZI 5

Sequencing Coverage Normal multicell vs. single cell Green regions are blackout Number of reads: ~28 million, read length: 100 bp, genome size: 4.6 Mbp, coverage: ~600x H. Chitsaz, et al. , Nature Biotech 29(10): 915 – 921, Oct 2011 6

Distribution of Coverage A cutoff threshold will eliminate about 25% of valid data in the single cell case, whereas it eliminates noise in the normal multicell case. H. Chitsaz, et al., Nature Biotech 29(10): 915 – 921, Oct 2011 7

Rescuing Low Coverage Contigs A quick example We remove the lowest coverage contig, in blue. 8

Rescuing Low Coverage Contigs After error removal Merged Contig. Coverage = 9 9

Velvet vs. Velvet-SC Velvet assembly algorithm Our assembly algorithm 1: Build a roadmap rdmap from R by indexing all k -mers. (a) EULER-SR error correction 2: Build a de Bruijn pregraph pg from rdmap. 3: Clip tips of pg. 4: Build a graph from pg by threading R . (b) Velvet-SC assembly algorithm 5: Condense graph by merging 1-in 1-out vertices. 1-7: Same as Velvet assembly algorithm. 6: Clip tips of graph . 8: for i =2 to cutoff do 7: Correct graph by the Tour Bus algorithm. i 9: Remove vertices with average coverage < cutoff 8: Remove vertices with average coverage < 10: Clip tips of graph . 9: Clip tips of graph . 11: Correct graph by the Tour Bus algorithm. 10: Correct graph by the Tour Bus algorithm. 12: Resolve repeats using read pairing. 11: Resolve repeats using read pairing. 13: Condense graph by merging 1-in1-out vertices. 12: Condense graph by merging 1-in 1-out vertices. 14: end for 13: Return vertices of graph as contigs. 15: Return vertices of graph as contigs. Daniel Zerbino and Ewan Birney, Genome Hamidreza Chitsaz, et al. , Nature Biotech Research 18: 821-829, 2008 29(10): 915 – 921, Oct 2011 10

E. coli Assembly Results Assembler # NG50 Known Complete contigs (bp) genes genes EULER-SR 1344 26662 4324 3178 Edena 1592 3919 2425 SOAPdenovo 1240 18468 3021 Velvet 428 22648 3055 E+V-SC 501 32051 3753 NG50 = the contig length at which longer contigs represent half of the total genome length. H. Chitsaz, et al., Nature Biotech 29(10): 915 – 921, Oct 2011 11

New Genome Deltaproteobacteria single cell assembly results Assembler # of N50 (bp) contigs Velvet 1856 11531 E+V-SC 823 30293 N50 = the contig length at which longer contigs represent half of the total assembly length. 12

October 2011 13

Single-cell Assemblers  E+Velvet-SC ◦ H. Chitsaz, et al., Nature Biotech 29(10): 915 – 921, Oct 2011.  SPAdes ◦ Anton Bankevich, et al., Journal of Computational Biology 19(5): 455 – 477, 2012.  IDBA-UD ◦ Y. Peng, et al., Bioinformatics 28(11): 1420 – 8, 2012. 14

Coverage Bias Not Sequence Specific 15

Combining Multiple MDAs  Combining DNA from multiple identical single cells, before or after amplification, reduces non-uniformity.  In practice, combining MDA from 6-12 identical cells gives very high quality assemblies.  It is hard to pick identical cells before sequencing. Chicken and egg problem. 16

Synergistic Co-assembly  Our solution: co-assembly ◦ N. Movahedi, et al., IEEE BiBM 2012. ◦ M. Embree, et al., The ISME J. 2013. ◦ N. Movahedi, et al., BMC Genomics, under review.  Another application of co-assembly: variation detection ◦ Iqbal, Z., et al. De novo assembly and genotyping of variants using colored de Bruijn graphs , Nat Genetics, 44, 226 – 232, 2012. 17

SYNERGISTIC CO- ASSEMBLY 18

HyDA Single Cell Co-Assembler  Isolate a number of single cells that are likely to be of the same species. But don’t worry, if they are not, our algorithm will tell you in the end.  Amplify and sequence each of them individually .  Assign a unique color to each read dataset.  Build a colored de Bruijn graph from the colored datasets. Iqbal, et al., Nature Genetics 44: 226–232, 2012. J. Simpson, Genome Informatics 2011.  Iteratively remove errors, condense, and finally output contigs. 19

Small Toy Example Shred reads into k-mers (k = 3) Green Read Red Read G G A C T A A A G A C C A A A T G G A G A C G A C A C C A C T C C A C T A C A A T A A A A A A A A A A T GGA GAC ACT CTA TAA AAA GAC ACC CCA CAA AAA AAT (1x) ‏ (1x) ‏ (1x) ‏ (1x) ‏ (1x) ‏ (1x) ‏ (1x) ‏ (1x) ‏ (1x) ‏ (1x) ‏ (1x) ‏ (1x) ‏ P. Pevzner, J Biomol Struct Dyn (1989) 7:63–73 R. Idury, M. Waterman , J Comput Biol (1995) 2:291–306 20

Small Toy Example Merge vertices labeled by identical k-mers Green Read: GGA GAC ACT CTA TAA AAA (1x) ‏ (1x) ‏ (1x) ‏ (1x) ‏ (1x) ‏ (1x) ‏ Red Read: GAC ACC CCA CAA AAA AAT (1x) ‏ (1x) ‏ (1x) ‏ (1x) ‏ (1x) ‏ (1x) ‏ Resulting Graph: GGA GAC ACT CTA TAA AAA AAT (1x) ‏ (1x) (1x) ‏ (1x) ‏ (1x) ‏ (1x) (1x) ‏ (1x) ‏ (1x) ‏ ACC CCA CAA (1x) ‏ (1x) ‏ (1x) ‏ 21

Co-assembly  Condensation is done solely based on graph structure, ignoring colorings.  Maximum colored coverage is used to determine erroneous sequences. 22

Relationships between Co-assembled Sequences Exclusive portion: Exclusivity ratio: Assembly size: 23

Alkane-Degrading Bacterial Community  An alkane-degrading community enriched from sediment from a hydrocarbon-contaminated ditch in Bremen, Germany.  Consists of 3 species: Anaerolinea (2 cells), Smithella (6 cells), and Syntrophus (2 cells), that have sophisticated metabolic interactions. They cannot be cultured.  Finished reference genome for a member of Anaerolinea and a member of Syntrophus is available . In collaboration with Karsten Zengler and Mallory Embree at UCSD. M. Embree, et al., The ISME J., 2013 N. Movahedi, et al. , BMC Genomics , under review 24

Co-assembly Results QUAST results, comparison with the state-of-the-art HyDA ¡ SPAdes ¡ Total (bp) ¡ N50 ¡ Total (bp) ¡ N50 ¡ 1,265,548 ¡ 3,782 ¡ 869,586 ¡ 3,128 ¡ K05 ¡ Syntrophus ¡ 465,091 ¡ 1,928 ¡ 390,923 ¡ 4,234 ¡ C04 ¡ 1,590,259 ¡ 6,977 ¡ 1,415,399 ¡ 10,475 ¡ MEL13 ¡ 1,945,701 ¡ 5,952 ¡ 1,960,722 ¡ 11,372 ¡ MEK03 ¡ 1,569,709 ¡ 5,887 ¡ 1,514,813 ¡ 8,861 ¡ MEB10 ¡ Smithella ¡ 840,236 ¡ 7,295 ¡ 653,866 ¡ 3,834 ¡ K19 ¡ 720,188 ¡ 5,239 ¡ 618,500 ¡ 9,332 ¡ K04 ¡ 1,323,536 ¡ 6,088 ¡ 982,263 ¡ 5,366 ¡ F16 ¡ 1,352,341 ¡ 8,201 ¡ 1,698,195 ¡ 5,944 ¡ F02 ¡ Anaerolinea ¡ 260,386 ¡ 850 ¡ 169,413 ¡ 1,187 ¡ A17 ¡ 25

Co-assembly Results RAST functional elements ¡ HyDA ¡ SPAdes ¡ subsystem ¡ subsystem ¡ ¡ Coding Coding sequence ¡ sequence ¡ 212 ¡ 8 ¡ 146 ¡ 9 ¡ A17 ¡ Anaerolinea ¡ 1,283 ¡ 122 ¡ 1,653 ¡ 153 ¡ F02 ¡ 1,197 ¡ 117 ¡ 899 ¡ 91 ¡ F16 ¡ 659 ¡ 89 ¡ 559 ¡ 75 ¡ K04 ¡ 757 ¡ 82 ¡ 581 ¡ 54 ¡ K19 ¡ 1,491 ¡ 151 ¡ 1,504 ¡ 156 ¡ Smithella ¡ MEB10 ¡ 1,856 ¡ 180 ¡ 1,955 ¡ 200 ¡ MEK03 ¡ 1,535 ¡ 165 ¡ 1,435 ¡ 154 ¡ MEL13 ¡ 416 ¡ 48 ¡ 375 ¡ 49 ¡ C04 ¡ Syntrophus ¡ 1,216 ¡ 121 ¡ 873 ¡ 68 ¡ K05 ¡ 26

Co-assembly Results Exclusivity ratios (%) ¡ Anaerolinea ¡ Smithella ¡ Syntrophus ¡ A17 ¡ F02 ¡ F16 ¡ K04 ¡ K19 ¡ MEB10 ¡ MEK03 ¡ MEL13 ¡ C04 ¡ K05 ¡ Ana. ¡ 0 ¡ 24 ¡ 87 ¡ 95 ¡ 96 ¡ 80 ¡ 82 ¡ 86 ¡ 22 ¡ 19 ¡ A17 ¡ 77 ¡ 0 ¡ 96 ¡ 98 ¡ 99 ¡ 95 ¡ 95 ¡ 96 ¡ 74 ¡ 73 ¡ F02 ¡ Smi. ¡ 96 ¡ 96 ¡ 0 ¡ 73 ¡ 73 ¡ 37 ¡ 22 ¡ 38 ¡ 96 ¡ 55 ¡ F16 ¡ 97 ¡ 97 ¡ 49 ¡ 0 ¡ 67 ¡ 42 ¡ 25 ¡ 45 ¡ 97 ¡ 57 ¡ K04 ¡ 98 ¡ 98 ¡ 54 ¡ 68 ¡ 0 ¡ 35 ¡ 32 ¡ 32 ¡ 98 ¡ 58 ¡ K19 ¡ 96 ¡ 96 ¡ 48 ¡ 74 ¡ 69 ¡ 0 ¡ 24 ¡ 39 ¡ 95 ¡ 56 ¡ MEB10 ¡ 97 ¡ 97 ¡ 49 ¡ 73 ¡ 74 ¡ 38 ¡ 0 ¡ 37 ¡ 96 ¡ 61 ¡ MEK03 ¡ 97 ¡ 97 ¡ 50 ¡ 76 ¡ 68 ¡ 39 ¡ 22 ¡ 0 ¡ 97 ¡ 59 ¡ MEL13 ¡ Syn. ¡ 44 ¡ 39 ¡ 89 ¡ 96 ¡ 97 ¡ 85 ¡ 86 ¡ 90 ¡ 0 ¡ 64 ¡ C04 ¡ 77 ¡ 75 ¡ 54 ¡ 76 ¡ 75 ¡ 45 ¡ 41 ¡ 49 ¡ 73 ¡ 0 ¡ K05 ¡ 27

HyDA Outline Index all distinct k-mers, storing their multiplicities and 1. connections, in a hash table. Each hash node is a self-balancing tree. Construct the condensed de Bruijn graph. 2. Iteratively remove low coverage vertices and recondense. 3. Under development and future work: resolve repeats using long 4. reads and mate pairs. 28

Lectures 20, 21: Single-cell Sequencing and Assembly - PowerPoint PPT Presentation

Lectures 20, 21: Single-cell Sequencing and Assembly Spring 2017 April 20,25, 2017 1 SINGLE-CELL SEQUENCING AND ASSEMBLY 2 Single-cell Sequencing Motivation: Vast majority

Sequencing technology and assembly Sanger sequencing Sanger sequencing with radioactivity

Introduction to Single Cell RNA Sequencing Sarah Boswell Director of the Single Cell Core,

Introduction to Single Cell RNA Sequencing Sarah Boswell Director of the Single Cell Core,

Introduction to Bioinformatics Genome sequencing & assembly Genome sequencing & assembly

Genomics Sequencing tech Sequencing tech: next generation What do we get from sequencing? How

Bacteria Without a Cell Wall L-forms Pros & Cons of Cell Wall Cell membrane Cell wall DNA

Introduction to single cell RNA sequencing CRUK Bioinformatics Summer School 2018 Mike

Single-cell transcriptomics (scRNA-seq) Eukaryotic Single Cell Genomics facility Applications for

Cell Communication and Cell Signaling Why is cell signaling important? Why is cell signaling

Lectures 7, 8: DNA Sequencing History and Methods Spring 2020 February 20,27, 2020 Introduction

Using Single-Cell Transcriptome Sequencing to Infer Olfactory Stem Cell Fate Trajectories

Does God play dice with the cell? Does God play dice with the cell? Does God play dice with the

Single-cell RNA-sequencing Ximena Ibarra-Soria CRUK Cambridge Institute RNA-Sequence Analysis

Single cell RNA sequencing sa Bjrklund

Next Next Generation Sequencing: an overview of Generation Sequencing: an overview of

Apicomplexan Genome Sequencing in Sanger Arnab Pain, The Pathogen Sequencing Unit (PSU) 2 nd

T A T EX Session 7: L EX and XL P . S. Langeslag 29 November 2018 foo(?=bar) foo(?!bar)

1 Ray Tracing History Ray Tracing History From SIGGRAPH 18 Outline in Code Image Raytrace

Base of Tongue/ Head and Neck 2019 2019-2020 NAACCR W EBINAR SERIES 2 Q&A Please submit

Towards Scalable Multimedia Analytics Bjrn r Jnsson data sys group Computer Science

Introduc)on to single-cell genome assembly Kasia (Katarzyna)

s r rs q

Groups and Sites German paticipation in the following groups BMW J ulich, Wuppertal CLS

Fast Anisotropic Smoothing of Multi-Valued Images using Curvature-Preserving PDEs David

Lectures 20, 21: Single-cell Sequencing and Assembly - PowerPoint PPT Presentation

Lectures 20, 21: Single-cell Sequencing and Assembly Spring 2017 April 20,25, 2017 1 SINGLE-CELL SEQUENCING AND ASSEMBLY 2 Single-cell Sequencing Motivation: Vast majority

Sequencing technology and assembly Sanger sequencing Sanger sequencing with radioactivity

Introduction to Single Cell RNA Sequencing Sarah Boswell Director of the Single Cell Core,

Introduction to Single Cell RNA Sequencing Sarah Boswell Director of the Single Cell Core,

Introduction to Bioinformatics Genome sequencing &amp; assembly Genome sequencing &amp; assembly

Genomics Sequencing tech Sequencing tech: next generation What do we get from sequencing? How

Bacteria Without a Cell Wall L-forms Pros &amp; Cons of Cell Wall Cell membrane Cell wall DNA

Introduction to single cell RNA sequencing CRUK Bioinformatics Summer School 2018 Mike

Single-cell transcriptomics (scRNA-seq) Eukaryotic Single Cell Genomics facility Applications for

Cell Communication and Cell Signaling Why is cell signaling important? Why is cell signaling

Lectures 7, 8: DNA Sequencing History and Methods Spring 2020 February 20,27, 2020 Introduction

Using Single-Cell Transcriptome Sequencing to Infer Olfactory Stem Cell Fate Trajectories

Does God play dice with the cell? Does God play dice with the cell? Does God play dice with the

Single-cell RNA-sequencing Ximena Ibarra-Soria CRUK Cambridge Institute RNA-Sequence Analysis

Single cell RNA sequencing sa Bjrklund

Next Next Generation Sequencing: an overview of Generation Sequencing: an overview of

Apicomplexan Genome Sequencing in Sanger Arnab Pain, The Pathogen Sequencing Unit (PSU) 2 nd

T A T EX Session 7: L EX and XL P . S. Langeslag 29 November 2018 foo(?=bar) foo(?!bar)

1 Ray Tracing History Ray Tracing History From SIGGRAPH 18 Outline in Code Image Raytrace

Base of Tongue/ Head and Neck 2019 2019-2020 NAACCR W EBINAR SERIES 2 Q&amp;A Please submit

Towards Scalable Multimedia Analytics Bjrn r Jnsson data sys group Computer Science

Introduc)on to single-cell genome assembly Kasia (Katarzyna)

s r rs q

Groups and Sites German paticipation in the following groups BMW J ulich, Wuppertal CLS

Fast Anisotropic Smoothing of Multi-Valued Images using Curvature-Preserving PDEs David

Introduction to Bioinformatics Genome sequencing & assembly Genome sequencing & assembly

Bacteria Without a Cell Wall L-forms Pros & Cons of Cell Wall Cell membrane Cell wall DNA

Base of Tongue/ Head and Neck 2019 2019-2020 NAACCR W EBINAR SERIES 2 Q&A Please submit