Single-cell RNA-sequencing Ximena Ibarra-Soria CRUK Cambridge - PowerPoint PPT Presentation

Single-cell RNA-sequencing Ximena Ibarra-Soria CRUK Cambridge Institute RNA-Sequence Analysis Course - EMBL EBI 12 April 2019 Based on materials by Aaron Lun

Why use single-cell RNA-seq It allows us to chracterise heterogeneity in the gene expression profile of a population. The cell is the basic unit of life. At the cell-level we can:

Why use single-cell RNA-seq It allows us to chracterise heterogeneity in the gene expression profile of a population. The cell is the basic unit of life. At the cell-level we can: ◮ Define cell identities (e.g. cell-types or subtypes). doi.org/10.1038/nri2707 doi.org/10.15252/msb.20145549

Why use single-cell RNA-seq It allows us to chracterise heterogeneity in the gene expression profile of a population. The cell is the basic unit of life. At the cell-level we can: ◮ Define cell identities (e.g. cell-types or subtypes). ◮ Observe cell states and behaviour (e.g. cell cycle, metabolism, stress). doi.org/10.1038/nri2707 doi.org/10.15252/msb.20145549

Why use single-cell RNA-seq It allows us to chracterise heterogeneity in the gene expression profile of a population. The cell is the basic unit of life. At the cell-level we can: ◮ Define cell identities (e.g. cell-types or subtypes). ◮ Observe cell states and behaviour (e.g. cell cycle, metabolism, stress). ◮ Study dynamic processes (e.g. differentiation, activation). doi.org/10.1038/nri2707 doi.org/10.15252/msb.20145549

Why use single-cell RNA-seq It allows us to chracterise heterogeneity in the gene expression profile of a population. The cell is the basic unit of life. At the cell-level we can: ◮ Define cell identities (e.g. cell-types or subtypes). ◮ Observe cell states and behaviour (e.g. cell cycle, metabolism, stress). ◮ Study dynamic processes (e.g. differentiation, activation). ◮ Study noise in transcriptional regulation. doi.org/10.1038/nri2707 doi.org/10.15252/msb.20145549

Why use single-cell RNA-seq RNA-seq allows quantification of the whole transcriptome. ◮ FISH: small number of transcripts. ◮ seqFISH+: 10,000 genes. ◮ FACS: small number of proteins. ◮ Mass cytometry: ~40 proteins.

A typical single-cell experiment Tissue Dissociated Physical separation Lysis single cells ACTG CCTG GCTA 3'-TTTTTTTTT-X-5' T A C AAAAAAAA G T G T G C C A C Computational ACTG GCTA CCTG G T ACTG GCTA C A CCTG C T G C analysis G T C A Pooled sequencing Cell barcoding Reverse transcription ◮ Dissociation can be easy or hard (blood vs muscle). ◮ Many separation methods (plate-based, droplets). ◮ Different protocols for RT and cDNA generation (full-length, 5’/3’ biased)

Throughput of scRNA-seq protocols scRNA-seq protocols have increased hugely in throughput. ◮ Cell separation using FACS or microfluidic devices. ◮ Automation of RT and cDNA generation. a Manual Multiplexing Integrated fluidic Liquid-handling Nanodroplets Picowells In situ barcoding circuits robotics Tang et al. 2009 18 Islam et al. 2011 24 Brennecke et al. 2013 64 Jaitin et al. 2014 33 Klein et al. 2015 34 Bose et al. 2015 43 Cao et al. 2017 51 Macosko et al. 2015 40 Rosenberg et al. 2017 52 b 10x Genomics SPLiT-seq 1,000,000 Drop-seq sci-RNA-seq 100,000 MARS-seq CytoSeq inDrop Single cells in study DroNC-seq 10,000 Seq-Well High-throughput STRT-seq CEL-seq Fluidigm C1 1,000 sequencing of RNA from single cells 100 10 SMART-seq2 SMART-seq 1 2009 2010 2011 2012 2013 2014 2015 2016 2017 Study publication date Svensson et al. , Nature Protocols, 2018 (doi.org/10.1038/nprot.2017.149)

scRNA-seq protocols Plate-based Droplet-based ◮ Hundreds to a few ◮ Tens to hundreds of thousand cells. thousands cells. ◮ High number of genes ◮ Lower number of genes detected. detected. ◮ High capture efficiency. ◮ Variable capture efficiency. ◮ Full-length transcripts. ◮ 3’/5’-biased. ◮ UMIs optional. ◮ UMIs. ◮ Compatible with spike-ins. ◮ No spike-ins.

Library prep ◮ Most protocols use polyA selection. ◮ cDNA is amplified by PCR. ◮ Introduction of strong biases. ◮ Alleviated by the use of U nique M olecular I dentifiers (UMIs). After RT After amplification Transcript (RC) 3'-ACATCGATCGC...TTTT-GGAT-AACGT-5' Constant 3'-ACATCGATCGC...TTTT-GGAT-AACGT-5' 3'-ACATCGATCGC...TTTT-GGAT-AACGT-5' 3'-ACATCGATCGC...TTTT-GGAT-AACGT-5' 3'-ACATCGATCGC...TTTT-GGAT-AACGT-5' 3'-ACATCGATCGC...TTTT-GGAT-AACGT-5' UMI 3'-CGACGGTTACG...TTTT-GCTT-AACGT-5' 3'-CGACGGTTACG...TTTT-GCTT-AACGT-5' 3'-TGAGCATCACTA...TTTT-AGTA-AACGT-5' 3'-TGAGCATCACTA...TTTT-AGTA-AACGT-5' 3'-TGAGCATCACTA...TTTT-AGTA-AACGT-5' 3'-TGAGCATCACTA...TTTT-AGTA-AACGT-5' After fragmentation After sequencing 3'-CAGTCGTACG...TTTT-GGAT-AACGT-5' Read 2 Read 1 3'-CGAGGGCGGT...TTTT-GGAT-AACGT-5' GTCAGCATGC TAGG 3'-AGCGTAGGCT...TTTT-GGAT-AACGT-5' GCTCCCGCCA TAGG 3'-CAGGCTGACG...TTTT-GGAT-AACGT-5' TCGCATCCGA TAGG 3'-GGATAGCTAG...TTTT-GGAT-AACGT-5' GACCGACTGC TAGG CCTATCGATC TAGG 3'-CACGGAAAAT...TTTT-GCTT-AACGT-5' GAGCCTTTTA TTCG GTCGTCGACT ATGA 3'-CAGCAGCTGA...TTTT-AGTA-AACGT-5' GGCCCCTCCT ATGA 3'-CCGGGGAGGA...TTTT-AGTA-AACGT-5' GAAAATACTC ATGA 3'-CTTTTATGAG...TTTT-AGTA-AACGT-5' Different fragmentation site per amplicon

Cell barcoding Allows multiplexing to sequence many libraries in the same lane. Different strategies: 1. Cell barcode in the PCR primer. ◮ Incorporated during library prep. ◮ Plate-based methods only (different barcode per well). 2. Cell barcode in the oligo-dT primer. Cell barcode (constant within bead) Bead CGACTA-NNNN-TTTTTTTT-3' UMI (variable within bead) Di fferent cell barcode Bead GTCAAA-NNNN-TTTTTTTT-3' One bead loaded per droplet, as well as ≤ 1 cell (hopefully)

scRNA-seq data In its rawest form, FASTQ files after Illumina sequencing. 1. Align reads to reference genome. ◮ Many good and fast aligners (e.g. subread, STAR). 2. Count number of reads mapped to each gene (e.g. HTSeq, featureCounts). This produces a count matrix with one count per gene per cell.

scRNA-seq data In its rawest form, FASTQ files after Illumina sequencing. 1. Align reads to reference genome. ◮ Many good and fast aligners (e.g. subread, STAR). 2. Count number of reads mapped to each gene (e.g. HTSeq, featureCounts). This produces a count matrix with one count per gene per cell. ◮ If UMIs are used, reads with the same UMI are collapsed to a single count.

scRNA-seq data In its rawest form, FASTQ files after Illumina sequencing. 1. Align reads to reference genome. ◮ Many good and fast aligners (e.g. subread, STAR). 2. Count number of reads mapped to each gene (e.g. HTSeq, featureCounts). This produces a count matrix with one count per gene per cell. ◮ If UMIs are used, reads with the same UMI are collapsed to a single count. ◮ Data generated with the 10X platform can be processed with CellRanger .

scRNA-seq data A typical scRNA-seq data count matrix ◮ Lots of zeros (both dropouts and lack of expression). ~100 - 1000 cells ~10000-40000 genes

scRNA-seq data ◮ Lots of zeros (both dropouts and lack of expression). 5000 pg 500 pg a b 10 7 10 7 (5,000 pg, technical replicate 2 ) (500 pg, technical replicate 2 ) 10 5 10 5 Normalized read count Normalized read count 10 3 10 3 10 10 0 0 10 3 10 5 10 7 10 3 10 5 10 7 0 10 0 10 Normalized read count Normalized read count (5,000 pg, technical replicate 1 ) (500 pg, technical replicate 1 ) 50 pg 10 pg c d 10 7 10 7 (50 pg, technical replicate 2 ) (10 pg, technical replicate 2 ) 10 5 10 5 Normalized read count Normalized read count 10 3 10 3 10 10 0 0 10 3 10 5 10 7 10 3 10 5 10 7 0 10 0 10 Normalized read count Normalized read count (50 pg, technical replicate 1 ) (10 pg, technical replicate 1 ) Brennecke et al., Nat Methods, 2013

scRNA-seq data analysis Aim : to extract real biology from data with technical noise 1. Quality control. 2. Normalisation of cell-specific biases. 3. Batch correction. 4. Modelling technical noise. 5. Dimensionality reduction and visualisation. 6. Clustering. . . . followed by higher-level analyses and interpretation.

Quality control Removal of low-quality cells arising by: ◮ Insufficient sequencing. ◮ Failed reverse transcription. ◮ Damaged cells during dissociation.

Quality control We use several metrics to identify low-quality samples: ◮ Total number of reads per cell ( low ). ◮ Total number of genes detected ( low ). ◮ Percentage of reads mapped to mitochondrial genes ( high ). ◮ Percentage of reads mapped to spike-in transcripts ( high ). Coverage Coverage Coverage Damage Non-mito Non-mito Non-mito Mito Mito Mito Extreme damage Spike-in Spike-in Spike-in

Quality control QC metrics. 7 15000 6 total_features_by_counts log10_total_counts 10000 5 5000 4 3 0 2383 2384 2677 2739 2383 2384 2677 2739 sample sample Data from Messmer et al., Cell Reports (2019).

Quality control QC metrics. 100 20 75 15 pct_counts_ERCC pct_counts_Mt 50 10 25 5 0 0 2383 2384 2677 2739 2383 2384 2677 2739 sample sample Data from Messmer et al., Cell Reports (2019).

Quality control How to define low-quality ? 1. Define fixed thresholds, e.g., at least 100,000 counts per cell. ◮ simple, easy to interpret. ◮ hard to generalize across data sets.

Single-cell RNA-sequencing Ximena Ibarra-Soria CRUK Cambridge - PowerPoint PPT Presentation

Single-cell RNA-sequencing Ximena Ibarra-Soria CRUK Cambridge Institute RNA-Sequence Analysis Course - EMBL EBI 12 April 2019 Based on materials by Aaron Lun Why use single-cell RNA-seq It allows us to chracterise heterogeneity in the gene

Introduction to Single Cell RNA Sequencing Sarah Boswell Director of the Single Cell Core,

Introduction to Single Cell RNA Sequencing Sarah Boswell Director of the Single Cell Core,

RNA-seq basics: From reads to differential expression COMBINE RNA-seq Workshop RNA sequencing

RNA World Hypothesis and RNA folding By Lixin Dai October 16, 2002 Outline: RNA World

Prediction of RNA-RNA Interaction slides by Mathias M ohl and Rolf Backofen ohl M.M c

Lectures 20, 21: Single-cell Sequencing and Assembly Spring

Introduction to single cell RNA sequencing CRUK Bioinformatics Summer School 2018 Mike

Sequencing technology and assembly Sanger sequencing Sanger sequencing with radioactivity

Bacteria Without a Cell Wall L-forms Pros & Cons of Cell Wall Cell membrane Cell wall DNA

What is single-cell RNA-Seq, and why is it useful? S IN GLE-CELL RN A-S EQ W ORK F LOW S IN R

PROTEIN SYNTHESIS RNA (ribonucleic acid) 3 types RNA DIFFERENCES 1. messenger RNA (mRNA) DNA

PROTEIN SYNTHESIS RNA (ribonucleic acid) 3 types RNA DIFFERENCES 1. messenger RNA (mRNA)

Single cell RNA sequencing sa Bjrklund

Genomics Sequencing tech Sequencing tech: next generation What do we get from sequencing? How

RNA sequencing with the MinION at Genoscope Jean-Marc Aury jmaury@genoscope.cns.fr @J_M_Aury

DNA AND RNA ATI TEAS SCIENCE DNA & RNA Questions related to DNA and RNA cover topics

Nex ext gen ener erat ion seque uenc ncing ng t o t o det et er erm ine HI HI V-1 1

CHANGE: children in the household the Safety Plan protects Mr. Safety will weigh and record Baby

Open Access and Beyond: Policy and Best Practices NFAIS Open Access and Beyond Conference

Patie ient Po Portal al & Mobile Apps Enhancing patient-provider communication By :

The applicability of next-generation sequencing to native plant materials development Rob

bioactive products enabled or enhanced by our novel encapsulation and delivery technologies

Mayne Pharma Group Limited HY14 Results Presentation 26 February 2014 Scott Richards, Chief

Q4 2018 B AKKAFROST GROUP Oslo 19 February 2019 DISCLAIMER No representation or warranty

Sambuz

Useful Links

Newsletter

Mail Us

Single-cell RNA-sequencing Ximena Ibarra-Soria CRUK Cambridge - PowerPoint PPT Presentation

Single-cell RNA-sequencing Ximena Ibarra-Soria CRUK Cambridge Institute RNA-Sequence Analysis Course - EMBL EBI 12 April 2019 Based on materials by Aaron Lun Why use single-cell RNA-seq It allows us to chracterise heterogeneity in the gene

Introduction to Single Cell RNA Sequencing Sarah Boswell Director of the Single Cell Core,

Introduction to Single Cell RNA Sequencing Sarah Boswell Director of the Single Cell Core,

RNA-seq basics: From reads to differential expression COMBINE RNA-seq Workshop RNA sequencing

RNA World Hypothesis and RNA folding By Lixin Dai October 16, 2002 Outline: RNA World

Prediction of RNA-RNA Interaction slides by Mathias M ohl and Rolf Backofen ohl M.M c

Lectures 20, 21: Single-cell Sequencing and Assembly Spring

Introduction to single cell RNA sequencing CRUK Bioinformatics Summer School 2018 Mike

Sequencing technology and assembly Sanger sequencing Sanger sequencing with radioactivity

Bacteria Without a Cell Wall L-forms Pros &amp; Cons of Cell Wall Cell membrane Cell wall DNA

What is single-cell RNA-Seq, and why is it useful? S IN GLE-CELL RN A-S EQ W ORK F LOW S IN R

PROTEIN SYNTHESIS RNA (ribonucleic acid) 3 types RNA DIFFERENCES 1. messenger RNA (mRNA) DNA

PROTEIN SYNTHESIS RNA (ribonucleic acid) 3 types RNA DIFFERENCES 1. messenger RNA (mRNA)

Single cell RNA sequencing sa Bjrklund

Genomics Sequencing tech Sequencing tech: next generation What do we get from sequencing? How

RNA sequencing with the MinION at Genoscope Jean-Marc Aury jmaury@genoscope.cns.fr @J_M_Aury

DNA AND RNA ATI TEAS SCIENCE DNA &amp; RNA Questions related to DNA and RNA cover topics

Nex ext gen ener erat ion seque uenc ncing ng t o t o det et er erm ine HI HI V-1 1

CHANGE: children in the household the Safety Plan protects Mr. Safety will weigh and record Baby

Open Access and Beyond: Policy and Best Practices NFAIS Open Access and Beyond Conference

Patie ient Po Portal al &amp; Mobile Apps Enhancing patient-provider communication By :

The applicability of next-generation sequencing to native plant materials development Rob

bioactive products enabled or enhanced by our novel encapsulation and delivery technologies

Mayne Pharma Group Limited HY14 Results Presentation 26 February 2014 Scott Richards, Chief

Q4 2018 B AKKAFROST GROUP Oslo 19 February 2019 DISCLAIMER No representation or warranty

Sambuz

Useful Links

Newsletter

Mail Us

Bacteria Without a Cell Wall L-forms Pros & Cons of Cell Wall Cell membrane Cell wall DNA

DNA AND RNA ATI TEAS SCIENCE DNA & RNA Questions related to DNA and RNA cover topics

Patie ient Po Portal al & Mobile Apps Enhancing patient-provider communication By :