quality control of scrnaseq data
play

Quality Control of scRNAseq data sa Bjrklund - PowerPoint PPT Presentation

Quality Control of scRNAseq data sa Bjrklund asa.bjorklund@scilifelab.se Outline Background on transcrip9onal burs9ng & drop-outs Experimental setup what could go wrong? Spike-in RNAs Quality control metrics PCA


  1. Quality Control of scRNAseq data Åsa Björklund asa.bjorklund@scilifelab.se

  2. Outline • Background on transcrip9onal burs9ng & drop-outs • Experimental setup – what could go wrong? • Spike-in RNAs • Quality control metrics • PCA for quality control

  3. Transcrip9onal burs9ng Burst frequency and size is correlated with mRNA abundance • Many TFs have low mean expression (and low burst frequency) and will only be • detected in a frac9on of the cells (Suter et al. Science 2011)

  4. Burs9ng, drop-outs and amplifica9on bias Stochastic gene expression 20% Gene 1 Gene 1 30% Gene 2 50% Gene 3 Gene 2 0% Gene 4 Gene 3 Gene 4 0% Gene 1 55% Gene 2 25% Gene 3 20% Gene 4 Reverse Ampli fj cation Dissociate May have bias due to transcription Bulk Bias due to cell length, structure, gc-content “random” selection type/state RNAseq of 10-40% of mRNAs - Drop-outs 20% Gene 1 50% Gene 2 25% Gene 3 5% Gene 4

  5. Transcript drop-out vs burs9ng • When a transcript is present in the cell but is not converted to a cDNA and not detected – Drop-out • When a transcript is expressed in most cells of the celltype, but not in every cell – Transcrip9onal burs9ng. • Lowly expressed transcripts will have a lower chance of detec9on and most likely low burst frequency – hard to dis9nguish drop-out from burs9ng.

  6. Problems compared to bulk RNA-seq • Amplifica9on bias • Drop-out rates • Transcrip9onal burs9ng • Background noise • Bias due to cell-cycle, cell size and other factors • O[en clear batch effects (Karchenko et al. Nature Methods 2014)

  7. Experimental setup Cell dissocia9on Single cell capture Single cell lysis Reverse transcrip9on Preamplifica9on Library prepara9on and sequencing

  8. Experimental setup Cell dissocia9on Single cell capture Single cell lysis It is cri9cal to have healthy whole cells with no RNA leakage. Tissues can be dissolved with mechanical methods, detergents or enzyma9c diges9on. Short 9me from dissocia9on to cell capture to Reverse reduce effect on transcrip9onal state. transcrip9on PROBLEMS: • Incomplete dissocia9on can give mul9ple cells s9cking Preamplifica9on together. • To harsh lysis may damage the cells -> RNA degrada9on and RNA leakage Library prepara9on • Different lysis condi9ons may/may not give nuclear lysis. and sequencing • Quality of the 9ssue to start with (Kolodziejczyk et al. 2015)

  9. Experimental setup Cell dissocia9on Single cell capture Single cell lysis Reverse transcrip9on Tissues that are hard to dissociate: Preamplifica9on Laser capture microscopy (LCM) Nuclei sor9ng PROBMLEMS: Library prepara9on • All these methods may give rise to empty wells/droplets, and and sequencing also duplicates or mul9ples of cells. • Long 9me for sor9ng may damage the cells (Kolodziejczyk et al. 2015)

  10. Experimental setup Cell dissocia9on Single cell capture Single cell lysis Reverse transcrip9on Preamplifica9on Efficiency of reverse transcrip9on is the key to high sensi9vity. Drop-out rate is around 90-60% depending on the method used. Two libraries with the same method using the same cell type may Library prepara9on have very different drop-out rates. and sequencing (Kolodziejczyk et al. 2015)

  11. Experimental setup Cell dissocia9on Single cell capture Single cell lysis Reverse Any amplifica9on step will introduce a bias in the data. transcrip9on Methods that uses UMIs will control for this to a large extent, but the chance of detec9ng a transcript that is amplified more is Preamplifica9on higher. Full length methods like SmartSeq2 has no UMIs, so we cannot Library prepara9on control for amplifica9on bias. and sequencing (Kolodziejczyk et al. 2015)

  12. Experimental setup Cell dissocia9on Single cell capture Single cell lysis Mul9plexing of samples will not always be perfect, so the number of reads per cell may vary quite a lot. Reverse transcrip9on Base calls in the sequening may be effected by a number of factors: • Low complexity of library – may be an issue whey there are Preamplifica9on many primer dimers • Base call quality scores may be effected if there are contamina9ons in the flow cell Library prepara9on and sequencing (Kolodziejczyk et al. 2015)

  13. Spike-in RNAs • Addi9on of external controls • ERCC spike-in most widely used, consists of 48 or 96 mRNAs at 17 different concentra9ons. • Important to add equal amounts to each cell, preferably in the lysis buffer. (Vallejos et al. PLOS Comp Biol 2015)

  14. Spike-in RNAs Spike-ins can be used to model: • Technical noise • Drop-out rates • Star9ng amount of RNA in the cell • Data normaliza9on (Vallejos et al. PLOS Comp Biol 2015)

  15. Spike-in RNAs (Tung et al. Scien9fic Reports 2017)

  16. Spike-in RNAs Finding biologically variable genes Coefficient of varia9on 2 : CV 2 = standard devia9on / mean ^2 (Brennecke et al. Nature Methods 2013)

  17. QC-metrics – Mapping sta9s9cs ( % uniquely mapping ) – Frac9on of exon mapping reads – 3’ bias – for full length methods like SS2 – mRNA-mapping reads – Number of detected genes – Spike-in detec9on – Mitochondrial read frac9on – rRNA read frac9on – Pairwise correla9on to other cells

  18. QC-metrics – Number of reads – Mapping sta9s9cs (% uniquely mapping) – Frac9on of exon mapping reads – mRNA-mapping reads (vs other types of genes like rRNA, sRNA, non coding, pseudogenes etc.) Low number of reads – may not have enough informa9on for that cell. Bad mapping may be an indica9on of a failed library prep. Low content of mRNAs will lead to more primer dimers and more spurious mapping and fewer mapping reads.

  19. QC-metrics – 3’ bias (degraded RNA) – for full length methods like SS2 Not degraded Degraded 6e+05 3500000 3000000 5e+05 2500000 4e+05 read number read number 2000000 3e+05 1500000 2e+05 1000000 1e+05 500000 0 20 40 60 80 100 0 20 40 60 80 100 percentile of gene body (5' − >3') percentile of gene body (5' − >3') Look at propor9on of reads that maps to the 10-20% most 3’ end of the transcript

  20. QC-metrics – Spike-in detec9on – Spike-in ra9o If the number of spike-in molecules that are detected is low, this is a clearly failed library prep. Propor9on of cell to spike-in reads is an indica9on of the star9ng amount of RNA from the cell. Low amount of cell RNA can indicate breakage or just a smaller cell.

  21. QC-metrics – Number of detected genes Number of detected genes clearly correlates to the size of the cells, so be careful if you are working with cells with very varying sizes. OK High number of detected genes 60 may be an indica9on of 50 duplicate/mul9ple cells. 40 30 20 Mul9ple cells Failed libraries 10 0 0.035 0.175 0.315 0.455 0.595 ailed QC

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend