Un Understan anding drop op-ou outs in singl gle-ce cell UMI: - PowerPoint PPT Presentation

Un Understan anding drop op-ou outs in singl gle-ce cell UMI: tw two paper ers wi with th differ eren ent t approach ches es Bayesian model selection reveals Demystifying "drop-outs" in single-cell biological origins of zero inflation in UMI data single-cell transcriptomics TH Kim, X Zhou, M Chen K Choi, Y Chen, DA Skelly, GA Churchill CSE 590C Fall 2020 October 19 th , 2020 Ayse Dincer & Walter L. Ruzzo 1

Singl gle-cell RNA sequencing g (sc scRNA-se seq) Genotype Phenotype A challenge in biology and medicine Transcriptomes can be informative Bulk RNA-seq • Bulk population sequencing can provide only the average Samples expression signal for an ensemble of cells • However, diverse cell types in our body each express a unique transcriptome Genes 2 Hwang, B., Lee, J.H. & Bang, D. Single-cell RNA sequencing technologies and bioinformatics pipelines. Exp Mol Med 50, 96 (2018).

Singl gle-cell RNA sequencing g (sc scRNA-se seq) We need a more precise understanding of the transcriptome in individual cells Bulk RNA-seq Single-cell RNA-seq Samples Samples CELLS Genes Genes Cell types Hwang, B., Lee, J.H. & Bang, D. Single-cell RNA sequencing technologies and bioinformatics pipelines. Exp Mol Med 50, 96 (2018). 3

Singl gle-cell RNA sequencing g (sc scRNA-se seq) • Pioneered by James Eberwine et al. and Iscove et al. • First analysis in 2009 by Tang et al. • characterization of cells from early developmental stages • Many studies followed: • Identify rare cell populations • Characterize outlier cells to understand drug resistance and relapse in cancer treatment • Detect diverse immune cell populations • Understand cell lineage relationships in early development 4 Hwang, B., Lee, J.H. & Bang, D. Single-cell RNA sequencing technologies and bioinformatics pipelines. Exp Mol Med 50, 96 (2018).

sc scRNA-seq Technology gy First step: single-cell Second step: generation of scRNA- isolation seq libraries Many techniques exist to example of droplet-based library generation isolate cells 5 Hwang, B., Lee, J.H. & Bang, D. Single-cell RNA sequencing technologies and bioinformatics pipelines. Exp Mol Med 50, 96 (2018).

sc scRNA-seq Technology gy: What is UMI? “Unique molecular identifiers (UMI) are molecular tags that are used to detect and quantify unique mRNA transcripts” Drop-Seq workflow Paired-end reads Illumina, Data Science 6 Sequencing Lecture 16

sc scRNA-se seq: Computational pipeline 7 Hwang, B., Lee, J.H. & Bang, D. Single-cell RNA sequencing technologies and bioinformatics pipelines. Exp Mol Med 50, 96 (2018).

sc scRNA-se seq: Computational pipeline 8 Hwang, B., Lee, J.H. & Bang, D. Single-cell RNA sequencing technologies and bioinformatics pipelines. Exp Mol Med 50, 96 (2018).

sc scRNA-se seq Ap Applications 9 Hwang, B., Lee, J.H. & Bang, D. Single-cell RNA sequencing technologies and bioinformatics pipelines. Exp Mol Med 50, 96 (2018).

sc scRNA-se seq Ap Applications 10 Hwang, B., Lee, J.H. & Bang, D. Single-cell RNA sequencing technologies and bioinformatics pipelines. Exp Mol Med 50, 96 (2018).

Singl gle-cell RNA sequencing g (sc scRNA-se seq) • Single-cell RNA sequencing is a very promising technology • It can allow new biological insights • Yet it also presents many technical and computation challenges • One problem we will focus on today is drop-out or zero-inflation 11

What is dropout in singl gle cell? a gene is observed at a moderate or high expression level in one cell but is not detected in another cell Kharchenko, P., Silberstein, L. & Scadden, D. Bayesian approach to single-cell differential expression analysis. Nat 12 Methods 11, 740–742 (2014).

Ther There e are e many many dif differ eren ent t appr pproaches hes Why do dropouts occur? We are not sure why!! 13

Why do dropouts occur in singl gle cell? There are different views Why do we observe dropouts? What should we do about them? • technical artifacts • impute before learning • statistical sampling • preprocess/cluster/reduce dimensions • cell type differences • incorporate technical variates • biological factors • incorporate biological variates • model zero inflation • ignore zero inflation 14

To Today we are going to examine 2 papers There are two main views Drop-outs are Drop-outs are related to technical artefacts biological signals To solve drop-outs -> To detect cell type heterogeneity -> Take cell type heterogeneity and Use drop-out rates biological covariates into account 15

Ba Bayesi sian mod odel se selecti tion on reveals s bi biologi gical al o origi gins ns o of z zero i inflation i n in n si single-cell t cell trans anscr crip iptomics mics Pa Paper 1 16

Sh Short ort s summa mmary of of p paper 1 r 1 • They apply a Bayesian model selection approach to demonstrate zero inflation in multiple biologically realistic scRNA-seq datasets • They show that the primary causes of zero inflation are not technical but rather biological in nature • They recommend the negative binomial count distribution, not zero- inflated, as a suitable reference model for scRNA-seq analysis 17

Out Outline ne for pa pape per 1 Problem: Potential reasons for zero inflation/dropout Method: Bayesian model selection approach to identify genes with zero inflation Results #1: scRATE can identify genes with zero inflation Results #2: Zero-inflation of genes is highly associated with cell types 18

Pr Problem: Wh Why are the here so many zeros? ? 1. Sequencing Depth 2. Per-gene average rate of expression Sequencing depth explains 95% of variation in the number of zeros per cell 19

Ba Backgrou ound: St Statistical Models 1. Poisson (P) 2. Negative Binomial (NB) 3. Zero-inflated Poisson 4. Zero-inflated Negative Binomial (ZIP) (ZINB) 20

Met Method: Ba Bayes esian mod odel el sel elec ection on to o iden entify gen enes es ex exhibiting zero inflation What is Bayesian model selection? • The goal is to select the model that maximizes the likelihood of the observed data • The probability of the data given the model is computed by integrating over the unknown parameter values in that model: 21 http://alumni.media.mit.edu/~tpminka/statlearn/demo/

Met Method: Ba Bayes esian mod odel el sel elec ection on to o iden entify gen enes es ex exhibiting zero inflation • Is based on generalized linear models (GLMs) • Implemented a Bayesian model selection criterion the expected log predictive density (ELPD) denotes LOOCV value for each cell vs. all the other cells • ELPD score is calculated for four statistical models (P, ZIP, NB, or ZINB) • scRATE examines all the data, including non-zero counts • Uses leave-one-out cross-validation, which provides a standard error (SE) to quantify uncertainty in the estimated ELPD scores • Penalizes both underfitting and overfitting models, a more complex model is selected only when the ELPD is substantially better 22

Re Results #1: Mod Model selection on can identify genes exhibiting g zero inflation (a) False Positive rates (b) True Positive rates 23

Results #2: Mo Re Most zero-in infla lated genes are due to varia iable le ex expression rates across cell types Applied scRATE directly Used cell type as an explanatory variable After accounting for cell type, the number of zero-inflated genes drops Genes that are no longer ZI vary across cell types Examples: Col1a2 -> fibroblasts, Ptpn18 -> immune cells 24

Re Results #2: Mo Most zero-in infla flated genes are due to va variable expression rates across cell types Majority of genes were originally classified as ZI are no longer ZI after accounting for cell type A few of genes remain or become ZI: female-specific Xist Y-chromosome gene Ddx3y After accounting for sex as an explanatory variable, these genes are no longer ZI 25

Pa Paper 1 Their conclusions: • High frequency of zeros does not necessarily imply technical dropout • Instead, zero inflation is largely explained by biological factors, such as cell type and sex • Recommend against the practice of replacing zeros in data with imputed non-zero values, could mask biological signals • Recommend the generalized linear model with negative binomial error, and taking cell types and biological factors as explanatory variables 26

Pa Paper 1 • Do you think simulation tests make sense? • What other simulation experiments can be carried? • Do you think simulated data can reflect true patterns? • Do you prefer to see more real-data experiments and biological covariate examples? • What are the advantages/disadvantages of this model? • Does it make sense that cell type is a determinant of zero-inflation? 27

De Demysti tifyi ying ng “dr drop-ou outs” ” in sing single le-ce cell UMI da data a Pa Paper 2 28

Sh Short ort s summa mmary of of p paper 2 r 2 • Proposed a novel framework HIPPO (Heterogeneity-Inspired Pre- Processing tOol) that leverages zero proportions to explain cellular heterogeneity and integrates feature selection with iterative clustering • Showed that clustering should be the foremost step of the workflow • Showed that cell-type heterogeneity can resolve drop-outs, while imputing or normalizing heterogeneous data can introduce unwanted noise 29

Un Understan anding drop op-ou outs in singl gle-ce cell UMI: - PowerPoint PPT Presentation

Un Understan anding drop op-ou outs in singl gle-ce cell UMI: tw two paper ers wi with th differ eren ent t approach ches es Bayesian model selection reveals Demystifying "drop-outs" in single-cell biological origins of

Understan anding & g & Implementing g SB 310 SB 310 HB 481 B 481 Agenda What is

Singl gle-Us Use P Plastic: Taking Practical Action in the Hotel Industry Jo Hendrickx &

Deferred Retirement Option Plan - DROP October 2016 1 Deferred Retirement Option Plan - DROP

Carve Outs Carve Outs A carve out is a method used for adjusting a non typical subject

On possible revision of the defjnition of GLE and a new class of "sub-GLE" events S.

OUTS TSTANDING ANDING INFORM ORMATI TION N EFFL FLUEN UENT T QUALIT ALITY Y CRI RITERI

Bacteria Without a Cell Wall L-forms Pros & Cons of Cell Wall Cell membrane Cell wall DNA

Clustering methods for scRNA-Seq S IN GLE-CELL RN A-S EQ W ORK F LOW S IN R Fanny Perraudeau

What is single-cell RNA-Seq, and why is it useful? S IN GLE-CELL RN A-S EQ W ORK F LOW S IN R

Single IRB Updates VHRPP NEWS YOU CAN USE JANUARY 22, 2018 Je Jenni Beadles, , MEd, , Singl

Cell Communication and Cell Signaling Why is cell signaling important? Why is cell signaling

Mouse Epithelium Dataset S IN GLE-CELL RN A-S EQ W ORK F LOW S IN R Fanny Perraudeau Senior

DROP TESTER PWB Level Drop Tester Salon Teknopaja Oy (Ltd) Joensuunkatu 5 24100 SALO FINLAND

Ins & Outs Ins & Outs of Contrac of Contracts Office of Research Support &

Ins & Outs of Contrac Ins & Outs of Contracts Office of Research Support &

Does God play dice with the cell? Does God play dice with the cell? Does God play dice with the

Status of Big Bang Nucleosynthesis Gianpiero Mangano INFN, Naples ITAL Y WIN 2017, Irvine June

Put the following in order of LARGEST It can be hard

A matrix big bang Ben Craps Vrije Universiteit Brussel & The International Solvay Institutes

A Passive Filter Aided Timing Recovery Scheme Faisal A. Musa, Anthony Chan Carusone Department

Is there a Big Bang going on? Dave Neary, Red Hat Ildiko Vancsa, Ericsson Yes... The growth of

Some recent results and open questions in time optimal control for infinite dimensional systems

Markov Decision Process AssumpCon: agent gets to observe the

The Network as a Language Construct Tony Garnock-Jones Sam Tobin-Hochstadt Matthias Felleisen

Sambuz

Useful Links

Newsletter

Mail Us

Un Understan anding drop op-ou outs in singl gle-ce cell UMI: - PowerPoint PPT Presentation

Un Understan anding drop op-ou outs in singl gle-ce cell UMI: tw two paper ers wi with th differ eren ent t approach ches es Bayesian model selection reveals Demystifying "drop-outs" in single-cell biological origins of

Understan anding &amp; g &amp; Implementing g SB 310 SB 310 HB 481 B 481 Agenda What is

Singl gle-Us Use P Plastic: Taking Practical Action in the Hotel Industry Jo Hendrickx &amp;

Deferred Retirement Option Plan - DROP October 2016 1 Deferred Retirement Option Plan - DROP

Carve Outs Carve Outs A carve out is a method used for adjusting a non typical subject

On possible revision of the defjnition of GLE and a new class of &quot;sub-GLE&quot; events S.

OUTS TSTANDING ANDING INFORM ORMATI TION N EFFL FLUEN UENT T QUALIT ALITY Y CRI RITERI

Bacteria Without a Cell Wall L-forms Pros &amp; Cons of Cell Wall Cell membrane Cell wall DNA

Clustering methods for scRNA-Seq S IN GLE-CELL RN A-S EQ W ORK F LOW S IN R Fanny Perraudeau

What is single-cell RNA-Seq, and why is it useful? S IN GLE-CELL RN A-S EQ W ORK F LOW S IN R

Single IRB Updates VHRPP NEWS YOU CAN USE JANUARY 22, 2018 Je Jenni Beadles, , MEd, , Singl

Cell Communication and Cell Signaling Why is cell signaling important? Why is cell signaling

Mouse Epithelium Dataset S IN GLE-CELL RN A-S EQ W ORK F LOW S IN R Fanny Perraudeau Senior

DROP TESTER PWB Level Drop Tester Salon Teknopaja Oy (Ltd) Joensuunkatu 5 24100 SALO FINLAND

Ins &amp; Outs Ins &amp; Outs of Contrac of Contracts Office of Research Support &amp;

Ins &amp; Outs of Contrac Ins &amp; Outs of Contracts Office of Research Support &amp;

Does God play dice with the cell? Does God play dice with the cell? Does God play dice with the

Status of Big Bang Nucleosynthesis Gianpiero Mangano INFN, Naples ITAL Y WIN 2017, Irvine June

Put the following in order of LARGEST It can be hard

A matrix big bang Ben Craps Vrije Universiteit Brussel &amp; The International Solvay Institutes

A Passive Filter Aided Timing Recovery Scheme Faisal A. Musa, Anthony Chan Carusone Department

Is there a Big Bang going on? Dave Neary, Red Hat Ildiko Vancsa, Ericsson Yes... The growth of

Some recent results and open questions in time optimal control for infinite dimensional systems

Markov Decision Process AssumpCon: agent gets to observe the

The Network as a Language Construct Tony Garnock-Jones Sam Tobin-Hochstadt Matthias Felleisen

Sambuz

Useful Links

Newsletter

Mail Us

Understan anding & g & Implementing g SB 310 SB 310 HB 481 B 481 Agenda What is

Singl gle-Us Use P Plastic: Taking Practical Action in the Hotel Industry Jo Hendrickx &

On possible revision of the defjnition of GLE and a new class of "sub-GLE" events S.

Bacteria Without a Cell Wall L-forms Pros & Cons of Cell Wall Cell membrane Cell wall DNA

Ins & Outs Ins & Outs of Contrac of Contracts Office of Research Support &

Ins & Outs of Contrac Ins & Outs of Contracts Office of Research Support &

A matrix big bang Ben Craps Vrije Universiteit Brussel & The International Solvay Institutes