scRNA-seq Differential expression analysis methods Olga Dethlefsen - PowerPoint PPT Presentation

scRNA-seq Differential expression analysis methods Olga Dethlefsen NBIS, National Bioinformatics Infrastructure Sweden October 2017 Olga (NBIS) scRNA-seq de October 2017 1 / 34

Outline Introduction: what is so special about DE with scRNA-seq Common methods: what is out there Performance: how to choose the best method Summary DE tutorial Olga (NBIS) scRNA-seq de October 2017 2 / 34

Introduction Figure: Simplified scRNA-seq workflow [adopted from http://hemberg-lab.github.io/ Olga (NBIS) scRNA-seq de October 2017 3 / 34

Introduction Differential expression is an old problem...so why is DE scRNA-seq different to RNA-seq? ? ? ? ? ? Olga (NBIS) scRNA-seq de October 2017 4 / 34

Introduction Differential expression is an old problem...so why is DE scRNA-seq different to RNA-seq? scRNA-seq are affected by higher noise (technical and biological factors) low amount of available mRNAs results in amplification biases and "dropout events" (technical) 3’ bias, partial coverage and uneven depth (technical) stochastic nature of transcription (biological) multimodality in gene expression; presence of multiple possible cell states within a cell population (biological) Olga (NBIS) scRNA-seq de October 2017 5 / 34

Common methods Common methods Olga (NBIS) scRNA-seq de October 2017 6 / 34

Common methods Olga (NBIS) scRNA-seq de October 2017 7 / 34

Common methods Common methods non-parametric test e.g. Kruskal-Wallis (generic) edgeR, limma (bulk RNA-seq) MAST, SCDE, Monocle (scRNA-seq) D 3 E, Pagoda (scRNA-seq) Olga (NBIS) scRNA-seq de October 2017 8 / 34

Common methods Table: Information of gene differential expression analysis methods used [Miao and Zhang, 2017, Quantitative Biology 2016, 4] Olga (NBIS) scRNA-seq de October 2017 9 / 34

Common methods MAST uses generalized linear hurdle model designed to account for stochastic dropouts and bimodal expression distribution in which expression is either strongly non-zero or non-detectable The rate of expression Z , and the level of expression Y , are modeled for each gene g , indicating whether gene g is expressed in cell i (i.e., Z ig = 0 if y ig = 0 and z ig = 1 if y ig > 0) A logistic regression model for the discrete variable Z and a Gaussian linear model for the continuous variable (Y|Z=1): logit ( P r ( Z ig = 1 )) = X i β D g P r ( Y ig = Y | Z ig = 1 ) = N ( X i β C g , σ 2 g ) , where X i is a design matrix Model parameters are fitted using an empirical Bayesian framework Allows for a joint estimate of nuisance and treatment effects, DE is determined using the likelihood ratio test Olga (NBIS) scRNA-seq de October 2017 10 / 34

Common methods SCDE models the read counts for each gene using a mixture of a NB, negative binomial, and a Poisson distribution NB distribution models the transcripts that are amplified and detected Poisson distribution models the unobserved or background-level signal of transcripts that are not amplified (e.g. dropout events) subset of robust genes is used to fit, via EM algorithm, the parameters to the mixture of models For DE, the posterior probability that the gene shows a fold expression difference between two conditions is computed using a Bayesian approach Olga (NBIS) scRNA-seq de October 2017 11 / 34

Common methods Monocole Originally designed for ordering cells by progress through differentiation stages (pseudo-time) The mean expression level of each gene is modeled with a GAM, generalized additive model, which relates one or more predictor variables to a response variable as g ( E ( Y )) = β 0 + f 1 ( x 1 ) + f 2 ( x 2 ) + ... + f m ( x m ) where Y is a specific gene expression level, x i are predictor variables, g is a link function, typically log function, and f i are non-parametric functions (e.g. cubic splines) The observable expression level Y is then modeled using GAM, E ( Y ) = s ( ϕ t ( b x , s i )) + ǫ where ϕ t ( b x , s i ) is the assigned pseudo-time of a cell and s is a cubic smoothing function with three degrees of freedom. The error term ǫ is normally distributed with a mean of zero The DE test is performed using an approx. χ 2 likelihood ratio test Olga (NBIS) scRNA-seq de October 2017 12 / 34

Common methods Let’s stop for a minute... Olga (NBIS) scRNA-seq de October 2017 13 / 34

Common methods Differential expression Differential expression analysis means taking the normalized read count data & performing statistical analysis to discover quantitative changes in expression levels between experimental groups. e.g. to decide whether, for a given gene, an observed difference in read counts is significant, that is, whether it is greater than what would be expected just due to natural random variation. or simply: checking for differences in distributions Olga (NBIS) scRNA-seq de October 2017 14 / 34

Common methods The key Outcome i = ( Model i ) + error i we collect data on a sample from a much larger population . Statistics lets us to make inferences about the population from which it was derived we try to predict the outcome given a model fitted to the data Olga (NBIS) scRNA-seq de October 2017 15 / 34

Common methods The key x 1 − x 2 t = � n 1 + 1 1 s p n 2 50 Frequency 30 10 0 165 170 175 180 height [cm] Olga (NBIS) scRNA-seq de October 2017 16 / 34

Common methods The key Simple recipe model e.g. gene expression with random error fit model to the data and/or data to the model, estimate model parameters use model for prediction and/or inference Olga (NBIS) scRNA-seq de October 2017 17 / 34

Common methods The key: MAST (again) uses generalized linear hurdle model designed to account for stochastic dropouts and bimodal expression distribution in which expression is either strongly non-zero or non-detectable The rate of expression Z , and the level of expression Y , are modeled for each gene g , indicating whether gene g is expressed in cell i (i.e., Z ig = 0 if y ig = 0 and z ig = 1 if y ig > 0) A logistic regression model for the discrete variable Z and a Gaussian linear model for the continuous variable (Y|Z=1): logit ( P r ( Z ig = 1 )) = X i β D g P r ( Y ig = Y | Z ig = 1 ) = N ( X i β C g , σ 2 g ) , where X i is a design matrix Model parameters are fitted using an empirical Bayesian framework Allows for a joint estimate of nuisance and treatment effects, DE is determined using the likelihood ratio test Olga (NBIS) scRNA-seq de October 2017 18 / 34

Common methods The key: SCDE (again) models the read counts for each gene using a mixture of a NB, negative binomial, and a Poisson distribution NB distribution models the transcripts that are amplified and detected Poisson distribution models the unobserved or background-level signal of transcripts that are not amplified (e.g. dropout events) subset of robust genes is used to fit, via EM algorithm, the parameters to the mixture of models For DE, the posterior probability that the gene shows a fold expression difference between two conditions is computed using a Bayesian approach Olga (NBIS) scRNA-seq de October 2017 19 / 34

Common methods The key: Monocole (again) Originally designed for ordering cells by progress through differentiation stages (pseudo-time) The mean expression level of each gene is modeled with a GAM, generalized additive model, which relates one or more predictor variables to a response variable as g ( E ( Y )) = β 0 + f 1 ( x 1 ) + f 2 ( x 2 ) + ... + f m ( x m ) where Y is a specific gene expression level, x i are predictor variables, g is a link function, typically log function, and f i are non-parametric functions (e.g. cubic splines) The observable expression level Y is then modeled using GAM, E ( Y ) = s ( ϕ t ( b x , s i )) + ǫ where ϕ t ( b x , s i ) is the assigned pseudo-time of a cell and s is a cubic smoothing function with three degrees of freedom. The error term ǫ is normally distributed with a mean of zero The DE test is performed using an approx. χ 2 likelihood ratio test Olga (NBIS) scRNA-seq de October 2017 20 / 34

Common methods They key: implication Simple recipe model e.g. gene expression with random error fit model to the data and/or data to the model, estimate model parameters use model for prediction and/or inference Implication the better model fits to the data the better statistics Olga (NBIS) scRNA-seq de October 2017 21 / 34

Common methods Negative Binomial Zero−inflated NB Poisson−Beta 500 400 200 400 300 150 300 Frequency Frequency Frequency 200 100 200 100 50 100 0 0 0 0 5 10 15 20 0 5 10 15 20 0 20 60 100 Read Counts Read Counts Read Counts Olga (NBIS) scRNA-seq de October 2017 22 / 34

Performance Performance Olga (NBIS) scRNA-seq de October 2017 23 / 34

Performance No golden standard There is no golden standard, no single best solution ...so what do we do? Olga (NBIS) scRNA-seq de October 2017 24 / 34

Performance No golden standard There is no golden standard, no single best solution ...so what do we do? we gather as much evidence as possible Olga (NBIS) scRNA-seq de October 2017 24 / 34

Performance Get to know your data & wisely choose DE methods Example data: 46,078 genes x 96 cells 22,229 genes with no expression at all 6000 15000 Frequency Frequency 4000 2000 5000 0 0 0 500 1000 1500 0 20 40 60 80 Read Counts 0 counts Olga (NBIS) scRNA-seq de October 2017 25 / 34

scRNA-seq Differential expression analysis methods Olga Dethlefsen - PowerPoint PPT Presentation

scRNA-seq Differential expression analysis methods Olga Dethlefsen NBIS, National Bioinformatics Infrastructure Sweden October 2017 Olga (NBIS) scRNA-seq de October 2017 1 / 34 Outline Introduction: what is so special about DE with

scRNA-seq Differential expression analyses Olga Dethlefsen olga.dethlefsen@nbis.se NBIS,

Jen Grenier Director, TREx Facility Announcements New and Improved Project Submission Form

Single-cell transcriptomics (scRNA-seq) Eukaryotic Single Cell Genomics facility Applications for

Introduction to RNA-Seq Mary Piper Bioinformatics Consultant and Trainer DataCamp RNA-Seq

What is single-cell RNA-Seq, and why is it useful? S IN GLE-CELL RN A-S EQ W ORK F LOW S IN R

scRNA-seq preprocessing and quality control Nathan Wong (CCBR) and Vicky Chen (CCR-SF

Clustering methods for scRNA-Seq S IN GLE-CELL RN A-S EQ W ORK F LOW S IN R Fanny Perraudeau

Importing data Peter Humburg Statistician, Macquarie University DataCamp ChIP-seq Workflows in

Methods for Analyzing ChIP-Seq data Introduction to the ChIP-Seq server at SIB Lausanne Public

RNA-seq Data Analysis Introduction to RNA-seq data analysis June, 2018 1 Luigi Grassi < lg

RNA-seq: filtering, quality control and visualisation COMBINE RNA-seq Workshop QC and

RNA-seq basics: From reads to differential expression COMBINE RNA-seq Workshop RNA sequencing

Winter School, 2 July 2012 Why do RNA-seq? Differential expression analysis of Discover new

ChIP-seq data analysis 04-05-12 Outlook Friday 04-05-12: Next-generation sequencing

RAD-seq in Roscoff Matthieu Bruneaux 2015-03-10 Mini-workshop about ddRAD Introduction about

Re-analysis of a CD4 ChIP-Seq data set with csaw Ryan C. Thompson Salomon Lab The Scripps

SFY 2020-2022 RFP Proposers Conference Alpine AAA Overview Covered Today Services

Basic Rural Health Clinic Billing Charles A. James, Jr. President and CEO North American

Presenter Disclosure Gary D. Foster, PhD Obesity, Weight Loss and OSA Scientific Advisory

SAJC Local Universities Forum 26 Jan 2019 MS VERONICA WONG ASST DIRECTOR, ADMISSIONS

Differential Privacy Tabular Data Li Xiong Outline Tabular data and histogram/range

CS573 Data Privacy and Security Differential Privacy tabular data and range queries Li Xiong

Patient Empowerment by Increasing Information Accessibility In a Telecare System presenter :

before and after IL2 treatment Lu Wang and Ying Sha 9/18/2014 1 Update since the 9/15/2014

Sambuz

Useful Links

Newsletter

Mail Us

scRNA-seq Differential expression analysis methods Olga Dethlefsen - PowerPoint PPT Presentation

scRNA-seq Differential expression analysis methods Olga Dethlefsen NBIS, National Bioinformatics Infrastructure Sweden October 2017 Olga (NBIS) scRNA-seq de October 2017 1 / 34 Outline Introduction: what is so special about DE with

scRNA-seq Differential expression analyses Olga Dethlefsen olga.dethlefsen@nbis.se NBIS,

Jen Grenier Director, TREx Facility Announcements New and Improved Project Submission Form

Single-cell transcriptomics (scRNA-seq) Eukaryotic Single Cell Genomics facility Applications for

Introduction to RNA-Seq Mary Piper Bioinformatics Consultant and Trainer DataCamp RNA-Seq

What is single-cell RNA-Seq, and why is it useful? S IN GLE-CELL RN A-S EQ W ORK F LOW S IN R

scRNA-seq preprocessing and quality control Nathan Wong (CCBR) and Vicky Chen (CCR-SF

Clustering methods for scRNA-Seq S IN GLE-CELL RN A-S EQ W ORK F LOW S IN R Fanny Perraudeau

Importing data Peter Humburg Statistician, Macquarie University DataCamp ChIP-seq Workflows in

Methods for Analyzing ChIP-Seq data Introduction to the ChIP-Seq server at SIB Lausanne Public

RNA-seq Data Analysis Introduction to RNA-seq data analysis June, 2018 1 Luigi Grassi &lt; lg

RNA-seq: filtering, quality control and visualisation COMBINE RNA-seq Workshop QC and

RNA-seq basics: From reads to differential expression COMBINE RNA-seq Workshop RNA sequencing

Winter School, 2 July 2012 Why do RNA-seq? Differential expression analysis of Discover new

ChIP-seq data analysis 04-05-12 Outlook Friday 04-05-12: Next-generation sequencing

RAD-seq in Roscoff Matthieu Bruneaux 2015-03-10 Mini-workshop about ddRAD Introduction about

Re-analysis of a CD4 ChIP-Seq data set with csaw Ryan C. Thompson Salomon Lab The Scripps

SFY 2020-2022 RFP Proposers Conference Alpine AAA Overview Covered Today Services

Basic Rural Health Clinic Billing Charles A. James, Jr. President and CEO North American

Presenter Disclosure Gary D. Foster, PhD Obesity, Weight Loss and OSA Scientific Advisory

SAJC Local Universities Forum 26 Jan 2019 MS VERONICA WONG ASST DIRECTOR, ADMISSIONS

Differential Privacy Tabular Data Li Xiong Outline Tabular data and histogram/range

CS573 Data Privacy and Security Differential Privacy tabular data and range queries Li Xiong

Patient Empowerment by Increasing Information Accessibility In a Telecare System presenter :

before and after IL2 treatment Lu Wang and Ying Sha 9/18/2014 1 Update since the 9/15/2014

Sambuz

Useful Links

Newsletter

Mail Us

RNA-seq Data Analysis Introduction to RNA-seq data analysis June, 2018 1 Luigi Grassi < lg