Statistical analysis for scRNAseq data
Cathy Maugis-Rabusseau cathy.maugis@insa-toulouse.fr
C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 1 / 52
Statistical analysis for scRNAseq data Cathy Maugis-Rabusseau - - PowerPoint PPT Presentation
Statistical analysis for scRNAseq data Cathy Maugis-Rabusseau cathy.maugis@insa-toulouse.fr C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 1 / 52 Plan Introduction 1 Feature selection / extraction 2 Dimension
C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 1 / 52
1
2
3
4
5
6
C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 2 / 52
C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 3 / 52
Rostom et al, FEBS 2017 C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 4 / 52
Rostom et al, FEBS 2017 C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 5 / 52
[Juliá et al., 2015] Sincell (Bioconductor/R package)
https://bioconductor.org/packages/release/bioc/html/sincell.html C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 6 / 52
[Juliá et al., 2015] Sincell (Bioconductor/R package) [Poirion et al., 2016]
C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 6 / 52
[Juliá et al., 2015] Sincell (Bioconductor/R package) [Poirion et al., 2016] [Wolf et al., 2018] SCANPY
https://github.com/theislab/Scanpy C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 6 / 52
[Juliá et al., 2015] Sincell (Bioconductor/R package) [Poirion et al., 2016] [Wolf et al., 2018] SCANPY [Guo et al., 2015] SINCERA:
Fig 1. Schematic Workflow. The analytic pipeline consists of three main components: pre-processing, cell type identification, and cell type specific gene signature and driving force identification.
https://github.com/xu-lab/SINCERA https://research.cchmc.org/pbge/sincera.html C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 6 / 52
[Juliá et al., 2015] Sincell (Bioconductor/R package) [Poirion et al., 2016] [Wolf et al., 2018] SCANPY [Guo et al., 2015] SINCERA: [Lun et al., 2016] Workflow Package : simpleSingleCell
https://bioconductor.org/packages/release/workflows/html/simpleSingleCell.html C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 6 / 52
[Juliá et al., 2015] Sincell (Bioconductor/R package) [Poirion et al., 2016] [Wolf et al., 2018] SCANPY [Guo et al., 2015] SINCERA: [Lun et al., 2016] Workflow Package : simpleSingleCell [Satija et al., 2015] SEURAT:
https://satijalab.org/seurat/
C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 6 / 52
1
2
3
4
5
6
C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 7 / 52
genes expressed in < τ% of cells genes with a mean average of expression < τ
Based on the Michaelis-Menten function Pdropout = 1 − S KM + S where S = mean expression Pdropout = dropout rate MLE to obtain the global KM across all genes
C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 8 / 52
Fits a quadratic model (gamma generalized linear model) to the relationship between mean expression and the coefficient of variation squared (CV2) χ2 test is used to find genes signif. above the curve Implemented in M3Drop package
C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 9 / 52
C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 9 / 52
C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 9 / 52
Calculate the gene-gene correlation matrix ρ = (ρij)i,j=1,...,G Evaluate the correlation magnitude for each gene : ˜ ρi = max
j
|ρij| Take the top few thousand genes having the highest correlation magnitude
C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 10 / 52
1
2
3
4
5
6
C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 11 / 52
C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 12 / 52
C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 13 / 52
C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 13 / 52
Deals with the large number of zero-values in scRNASeq data Relationship between the dropout rate p0 and the mean level of non-zero expression (log read count) µ: p0 = exp(−λµ2) ZIFA adopts a latent variable model and uses an EM algorithm for the parameter estimation Python software : https://github.com/epierson9/ZIFA
https://bioconductor.org/packages/release/bioc/html/zinbwave.html
C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 14 / 52
1
Preliminary, log(xij + 1)
2
Identification of dropout candidates.
(CIDR finds a sample-dependent threshold that separates the zero peak from the rest of the expression distribution for each cell)
3
Estimation of the relationship between dropout rate and gene expression levels
(non-linear least-squares regression to fit a decreasing logistic function to the data)
4
Calculation of dissimilarity between the imputed gene expression profiles for every pairs of single cells
5
PCoA using the CIDR dissimilarity matrix
6
Clustering (CAH) using the first few principal coordinates
C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 15 / 52
C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 16 / 52
https://constantamateur.github.io/2018-01-02-tSNE/
C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 17 / 52
C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 17 / 52
i )
i )
C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 18 / 52
Statistical analysis for scRNAseq data 19 / 52
Small pij modeled by large qij: small penalty t-SNE: mainly preserves local similarity structure of the data
C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 20 / 52
(see https://lvdmaaten.github.io/tsne/)
C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 21 / 52
Computational method for separating a multivariate signal into additive subcomponents. This is done by assuming that the subcomponents are non-Gaussian signals and that they are statistically independent from each other.
Computes a family of embeddings of a data set into Euclidean space whose coordinates can be computed from the eigenvectors and eigenvalues of a diffusion operator on the data.
C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 22 / 52
1
2
3
4
5
6
C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 23 / 52
Hierarchical clustering (CAH), Kmeans clustering, Graph-based clustering Model-based clustering ...
C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 24 / 52
C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 25 / 52
C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 25 / 52
Euclidian distance 1-correlation, 1-correlation2, .... Dissimilarity between the imputed gene expression profiles (CIDR), ....
C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 26 / 52
1
2
3
4
C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 27 / 52
C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 28 / 52
C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 29 / 52
Data are organized into K subpopulations Each subpopulation is distributed from fk(.|αk) ⇒ Thus the population is distributed from a mixture of these subdistributions f(.|θK) =
K
πkfk(.|αk) with (π1, . . . , πK) ∈]0, 1[K,
K
πk = 1 Parameter vector: θK = (π1, . . . , πK, α1, . . . , αK)
C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 30 / 52
C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 31 / 52
K
C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 32 / 52
K)).
C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 33 / 52
Software Description Ref SC3 Kmeans on different dimensions of PCA (on log-counts) + CAH on consensus matrix [Kiselev et al., 2017] https://bioconductor.org/packages/release/bioc/html/SC3.html pcaReduce Combines PCA on log-counts, k-means and "iterative" hierarchical clustering [Žurauskien˙ e and Yau, 2016] https://github.com/JustinaZ/pcaReduce SINCERA CAH using Pearson corr. and average linkage on z-score (prelim. data transf.) [Guo et al., 2015] https://research.cchmc.org/pbge/sincera.html SNN-Cliq Shared Nearest Neighbours graph + clique method [Xu and Su, 2015] http://bioinfo.uncc.edu/SNNCliq/ Seurat PCA + Nearest Neighbor graph clustering [Butler et al., 2018] https://github.com/satijalab/seurat CIDR PCA (diss. CIDR on counts, dropouts imputation) + CAH [Lin et al., 2017] https://github.com/VCCRI/CIDR SAFE Ensemble clustering using SC3, CIDR, Seurat et t-SNE Kmeans [Yang et al., 2017] https://github.com/yycunc/SAFEclustering BISCUIT Dropout imputation and clustering. mixtures of multivariate log-normal and Bayesian inference [Prabhakaran et al., 2016] https://github.com/sandhya212/BISCUIT_SingleCell_IMM_ICML_2016 SIMLR Kernel-based similarity learning (S) + t-SNE on S + Kmeans [Wang et al., 2017] https://bioconductor.org/packages/release/bioc/html/SIMLR.html RaceID Kmedoids clustering on similarity matrix of Pearson’s correlation coeff. [Grün et al., 2015] https://cran.r-project.org/web/packages/RaceID/ backSpin Biclustering based on sorting points into neighborhoods (SPIN) [Zeisel et al., 2015] https://github.com/linnarsson-lab/BackSPIN
C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 34 / 52
Figure 1. The SC3 framework for consensus clustering. (a) Overview of clustering with SC3 framework (see Methods). A total of 6D clusterings are obtained, where D is the total number of dimensions d
1
, …, d
D considered. These clusterings are then combined through a consensus
step to increase accuracy and robustness. Here, the consensus step is exemplified using the Treutlein data: the binary matrices (Methods) corresponding to each clustering are averaged, and the resulting matrix is segmented using hierarchical clustering up to the kth hierarchical level (k = 5 in this example). (b) Published datasets used to set SC3 parameters. N is the number of cells in a dataset; k is the number of clusters originally identified by the authors
9,14,17,18,20–23,63–66; Units: RPKM is Reads Per Kilobase of transcript per Million mapped reads,
RPM is Reads Per Million mapped reads, FPKM is Fragments Per Kilobase of transcript per Million mapped reads, TPM is Transcripts Per Million mapped reads. (c) Testing the distances, nonlinear transformations and d range. Median of ARI over 100 realizations of the SC3 clustering for six gold standard datasets (Biase, Yan, Goolam, Kolodziejczyk, Deng and Pollen,
[Kiselev et al., 2017] https://bioconductor.org/packages/release/bioc/html/SC3.html C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 35 / 52
Algorithm 1: The pcaReduce algorithm. Here Xn×d is a gene expression matrix with n cells (given in rows) and d genes (in columns); q is the number of dimensions – effectively this refers to the number of levels in the hierarchy; Y is a score matrix, which is the output of PCA algorithm; µij and ij definition are given in Eq. (1); (i) and (ii) denote two different merging settings: (i) merg- ing is based on largest probability P(i, j) value; (ii) merging is based on sampling according to P(i, j) distribution. Input X distribution. Input: Xn×d and q ; Output: a collection of q clusterings;
1 Y ←
− PCA(Xn×d, dim=q);
2 (µ, ) ←
− kmeans(Y, K = q + 1);
3 Q ←
− q − 1;
4 for r = 1, . . . , Q do 5
for all possible pairs (i, j) in {1, . . . , K} × {1, . . . , K} do
6
P(i, j) ← − p
7
end for
8
(i) either choose pair (i, j) with largest probability P(i, j) and merge clusters i and j;
9
(ii) or sample a pair (i, j) with probability P(i, j) and merge clusters i and j;
10
q ← − q − 1;
11
Y ← Yn×q (i.e. remove last dimension);
12
update (µ, );
13
K ← K − 1;
14 end for
[Žurauskien˙ e and Yau, 2016] https://github.com/JustinaZ/pcaReduce C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 36 / 52
With Technical Variation
26
Without Technical Variation
Latent counts which we want to recover Observation [Prabhakaran et al., 2016] https://github.com/sandhya212/BISCUIT_SingleCell_IMM_ICML_2016 C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 37 / 52
Cluster-specific parameters Cell-specific parameters
Priors set based on
distribution
Hyper-parameters Hyper-priors Global Distributions of Observed Data Observed gene expression per cell j
Prabhakaran*, Azizi* ICML 2016
27
BISCUIT (Bayesian Inference for Single-cell ClUstering and ImpuTing)
Data-driven priors to adapt to different datasets
[Prabhakaran et al., 2016] https://github.com/sandhya212/BISCUIT_SingleCell_IMM_ICML_2016 C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 37 / 52
Figure 1. Overview of SAFE-clustering. Log-transformed expression matrix of scRNA-seq data are first clustered using four state-of-the-art methods, SC3, CIDR, Seurat and t-SNE + k-means; and then individual solutions are combined using one of the three hypergraph-based partitioning algorithms: hypergraph partitioning algorithm (HGPA), meta-cluster algorithm (MCLA) and cluster-based similarity partitioning algorithm (CSPA) to produce consensus clustering.
0.000 0.000 0.000 0 0.000 0 0.000 0.000 0.000 0.000 0.000 0 0.000 0 0.000 0.000 0.000 0.000 0.000 0 0.000 0 0.000 0.000 0.000 9.124 0.000 0 9.464 0 0.000 0.000 0.000 0.000 0.000 0 0.000 0 0.000 0.000 0.000 0.000 0.000 0 0.000 0 0.000 0.000 0.000 0.000 0.000 0 0.000 0 0.000 0.000 0.000 0.000 0.000 0 0.000 0 0.000 0.000 0.000 0.000 0.000 0 0.000 0 0.000 0.000 0.000 0.000 0.000 0 0.000 0 0.000 0.000 9.136 0.000 0.000 0 10.463 0 0.000 9.643 0.000 0.000 0.000 0 0.000 0 0.000 0.000 9.136 0.000 0.000 0 0.000 0 9.315 0.000 0.000 0.000 0.000 0 0.000 0 0.000 0.000 0.000 0.000 0.000 0 0.000 0 0.000 0.000 0.000 0.000 0.000 0 0.000 0 0.000 0.000 9.136 0.000 9.626 0 9.464 0 0.000 0.000 0.000 0.000 0.000 0 0.000 0 0.000 0.000 0.000 0.000 0.000 0 0.000 0 0.000 0.000 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0Normalization
SC3 CIDR Seurat t-SNE + k-means k-way partitioning using shmetis
Jaccard similarity matrix calculation Similarity matrix calculation k-way partitioning using gpmetis k-way partitioning using gpmetis Association index calculation
Final cluster ensemble
ANMI calculation
SAFE-clustering
HGPA MCLA CSPA
Individual clusterings
Expression matrix of scRNA-seq data Hypergraph
n single cells Genes n single cells Genes h hyperedges n single cells
HGPA clustering MCLA clustering CSPA clustering
H1 H3 H2 H4
[Yang et al., 2017] https://github.com/yycunc/SAFEclustering C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 38 / 52
C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 39 / 52
1
2
3
4
5
6
C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 40 / 52
Dimensionality reduction step: PCA, t-SNE, .... or graph-based techniques .... or clustering methods Trajectory modelling step
[Cannoodt et al., 2016]: comparison of 10 methods [Saelens et al., 2018]: comparison of 29 (among 59 existing) methods https://github.com/dynverse/dynverse
C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 41 / 52
C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 42 / 52
C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 43 / 52
Evaluation of trajectory inference methods
An overview of the main results from this study is shown in Figure 2. This includes an overview of the results obtained from the method characterisation (Figure 2a), the benchmarking evaluation (Figure 2b), and the quality control evaluation (Figure 2c).
Type
Start End S t a t e s Genes S c
e Correlation Edge flip R F M S E R e a l Synthetic Linear Bifurcation C
v e r g e n c e M u l t i f u r c a t i
Rooted tree D A G Cycle G r a p h Average time % Errored Error reason S c
e Developer friendly U s e r f r i e n d l y F u t u r e
r
A v a i l a b i l i t y Behaviour Code assurance Code quality D
u m e n t a t i
Paper Method characterisation Priors Benchmark Per metric Per source Per trajectory type Execution Quality control Practicality Categories
a b c
Slingshot TSCAN Monocle DDRTree SCORPIUS cellTree Maptpx cellTree Gibbs Embeddr SLICE reCAT Wishbone MFA Waterfall cellTree VEM Monocle ICA SCUBA Topslam Phenopath SCOUP DPT Ouijaflow Wanderlust GPfates StemID Sincell Mpath SLICER SCIMITAR Ouija Pseudogp insufficient data insufficient data insufficient data free free free fixed free free fixed free fixed param param fixed free param free fixed fixed param fixed fixed fixed param free free free free fixed fixed fixed id id id id id id id # # # # # id id id id id # id id id id id id id id id id 2s 14s 49s 29s 17m 2h 32s 2m 38m 45s 30m 9s 2m 1m 9m 10m 12m 1h 51s 3m 44s 12m 39m 15s 3s 7m 2h 5h 5h 2% 0% 5% 0% 0% 5% 0% 6% 9% 19% 0% 0% 1% 13% 1% 8% 0% 25% 0% 6% 17% 25% 17% 6% 30% 0% 70% 88% 90% Topology constraints free parameter fixed inferred by algorithm determined by parameter fixed by algorithm Prior id # id #
required cell/gene ids required amount Benchmark score low high Error reason Memory limit exceeded Time limit exceeded Dataset-specific error Stochastic error QC score low high Generated on 2018-03-05 Generated on 2018-03-05 Generated on 2018-03-05 Generated on 2018-03-05 Generated on 2018-03-05 Generated on 2018-03-05 Generated on 2018-03-05 Generated on 2018-03-05 Generated on 2018-03-05 Generated on 2018-03-05 Generated on 2018-03-05 Generated on 2018-03-05 Generated on 2018-03-05 Generated on 2018-03-05 Generated on 2018-03-05 Generated on 2018-03-05 Generated on 2018-03-05 Generated on 2018-03-05 Generated on 2018-03-05 Generated on 2018-03-05 Generated on 2018-03-05 Generated on 2018-03-05 Generated on 2018-03-05 Generated on 2018-03-05 Generated on 2018-03-05 Generated on 2018-03-05 Generated on 2018-03-05 Generated on 2018-03-05 Generated on 2018-03-05
Figure 2 Overview of the results on the evaluation of 29 TI methods. a) The methods were characterised according to the most complex trajectory type they can infer, whether or not the inferred topology is constrained by the algorithm or a parameter, and which prior information a method requires or can optionally use. b) The methods are ordered according to the overall score achieved in the benchmark evaluation. Also shown are the aggregated scores per metric, source and trajectory type, as well as the average execution time across all datasets and the percentage of executions in which no output was produced. c) Overall performance in the quality control evaluation is highly variable, even amongst the highest ranked methods according to the benchmark evaluation. Also listed are the quality control scores aggregated according to practicality and the different categories. Having ordered all methods by their overall benchmarking score, we found that Slingshot predicted the most accurate trajec- tories, followed by TSCAN and Monocle DDRTree. When we looked at the benchmark scores per trajectory type, Slingshot was the only method that performed well across most trajectory types. However, we found that several methods were specialised in predicting specific trajectory types; for example SCORPIUS for linear trajectories, reCAT for cycles, and Monocle DDRTree for trees. We observed a high correlation (0.7-0.9) between results originating from real datasets versus those originating from synthetic datasets (Supplementary Figure 6). This confirms both the relevance of the synthetic data and the accuracy of the gold stan- 5
C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 44 / 52
1
2
3
4
5
6
C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 45 / 52
C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 46 / 52
C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 47 / 52
https://hemberg-lab.github.io/scRNA.seq.course/index.html
C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 48 / 52
Andrews, T. S. and Hemberg, M. (2018). Identifying cell populations with scrnaseq. Molecular aspects of medicine, 59:114–122. Brennecke, P ., Anders, S., Kim, J. K., Kołodziejczyk, A. A., Zhang, X., Proserpio, V., Baying, B., Benes, V., Teichmann,
Accounting for technical noise in single-cell rna-seq experiments. Nature methods, 10(11):1093. Butler, A., Hoffman, P ., Smibert, P ., Papalexi, E., and Satija, R. (2018). Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nature biotechnology, 36(5):411. Cannoodt, R., Saelens, W., and Saeys, Y. (2016). Computational methods for trajectory inference from single-cell transcriptomics. European journal of immunology, 46(11):2496–2506. Duò, A., Robinson, M. D., and Soneson, C. (2018). A systematic performance evaluation of clustering methods for single-cell rna-seq data. F1000Research, 7. Freytag, S., Tian, L., Lönnstedt, I., Ng, M., and Bahlo, M. (2018). Comparison of clustering tools in r for medium-sized 10x genomics single-cell rna-sequencing data. F1000Research, 7. Grün, D., Lyubimova, A., Kester, L., Wiebrands, K., Basak, O., Sasaki, N., Clevers, H., and van Oudenaarden, A. (2015). Single-cell messenger rna sequencing reveals rare intestinal cell types. Nature, 525(7568):251. Guo, M., Wang, H., Potter, S. S., Whitsett, J. A., and Xu, Y. (2015). Sincera: a pipeline for single-cell rna-seq profiling analysis. PLoS computational biology, 11(11):e1004575. C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 49 / 52
Juliá, M., Telenti, A., and Rausell, A. (2015). Sincell: an r/bioconductor package for statistical assessment of cell-state hierarchies from single-cell rna-seq. Bioinformatics, 31(20):3380–3382. Kim, J. K., Kolodziejczyk, A. A., Ilicic, T., Teichmann, S. A., and Marioni, J. C. (2015). Characterizing noise structure in single-cell rna-seq distinguishes genuine from technical stochastic allelic expression. Nature communications, 6:8687. Kiselev, V. Y., Kirschner, K., Schaub, M. T., Andrews, T., Yiu, A., Chandra, T., Natarajan, K. N., Reik, W., Barahona, M., Green, A. R., and Hemberg, M. (2017). Sc3: consensus clustering of single-cell rna-seq data. Nature Methods, 14:483 EP –. Lin, P ., Troup, M., and Ho, J. W. (2017). Cidr: Ultrafast and accurate clustering through imputation for single-cell rna-seq data. Genome biology, 18(1):59. Lun, A. T., McCarthy, D. J., and Marioni, J. C. (2016). A step-by-step workflow for low-level analysis of single-cell rna-seq data with bioconductor. F1000Research, 5. Pierson, E. and Yau, C. (2015). Zifa: Dimensionality reduction for zero-inflated single-cell gene expression analysis. Genome biology, 16(1):241. Poirion, O. B., Zhu, X., Ching, T., and Garmire, L. (2016). Single-cell transcriptomics bioinformatics and computational challenges. Frontiers in genetics, 7:163. C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 50 / 52
Prabhakaran, S., Azizi, E., Carr, A., and Pe’er, D. (2016). Dirichlet process mixture model for correcting technical variation in single-cell gene expression data. In International Conference on Machine Learning, pages 1070–1079. Risso, D., Perraudeau, F., Gribkova, S., Dudoit, S., and Vert, J.-P . (2017). Zinb-wave: A general and flexible method for signal extraction from single-cell rna-seq data. bioRxiv. Saelens, W., Cannoodt, R., Todorov, H., and Saeys, Y. (2018). A comparison of single-cell trajectory inference methods: towards more accurate and robust tools. bioRxiv, page 276907. Satija, R., Farrell, J. A., Gennert, D., Schier, A. F., and Regev, A. (2015). Spatial reconstruction of single-cell gene expression data. Nature biotechnology, 33(5):495. Soneson, C. and Robinson, M. (2018). Bias, robustness and scability in single-cell differential expression analysis. Nature Methods, 15:255–261. Vallejos, C., Marioni, J., and Richardson, S. (2015). Basics: Bayesian analysis of single-cell sequencing data. PLOS COMPUTATIONAL BIOLOGY, 11. Wang, B., Zhu, J., Pierson, E., Ramazzotti, D., and Batzoglou, S. (2017). Visualization and analysis of single-cell rna-seq data by kernel-based similarity learning. Nature methods, 14(4):414. Wolf, F. A., Angerer, P ., and Theis, F. J. (2018). Scanpy: large-scale single-cell gene expression data analysis. Genome biology, 19(1):15. C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 51 / 52
Xu, C. and Su, Z. (2015). Identification of cell types from single-cell transcriptomes using a novel clustering method. Bioinformatics, 31(12):1974–1980. Yang, Y., Huh, R., Culpepper, H. W., Lin, Y., Love, M. I., and Li, Y. (2017). SAFE-clustering: Single-cell aggregated (from ensemble) clustering for single-cell RNA-seq data. bioRxiv. Zeisel, A., Muñoz-Manchado, A. B., Codeluppi, S., Lönnerberg, P ., La Manno, G., Juréus, A., Marques, S., Munguba, H., He, L., Betsholtz, C., et al. (2015). Cell types in the mouse cortex and hippocampus revealed by single-cell rna-seq. Science, 347(6226):1138–1142. Žurauskien˙ e, J. and Yau, C. (2016). pcareduce: hierarchical clustering of single cell transcriptional profiles. BMC Bioinformatics, 17. C.Maugis-Rabusseau (IMT/INSA) Statistical analysis for scRNAseq data 52 / 52