Understanding Nothing: Zeros in scRNASeq Tallulah Andrews, 27 Sept - PowerPoint PPT Presentation

Understanding Nothing: Zeros in scRNASeq Tallulah Andrews, 27 Sept 2016

Single-cell vs bulk RNASeq Cell Library Expression RNA cDNA Amplification Sequencing Isolation Preparation Matrix ATTCG 0 10 0 20 TCACT 13 2 0 8 TCGGA 11 30 0 0 Enables: - Unbiased cell-type identification/tissue composition Computational - Elucidation of cell-fate decisions & development Analysis - Detection of heterogeneity of cellular responses - Investigation of stochastic gene expression

Single-cell vs bulk RNASeq Cell Library Expression RNA cDNA Amplification Sequencing Isolation Preparation Matrix ATTCG 0 10 0 20 TCACT 13 2 0 8 TCGGA 11 30 0 0 Bulk RNASeq: 100 ng Computational Analysis Single cell RNASeq: ~10 pg

Zeros Dominate scRNASeq No. No. Prop Dataset Type Cells Genes Zero Buettner mouse ESCs 279 17,231 51.2% Shalek mouse bone 324 12,474 66.4% marrow Deng mouse embryo 255 17,406 50.2% Usoskin mouse neuron 530 15,585 72.5% Kirschner mouse ESCs 2,448 23,729 62.5% Linnarsson mouse brain 2,542 17,867 76.9% *Cells with > 2,000 Pollen human neural 301 19,624 60.3% detected genes **Genes seen in >3 Zhong mouse embryo 49 20,558 38.0% cells

Source of Zeros Cell Library Expression RNA cDNA Amplification Sequencing Isolation Preparation Matrix ATTCG 0 10 0 20 TCACT 13 2 0 8 TCGGA 11 30 0 0 Computational Analysis

Source of Zeros Cell Library Expression RNA cDNA Amplification Sequencing Isolation Preparation Matrix ATTCG 0 10 0 20 TCACT 13 2 0 8 TCGGA 11 30 0 0 Under ~1 million reads/cell Computational Analysis Svensson et al. (2016)

Source of Zeros Cell Library Expression RNA cDNA Amplification Sequencing Isolation Preparation Matrix ATTCG 0 10 0 20 TCACT 13 2 0 8 TCGGA 11 30 0 0 ~66% Efficiency >95% Efficiency Computational Analysis Reiter et al. (2011) & Bengtsson et al. (2008)

RT failure propagates downstream n 0 = 1 n 0 = 5

Reverse Transcription = Michaelis-Menten To model probability: V max = 1 Detection probability Reverse Transcriptase dNTP mRNA DNA

MM vs Other Models Michaelis-Menten Modelling of Dropouts (M3Drop) - P dropout = 1- [s]/(K+[s]) - For Deng: K = 9.5 log10(expression)

MM vs Other Models Michaelis-Menten Modelling of Dropouts (M3Drop) - P dropout = 1- [s]/(K+[s]) - For Deng: K = 9.5 Zero Inflated Factor Analysis (ZIFA) - Dimensionality Reduction for scRNASeq P dropout = e - ƛ [s][s] - - For Deng: λ = 0.0075 log10(expression)

MM vs Other Models Michaelis-Menten Modelling of Dropouts (M3Drop) - P dropout = 1- [s]/(K+[s]) - For Deng: K = 9.5 Zero Inflated Factor Analysis (ZIFA) - Dimensionality Reduction for scRNASeq P dropout = e - ƛ [s][s] - - For Deng: λ = 0.0075 Single Cell Differential Expression (SCDE) P dropout = 1/(1+e -(a+b*log([s])) ) - - For Deng: a = 1.5, b = -0.75 log10(expression)

Michaelis-Menten fits diverse datasets. Buettner - CPM Linnarsson - UMI CPM Shalek - FPKM

Michaelis-Menten fits diverse datasets. M3Drop SCDE ZIFA Error

Differentially Expressed Genes are Outliers P1 Average across mixture Dropout Rate Dropout Rate ( P1+P2) 2 P2 Expression Log Expression

Outlier/DE gene detection Michaelis-Menten: P dropout = 1- S/(K+S) Rearrange to solve for K: K = P / (1-P) * S 1. Calculate K j for each gene 2. Propagate errors in estimates for S (mean expression) and P (observed dropout rate) to get error for K j 3. Estimate error of global K M 4. Test whether K j is significantly larger than K M fit across all genes using a Z-test combining errors of (2) & (3)

Highly Variable Genes In general: f(variance) = g(mean) 1. Fit a relationship between variance and mean expression a. May use all genes or only spike-ins in fitting 2. Identify points above this relationship Brennecke et al. (2013) : CV 2 = a 1 / μ + α 0 1. Significant outliers detected using � 2 -test 2.

DE Simulations - Dropouts vs Variance.

DE Simulations - Dropouts vs Variance. μ = 100, n = 100

Applying M3Drop to Early Mouse Development Deng

Identification of TE and ICM

What are outliers to the left? Buettner - CPM Mismapping reads Dropout Rate DE Genes Highly Variable Genes Under measured expression Log Expression

Processed Pseudogenes = True Negatives Genome Processed mRNA cDNA Randomly inserted into genome - Identical sequence to original transcript - Lacks introns - Lacks promoters & regulatory sequences - Assumed to not be transcribed - >3,000 identified in the mouse genome - only 150 have confirmed expression

Processed Pseudogenes - Mismapping Reads Truth Observed Gene Gene ~4% Processed Pseudogene Processed Pseudogene Processed Pseudogenes 1% sequencing error rate x 100bp reads: Left shifted by 1.4 (p ~ 0) 4% of reads have 3+ sequencing errors

Under-Measured Expression Paralogs Short Genes Duplication node: Mus musculus CDS < 300 n.t. Left shifted by 0.66 (p < 10 -40 ) Left shifted by 0.21 (p < 10 -45 ) fewer unique fragments = multimapping reads = fewer unique reads under counting

Tophat2 maps more reads to processed pseudogenes Kallisto Tophat2 STAR

Unique Molecular Identifiers (UMIs) Cell Library UMI count RNA cDNA Amplification Sequencing Isolation Preparation Matrix ATTCG 0 10 0 20 TCACT 13 2 0 8 TCGGA 11 30 0 0 Enables: - Correction for PCR duplicates (amplification noise)

None of the proposed models fit corrected UMIs

Cell-specific detection rates obscure true relationship Downsample to 2122 UMIs/cell Saturation of Detected genes p(0) = e -λ λ = mean gene expression * a

The PoissonUMIs Model M ij ~ Poisson(λ) λ = m i *m j *total*α M ij = Molecules of gene j in cell i m i = proportion of molecules in cell i m j = proportion of molecules for gene j total = total detected molecules α = scaling factor Account for different counting methods

Poisson model accounting for differences in read depth α fixed at 1 α fixed at 1

Fitted alpha reflects quantification method Corrected UMIs Unique UMIs Reads α = 0.90 α = 0.64 α = 0.016

Fitting the model to other UMI datasets Linnarsson α = 0.65 Kirschner α = 0.90

Fitting the model to other UMI datasets Linnarsson α = 0.65 Kirschner α = 0.90 Removed singleton UMIs Corrected for 2 mismatches

Summary Amplification noise

Summary Amplification noise Mismapping / Miscounting

Summary Amplification noise Mismapping / Miscounting Differential Expression

Acknowledgements Wellcome Trust Sanger Institute Martin Hemberg Vladimir Kiselev Availability M3Drop : https://github.com/tallulandrews/M3Drop PoissonUMIs: https://github.com/tallulandrews/PoissonUMIs EMBL Rome Christophe Lancrin Isabelle Bergiers

Understanding Nothing: Zeros in scRNASeq Tallulah Andrews, 27 Sept - PowerPoint PPT Presentation

Understanding Nothing: Zeros in scRNASeq Tallulah Andrews, 27 Sept 2016 Single-cell vs bulk RNASeq Cell Library Expression RNA cDNA Amplification Sequencing Isolation Preparation Matrix ATTCG 0 10 0 20 TCACT 13 2 0 8 TCGGA 11

Zeros of analytic functions Lecture 14 Zeros of analytic functions Zeros of analytic functions

Brownian motion with variable drift can have drift can have isolated zeros isolated zeros

Statistical analysis for scRNAseq data Cathy Maugis-Rabusseau cathy.maugis@insa-toulouse.fr

Random Matrices and Zeros of Polynomials Guilherme Silva Joint work with Pavel Bleher (IUPUI)

Zeros of Partial Sums of the Riemann Zeta-Function S. M. Gonek (with A. H. Ledoan) Department of

Zeros of Asymptotically Extremal Polynomials E. B. Saff Vanderbilt University Midwestern

Nothing And I am convinced that nothing can ever separate us from Gods love. Romans 8:38

Universality for zeros of random polynomials Motivation Random polynomials Turgay Bayraktar

e dx e e dx e 2 2 A 2 2 A Result will be more interesting. (If you A A

Zeros and irreducibility of some classes of special polynomials Karl Dilcher Dalhousie Number

Zeros and poles of Pad e approximants to the symmetric Zeta function Greg Fee Peter Borwein

Zeros of Ultraspherical Polynomials Kathy Driver University of Cape Town Visiting Vanderbilt

February Fourier talks, 2011 Zeros of some self-reciprocal polynomials D. Joyner, USNA FFT 2011

Pair correlation estimates for the zeros of the zeta function via semidefinite programming Andr

Laguerre Polynomials and Interlacing of Zeros Kathy Driver University of Cape Town SANUM

scRNAseq clustering tools sa Bjrklund asa.bjorklund@scilifelab.se What is a celltype? What

Cancer Genome Analysis (CONEXIC) Akavia et al. Cell, 2010. 02-715

An Object Oriented Simulation of Real Occurring Molecular Biological Processes for DNA Computing

Sample and buffer preparation Melissa Grwert EMBL Hamburg Biology (Dipl.) in Heidelberg

Bioinformatics Methods for Biomedical Complex System Applications May-June, 2008 Luciano Milanesi

Glenn Tesler University of California, San Diego Department of Mathematics Joint work with Jeff

Innovation Washington, DC-based Think Tank & Advocacy Organization A unique model to create

PC-07 Clinical evaluation of three commercial PCR assays for the detection of macrolide resistance

3D folding of chromosomal domains in relation to gene expression Marc A. Marti-Renom

Understanding Nothing: Zeros in scRNASeq Tallulah Andrews, 27 Sept - PowerPoint PPT Presentation

Understanding Nothing: Zeros in scRNASeq Tallulah Andrews, 27 Sept 2016 Single-cell vs bulk RNASeq Cell Library Expression RNA cDNA Amplification Sequencing Isolation Preparation Matrix ATTCG 0 10 0 20 TCACT 13 2 0 8 TCGGA 11

Zeros of analytic functions Lecture 14 Zeros of analytic functions Zeros of analytic functions

Brownian motion with variable drift can have drift can have isolated zeros isolated zeros

Statistical analysis for scRNAseq data Cathy Maugis-Rabusseau cathy.maugis@insa-toulouse.fr

Random Matrices and Zeros of Polynomials Guilherme Silva Joint work with Pavel Bleher (IUPUI)

Zeros of Partial Sums of the Riemann Zeta-Function S. M. Gonek (with A. H. Ledoan) Department of

Zeros of Asymptotically Extremal Polynomials E. B. Saff Vanderbilt University Midwestern

Nothing And I am convinced that nothing can ever separate us from Gods love. Romans 8:38

Universality for zeros of random polynomials Motivation Random polynomials Turgay Bayraktar

e dx e e dx e 2 2 A 2 2 A Result will be more interesting. (If you A A

Zeros and irreducibility of some classes of special polynomials Karl Dilcher Dalhousie Number

Zeros and poles of Pad e approximants to the symmetric Zeta function Greg Fee Peter Borwein

Zeros of Ultraspherical Polynomials Kathy Driver University of Cape Town Visiting Vanderbilt

February Fourier talks, 2011 Zeros of some self-reciprocal polynomials D. Joyner, USNA FFT 2011

Pair correlation estimates for the zeros of the zeta function via semidefinite programming Andr

Laguerre Polynomials and Interlacing of Zeros Kathy Driver University of Cape Town SANUM

scRNAseq clustering tools sa Bjrklund asa.bjorklund@scilifelab.se What is a celltype? What

Cancer Genome Analysis (CONEXIC) Akavia et al. Cell, 2010. 02-715

An Object Oriented Simulation of Real Occurring Molecular Biological Processes for DNA Computing

Sample and buffer preparation Melissa Grwert EMBL Hamburg Biology (Dipl.) in Heidelberg

Bioinformatics Methods for Biomedical Complex System Applications May-June, 2008 Luciano Milanesi

Glenn Tesler University of California, San Diego Department of Mathematics Joint work with Jeff

Innovation Washington, DC-based Think Tank &amp; Advocacy Organization A unique model to create

PC-07 Clinical evaluation of three commercial PCR assays for the detection of macrolide resistance

3D folding of chromosomal domains in relation to gene expression Marc A. Marti-Renom

Innovation Washington, DC-based Think Tank & Advocacy Organization A unique model to create