- 1
Scalable differential transcript usage analysis for single-cell - - PowerPoint PPT Presentation
Scalable differential transcript usage analysis for single-cell - - PowerPoint PPT Presentation
Scalable differential transcript usage analysis for single-cell applications JEROEN GILIS EuroBioc2019 presentation Promotor: Prof. Lieven Clement Supervisor: Dr. Koen Van den Berge 1 Di ff erential Transcript Usage (DTU) Translation
Normal metabolism Tumorigenesis Isoform M1 Pre-mRNA DNA Isoform M2 Transcript-level analysis
M1 M2 M1 M2 Relative usage (%) Expression level (cpm)
Gene-level analysis
Differential Transcript Usage (DTU)
2
Transcription Alternative splicing Translation
(gene-level)
Method development
3
DGE
Ygi ~ NB (µgi, φg) log (µgi) = ηgi ηgi = β0 + βgc + log (Si)
C
- Our workflow unlocks edgeR for DTU analysis
4
DTE
- Our workflow unlocks edgeR for DTU analysis
Yti ~ NB (µti, φt) log (µti) = ηti ηti = β0 + βtc + log (Si)
C
Method development
5
- Our workflow unlocks edgeR for DTU analysis
Yti ~ NB (µti, φt) log (µti) = ηti ηti = β0 + βtc + log (Tti)
DTU
C
Method development
- Our workflow takes the gene-level counts (total counts, Tti) as offsets to the
GLM framework edgeR-total
6
- Our workflow unlocks edgeR for DTU analysis
Yti ~ NB (µti, φt) log (µti) = ηti ηti = β0 + βtc + log (Tti)
DTU
C
Method development
- DEXSeq
- Our workflow takes the gene-level counts (total counts, Tti) as offsets to the
GLM framework edgeR-total
- Our second workflow takes the other counts as offsets edgeR-other
Sample 1 … Sample m Tx 1 112 … 15 Tx t … … … Tx n 62 … 348
Counts
Sample 1 … Sample m Tx 1 25 … 3 Tx t … … … Tx n 88 … 212
‘other’ counts
7
Performance evaluation on real bulk data
Gtex dataset, Nature Genetics 45, 580-585 (2013)
DEXSeq DRIMSeq edgeR_total edgeR_other limma_diffsplice 5v5 10v10 75v75
Scalability benchmark on real single-cell data
- Our workflow performs a DTU analysis between two groups of 512 cells in ~20 minutes
- DEXSeq scales quadratically
8
- Dataset; 288 mouse embryonic stem cells, different cell cycle stages (G1, S and G2M)
- Runtime; < 2 minutes
- Significant enrichment in cell cycle processes
- Several DTU genes are;
✦ Biologically relevant ✦ Not picked up in a gene-level analysis ✦ Clearly differentially used when visualised
Single-cell transcriptomics case study
Dataset from Buettner et al., Nature Biotechnology 33; 155-160 (2015)
9
Phase
G1 S
*** ***
Proportions
Tx1 Tx2 Tx3
Ccdc86
The size of the dots (which represent individual cells) are weighted according to the total expression of the gene in that cell.
- Dataset; 288 mouse embryonic stem cells, different cell cycle stages (G1, S and G2M)
- Runtime; < 2 minutes for offset-based methods
- Significant enrichment in cell cycle processes
- Some DTU genes display clear DTU in visualisation and are biologically relevant
- edgeR_other method large number of (false) positive results; sensitive to outliers (?)
- Discrepancy between edgeR-total and limma diffsplice; asses formally in single-cell benchmark
10
Single-cell transcriptomics case study
Buettner dataset, Nature Biotechnology 33; 155-160 (2015)
limma diffsplice edgeR-total edgeR-other
Tx8
***
Proportions
Eef1d
Phase G2M G1
Take-home messages
- 1. Has a performance similar to that of DEXSeq
- 2. Correctly controls the false discovery rate
- 3. Scales towards large transcriptomics datasets
We are developing a workflow for studying DTU that;
11
- 12
Scalable differential transcript usage analysis for single-cell applications
JEROEN GILIS EuroBioc2019 presentation Promotor: Prof. Lieven Clement Supervisor: Dr. Koen Van den Berge
- 13
Background - DTU
- Input: matrix of transcript-level counts (e.g. Salmon or kallisto)
Background - DEXSeq
- Statistical model:
14
S T TC
Yti ~ NB (µti, φt) log (µti) = ηti ηti = βti + βt + βtci
Complementary counts Transcript-level counts
15
DEXSeq DRIMSeq edgeR_total edgeR_other limma_diffsplice 3v3 6v6 10v10
Parametric bulk simulation study
Dataset from Love et al., F1000Research, 7:952 (2018)
16
DEXSeq DRIMSeq edgeR_total edgeR_other limma_diffsplice Gtex dataset stringent filtering
17
Love dataset stringent filtering DEXSeq DRIMSeq edgeR_total edgeR_other limma_diffsplice
18
Other parametric bulk simulations and additional methods Love 6v6 Van den Berge 5v5 (1) Van den Berge 5v5 (2) DEXSeq DRIMSeq edgeR_total edgeR_other limma_diffsplice NBSplice edgeR_diffsplice
- Methods that require sample-level intercepts scale quadratically with the number of cells
- edgeR one order of magnitude faster than DESeq2
- All methods scale linearly with the number of transcripts
Results - Scalability
19
20