Scalable differential transcript usage analysis for single-cell - - PowerPoint PPT Presentation

scalable differential transcript usage analysis for
SMART_READER_LITE
LIVE PREVIEW

Scalable differential transcript usage analysis for single-cell - - PowerPoint PPT Presentation

Scalable differential transcript usage analysis for single-cell applications JEROEN GILIS EuroBioc2019 presentation Promotor: Prof. Lieven Clement Supervisor: Dr. Koen Van den Berge 1 Di ff erential Transcript Usage (DTU) Translation


slide-1
SLIDE 1
  • 1

Scalable differential transcript usage analysis for single-cell applications

JEROEN GILIS EuroBioc2019 presentation Promotor: Prof. Lieven Clement Supervisor: Dr. Koen Van den Berge

slide-2
SLIDE 2

Normal metabolism Tumorigenesis Isoform M1 Pre-mRNA DNA Isoform M2 Transcript-level analysis

M1 M2 M1 M2 Relative usage (%) Expression level (cpm)

Gene-level analysis

Differential Transcript Usage (DTU)

2

Transcription Alternative splicing Translation

(gene-level)

slide-3
SLIDE 3

Method development

3

DGE

Ygi ~ NB (µgi, φg) log (µgi) = ηgi ηgi = β0 + βgc + log (Si)

C

  • Our workflow unlocks edgeR for DTU analysis
slide-4
SLIDE 4

4

DTE

  • Our workflow unlocks edgeR for DTU analysis

Yti ~ NB (µti, φt) log (µti) = ηti ηti = β0 + βtc + log (Si)

C

Method development

slide-5
SLIDE 5

5

  • Our workflow unlocks edgeR for DTU analysis

Yti ~ NB (µti, φt) log (µti) = ηti ηti = β0 + βtc + log (Tti)

DTU

C

Method development

  • Our workflow takes the gene-level counts (total counts, Tti) as offsets to the

GLM framework edgeR-total

slide-6
SLIDE 6

6

  • Our workflow unlocks edgeR for DTU analysis

Yti ~ NB (µti, φt) log (µti) = ηti ηti = β0 + βtc + log (Tti)

DTU

C

Method development

  • DEXSeq
  • Our workflow takes the gene-level counts (total counts, Tti) as offsets to the

GLM framework edgeR-total

  • Our second workflow takes the other counts as offsets edgeR-other

Sample 1 … Sample m Tx 1 112 … 15 Tx t … … … Tx n 62 … 348

Counts

Sample 1 … Sample m Tx 1 25 … 3 Tx t … … … Tx n 88 … 212

‘other’ counts

slide-7
SLIDE 7

7

Performance evaluation on real bulk data

Gtex dataset, Nature Genetics 45, 580-585 (2013)

DEXSeq DRIMSeq edgeR_total edgeR_other limma_diffsplice 5v5 10v10 75v75

slide-8
SLIDE 8

Scalability benchmark on real single-cell data

  • Our workflow performs a DTU analysis between two groups of 512 cells in ~20 minutes
  • DEXSeq scales quadratically

8

slide-9
SLIDE 9
  • Dataset; 288 mouse embryonic stem cells, different cell cycle stages (G1, S and G2M)
  • Runtime; < 2 minutes
  • Significant enrichment in cell cycle processes
  • Several DTU genes are;

✦ Biologically relevant ✦ Not picked up in a gene-level analysis ✦ Clearly differentially used when visualised

Single-cell transcriptomics case study

Dataset from Buettner et al., Nature Biotechnology 33; 155-160 (2015)

9

Phase

G1 S

*** ***

Proportions

Tx1 Tx2 Tx3

Ccdc86

The size of the dots (which represent individual cells) are weighted according to the total expression of the gene in that cell.

slide-10
SLIDE 10
  • Dataset; 288 mouse embryonic stem cells, different cell cycle stages (G1, S and G2M)
  • Runtime; < 2 minutes for offset-based methods
  • Significant enrichment in cell cycle processes
  • Some DTU genes display clear DTU in visualisation and are biologically relevant
  • edgeR_other method large number of (false) positive results; sensitive to outliers (?)
  • Discrepancy between edgeR-total and limma diffsplice; asses formally in single-cell benchmark

10

Single-cell transcriptomics case study

Buettner dataset, Nature Biotechnology 33; 155-160 (2015)

limma diffsplice edgeR-total edgeR-other

Tx8

***

Proportions

Eef1d

Phase G2M G1

slide-11
SLIDE 11

Take-home messages

  • 1. Has a performance similar to that of DEXSeq
  • 2. Correctly controls the false discovery rate
  • 3. Scales towards large transcriptomics datasets

We are developing a workflow for studying DTU that;

11

slide-12
SLIDE 12
  • 12

Scalable differential transcript usage analysis for single-cell applications

JEROEN GILIS EuroBioc2019 presentation Promotor: Prof. Lieven Clement Supervisor: Dr. Koen Van den Berge

slide-13
SLIDE 13
  • 13

Background - DTU

slide-14
SLIDE 14
  • Input: matrix of transcript-level counts (e.g. Salmon or kallisto)

Background - DEXSeq

  • Statistical model:

14

S T TC

Yti ~ NB (µti, φt) log (µti) = ηti ηti = βti + βt + βtci

Complementary counts Transcript-level counts

slide-15
SLIDE 15

15

DEXSeq DRIMSeq edgeR_total edgeR_other limma_diffsplice 3v3 6v6 10v10

Parametric bulk simulation study

Dataset from Love et al., F1000Research, 7:952 (2018)

slide-16
SLIDE 16

16

DEXSeq DRIMSeq edgeR_total edgeR_other limma_diffsplice Gtex dataset stringent filtering

slide-17
SLIDE 17

17

Love dataset stringent filtering DEXSeq DRIMSeq edgeR_total edgeR_other limma_diffsplice

slide-18
SLIDE 18

18

Other parametric bulk simulations and additional methods Love 6v6 Van den Berge 5v5 (1) Van den Berge 5v5 (2) DEXSeq DRIMSeq edgeR_total edgeR_other limma_diffsplice NBSplice edgeR_diffsplice

slide-19
SLIDE 19
  • Methods that require sample-level intercepts scale quadratically with the number of cells
  • edgeR one order of magnitude faster than DESeq2
  • All methods scale linearly with the number of transcripts

Results - Scalability

19

slide-20
SLIDE 20

20