scalable differential transcript usage analysis for
play

Scalable differential transcript usage analysis for single-cell - PowerPoint PPT Presentation

Scalable differential transcript usage analysis for single-cell applications JEROEN GILIS EuroBioc2019 presentation Promotor: Prof. Lieven Clement Supervisor: Dr. Koen Van den Berge 1 Di ff erential Transcript Usage (DTU) Translation


  1. Scalable differential transcript usage analysis for single-cell applications JEROEN GILIS EuroBioc2019 presentation Promotor: Prof. Lieven Clement Supervisor: Dr. Koen Van den Berge � 1

  2. Di ff erential Transcript Usage (DTU) Translation Alternative splicing Normal metabolism Transcription Isoform M1 Pre-mRNA Tumorigenesis (gene-level) DNA Isoform M2 Gene-level analysis Transcript-level analysis Expression level (cpm) Relative usage (%) M1 M2 M1 M2 � 2

  3. Method development • Our workflow unlocks edgeR for DTU analysis Y gi ~ NB ( µ gi , φ g ) DGE log ( µ gi ) = η gi C η gi = β 0 + β gc + log (S i ) � 3

  4. Method development • Our workflow unlocks edgeR for DTU analysis Y ti ~ NB ( µ ti , φ t ) DTE log ( µ ti ) = η ti C η ti = β 0 + β tc + log (S i ) � 4

  5. Method development • Our workflow unlocks edgeR for DTU analysis Y ti ~ NB ( µ ti , φ t ) DTU log ( µ ti ) = η ti C η ti = β 0 + β tc + log (T ti ) • Our workflow takes the gene-level counts (total counts, T ti ) as offsets to the GLM framework edgeR-total � 5

  6. Method development • Our workflow unlocks edgeR for DTU analysis Y ti ~ NB ( µ ti , φ t ) DTU log ( µ ti ) = η ti C η ti = β 0 + β tc + log (T ti ) • Our workflow takes the gene-level counts (total counts, T ti ) as offsets to the GLM framework edgeR-total • DEXSeq Sample 1 … Sample m Sample 1 … Sample m Tx 1 112 … 15 Tx 1 25 … 3 ‘other’ Counts Tx t … … … Tx t … … … counts Tx n 62 … 348 Tx n 88 … 212 • Our second workflow takes the other counts as offsets edgeR-other � 6

  7. Performance evaluation on real bulk data Gtex dataset, Nature Genetics 45, 580-585 (2013) 5v5 75v75 10v10 DEXSeq edgeR_total edgeR_other limma_di ff splice DRIMSeq � 7

  8. Scalability benchmark on real single-cell data • Our workflow performs a DTU analysis between two groups of 512 cells in ~20 minutes • DEXSeq scales quadratically � 8

  9. Single-cell transcriptomics case study Dataset from Buettner et al., Nature Biotechnology 33; 155-160 (2015) • Dataset; 288 mouse embryonic stem cells, di ff erent cell cycle stages (G1, S and G2M) • Runtime; < 2 minutes • Significant enrichment in cell cycle processes • Several DTU genes are; ✦ Biologically relevant ✦ Not picked up in a gene-level analysis ✦ Clearly di ff erentially used when visualised Ccdc86 *** *** Proportions Phase G1 S Tx1 Tx2 Tx3 The size of the dots (which represent individual cells) are weighted according to the total expression of the gene in that cell. � 9

  10. Single-cell transcriptomics case study Buettner dataset, Nature Biotechnology 33; 155-160 (2015) • Dataset; 288 mouse embryonic stem cells, di ff erent cell cycle stages (G1, S and G2M) • Runtime; < 2 minutes for o ff set-based methods • Significant enrichment in cell cycle processes • Some DTU genes display clear DTU in visualisation and are biologically relevant • edgeR_other method large number of (false) positive results; sensitive to outliers (?) • Discrepancy between edgeR-total and limma di ff splice; asses formally in single-cell benchmark Eef1d limma di ff splice edgeR-total *** Proportions edgeR-other Tx8 Phase G1 G2M � 10

  11. Take-home messages We are developing a workflow for studying DTU that; 1. Has a performance similar to that of DEXSeq 2. Correctly controls the false discovery rate 3. Scales towards large transcriptomics datasets � 11

  12. Scalable differential transcript usage analysis for single-cell applications JEROEN GILIS EuroBioc2019 presentation Promotor: Prof. Lieven Clement Supervisor: Dr. Koen Van den Berge � 12

  13. Background - DTU � 13

  14. Background - DEXSeq • Input : matrix of transcript-level counts (e.g. Salmon or kallisto) Transcript-level counts Complementary counts • Statistical model: Y ti ~ NB ( µ ti , φ t ) log ( µ ti ) = η ti S T TC η ti = β ti + β t + β tci � 14

  15. Parametric bulk simulation study Dataset from Love et al., F1000Research, 7:952 (2018) 3v3 10v10 6v6 DEXSeq edgeR_total edgeR_other limma_di ff splice DRIMSeq � 15

  16. Gtex dataset stringent filtering DEXSeq edgeR_total edgeR_other limma_di ff splice DRIMSeq � 16

  17. Love dataset stringent filtering DEXSeq edgeR_total edgeR_other limma_di ff splice DRIMSeq � 17

  18. Other parametric bulk simulations and additional methods Love 6v6 Van den Berge 5v5 (1) Van den Berge 5v5 (2) DEXSeq edgeR_total edgeR_other limma_di ff splice DRIMSeq NBSplice edgeR_di ff splice � 18

  19. Results - Scalability • Methods that require sample-level intercepts scale quadratically with the number of cells • edgeR one order of magnitude faster than DESeq2 • All methods scale linearly with the number of transcripts � 19

  20. � 20

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend