9/18/2014 1
before and after IL2 treatment Lu Wang and Ying Sha 9/18/2014 1 - - PowerPoint PPT Presentation
before and after IL2 treatment Lu Wang and Ying Sha 9/18/2014 1 - - PowerPoint PPT Presentation
Differential expression analysis of SR and SEN cells before and after IL2 treatment Lu Wang and Ying Sha 9/18/2014 1 Update since the 9/15/2014 slides: 1. Standardized the p-value cutoff (P < 0.05) for all DE analysis tools (except for
9/18/2014 2
Update since the 9/15/2014 slides:
- 1. Standardized the p-value cutoff (P < 0.05) for all DE analysis tools (except for
GFOLD where p-value does not apply)
- 2. All related plots has been updated
- 3. Added differential expression comparison between SR treated and SEN
treated
- 4. Added GFOLD results for drop two small SEN library while keeping the SR
replicates.
- 5. Validation on genes with known behavior.
9/18/2014 3
Previously, we evaluated five different normalization methods (with and without ERCC normalization) and decided together with Victoria on TMM normalization (without ERCC) In this presentation, we are dealing with two issues:
- 1. How to best treat the technical replicates (in particular the low abundance
samples)
- 2. The best differential expression method to use (conditioned upon the different
methods used to treat the technical replicates)
- Three ways of combining datasets
- Technical Replicates(Tech): Treat sequencing runs for the same cell and condition as
technical replicates
- Drop Small Libraries(Drop): Treat sequencing runs for the same cell and condition as
technical replicates but drop the smaller libraries/replicates
- Combine All Replicates(Combine): Combine sequencing runs/technical replicates for the
same cell and condition
- Five differential expression calling tools
- DESeq: Frequently used DE Analysis tool when replicates are present; Does not work
well without replicates
- edgR: Frequently used DE Analysis tool when replicates are present; Does not work
without replicates
- Gfold: Works well without replicates
- NOISeq: Looks at dRPKM and Fold-change at the same time; Works with or without
replicates
- DEGseq: Works with or without replicates.
9/18/2014 4
9/18/2014 5
DESeq edgeR GFold NOISeq DEGSeq Tech Drop Combine
X
X = methods wont work
log2 Read counts
Technical Replicates(Tech)
log2 Read counts
Drop Small Libraries(Drop)
log2 Read counts
Combine All Replicates(Combine)
Distribution of Raw Read Counts
9/18/2014 6
Drop and Combine gives more similar distribution across samples before normalization
Density
Distribution of Raw Read Counts
Technical Replicates(Tech) Drop Small Libraries(Drop) Combine All Replicates(Combine)
9/18/2014 7
Drop and Combine gives more similar distribution across samples before normalization
Distribution of TMM Normalized Counts
Technical Replicates(Tech) Drop Small Libraries(Drop) Combine All Replicates(Combine)
9/18/2014 8
After normalization, all three methods, Tech, Drop and Combine gives similar distribution across samples
Distribution of TMM Normalized Counts
Technical Replicates(Tech) Drop Small Libraries(Drop) Combine All Replicates(Combine)
Density
9/18/2014 9
After normalization, all three methods, Tech, Drop and Combine gives similar distribution across samples
For each combination of dataset and tools, there are four differential expression comparisons
- IL-2 treated SR cells and untreated SR cells
- IL-2 treated SEN cells and untreated SEN cells
- Untreated SEN cells and untreated SR cells
- IL-2 treated SEN cells and treated SR cells
9/18/2014 10
DESeq edgeR GFold NOISeq DEGSeq Tech Drop Combine
9/18/2014 11
DESeq edgeR GFold NOISeq DEGSeq Tech Drop Combine
X
Comparison #1
X = methods wont work
edgeR vs s DESeq
SR treated vs SR untreated SEN untreated vs SR untreated
9/18/2014 12
SEN treated vs SEN untreated No data from edgeR
edgeR up edgeR down DESeq down DESeq up edgeR up edgeR down DESeq down DESeq up
DEG lists have high overlop.
- Overlap exist in opposite DEG lists.
- DESeq identified 46.1% of total transcripts as DEGs.
- Each condition contains a low-library-size replicate.
9/18/2014 13
Comparison #1 - Conclusion
Treating technical replicates separately does not yield reliable results Therefore, either the low abundance samples need to be dropped or the technical replicates need to be combined
9/18/2014 14
DESeq edgeR GFold NOISeq DEGseq Tech Drop Combine
X
X X X = methods wont work
Comparison #1 - Conclusion
9/18/2014 15
DESeq edgeR GFold NOISeq DEGSeq Tech
X X
Drop Combine
X
Comparison #2
9/18/2014 16
Comparison #2
SR treated vs SR untreated SEN treated vs SEN untreated
DESeq up edgeR up DESeq down edgeR down DESeq up edgeR up DESeq down edgeR down
- Overall, DESeq and edgeR does not share a high proportion of DE genes except for SEN untreated vs. SR
untreated where the difference between two conditions are relatively large.
- This could be result from partially missing replicates from SEN samples.
9/18/2014 17
Comparison #2
SEN untreated vs SR untreated
DESeq up edgeR up DESeq down edgeR down
- Overall, DESeq and edgeR does not share a high proportion of DE genes except for SEN untreated vs. SR
untreated where the difference between two conditions are relatively large.
- This could be result from partially missing replicates from SEN samples.
SEN treated vs SR treated
DESeq up edgeR up DESeq down edgeR down
9/18/2014 18
Comparison #2 - Conclusion
Dropping low abundance samples results in low overlap of genes identified as differentially expressed (IL2-up or IL-2 down) using different analysis methods Therefore, either the technical replicates (including the low abundance samples) need to be combined
9/18/2014 19
DESeq edgeR GFold NOISeq DEGseq Tech Drop Combine
X X X X X X = methods wont work
Comparison #2 - Conclusion
9/18/2014 20
DESeq edgeR GFold NOISeq DEGSeq Tech
X X
Drop
X X
Combine
X
Comparison #3
X = methods wont work
9/18/2014 21
Comparison #3
DESeq up NOISeq up NOISeq down DESeq down DESeq up NOISeq up NOISeq down DESeq down
SR treated vs SR untreated SEN treated vs SEN untreated
- Overall, NOISeq failed to identify a reasonable number of differentially expressed genes from the combined data
with FDR adjusted p-value < 0.05
9/18/2014 22
Comparison #3
DESeq up NOISeq up NOISeq down DESeq down
SEN untreated vs SR untreated
DESeq up NOISeq up NOISeq down DESeq down
SEN treated vs SR treated
- Overall, NOISeq failed to identify a reasonable number of differentially expressed genes from the combined data
with FDR adjusted p-value < 0.05
9/18/2014 23
Comparison #3 - Conclusion
Overall, NOISeq failed to identify any differentially expressed genes from the combined data with the same confidence as DESeq DESeq also yields a very low number of differentially expressed genes. This is likely due to combining samples resulting in a lack of replicates (which DESeq is sensitive to)
9/18/2014 24
DESeq edgeR GFold NOISeq DEGseq Tech Drop Combine
X X X X ? X X
Comparison #3 - Conclusion
X = methods wont work ?= still not sure which is the best
9/18/2014 25
DESeq edgeR GFold NOISeq DEGSeq Tech
X X
Drop
X X
Combine
? X X
Comparison #4
X = methods wont work ?= still not sure which is the best
9/18/2014 26
Comparison #4
- Overall, DE genes detected by DESeq from Combined method were subsets of ones from Drop method.
- This is because the Drop method kept the SR technical replicates while the Combine method do not have any replicates.
- This proves that DESeq tend to give a more conservative set of DE genes when replicates are missing.
Drop up Combine up Combine down Drop down
SR treated vs SR untreated SEN treated vs SEN untreated
Drop up Combine up Combine down Drop down
9/18/2014 27
Comparison #4
- Overall, DE genes detected by DESeq from Combined method were subsets of ones from Drop method.
- This is because the Drop method kept the SR technical replicates while the Combine method do not have any replicates.
- This proves that DESeq tend to give a more conservative set of DE genes when replicates are missing.
SEN untreated vs SR untreated
Drop up Combine up Combine down Drop down
SEN treated vs SR treated
Drop up Combine up Combine down Drop down
9/18/2014 28
Comparison #4 - Conclusion
DESeq does indeed identify many more differentially expressed genes using the drop method (whereby low abundance technical replicates are removed) than the combine method (whereby technical replicates are combined to a single sample) This confirms the dependence of DESeq on replicates and indicates that it should not be used here
9/18/2014 29
DESeq edgeR GFold NOISeq DEGSeq Tech
X X
Drop
X X
Combine
? X X
Comparison #4 - Conclusion
X = methods wont work ?= still not sure which is the best
9/18/2014 30
DESeq edgeR GFold NOISeq DEGSeq Tech
X X
Drop
X X
Combine
? X X
Comparison #5
X = methods wont work ?= still not sure which is the best
9/18/2014 31
Comparison #5
GFOLD up DEGseq up GFOLD down DEGseq down GFOLD up DEGseq up GFOLD down DEGseq down
- GFOLD and DEGseq performs similarly in terms of number of DE genes identified
- Most of DE genes identified are shared between two tools
SR treated vs SR untreated SEN treated vs SEN untreated
9/18/2014 32
Comparison #5
GFOLD up DEGseq up GFOLD down DEGseq down
SEN untreated vs SR untreated
GFOLD up DEGseq up GFOLD down DEGseq down
SEN treated vs SR treated
- GFOLD and DEGseq performs similarly in terms of number of DE genes identified
- Most of DE genes identified are shared between two tools
Statistics of DEGs
Cut t of
- ff:
: Gf Gfold ld 0.0 0.01
up down No DE SEN_untreated vs SEN_treated 246 112 13329 SR_untreated vs SR_treated 147 46 13494 SR_untreated vs SEN_untreated 528 498 12661 SR_treated vs SEN_treated 586 487 12614
9/18/2014 33
- GFOLD cutoff is determined empirically(See plot on slide#30)
- Transcripts are filtered by criteria of count per million >1 in at least 1 condition.
- 13687 genes are included in the analysis. Each gene is represented by one transcript which has highest
expression level across all samples.
9/18/2014 34
GFOLD cutoff GFOLD value
9/18/2014 35
Comparison #5 - Conclusion
GFOLD and DEGseq show similar results (i.e. relatively high overlap of differentially expressed genes) To decide which of these methods to use, we will do validation by comparing RNA-seq results with qPCR results as described starting on slide #45
9/18/2014 36
DESeq edgeR GFold NOISeq DEGSeq Tech
X X
Drop
X X
Combine
? X ? X ?
Comparison #5 - Conclusion
X = methods wont work ?= still not sure which is the best
9/18/2014 37
DESeq edgeR GFold NOISeq DEGSeq Tech
X X
Drop
X X
Combine
? X ? X ?
Comparison #6
X = methods wont work ?= still not sure which is the best
9/18/2014 38
Comparison #6
- Gfold identifies less DEGs using drop method.
- Gfold requires biological replicates rather than technical replicates.
- Gfold became even conservative when provided with technical replicates to substitute biological replicates.
Drop up Combine up Combine down Drop down
SR treated vs SR untreated SEN treated vs SEN untreated
Drop up Combine up Combine down Drop down
9/18/2014 39
Comparison #6
- Gfold identifies less DEGs using drop method.
- Gfold requires biological replicates rather than technical replicates.
- Gfold became even conservative when provided with technical replicates to substitute biological replicates.
SEN untreated vs SR untreated
Drop up Combine up Combine down Drop down
SEN treated vs SR treated
Drop up Combine up Combine down Drop down
9/18/2014 40
Comparison #6 - Conclusion
GFOLD identifies more DE genes when the technical replicates are combined which is the opposite result comparing to that of DESeq. This may indicate that GFOLD has robust performance with the absence of replicates. However, the good performance could also be heavily depended on the GFOLD cutoff, which is different from the p < 0.05 cutoff used by all other tools.
9/18/2014 41
DESeq edgeR GFold NOISeq DEGseq Tech Drop Combine
Preliminary Results
X X X X X ? X ? X ? X = methods wont work ?= still not sure which is the best
9/18/2014 42
DESeq edgeR GFold NOISeq DEGseq Tech Drop Combine # DE gene: Low
X
# DE gene: High
X
# DE gene: High
Preliminary Results
X X X X X X = methods wont work
- Just based on the number of differentially expressed genes detected by each method,
DEGseq and GFOLD yield the highest number while DESeq yields the lowest number of DE genes.
X
9/18/2014 43
Preliminary Results and Next Validation Steps
The best approach at this time seems to be to combine technical replicates (both low and high abundance samples) into single biological replicates … as Victoria previously suggested However, preliminary results also suggest that the methods used to identify differentially expressed genes are highly dependent on the presence of replicates (i.e. they lose power when samples are combined and replicates are removed) In order to evaluate this, we are going to 1) validate the differential expression
- f sets of genes for which we have a strong expectation of their behavior and
2) compare sets of differentially expressed genes identified with different methods with qPCR results Victoria already has for comparison between IL2 treatments
9/18/2014 44
House-keeping Genes New House-Keeping Genes Known DE Genes between SEN and SR Known IL-2 Related Genes GAPDH, ACTB, RPL13A C1orf43, CHMP2A, EMC7 SMARCA5, HOXA1, H2AFY IL2RA, IL2RB, IL2RG, STAT5A, STAT5B
Validation Based on Genes with Known Behaviors
DESeq edgeR GFold NOISeq DEGSeq Tech Drop Combine
9/18/2014 45
Validation Based on Comparison with qPCR results
We do not have results from RNA- seq for these genes due to low expression level
9/18/2014 46
DESeq GFold NOISeq DEGSeq SR treated vs untreated GAPDH GAPDH, RPL13A ACTB SEN treated vs untreated GAPDH GAPDH, ACTB SEN treated vs SR treated GAPDH, RPL13A ACTB SEN untreated vs SR untreated GAPDH GAPDH, RPL13A, ACTB
Validation Based on Genes with Known Behaviors(Combine)
- All differential expression analysis methods did not detect differential expression among the genes
with known behaviors with high confidence(P<0.05 or GFOLD < 0.01)
- No genes from the “new house-keeping” gene set were identified as differentially expressed.
9/18/2014 47
TMM Normalized Expression(Combine)
Differentially Expressed Not DE TMM normalized expression values were used in differential expression analysis
9/18/2014 48 GAPDH RPL13A
Raw/Not Normalized Read Counts(Combine)
Not DE Differentially Expressed Raw expression values were just for reference and they were NOT used in differential expression analysis
9/18/2014 49
DESeq GFold NOISeq DEGSeq SR treated vs untreated GAPDH GAPDH, RPL13A ACTB SEN treated vs untreated GAPDH GAPDH, ACTB SEN treated vs SR treated GAPDH GAPDH, RPL13A ACTB SEN untreated vs SR untreated GAPDH GAPDH, RPL13A, ACTB DESeq GFold NOISeq DEGseq Tech
X X X X
Drop
X X X X
Combine # DE genes: Low # DE genes: High
X
# DE genes: High
- Therefore, genes with known behavior fails to identify the best method for differential expression.
- However, if we put the two tables side-by-side, we can also see that more house-keeping genes were identified
as DE genes (potentially false positives) by DEGseq.
X
9/18/2014 50
Conclusion
- GFOLD and DEGseq identifies comparable number of DE genes.
- None of the genes with known differentially expression behavior were identified by
either of the method.
- However, DEGseq identifies more house-keeping genes as differentially expressed
gene with adjusted p-value < 0.05.
- Therefore, GFOLD gives the best results for our current dataset and will be used
for downstream analysis.