before and after IL2 treatment Lu Wang and Ying Sha 9/18/2014 1 - - PowerPoint PPT Presentation

before and after il2 treatment lu wang and ying sha
SMART_READER_LITE
LIVE PREVIEW

before and after IL2 treatment Lu Wang and Ying Sha 9/18/2014 1 - - PowerPoint PPT Presentation

Differential expression analysis of SR and SEN cells before and after IL2 treatment Lu Wang and Ying Sha 9/18/2014 1 Update since the 9/15/2014 slides: 1. Standardized the p-value cutoff (P < 0.05) for all DE analysis tools (except for


slide-1
SLIDE 1

9/18/2014 1

Differential expression analysis of SR and SEN cells before and after IL2 treatment Lu Wang and Ying Sha

slide-2
SLIDE 2

9/18/2014 2

Update since the 9/15/2014 slides:

  • 1. Standardized the p-value cutoff (P < 0.05) for all DE analysis tools (except for

GFOLD where p-value does not apply)

  • 2. All related plots has been updated
  • 3. Added differential expression comparison between SR treated and SEN

treated

  • 4. Added GFOLD results for drop two small SEN library while keeping the SR

replicates.

  • 5. Validation on genes with known behavior.
slide-3
SLIDE 3

9/18/2014 3

Previously, we evaluated five different normalization methods (with and without ERCC normalization) and decided together with Victoria on TMM normalization (without ERCC) In this presentation, we are dealing with two issues:

  • 1. How to best treat the technical replicates (in particular the low abundance

samples)

  • 2. The best differential expression method to use (conditioned upon the different

methods used to treat the technical replicates)

slide-4
SLIDE 4
  • Three ways of combining datasets
  • Technical Replicates(Tech): Treat sequencing runs for the same cell and condition as

technical replicates

  • Drop Small Libraries(Drop): Treat sequencing runs for the same cell and condition as

technical replicates but drop the smaller libraries/replicates

  • Combine All Replicates(Combine): Combine sequencing runs/technical replicates for the

same cell and condition

  • Five differential expression calling tools
  • DESeq: Frequently used DE Analysis tool when replicates are present; Does not work

well without replicates

  • edgR: Frequently used DE Analysis tool when replicates are present; Does not work

without replicates

  • Gfold: Works well without replicates
  • NOISeq: Looks at dRPKM and Fold-change at the same time; Works with or without

replicates

  • DEGseq: Works with or without replicates.

9/18/2014 4

slide-5
SLIDE 5

9/18/2014 5

DESeq edgeR GFold NOISeq DEGSeq Tech Drop Combine

X

X = methods wont work

slide-6
SLIDE 6

log2 Read counts

Technical Replicates(Tech)

log2 Read counts

Drop Small Libraries(Drop)

log2 Read counts

Combine All Replicates(Combine)

Distribution of Raw Read Counts

9/18/2014 6

Drop and Combine gives more similar distribution across samples before normalization

slide-7
SLIDE 7

Density

Distribution of Raw Read Counts

Technical Replicates(Tech) Drop Small Libraries(Drop) Combine All Replicates(Combine)

9/18/2014 7

Drop and Combine gives more similar distribution across samples before normalization

slide-8
SLIDE 8

Distribution of TMM Normalized Counts

Technical Replicates(Tech) Drop Small Libraries(Drop) Combine All Replicates(Combine)

9/18/2014 8

After normalization, all three methods, Tech, Drop and Combine gives similar distribution across samples

slide-9
SLIDE 9

Distribution of TMM Normalized Counts

Technical Replicates(Tech) Drop Small Libraries(Drop) Combine All Replicates(Combine)

Density

9/18/2014 9

After normalization, all three methods, Tech, Drop and Combine gives similar distribution across samples

slide-10
SLIDE 10

For each combination of dataset and tools, there are four differential expression comparisons

  • IL-2 treated SR cells and untreated SR cells
  • IL-2 treated SEN cells and untreated SEN cells
  • Untreated SEN cells and untreated SR cells
  • IL-2 treated SEN cells and treated SR cells

9/18/2014 10

DESeq edgeR GFold NOISeq DEGSeq Tech Drop Combine

slide-11
SLIDE 11

9/18/2014 11

DESeq edgeR GFold NOISeq DEGSeq Tech Drop Combine

X

Comparison #1

X = methods wont work

slide-12
SLIDE 12

edgeR vs s DESeq

SR treated vs SR untreated SEN untreated vs SR untreated

9/18/2014 12

SEN treated vs SEN untreated No data from edgeR

edgeR up edgeR down DESeq down DESeq up edgeR up edgeR down DESeq down DESeq up

DEG lists have high overlop.

  • Overlap exist in opposite DEG lists.
  • DESeq identified 46.1% of total transcripts as DEGs.
  • Each condition contains a low-library-size replicate.
slide-13
SLIDE 13

9/18/2014 13

Comparison #1 - Conclusion

Treating technical replicates separately does not yield reliable results Therefore, either the low abundance samples need to be dropped or the technical replicates need to be combined

slide-14
SLIDE 14

9/18/2014 14

DESeq edgeR GFold NOISeq DEGseq Tech Drop Combine

X

X X X = methods wont work

Comparison #1 - Conclusion

slide-15
SLIDE 15

9/18/2014 15

DESeq edgeR GFold NOISeq DEGSeq Tech

X X

Drop Combine

X

Comparison #2

slide-16
SLIDE 16

9/18/2014 16

Comparison #2

SR treated vs SR untreated SEN treated vs SEN untreated

DESeq up edgeR up DESeq down edgeR down DESeq up edgeR up DESeq down edgeR down

  • Overall, DESeq and edgeR does not share a high proportion of DE genes except for SEN untreated vs. SR

untreated where the difference between two conditions are relatively large.

  • This could be result from partially missing replicates from SEN samples.
slide-17
SLIDE 17

9/18/2014 17

Comparison #2

SEN untreated vs SR untreated

DESeq up edgeR up DESeq down edgeR down

  • Overall, DESeq and edgeR does not share a high proportion of DE genes except for SEN untreated vs. SR

untreated where the difference between two conditions are relatively large.

  • This could be result from partially missing replicates from SEN samples.

SEN treated vs SR treated

DESeq up edgeR up DESeq down edgeR down

slide-18
SLIDE 18

9/18/2014 18

Comparison #2 - Conclusion

Dropping low abundance samples results in low overlap of genes identified as differentially expressed (IL2-up or IL-2 down) using different analysis methods Therefore, either the technical replicates (including the low abundance samples) need to be combined

slide-19
SLIDE 19

9/18/2014 19

DESeq edgeR GFold NOISeq DEGseq Tech Drop Combine

X X X X X X = methods wont work

Comparison #2 - Conclusion

slide-20
SLIDE 20

9/18/2014 20

DESeq edgeR GFold NOISeq DEGSeq Tech

X X

Drop

X X

Combine

X

Comparison #3

X = methods wont work

slide-21
SLIDE 21

9/18/2014 21

Comparison #3

DESeq up NOISeq up NOISeq down DESeq down DESeq up NOISeq up NOISeq down DESeq down

SR treated vs SR untreated SEN treated vs SEN untreated

  • Overall, NOISeq failed to identify a reasonable number of differentially expressed genes from the combined data

with FDR adjusted p-value < 0.05

slide-22
SLIDE 22

9/18/2014 22

Comparison #3

DESeq up NOISeq up NOISeq down DESeq down

SEN untreated vs SR untreated

DESeq up NOISeq up NOISeq down DESeq down

SEN treated vs SR treated

  • Overall, NOISeq failed to identify a reasonable number of differentially expressed genes from the combined data

with FDR adjusted p-value < 0.05

slide-23
SLIDE 23

9/18/2014 23

Comparison #3 - Conclusion

Overall, NOISeq failed to identify any differentially expressed genes from the combined data with the same confidence as DESeq DESeq also yields a very low number of differentially expressed genes. This is likely due to combining samples resulting in a lack of replicates (which DESeq is sensitive to)

slide-24
SLIDE 24

9/18/2014 24

DESeq edgeR GFold NOISeq DEGseq Tech Drop Combine

X X X X ? X X

Comparison #3 - Conclusion

X = methods wont work ?= still not sure which is the best

slide-25
SLIDE 25

9/18/2014 25

DESeq edgeR GFold NOISeq DEGSeq Tech

X X

Drop

X X

Combine

? X X

Comparison #4

X = methods wont work ?= still not sure which is the best

slide-26
SLIDE 26

9/18/2014 26

Comparison #4

  • Overall, DE genes detected by DESeq from Combined method were subsets of ones from Drop method.
  • This is because the Drop method kept the SR technical replicates while the Combine method do not have any replicates.
  • This proves that DESeq tend to give a more conservative set of DE genes when replicates are missing.

Drop up Combine up Combine down Drop down

SR treated vs SR untreated SEN treated vs SEN untreated

Drop up Combine up Combine down Drop down

slide-27
SLIDE 27

9/18/2014 27

Comparison #4

  • Overall, DE genes detected by DESeq from Combined method were subsets of ones from Drop method.
  • This is because the Drop method kept the SR technical replicates while the Combine method do not have any replicates.
  • This proves that DESeq tend to give a more conservative set of DE genes when replicates are missing.

SEN untreated vs SR untreated

Drop up Combine up Combine down Drop down

SEN treated vs SR treated

Drop up Combine up Combine down Drop down

slide-28
SLIDE 28

9/18/2014 28

Comparison #4 - Conclusion

DESeq does indeed identify many more differentially expressed genes using the drop method (whereby low abundance technical replicates are removed) than the combine method (whereby technical replicates are combined to a single sample) This confirms the dependence of DESeq on replicates and indicates that it should not be used here

slide-29
SLIDE 29

9/18/2014 29

DESeq edgeR GFold NOISeq DEGSeq Tech

X X

Drop

X X

Combine

? X X

Comparison #4 - Conclusion

X = methods wont work ?= still not sure which is the best

slide-30
SLIDE 30

9/18/2014 30

DESeq edgeR GFold NOISeq DEGSeq Tech

X X

Drop

X X

Combine

? X X

Comparison #5

X = methods wont work ?= still not sure which is the best

slide-31
SLIDE 31

9/18/2014 31

Comparison #5

GFOLD up DEGseq up GFOLD down DEGseq down GFOLD up DEGseq up GFOLD down DEGseq down

  • GFOLD and DEGseq performs similarly in terms of number of DE genes identified
  • Most of DE genes identified are shared between two tools

SR treated vs SR untreated SEN treated vs SEN untreated

slide-32
SLIDE 32

9/18/2014 32

Comparison #5

GFOLD up DEGseq up GFOLD down DEGseq down

SEN untreated vs SR untreated

GFOLD up DEGseq up GFOLD down DEGseq down

SEN treated vs SR treated

  • GFOLD and DEGseq performs similarly in terms of number of DE genes identified
  • Most of DE genes identified are shared between two tools
slide-33
SLIDE 33

Statistics of DEGs

Cut t of

  • ff:

: Gf Gfold ld 0.0 0.01

up down No DE SEN_untreated vs SEN_treated 246 112 13329 SR_untreated vs SR_treated 147 46 13494 SR_untreated vs SEN_untreated 528 498 12661 SR_treated vs SEN_treated 586 487 12614

9/18/2014 33

  • GFOLD cutoff is determined empirically(See plot on slide#30)
  • Transcripts are filtered by criteria of count per million >1 in at least 1 condition.
  • 13687 genes are included in the analysis. Each gene is represented by one transcript which has highest

expression level across all samples.

slide-34
SLIDE 34

9/18/2014 34

GFOLD cutoff GFOLD value

slide-35
SLIDE 35

9/18/2014 35

Comparison #5 - Conclusion

GFOLD and DEGseq show similar results (i.e. relatively high overlap of differentially expressed genes) To decide which of these methods to use, we will do validation by comparing RNA-seq results with qPCR results as described starting on slide #45

slide-36
SLIDE 36

9/18/2014 36

DESeq edgeR GFold NOISeq DEGSeq Tech

X X

Drop

X X

Combine

? X ? X ?

Comparison #5 - Conclusion

X = methods wont work ?= still not sure which is the best

slide-37
SLIDE 37

9/18/2014 37

DESeq edgeR GFold NOISeq DEGSeq Tech

X X

Drop

X X

Combine

? X ? X ?

Comparison #6

X = methods wont work ?= still not sure which is the best

slide-38
SLIDE 38

9/18/2014 38

Comparison #6

  • Gfold identifies less DEGs using drop method.
  • Gfold requires biological replicates rather than technical replicates.
  • Gfold became even conservative when provided with technical replicates to substitute biological replicates.

Drop up Combine up Combine down Drop down

SR treated vs SR untreated SEN treated vs SEN untreated

Drop up Combine up Combine down Drop down

slide-39
SLIDE 39

9/18/2014 39

Comparison #6

  • Gfold identifies less DEGs using drop method.
  • Gfold requires biological replicates rather than technical replicates.
  • Gfold became even conservative when provided with technical replicates to substitute biological replicates.

SEN untreated vs SR untreated

Drop up Combine up Combine down Drop down

SEN treated vs SR treated

Drop up Combine up Combine down Drop down

slide-40
SLIDE 40

9/18/2014 40

Comparison #6 - Conclusion

GFOLD identifies more DE genes when the technical replicates are combined which is the opposite result comparing to that of DESeq. This may indicate that GFOLD has robust performance with the absence of replicates. However, the good performance could also be heavily depended on the GFOLD cutoff, which is different from the p < 0.05 cutoff used by all other tools.

slide-41
SLIDE 41

9/18/2014 41

DESeq edgeR GFold NOISeq DEGseq Tech Drop Combine

Preliminary Results

X X X X X ? X ? X ? X = methods wont work ?= still not sure which is the best

slide-42
SLIDE 42

9/18/2014 42

DESeq edgeR GFold NOISeq DEGseq Tech Drop Combine # DE gene: Low

X

# DE gene: High

X

# DE gene: High

Preliminary Results

X X X X X X = methods wont work

  • Just based on the number of differentially expressed genes detected by each method,

DEGseq and GFOLD yield the highest number while DESeq yields the lowest number of DE genes.

X

slide-43
SLIDE 43

9/18/2014 43

Preliminary Results and Next Validation Steps

The best approach at this time seems to be to combine technical replicates (both low and high abundance samples) into single biological replicates … as Victoria previously suggested However, preliminary results also suggest that the methods used to identify differentially expressed genes are highly dependent on the presence of replicates (i.e. they lose power when samples are combined and replicates are removed) In order to evaluate this, we are going to 1) validate the differential expression

  • f sets of genes for which we have a strong expectation of their behavior and

2) compare sets of differentially expressed genes identified with different methods with qPCR results Victoria already has for comparison between IL2 treatments

slide-44
SLIDE 44

9/18/2014 44

House-keeping Genes New House-Keeping Genes Known DE Genes between SEN and SR Known IL-2 Related Genes GAPDH, ACTB, RPL13A C1orf43, CHMP2A, EMC7 SMARCA5, HOXA1, H2AFY IL2RA, IL2RB, IL2RG, STAT5A, STAT5B

Validation Based on Genes with Known Behaviors

DESeq edgeR GFold NOISeq DEGSeq Tech Drop Combine

slide-45
SLIDE 45

9/18/2014 45

Validation Based on Comparison with qPCR results

We do not have results from RNA- seq for these genes due to low expression level

slide-46
SLIDE 46

9/18/2014 46

DESeq GFold NOISeq DEGSeq SR treated vs untreated GAPDH GAPDH, RPL13A ACTB SEN treated vs untreated GAPDH GAPDH, ACTB SEN treated vs SR treated GAPDH, RPL13A ACTB SEN untreated vs SR untreated GAPDH GAPDH, RPL13A, ACTB

Validation Based on Genes with Known Behaviors(Combine)

  • All differential expression analysis methods did not detect differential expression among the genes

with known behaviors with high confidence(P<0.05 or GFOLD < 0.01)

  • No genes from the “new house-keeping” gene set were identified as differentially expressed.
slide-47
SLIDE 47

9/18/2014 47

TMM Normalized Expression(Combine)

Differentially Expressed Not DE TMM normalized expression values were used in differential expression analysis

slide-48
SLIDE 48

9/18/2014 48 GAPDH RPL13A

Raw/Not Normalized Read Counts(Combine)

Not DE Differentially Expressed Raw expression values were just for reference and they were NOT used in differential expression analysis

slide-49
SLIDE 49

9/18/2014 49

DESeq GFold NOISeq DEGSeq SR treated vs untreated GAPDH GAPDH, RPL13A ACTB SEN treated vs untreated GAPDH GAPDH, ACTB SEN treated vs SR treated GAPDH GAPDH, RPL13A ACTB SEN untreated vs SR untreated GAPDH GAPDH, RPL13A, ACTB DESeq GFold NOISeq DEGseq Tech

X X X X

Drop

X X X X

Combine # DE genes: Low # DE genes: High

X

# DE genes: High

  • Therefore, genes with known behavior fails to identify the best method for differential expression.
  • However, if we put the two tables side-by-side, we can also see that more house-keeping genes were identified

as DE genes (potentially false positives) by DEGseq.

X

slide-50
SLIDE 50

9/18/2014 50

Conclusion

  • GFOLD and DEGseq identifies comparable number of DE genes.
  • None of the genes with known differentially expression behavior were identified by

either of the method.

  • However, DEGseq identifies more house-keeping genes as differentially expressed

gene with adjusted p-value < 0.05.

  • Therefore, GFOLD gives the best results for our current dataset and will be used

for downstream analysis.