RNAseq: Normalization and differential expression I Jens Gietzelt - - PowerPoint PPT Presentation

rnaseq normalization and differential expression i
SMART_READER_LITE
LIVE PREVIEW

RNAseq: Normalization and differential expression I Jens Gietzelt - - PowerPoint PPT Presentation

RNAseq: Normalization and differential expression I Jens Gietzelt 22.05.2012 Robinson, Oshlack. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biology. 2010 Hardcastle, Kelly. baySeq: Empirical


slide-1
SLIDE 1

RNAseq: Normalization and differential expression I

Jens Gietzelt 22.05.2012 Robinson, Oshlack. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biology. 2010 Hardcastle, Kelly. baySeq: Empirical Bayesian methods for identifying differential expression in sequence count data. BMC

  • Bioinformatics. 2010
slide-2
SLIDE 2

Introduction Pairwise calibration (EdgeR) Differential expression

Outline of the presentation

1

Introduction

2

Pairwise calibration (EdgeR)

3

Differential expression

Jens Gietzelt

slide-3
SLIDE 3

Introduction Pairwise calibration (EdgeR) Differential expression

Introduction

normalization: comparison of expression levels between genes within a sample (same scale) however technical effects introduce a bias in the comparison between samples ⇒ normalization is crucial before performing differential expression calibration method EdgeR takes advantage of within-sample comparability differential expression: appropriate distribution for count data incorporate calibration parameters

Jens Gietzelt

slide-4
SLIDE 4

Introduction Pairwise calibration (EdgeR) Differential expression

Framework

Yg,k ... observed count for gene g in library k Nk =

G

  • g=1

Yg,k ... total number of reads for library k ηg,k ... number of transcripts of gene g in library k Lg ... length of gene g Sk =

G

  • g=1

ηg,kLg ... total RNA output of sample k E (Yg,k) = ηg,kLg Sk Nk counts are a linear function of the number of transcripts library size calibration (Yg,k/Nk) is appropriate for the comparison

  • f replicates

comparison of biologically different samples may be biased by varying RNA composition

Jens Gietzelt

slide-5
SLIDE 5

kidney vs. liver dataset

slide-6
SLIDE 6

Introduction Pairwise calibration (EdgeR) Differential expression

Trimmed mean of log-foldchange

RNA production Sk of one sample cannot be determined directly estimation of relative differences of RNA production fk = Sk/Sr of a pair of samples (k, r) assumption: most genes are not differentially expressed ⇒ compute robust mean over log-foldchanges:

double filtering over both mean and difference of log-values calculate a weighted mean over the log-foldchanges ⇒ resacle factors fk = TMM(k,r), where r is reference sample

log2

  • TMM(k,r)
  • =
  • g∈G ∗ wg,(k,r) (log2 (Yg,k/Nk) − log2 (Yg,r/Nr))
  • g∈G ∗ wg,(k,r)

wg,(k,r) = 1 Yg,k − 1 Nk + 1 Yg,r − 1 Nr −1

Jens Gietzelt

slide-7
SLIDE 7

kidney vs. liver dataset

slide-8
SLIDE 8

Simulation: pair of samples

simulated data sampled from poisson distribution

slide-9
SLIDE 9

Simulation: replicates

Cloonan: log-transformation and quantile normalization

slide-10
SLIDE 10

Introduction Pairwise calibration (EdgeR) Differential expression

Differential expression

methods in use: DegSeq (normal distr.) EdgeR (negative binomial) DEseq (negative binomial, multiple groups) baySeq (negative binomial, multiple groups) Myrna (permutation based)

Jens Gietzelt

slide-11
SLIDE 11

Introduction Pairwise calibration (EdgeR) Differential expression

EdgeR

technical replicates: poisson distr. biologically different samples: negative binomial distr. Y ∼ NB (p, m) Y ... number of successes in a sequence of Bernoulli trials with probability p before r failures occur alternative parametrization: qg,e ... proportion of sequenced RNA of gene g for experimental group e Yg,k,e ∼ NB (qg,eNkfk, φg) E (Yg,k,e) = µg,k,e = qg,eNkfk, Var (Yg,k,e) = µg,k,e + µ2

g,k,eφg

test if qg,1 is significantly different from qg,2 dispersons φg are moderated towards a common disperson

Jens Gietzelt

slide-12
SLIDE 12

Introduction Pairwise calibration (EdgeR) Differential expression

baySeq I

empirical Bayes approach to detect differential expression Dg = {Yg,k, Nk, fk}k=1,...,K M ... user specified model θM ... vector of parameters of model M P (M|Dg) = P (Dg|M) P (M) P (Dg) calculate marginal likelihood: P (Dg|M) =

  • P (Dg|θM, M) P (θM|M) dθM

Jens Gietzelt

slide-13
SLIDE 13

Introduction Pairwise calibration (EdgeR) Differential expression

baySeq II

P (Dg|M) =

  • P (Dg|θM, M) P (θM|M) dθM

e.g. Poisson-Gamma conjugacy, however no such conjugacy with negative binomial data ⇒ define an empirical distribution on θM and estimate the marginal likelihood numerically prior P (M) is estimated by iteration: P (M) = pg, p∗

g = P (M|Dg)

baySeq: applicable to complex experimental designs computationally intensive

Jens Gietzelt