Expression Quantification (I) Mario Fasold, LIFE, IZBI This - - PowerPoint PPT Presentation

expression quantification i
SMART_READER_LITE
LIVE PREVIEW

Expression Quantification (I) Mario Fasold, LIFE, IZBI This - - PowerPoint PPT Presentation

Leipziger Forschungszentrum fr Medizinische Fakultt Zivilisationserkrankungen Expression Quantification (I) Mario Fasold, LIFE, IZBI This publication is supported by LIFE Leipzig Research Center for Civilization Diseases, Universitt


slide-1
SLIDE 1

Medizinische Fakultät

Leipziger Forschungszentrum für Zivilisationserkrankungen

This publication is supported by LIFE – Leipzig Research Center for Civilization Diseases, Universität Leipzig. This project was funded by means of the European Social Fund and the Free State of Saxony.

Expression Quantification (I)

Mario Fasold, LIFE, IZBI

slide-2
SLIDE 2

Medizinische Fakultät

Leipziger Forschungszentrum für Zivilisationserkrankungen

This publication is supported by LIFE – Leipzig Research Center for Civilization Diseases, Universität Leipzig. This project was funded by means of the European Social Fund and the Free State of Saxony.

One Illumina HiSeq 2000 run produces 2 times (paired-end)

  • ca. 1,2 Billion reads
  • ca. 120 GB FASTQ file

Sequencing Technology

slide-3
SLIDE 3

Medizinische Fakultät

Leipziger Forschungszentrum für Zivilisationserkrankungen

This publication is supported by LIFE – Leipzig Research Center for Civilization Diseases, Universität Leipzig. This project was funded by means of the European Social Fund and the Free State of Saxony.

RNA-seq protocol

slide-4
SLIDE 4

Medizinische Fakultät

Leipziger Forschungszentrum für Zivilisationserkrankungen

This publication is supported by LIFE – Leipzig Research Center for Civilization Diseases, Universität Leipzig. This project was funded by means of the European Social Fund and the Free State of Saxony.

  • Obtain some “value” (estimate) representing the true

abundance of transcripts in vivo

Task

slide-5
SLIDE 5

Medizinische Fakultät

Leipziger Forschungszentrum für Zivilisationserkrankungen

This publication is supported by LIFE – Leipzig Research Center for Civilization Diseases, Universität Leipzig. This project was funded by means of the European Social Fund and the Free State of Saxony.

  • 1. Basic Expression Measures
  • 2. Statistical Poisson Model
  • 1. Single Isoform
  • 2. Multiple Isoform
  • 3. Statistical Inference
  • 4. Results
  • 3. Paired-End Sequencing
  • 4. Negative-Binomial

Overview

slide-6
SLIDE 6

Medizinische Fakultät

Leipziger Forschungszentrum für Zivilisationserkrankungen

This publication is supported by LIFE – Leipzig Research Center for Civilization Diseases, Universität Leipzig. This project was funded by means of the European Social Fund and the Free State of Saxony.

  • What properties should the expression values have?
  • Values should be comparable
  • Between transcripts
  • Between samples (-> differential expression)

Expression Quantification

slide-7
SLIDE 7

Medizinische Fakultät

Leipziger Forschungszentrum für Zivilisationserkrankungen

This publication is supported by LIFE – Leipzig Research Center for Civilization Diseases, Universität Leipzig. This project was funded by means of the European Social Fund and the Free State of Saxony.

Differences in transcript length

Problems in Expression Quantification

slide-8
SLIDE 8

Medizinische Fakultät

Leipziger Forschungszentrum für Zivilisationserkrankungen

This publication is supported by LIFE – Leipzig Research Center for Civilization Diseases, Universität Leipzig. This project was funded by means of the European Social Fund and the Free State of Saxony.

Fluctuations in sequencing depth

Problems in Expression Quantification

slide-9
SLIDE 9

Medizinische Fakultät

Leipziger Forschungszentrum für Zivilisationserkrankungen

This publication is supported by LIFE – Leipzig Research Center for Civilization Diseases, Universität Leipzig. This project was funded by means of the European Social Fund and the Free State of Saxony.

RPKM/FPKM = Reads/Fragments per kilobase of exon model per million mapped reads

Basic Measure: RPKM

x = the number of reads mapped onto the gene's exons w = total number of reads in the experiment l = the sum of the exons in base pairs.

wl x FPKM

9

10

slide-10
SLIDE 10

Medizinische Fakultät

Leipziger Forschungszentrum für Zivilisationserkrankungen

This publication is supported by LIFE – Leipzig Research Center for Civilization Diseases, Universität Leipzig. This project was funded by means of the European Social Fund and the Free State of Saxony.

slide-11
SLIDE 11

Medizinische Fakultät

Leipziger Forschungszentrum für Zivilisationserkrankungen

This publication is supported by LIFE – Leipzig Research Center for Civilization Diseases, Universität Leipzig. This project was funded by means of the European Social Fund and the Free State of Saxony.

However, there are isoforms...

slide-12
SLIDE 12

Medizinische Fakultät

Leipziger Forschungszentrum für Zivilisationserkrankungen

This publication is supported by LIFE – Leipzig Research Center for Civilization Diseases, Universität Leipzig. This project was funded by means of the European Social Fund and the Free State of Saxony.

7%

Typical number of isoforms

slide-13
SLIDE 13

Medizinische Fakultät

Leipziger Forschungszentrum für Zivilisationserkrankungen

This publication is supported by LIFE – Leipzig Research Center for Civilization Diseases, Universität Leipzig. This project was funded by means of the European Social Fund and the Free State of Saxony.

slide-14
SLIDE 14

Medizinische Fakultät

Leipziger Forschungszentrum für Zivilisationserkrankungen

This publication is supported by LIFE – Leipzig Research Center for Civilization Diseases, Universität Leipzig. This project was funded by means of the European Social Fund and the Free State of Saxony.

ng=2 fg,2 fg,1

Notation

slide-15
SLIDE 15

Medizinische Fakultät

Leipziger Forschungszentrum für Zivilisationserkrankungen

This publication is supported by LIFE – Leipzig Research Center for Civilization Diseases, Universität Leipzig. This project was funded by means of the European Social Fund and the Free State of Saxony.

lf lf kf kf

slide-16
SLIDE 16

Medizinische Fakultät

Leipziger Forschungszentrum für Zivilisationserkrankungen

This publication is supported by LIFE – Leipzig Research Center for Civilization Diseases, Universität Leipzig. This project was funded by means of the European Social Fund and the Free State of Saxony.

Set of all isoforms in sample Total length of isoforms f1 f1 f1 f1 f1 f1 f1 f2 f2 f2 f1 f1 f1 f1 f1 f2 f2 f2

slide-17
SLIDE 17

Medizinische Fakultät

Leipziger Forschungszentrum für Zivilisationserkrankungen

This publication is supported by LIFE – Leipzig Research Center for Civilization Diseases, Universität Leipzig. This project was funded by means of the European Social Fund and the Free State of Saxony.

Expression value Probablity of a read coming from isoform f f1 f1 f1 f1 f2 f2 f2 f1 f1 f1 f1 f1 f1 f1 f2 f2 f2 f1

slide-18
SLIDE 18

Medizinische Fakultät

Leipziger Forschungszentrum für Zivilisationserkrankungen

This publication is supported by LIFE – Leipzig Research Center for Civilization Diseases, Universität Leipzig. This project was funded by means of the European Social Fund and the Free State of Saxony.

  • Reads sampled independently (-> unique reads)
  • Read sampling probability uniform over all sequences

(uniform coverage)

  • Each transcript processed independently and then

sequenced

Model Assumptions

slide-19
SLIDE 19

Medizinische Fakultät

Leipziger Forschungszentrum für Zivilisationserkrankungen

This publication is supported by LIFE – Leipzig Research Center for Civilization Diseases, Universität Leipzig. This project was funded by means of the European Social Fund and the Free State of Saxony.

  • Let w be total number of reads
  • Given isoform f and region of length l in f
  • Let X be a random variable representing the number of

reads falling in that region

  • Then X~bin(w, Θfl)

Model (1)

slide-20
SLIDE 20

Medizinische Fakultät

Leipziger Forschungszentrum für Zivilisationserkrankungen

This publication is supported by LIFE – Leipzig Research Center for Civilization Diseases, Universität Leipzig. This project was funded by means of the European Social Fund and the Free State of Saxony.

Coin Toss gives Binomial Dist’n

slide-21
SLIDE 21

Medizinische Fakultät

Leipziger Forschungszentrum für Zivilisationserkrankungen

This publication is supported by LIFE – Leipzig Research Center for Civilization Diseases, Universität Leipzig. This project was funded by means of the European Social Fund and the Free State of Saxony.

slide-22
SLIDE 22

Medizinische Fakultät

Leipziger Forschungszentrum für Zivilisationserkrankungen

This publication is supported by LIFE – Leipzig Research Center for Civilization Diseases, Universität Leipzig. This project was funded by means of the European Social Fund and the Free State of Saxony.

  • Each gene g handled separately
  • Exons only shared as a whole – isoforms either share an

exon, or they don’t

  • Let X~Poisson( ) be a random variable representing the

number of reads falling in some region of interest in g

Poisson Model

slide-23
SLIDE 23

Medizinische Fakultät

Leipziger Forschungszentrum für Zivilisationserkrankungen

This publication is supported by LIFE – Leipzig Research Center for Civilization Diseases, Universität Leipzig. This project was funded by means of the European Social Fund and the Free State of Saxony.

  • Each gene g handled separately
  • Exons only shared as a whole – isoforms either share an

exon, or they don’t

  • Let X~Poisson( ) be a random variable representing the

number of reads falling in some region of interest in g

  • Then for some exon j: λ=
  • cij is 1 if isoform i contains exon j, 0 otherwise

Poisson Model

ai is the “sampling rate”

slide-24
SLIDE 24

Medizinische Fakultät

Leipziger Forschungszentrum für Zivilisationserkrankungen

This publication is supported by LIFE – Leipzig Research Center for Civilization Diseases, Universität Leipzig. This project was funded by means of the European Social Fund and the Free State of Saxony.

Estimation of expression

slide-25
SLIDE 25

Medizinische Fakultät

Leipziger Forschungszentrum für Zivilisationserkrankungen

This publication is supported by LIFE – Leipzig Research Center for Civilization Diseases, Universität Leipzig. This project was funded by means of the European Social Fund and the Free State of Saxony.

Single isoform case

slide-26
SLIDE 26

Medizinische Fakultät

Leipziger Forschungszentrum für Zivilisationserkrankungen

This publication is supported by LIFE – Leipzig Research Center for Civilization Diseases, Universität Leipzig. This project was funded by means of the European Social Fund and the Free State of Saxony.

Multiple isoform case

  • No closed-form solution available for ML if n>1
  • Likelihood function is concave – any local maximum is also a

global one

  • Use numerical methods to find maximum
  • Here: coordinate-wise hill climbing
slide-27
SLIDE 27

Medizinische Fakultät

Leipziger Forschungszentrum für Zivilisationserkrankungen

This publication is supported by LIFE – Leipzig Research Center for Civilization Diseases, Universität Leipzig. This project was funded by means of the European Social Fund and the Free State of Saxony.

Fisher information matrix

  • We now have a point estimate for the expression index according to some

Poisson model

  • How reliable is the estimate?
  • We need some confidence interval to test for DE
  • The distribution of the estimate can be approximated with a normal distribution

with mean (true parameter) and covariance (inverse Fisher information matrix)

slide-28
SLIDE 28

Medizinische Fakultät

Leipziger Forschungszentrum für Zivilisationserkrankungen

This publication is supported by LIFE – Leipzig Research Center for Civilization Diseases, Universität Leipzig. This project was funded by means of the European Social Fund and the Free State of Saxony.

Results

  • Use RNA-seq dataset from Mortavazi et al.(2008): three mouse tissue samples

with 2 replicates

  • 60-80 M reads each
  • Mapped with SeqMap tool from the same authors to mm9
  • mRNA annotations from mm9 RefSeq mRNA database
slide-29
SLIDE 29

Medizinische Fakultät

Leipziger Forschungszentrum für Zivilisationserkrankungen

This publication is supported by LIFE – Leipzig Research Center for Civilization Diseases, Universität Leipzig. This project was funded by means of the European Social Fund and the Free State of Saxony.

Differentially expressed isoforms

slide-30
SLIDE 30

Medizinische Fakultät

Leipziger Forschungszentrum für Zivilisationserkrankungen

This publication is supported by LIFE – Leipzig Research Center for Civilization Diseases, Universität Leipzig. This project was funded by means of the European Social Fund and the Free State of Saxony.

From isoforms to gene expression: which reads to count?

slide-31
SLIDE 31

Medizinische Fakultät

Leipziger Forschungszentrum für Zivilisationserkrankungen

This publication is supported by LIFE – Leipzig Research Center for Civilization Diseases, Universität Leipzig. This project was funded by means of the European Social Fund and the Free State of Saxony.

From isoforms to gene expression

  • Exon intersection similar to microarrays, but they can

reduce statistical power as reads are not considered

  • Exon union underestimated expression for alternatively

spliced genes

  • This paper used “sum of all reads” = exon union
slide-32
SLIDE 32

Medizinische Fakultät

Leipziger Forschungszentrum für Zivilisationserkrankungen

This publication is supported by LIFE – Leipzig Research Center for Civilization Diseases, Universität Leipzig. This project was funded by means of the European Social Fund and the Free State of Saxony.

slide-33
SLIDE 33

Medizinische Fakultät

Leipziger Forschungszentrum für Zivilisationserkrankungen

This publication is supported by LIFE – Leipzig Research Center for Civilization Diseases, Universität Leipzig. This project was funded by means of the European Social Fund and the Free State of Saxony.

Statistical inference

  • Problem: Fisher information matrix is degenerated

especially for isoforms with low expression

  • Need to regularize covariance matrix
  • Use Importance Sampling method to simulate from

posterior distribution

slide-34
SLIDE 34

Medizinische Fakultät

Leipziger Forschungszentrum für Zivilisationserkrankungen

This publication is supported by LIFE – Leipzig Research Center for Civilization Diseases, Universität Leipzig. This project was funded by means of the European Social Fund and the Free State of Saxony.

slide-35
SLIDE 35

Medizinische Fakultät

Leipziger Forschungszentrum für Zivilisationserkrankungen

This publication is supported by LIFE – Leipzig Research Center for Civilization Diseases, Universität Leipzig. This project was funded by means of the European Social Fund and the Free State of Saxony.

Summary: isoform expression

  • Isoform expression inference in RNA-seq possible using

Poisson model

  • “Inferences agree with detailed inspection”
  • Exon junction reads reduce confidence interval
slide-36
SLIDE 36

Medizinische Fakultät

Leipziger Forschungszentrum für Zivilisationserkrankungen

This publication is supported by LIFE – Leipzig Research Center for Civilization Diseases, Universität Leipzig. This project was funded by means of the European Social Fund and the Free State of Saxony.

Paired-end sequencing

P5 grafting forward read primer DNA insert index read primer index P7 grafting reverse read primer

slide-37
SLIDE 37

Medizinische Fakultät

Leipziger Forschungszentrum für Zivilisationserkrankungen

This publication is supported by LIFE – Leipzig Research Center for Civilization Diseases, Universität Leipzig. This project was funded by means of the European Social Fund and the Free State of Saxony.

Paired-end sequencing

Forward read Index read Reverse Read

slide-38
SLIDE 38

Medizinische Fakultät

Leipziger Forschungszentrum für Zivilisationserkrankungen

This publication is supported by LIFE – Leipzig Research Center for Civilization Diseases, Universität Leipzig. This project was funded by means of the European Social Fund and the Free State of Saxony.

Informative Reads

slide-39
SLIDE 39

Medizinische Fakultät

Leipziger Forschungszentrum für Zivilisationserkrankungen

This publication is supported by LIFE – Leipzig Research Center for Civilization Diseases, Universität Leipzig. This project was funded by means of the European Social Fund and the Free State of Saxony.

Incorporation in the Poisson Model

  • Model insert size using sampling rate ai
slide-40
SLIDE 40

Medizinische Fakultät

Leipziger Forschungszentrum für Zivilisationserkrankungen

This publication is supported by LIFE – Leipzig Research Center for Civilization Diseases, Universität Leipzig. This project was funded by means of the European Social Fund and the Free State of Saxony.

Overdispersion and Negative Binomial

  • Poisson distribution (one parameter) cannot account for

(high) biological variability across samples

  • Most RNA-seq studies have not enough replicates to

estimate variability using a permutation-derived approach

  • Model variance using e.g.negative binomial distribution
slide-41
SLIDE 41

Medizinische Fakultät

Leipziger Forschungszentrum für Zivilisationserkrankungen

This publication is supported by LIFE – Leipzig Research Center for Civilization Diseases, Universität Leipzig. This project was funded by means of the European Social Fund and the Free State of Saxony.

slide-42
SLIDE 42

Medizinische Fakultät

Leipziger Forschungszentrum für Zivilisationserkrankungen

This publication is supported by LIFE – Leipzig Research Center for Civilization Diseases, Universität Leipzig. This project was funded by means of the European Social Fund and the Free State of Saxony.

Summary and Challenges

  • Straightforward statistical model for isoform expression

available

  • However, reads are not truly random and uniform
  • High peaks and 3’ bias in read distributions
  • Positive correlations in read distributions between

tissues

  • Isoform-level annotations not complete
  • Most studies comprise few biological replicates