Ranking candidate genes from Ranking candidate genes from - - PowerPoint PPT Presentation

ranking candidate genes from ranking candidate genes from
SMART_READER_LITE
LIVE PREVIEW

Ranking candidate genes from Ranking candidate genes from - - PowerPoint PPT Presentation

Ranking candidate genes from Ranking candidate genes from perturbation experiments Niko Beerenwinkel Gene ranking Goal: Identify (or prioritize) genes that affect readout, i.e., are involved in biological process of interest i l d i bi


slide-1
SLIDE 1

Ranking candidate genes from Ranking candidate genes from perturbation experiments

Niko Beerenwinkel

slide-2
SLIDE 2

Gene ranking

  • Goal: Identify (or prioritize) genes that affect readout, i.e.,

i l d i bi l i l f i t t are involved in biological process of interest

  • Issues
  • Noise (readout, siRNA specificity)
  • Design (siRNA library, replicates, validation screens)
  • Limited resources
  • Procedure
  • Normalization: quantiles z-score error models

Normalization: quantiles, z score, error models

  • Rank by normalized readout or p-value

2

slide-3
SLIDE 3

RSA: Redundant siRNA activity analysis

  • Rank all siRNAs (wells) by readout
  • Assign p value to each gene based on the rank distribution of all
  • Assign p-value to each gene based on the rank distribution of all

siRNAs targeting it (hypergeometric model)

3

König et al, 2007

slide-4
SLIDE 4

Comparing gene rankings

  • Intersection metric
  • Spearman’s footrule
  • 4
slide-5
SLIDE 5

Stable variables

  • Let Λ be the set of all (reasonable) cut-offs for a given

ranking (i e λ ∈ Λ is a regularization parameter) ranking (i.e., λ ∈ Λ is a regularization parameter).

  • The set of selected genes

i f ti f th l I

ˆ Sλ = ˆ Sλ(I)

is a function of the samples I.

  • For a given threshold π, the set of stable variables is

ˆ Sstable =

½

k : max

λ∈Λ P

³

k ∈ ˆ Sλ´ ≥ π

¾

  • P can be estimated by sub- or re-sampling

½

λ∈Λ

³ ´ ¾

  • P can be estimated by sub- or re-sampling.

5

slide-6
SLIDE 6

Stability selection (Meinshausen & Bühlmann, 2010)

  • Under certain assumptions, the expectation of the number

f f l l l t d i bl V i b d d b

  • f falsely selected variables V is bounded by

( ) 1 q2

Λ

E(V ) ≤ 1 2π − 1 qΛ p

where p is the total number of genes, and

h

| ˆλ( )|

i

the expected number of stable genes

qΛ = E

h

| ∪λ∈Λ Sλ(I)|

i

the expected number of stable genes.

  • In practice we can set π and Λ to control false positives
  • In practice, we can set π and Λ to control false positives.

6

slide-7
SLIDE 7

Data sets

  • Hardt lab

S l ll i h ll

  • Salmonella screen in human cells
  • 19,000 genes
  • Read-out: infection rate

Read out: infection rate

  • ~4 different siRNAs per gene, no replicates
  • Merdes lab (Saj et al., Dev Cell, 2010)
  • Notch screen in Drosophila
  • 12,000 genes
  • Read-out: Notch activity
  • 4 replicates
  • 4 replicates
  • Secondary and in vivo validation screens

7

slide-8
SLIDE 8

Salmonella screen: ranking

  • Quantile normalization
  • Rankings (Kendall’s tau distance):

8

slide-9
SLIDE 9

Salmonella screen: stability

Λ Λ Λ Λ

9

slide-10
SLIDE 10

Notch screen: raw data

10

slide-11
SLIDE 11

Notch screen: quantile normalization

11

slide-12
SLIDE 12

Notch screen: normalization

  • Raw data
  • Quantiles

cor Quantiles rrelation

  • Z-scores

12

slide-13
SLIDE 13

Notch screen: ranking

  • Quantile-normalized
  • Ranking distance (Kendall’s tau)

13

slide-14
SLIDE 14

Notch screen: reproducibility

  • Leave-one-out:

R ki b d Ranking based on three replicates validated with fourth validated with fourth replicate C t ff 300 f

  • Cut-off 300 for

validation

14

slide-15
SLIDE 15

Notch screen: average leave-one-out ROC curves for different normalizations different normalizations

15

slide-16
SLIDE 16

Notch screen: ROC analysis of validation screen

  • Secondary screen of 900 genes
  • Focus on down regulation

All 12 000 T 254 All 12,000 genes Top 254 genes

16

slide-17
SLIDE 17

Notch screen: stability, in vivo validation

  • Median ranking of top 2000 genes

Λ

Median 20

17

slide-18
SLIDE 18

Conclusions

  • Both quantile and z-score normalzation improve correlation

d d ibilit and reproducibility.

  • Selecting stable genes complements selection of high-

i scoring genes.

  • Stable sets quantify reproducibility of being among top k in

ranking

  • Upper bound on expected number of false positives
  • RSA produced fairly unstable sets

18

slide-19
SLIDE 19

Acknowledgements

  • Computational Biology Group, www.cbg.ethz.ch

J li Si b

  • Juliane Siebourg
  • Edgar Delgado-Eckert

C ll b

  • Collaborators
  • Gunter Merdes (D-BSSE, ETH Zurich)
  • Wolf-Dietrich Hardt (D-BIOL, ETH Zurich)
  • InfectX consortium
  • Funding
  • InfectX, SystemsX.ch

19