ranking candidate genes from ranking candidate genes from
play

Ranking candidate genes from Ranking candidate genes from - PowerPoint PPT Presentation

Ranking candidate genes from Ranking candidate genes from perturbation experiments Niko Beerenwinkel Gene ranking Goal: Identify (or prioritize) genes that affect readout, i.e., are involved in biological process of interest i l d i bi


  1. Ranking candidate genes from Ranking candidate genes from perturbation experiments Niko Beerenwinkel

  2. Gene ranking  Goal: Identify (or prioritize) genes that affect readout, i.e., are involved in biological process of interest i l d i bi l i l f i t t  Issues  Noise (readout, siRNA specificity)  Design (siRNA library, replicates, validation screens)  Limited resources  Procedure  Normalization: quantiles z-score error models Normalization: quantiles, z score, error models  Rank by normalized readout or p-value 2

  3. RSA: Redundant siRNA activity analysis  Rank all siRNAs (wells) by readout   Assign p value to each gene based on the rank distribution of all Assign p-value to each gene based on the rank distribution of all siRNAs targeting it (hypergeometric model) König et al, 2007 3

  4. Comparing gene rankings  Intersection metric  Spearman’s footrule  4

  5. Stable variables  Let Λ be the set of all (reasonable) cut-offs for a given ranking (i.e., λ ∈ Λ is a regularization parameter). λ ∈ Λ is a regularization parameter) ranking (i e  The set of selected genes S λ = ˆ S λ ( I ) ˆ i is a function of the samples I . f ti f th l I  For a given threshold π , the set of stable variables is ½ ½ ¾ ¾ ³ ³ S λ ´ ´ S stable = ˆ k ∈ ˆ ≥ π k : max λ ∈ Λ P λ ∈ Λ  P can be estimated by sub- or re-sampling  P can be estimated by sub- or re-sampling. 5

  6. Stability selection (Meinshausen & B ühlmann, 2010)  Under certain assumptions, the expectation of the number of falsely selected variables V is bounded by f f l l l t d i bl V i b d d b q 2 1 1 q Λ Λ E( V ) ≤ ( ) 2 π − 1 p where p is the total number of genes, and h h i i | ∪ λ ∈ Λ S λ ( I ) | ˆ λ ( ) | q Λ = E | the expected number of stable genes the expected number of stable genes.  In practice we can set π and Λ to control false positives  In practice, we can set π and Λ to control false positives. 6

  7. Data sets  Hardt lab  Salmonella screen in human cells S l ll i h ll  19,000 genes  Read-out: infection rate Read out: infection rate  ~4 different siRNAs per gene, no replicates  Merdes lab (Saj et al., Dev Cell, 2010)  Notch screen in Drosophila  12,000 genes  Read-out: Notch activity  4 replicates  4 replicates  Secondary and in vivo validation screens 7

  8. Salmonella screen: ranking  Quantile normalization  Rankings (Kendall’s tau distance): 8

  9. 9 Λ Λ Salmonella screen: stability Λ Λ

  10. 10 Notch screen: raw data

  11. 11 Notch screen: quantile normalization

  12. Notch screen: normalization  Raw data cor  Quantiles Quantiles rrelation  Z-scores 12

  13. Notch screen: ranking  Quantile-normalized  Ranking distance (Kendall’s tau) 13

  14. Notch screen: reproducibility  Leave-one-out: R Ranking based on ki b d three replicates validated with fourth validated with fourth replicate  Cut-off 300 for C t ff 300 f validation 14

  15. Notch screen: average leave-one-out ROC curves for different normalizations different normalizations 15

  16. Notch screen: ROC analysis of validation screen  Secondary screen of 900 genes  Focus on down regulation All 12 000 All 12,000 genes T Top 254 genes 254 16

  17. Notch screen: stability, in vivo validation  Median ranking of top 2000 genes Λ Median 20 17

  18. Conclusions  Both quantile and z-score normalzation improve correlation and reproducibility. d d ibilit  Selecting stable genes complements selection of high- scoring genes. i  Stable sets quantify reproducibility of being among top k in ranking  Upper bound on expected number of false positives  RSA produced fairly unstable sets 18

  19. Acknowledgements  Computational Biology Group, www.cbg.ethz.ch  Juliane Siebourg J li Si b  Edgar Delgado-Eckert  C ll b Collaborators  Gunter Merdes (D-BSSE, ETH Zurich)  Wolf-Dietrich Hardt (D-BIOL, ETH Zurich)  InfectX consortium  Funding  InfectX , SystemsX.ch 19

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend