fdr and online fdr
play

FDR and Online FDR Adel Javanmard and Andrea Montanari USC and - PowerPoint PPT Presentation

FDR and Online FDR Adel Javanmard and Andrea Montanari USC and Stanford December 11, 2015 Andrea Montanari (Stanford) FDR and Online FDR December 11, 2015 1 / 34 Outline Large-scale Hypothesis Testing 1 Controlling FDR 2 Controlling


  1. FDR and Online FDR Adel Javanmard and Andrea Montanari USC and Stanford December 11, 2015 Andrea Montanari (Stanford) FDR and Online FDR December 11, 2015 1 / 34

  2. Outline Large-scale Hypothesis Testing 1 Controlling FDR 2 Controlling Online FDR 3 Conclusion 4 Andrea Montanari (Stanford) FDR and Online FDR December 11, 2015 2 / 34

  3. Large-scale Hypothesis Testing Andrea Montanari (Stanford) FDR and Online FDR December 11, 2015 3 / 34

  4. Assume ◮ I am the CTO of a big web company ◮ ✙ 1000 data scientists ◮ ✙ 1000 ‘brilliant ideas’ per day ◮ Users are more likely to click on the first search result ◮ Users are more likely to on top right ads ◮ Users are more engaged with page layout A ◮ How to avoid wasting company resources? Compute ‘significance level’ from data! Andrea Montanari (Stanford) FDR and Online FDR December 11, 2015 4 / 34

  5. Assume ◮ I am the CTO of a big web company ◮ ✙ 1000 data scientists ◮ ✙ 1000 ‘brilliant ideas’ per day ◮ Users are more likely to click on the first search result ◮ Users are more likely to on top right ads ◮ Users are more engaged with page layout A ◮ How to avoid wasting company resources? Compute ‘significance level’ from data! Andrea Montanari (Stanford) FDR and Online FDR December 11, 2015 4 / 34

  6. Assume ◮ I am the CTO of a big web company ◮ ✙ 1000 data scientists ◮ ✙ 1000 ‘brilliant ideas’ per day ◮ Users are more likely to click on the first search result ◮ Users are more likely to on top right ads ◮ Users are more engaged with page layout A ◮ How to avoid wasting company resources? Compute ‘significance level’ from data! Andrea Montanari (Stanford) FDR and Online FDR December 11, 2015 4 / 34

  7. Assume ◮ I am the CTO of a big web company ◮ ✙ 1000 data scientists ◮ ✙ 1000 ‘brilliant ideas’ per day ◮ Users are more likely to click on the first search result ◮ Users are more likely to on top right ads ◮ Users are more engaged with page layout A ◮ How to avoid wasting company resources? Compute ‘significance level’ from data! Andrea Montanari (Stanford) FDR and Online FDR December 11, 2015 4 / 34

  8. Assume ◮ I am the CTO of a big web company ◮ ✙ 1000 data scientists ◮ ✙ 1000 ‘brilliant ideas’ per day ◮ Users are more likely to click on the first search result ◮ Users are more likely to on top right ads ◮ Users are more engaged with page layout A ◮ How to avoid wasting company resources? Compute ‘significance level’ from data! Andrea Montanari (Stanford) FDR and Online FDR December 11, 2015 4 / 34

  9. Assume ◮ I am the CTO of a big web company ◮ ✙ 1000 data scientists ◮ ✙ 1000 ‘brilliant ideas’ per day ◮ Users are more likely to click on the first search result ◮ Users are more likely to on top right ads ◮ Users are more engaged with page layout A ◮ How to avoid wasting company resources? Compute ‘significance level’ from data! Andrea Montanari (Stanford) FDR and Online FDR December 11, 2015 4 / 34

  10. Example Idea: Users click more on the first search result than on the second Null H 0 : Users are equaly likely to click on first and second Data: ◮ n events ◮ n 1 clicks on the first result ◮ n 2 ❂ n � n 1 clicks on the second result Idea z ✑ n 1 � n 2 H 0 ✮ ♣ n ✙ N ✭ 0 ❀ 1 ✮ ◮ If z ✢ 1, then declare it significant Andrea Montanari (Stanford) FDR and Online FDR December 11, 2015 5 / 34

  11. Example Idea: Users click more on the first search result than on the second Null H 0 : Users are equaly likely to click on first and second Data: ◮ n events ◮ n 1 clicks on the first result ◮ n 2 ❂ n � n 1 clicks on the second result Idea z ✑ n 1 � n 2 H 0 ✮ ♣ n ✙ N ✭ 0 ❀ 1 ✮ ◮ If z ✢ 1, then declare it significant Andrea Montanari (Stanford) FDR and Online FDR December 11, 2015 5 / 34

  12. Example Idea: Users click more on the first search result than on the second Null H 0 : Users are equaly likely to click on first and second Data: ◮ n events ◮ n 1 clicks on the first result ◮ n 2 ❂ n � n 1 clicks on the second result Idea z ✑ n 1 � n 2 H 0 ✮ ♣ n ✙ N ✭ 0 ❀ 1 ✮ ◮ If z ✢ 1, then declare it significant Andrea Montanari (Stanford) FDR and Online FDR December 11, 2015 5 / 34

  13. Example Idea: Users click more on the first search result than on the second Null H 0 : Users are equaly likely to click on first and second Data: ◮ n events ◮ n 1 clicks on the first result ◮ n 2 ❂ n � n 1 clicks on the second result Idea z ✑ n 1 � n 2 H 0 ✮ ♣ n ✙ N ✭ 0 ❀ 1 ✮ ◮ If z ✢ 1, then declare it significant Andrea Montanari (Stanford) FDR and Online FDR December 11, 2015 5 / 34

  14. Formally z ✑ n 1 � n 2 ♣ n ✙ N ✭ 0 ❀ 1 ✮ p-value ( G ✘ N ✭ 0 ❀ 1 ✮ ) ❩ ✶ e � x 2 ❂ 2 p ✑ P ✭ G ✕ z ✮ ❂ ♣ 2 ✙ d x z ◮ Null: p ✘ Uniform ✭❬ 0 ❀ 1 ❪✮ (Definition) ◮ Small p : significant Andrea Montanari (Stanford) FDR and Online FDR December 11, 2015 6 / 34

  15. Formally z ✑ n 1 � n 2 ♣ n ✙ N ✭ 0 ❀ 1 ✮ p-value ( G ✘ N ✭ 0 ❀ 1 ✮ ) ❩ ✶ e � x 2 ❂ 2 p ✑ P ✭ G ✕ z ✮ ❂ ♣ 2 ✙ d x z ◮ Null: p ✘ Uniform ✭❬ 0 ❀ 1 ❪✮ (Definition) ◮ Small p : significant Andrea Montanari (Stanford) FDR and Online FDR December 11, 2015 6 / 34

  16. Company policy Bring your idea up only if p ✔ ☛ [ ☛ ❂ 0 ✿ 05, Fisher’s rule of thumb] Andrea Montanari (Stanford) FDR and Online FDR December 11, 2015 7 / 34

  17. Company policy Bring your idea up only if p ✔ ☛ [ ☛ ❂ 0 ✿ 05, Fisher’s rule of thumb] Andrea Montanari (Stanford) FDR and Online FDR December 11, 2015 7 / 34

  18. Problem ◮ M ✙ 1000 hypotheses per day ◮ M ☛ ✙ 1000 ✁ 0 ✿ 05 ❂ 50 pass the test ◮ Still too much waste New company policy (Bonferroni): Bring up your idea only if p ✔ ☛ M ❂ ☛❂ M Andrea Montanari (Stanford) FDR and Online FDR December 11, 2015 8 / 34

  19. Problem ◮ M ✙ 1000 hypotheses per day ◮ M ☛ ✙ 1000 ✁ 0 ✿ 05 ❂ 50 pass the test ◮ Still too much waste New company policy (Bonferroni): Bring up your idea only if p ✔ ☛ M ❂ ☛❂ M Andrea Montanari (Stanford) FDR and Online FDR December 11, 2015 8 / 34

  20. Problem ◮ M ✙ 1000 hypotheses per day ◮ M ☛ ✙ 1000 ✁ 0 ✿ 05 ❂ 50 pass the test ◮ Still too much waste New company policy (Bonferroni): Bring up your idea only if p ✔ ☛ M ❂ ☛❂ M Andrea Montanari (Stanford) FDR and Online FDR December 11, 2015 8 / 34

  21. Problem with Bonferroni Bring up your idea only if p ✔ ☛ M ❂ ☛❂ M ◮ More data scientists ✮ Less sensitive ◮ ☛ false positives per day ✮ Does not scale with M Andrea Montanari (Stanford) FDR and Online FDR December 11, 2015 9 / 34

  22. Problem with Bonferroni Bring up your idea only if p ✔ ☛ M ❂ ☛❂ M ◮ More data scientists ✮ Less sensitive ◮ ☛ false positives per day ✮ Does not scale with M Andrea Montanari (Stanford) FDR and Online FDR December 11, 2015 9 / 34

  23. What do we want to achieve? Andrea Montanari (Stanford) FDR and Online FDR December 11, 2015 10 / 34

  24. FDR (Benjamini, Hochberg, 1995) ◮ M hypotheses ◮ D ✑ Total number of discoveries (positives) ◮ FD ✑ Number of false discoveries FD ♥ ♦ FDR ❂ E max ✭ D ❀ 1 ✮ Interpretation: FDR ✔ 0 ✿ 1 ✮ At most 10 ✪ of the discoveries is false. Andrea Montanari (Stanford) FDR and Online FDR December 11, 2015 11 / 34

  25. FDR (Benjamini, Hochberg, 1995) ◮ M hypotheses ◮ D ✑ Total number of discoveries (positives) ◮ FD ✑ Number of false discoveries FD ♥ ♦ FDR ❂ E max ✭ D ❀ 1 ✮ Interpretation: FDR ✔ 0 ✿ 1 ✮ At most 10 ✪ of the discoveries is false. Andrea Montanari (Stanford) FDR and Online FDR December 11, 2015 11 / 34

  26. Controlling FDR Andrea Montanari (Stanford) FDR and Online FDR December 11, 2015 12 / 34

  27. Setting Null hypotheses: H 0 ❀ 1 ❀ H 0 ❀ 2 ❀ ✿ ✿ ✿ ❀ H 0 ❀ M p-values: p 1 ❀ p 2 ❀ ✿ ✿ ✿ ❀ p M Ground truth: ✒ 1 ❀ ✒ 2 ❀ ✿ ✿ ✿ ❀ ✒ M ❬ H 0 ❀ i ✿ ✒ i ❂ 0 ❪ Test ouput ( p ❂ ✭ p i ✮ 1 ✔ i ✔ M : T 1 ✭ p ✮ ❀ T 2 ✭ p ✮ ❀ ✿ ✿ ✿ ❀ T M ✭ p ✮ ✷ ❢ 0 ❀ 1 ❣ ✒ i ❂ 0 ✮ p i ✘ Uniform ✭❬ 0 ❀ 1 ❪✮ Andrea Montanari (Stanford) FDR and Online FDR December 11, 2015 13 / 34

  28. Setting Null hypotheses: H 0 ❀ 1 ❀ H 0 ❀ 2 ❀ ✿ ✿ ✿ ❀ H 0 ❀ M p-values: p 1 ❀ p 2 ❀ ✿ ✿ ✿ ❀ p M Ground truth: ✒ 1 ❀ ✒ 2 ❀ ✿ ✿ ✿ ❀ ✒ M ❬ H 0 ❀ i ✿ ✒ i ❂ 0 ❪ Test ouput ( p ❂ ✭ p i ✮ 1 ✔ i ✔ M : T 1 ✭ p ✮ ❀ T 2 ✭ p ✮ ❀ ✿ ✿ ✿ ❀ T M ✭ p ✮ ✷ ❢ 0 ❀ 1 ❣ ✒ i ❂ 0 ✮ p i ✘ Uniform ✭❬ 0 ❀ 1 ❪✮ Andrea Montanari (Stanford) FDR and Online FDR December 11, 2015 13 / 34

  29. Benjamini-Hochberg procedure ◮ Order the p-values p ✭ 1 ✮ ✔ p ✭ 2 ✮ ✔ ✁ ✁ ✁ ✔ p ✭ M ✮ ◮ Set threshold p ✭ i ✮ ✔ i ☛ ♥ ♦ I ❂ max i ✷ ❬ M ❪ ✿ M ◮ Reject at level p ✭ I ✮ : ✭ 1 if p ❵ ✔ p ✭ I ✮ , T ❵ ✭ p ✮ ❂ 0 otherwise. Theorem ( Benjamini, Hochberg, 1995) If the p-values are independent, and BH is used, then FDR ✔ ☛ Andrea Montanari (Stanford) FDR and Online FDR December 11, 2015 14 / 34

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend