Post hoc bounds on false positives using Post hoc bounds on false - PowerPoint PPT Presentation

Post hoc bounds on false positives using Post hoc bounds on false positives using reference families reference families Pierre Neuvial Pierre Neuvial CNRS and Institut de Mathématiques de Toulouse (France) CNRS and Institut de Mathématiques de Toulouse (France) joint work with Gilles Blanchard, Guillermo Durand, Etienne Roquain, joint work with Gilles Blanchard, Guillermo Durand, Etienne Roquain, Marie Perrot-Dockès Marie Perrot-Dockès https://arxiv.org/abs/1910.11575 https://arxiv.org/abs/1910.11575 Funded by Funded by ANR SansSouci ANR SansSouci 1 / 23 1 / 23

Case study: di�erential expression in genomics Example: Leukemia data set Chiaretti et. al., Clinical cancer research , 11(20):7209–7219, 2005 Data: gene expression measurements (mRNA) genes m = 12625 cancer patients in two subgroups: n = 79 BCR/ABL: 37 patients NEG: 42 patients Question Find genes whose average expression differs between the two groups 2 / 23

Leukemia data set: volcano plot 3 / 23

Notation null hypotheses to be tested H = {1, … m } m : true null hypotheses, H 0 ⊂ H H 1 = H ∖ H 0 , m 0 = | H 0 | π 0 = m 0 / m : -values ( p i ) 1≤ i ≤ m p : a set of rejected hypotheses R ⊂ H : number of "false positives" within . | R ∩ H 0 | R Goal: post hoc inference Find a -level post hoc upper bound on , ie such that (1 − α ) | S ∩ H 0 | V α P (∀ S ⊂ {1 … m }, | S ∩ H 0 | ≤ V α ( S )) ≥ 1 − α Some related works Genovese & Wasserman, Ann. Stat. , 2006; Goeman & Solari, Stat. Sci. , 2011 Katsevich and Ramdas, ArXiv:1803.06790 Meijer, Krebs, and Goeman SAGMB , 2015 4 / 23

Starting point: post hoc bound via Simes' inequality Under PRDS, Simes' inequality implies P ( ∀ k , | R k ∩ H 0 | ≤ k − 1 ) ≥ 1 − α where R k = { i / p i ≤ αk / m } Corollary: post hoc bound on (1 − α ) | S ∩ H 0 | ¯ ¯ ¯ ¯ V α ( S ) = min 1≤ k ≤| S | { ∑ 1{ p i > αk / m } + k − 1 } i ∈ S Recovers the bound of Goeman and Solari, Stat. Science , 2011. Proof: | S ∩ H 0 | = | S ∩ R c k ∩ H 0 | + | S ∩ R k ∩ H 0 | ≤ | S ∩ R c k | + | R k ∩ H 0 | 5 / 23

Leukemia data set: volcano plot (Simes-based bound) 6 / 23

Post hoc control via reference families Post hoc control via reference families 7 / 23 7 / 23

Joint Error Rate control implies post hoc bound De�nition: JER controlling family such that R = ( R k , ζ k ) k P ( ∀ k , | R k ∩ H 0 | ≤ ζ k ) ≥ 1 − α Simes: , R k = { i / p i ≤ αk / m } ζ k = k − 1 Property: interpolation yields valid post hoc bounds (1 − α ) V ∗ α ( S ) = max{| S ∩ A | : A s.t. ∀ k , | R k ∩ A | ≤ ζ k } ¯ ¯ ¯ ¯ 1≤ k ≤| S | { | S ∩ R c V α ( S ) = min k | + ζ k } Simes: ¯ ¯ ¯ ¯ V ∗ α ( S ) = V α ( S ) = min 1≤ k ≤| S | {∑ i ∈ S 1{ p i > αk / m } + k − 1 } Main question: how to obtain JER control? 8 / 23

Contributions: post hoc bounds in two dual cases -value level sets structured hypotheses p Fixed given by prior R k knowledge Find ζ k = ζ k ( X ) Fixed JER control = joint estimation of ζ k (= k − 1) R k = R k ( X ) | R k ∩ H 0 | JER control = joint control of the - k FWER 9 / 23

Case 1: Fixed Case 1: Fixed , random , random ζ k R k Blanchard, N., Roquain: Post Hoc Blanchard, N., Roquain: Post Hoc Confidence Bounds on False Positives Confidence Bounds on False Positives Using Reference Families Using Reference Families Annals of Statistics Annals of Statistics , to appear. , to appear. R package sansSouci R package sansSouci 10 / 23 10 / 23

Setup: , ζ k = k − 1 R k = { i : p i ≤ t k ( λ )} Properties The are nested α ( S ) = ¯ ¯ ¯ ¯ ⇒ V ∗ R k V α ( S ) For the reference family : ( R k , ζ k ) JER control holds for any such that λ P ( ∃ k , p ( k : H 0 ) ≤ t k ( λ ) ) ≤ α Examples for under PRDS λ = α t k ( λ ) = λk / m for quantile of under λ = α t k ( λ ) = λ − Beta ( k + 1, m − k + 1) independence adaptivity to dependence? 11 / 23

Adaptivity to dependence Goal: estimate the largest such that λ P ( ∃ k , p ( k : H 0 ) ≤ t k ( λ ) ) ≤ α Tool: randomization , e.g. class label permutation in multiple two-sample tests Example: quantile of t k ( λ ) = λ − Beta ( k + 1, m − k + 1) 12 / 23

Leukemia data: con�dence bounds on | S ∩ H 1 | 13 / 23

Leukemia data: con�dence bounds on FDP = | S ∩ H 0 | | S |∨1 14 / 23

Leukemia data set: volcano plot (Simes-based bound) 15 / 23

Leukemia data set: volcano plot (after -calibration) λ 16 / 23

Case 2: Fixed Case 2: Fixed , random , random R k ζ k Durand, Blanchard, N., Roquain: Post hoc false positive control for Durand, Blanchard, N., Roquain: Post hoc false positive control for structured hypotheses, structured hypotheses, Scandinavian Journal of Statistics Scandinavian Journal of Statistics (2020). (2020). arxiv:1807.01470 arxiv:1807.01470 R package R package sansSouci sansSouci 17 / 23 17 / 23

Setup: Fixed , random R k ζ k Forest assumption: the are either nested or disjoint ( R k ) k =1… K Questions: 1. How to chose yielding JER control? ζ k ( X ) 2. How to estimate the associated post hoc bound V ∗ α 18 / 23

1. JER control Device: DKWM inequality Dvoretzky, Kiefer, and Wolfowitz (1956) Ann. Math. Stat. Massart (1990) Ann. Prob. Proposition Under independence, JER control is obtained for 2 1/2 ⎥ ⎢ C 2 ⎢ ⎥ ∑ i ∈ R 1 1 { p i ( X ) > t } C ⎢ ⎥ ⎢ ⎥ ζ k ( X ) = | R k | ∧ min + ( + ) , 4(1 − t ) 2 1 − t 2(1 − t ) t ∈[0,1) ⎣ ⎦ where 1 K C = √ log ( ) α 2 19 / 23

2. Algorithm to compute V ∗ α Proposition The bound is obtained recursively by examining partitions at each V ∗ α possible depth in the forest. 20 / 23

Numerical experiments: Simes vs tree-based methods 21 / 23

Leukemia data set: regional association plot The selection can be done interactively: https://pneuvial.shinyapps.io/posthoc-bounds_ordered-hypotheses/ 22 / 23

Conclusions Versatile approach to post hoc inference JER control post hoc bounds ⇒ JER control can be obtained from classical probabilistic inequalities Fixed , random : Simes' inequality under PRDS ζ k R k Fixed , random : DKWM inequality under independence R k ζ k adaptation to dependence: sharper JER control can be obtained by randomization Extensions Applications to genomic data analysis e.g. differential analysis along the genome Fixed , random : extension to specific dependence settings R k ζ k See poster of Marie Perrot-Dockès: "Improving structured post hoc inference via a Hidden Markov Model" 23 / 23

Post hoc bounds on false positives using Post hoc bounds on false - PowerPoint PPT Presentation

Post hoc bounds on false positives using Post hoc bounds on false positives using reference families reference families Pierre Neuvial Pierre Neuvial CNRS and Institut de Mathmatiques de Toulouse (France) CNRS and Institut de Mathmatiques

# of true positives true positive rate = # of known positives (Proportion of actual positives

# of true positives true positive rate = # of known positives (Proportion of actual positives

PUBLIC POLICY TOWARD ABUSE OF FIRM DOMINANCE Outline Public policy: false positives and

False fasting is driven by pride False fasting is driven by pride False fasting is

Building Your Own WAF as a Service and Forgetting about False Positives 1 Building Your Own WAF

False Layers Delmarva Variant Strain Phylogenetic Tree Cloacal/Pharyngal One of these 50 week

FALSE CREEK SOUTH TOPIC WORKSHOP 2: SUSTAINABILITY Saturday, December 2, 2017 | False Creek

Circuit Lower-bounds Lecture 24 Weak circuits are indeed weak 1 Circuit Lower-bounds 2

Rheumatoid Arthritis Diagnosis Avoiding CCP False Positives Through Test Selection Dr. Teresa

Evaluating Sensitive Question Techniques An Approach that Detects False Positives oglinger 1

False-Positives, p-Hacking, Statistical Power, and Evidential Value Leif D. Nelson University of

Scoring model for IoCs by combining open intelligence feeds to reduce false positives Authors:

Building Your Own WAF as a Service and Forgetting about False Positives Juan Berner 1 About me

Duplicate Payments: Remove the Noise of False Positives Karl Andersson, Founder Phone:

Specification Mining With Few False Positives Claire Le Goues Westley Weimer University of

Alert classification to reduce false positives in intrusion detection P h D D e f e n s e P r e

using R for regression model selection with adaptive penalties procedures based on the FDR

Farhad Fallah 11/17/2015 Previous descriptors Introduction to the HOG descriptor

CSSE463: Image Recognition Day 11 Due: Written assignment 1 tomorrow, 4:00 pm Start

Off-line Signature Verification: A Circular Outline Grid-Based Feature Extraction Approach

FHA CATALYST : ELECTRONIC APPRAISAL DELIVERY MODULE - August 27, 2020 LENDER USER OVERVIEW TECH

COLLEGE & UNIVERSITY FREQUENTLY ASKED QUESTIONS Last Updated 6/02/2020 COLLEGE &

Can we trust the Bible? Rev. Brad Rogers Historical Critical Method 1. Author? 2. Date? 3.

Second opportunity to apply for NGIatlantic.eu Open calls 23 September 2020, 16:00 CEST Webinar

Sambuz

Useful Links

Newsletter

Mail Us

Post hoc bounds on false positives using Post hoc bounds on false - PowerPoint PPT Presentation

Post hoc bounds on false positives using Post hoc bounds on false positives using reference families reference families Pierre Neuvial Pierre Neuvial CNRS and Institut de Mathmatiques de Toulouse (France) CNRS and Institut de Mathmatiques

# of true positives true positive rate = # of known positives (Proportion of actual positives

# of true positives true positive rate = # of known positives (Proportion of actual positives

PUBLIC POLICY TOWARD ABUSE OF FIRM DOMINANCE Outline Public policy: false positives and

False fasting is driven by pride False fasting is driven by pride False fasting is

Building Your Own WAF as a Service and Forgetting about False Positives 1 Building Your Own WAF

False Layers Delmarva Variant Strain Phylogenetic Tree Cloacal/Pharyngal One of these 50 week

FALSE CREEK SOUTH TOPIC WORKSHOP 2: SUSTAINABILITY Saturday, December 2, 2017 | False Creek

Circuit Lower-bounds Lecture 24 Weak circuits are indeed weak 1 Circuit Lower-bounds 2

Rheumatoid Arthritis Diagnosis Avoiding CCP False Positives Through Test Selection Dr. Teresa

Evaluating Sensitive Question Techniques An Approach that Detects False Positives oglinger 1

False-Positives, p-Hacking, Statistical Power, and Evidential Value Leif D. Nelson University of

Scoring model for IoCs by combining open intelligence feeds to reduce false positives Authors:

Building Your Own WAF as a Service and Forgetting about False Positives Juan Berner 1 About me

Duplicate Payments: Remove the Noise of False Positives Karl Andersson, Founder Phone:

Specification Mining With Few False Positives Claire Le Goues Westley Weimer University of

Alert classification to reduce false positives in intrusion detection P h D D e f e n s e P r e

using R for regression model selection with adaptive penalties procedures based on the FDR

Farhad Fallah 11/17/2015 Previous descriptors Introduction to the HOG descriptor

CSSE463: Image Recognition Day 11 Due: Written assignment 1 tomorrow, 4:00 pm Start

Off-line Signature Verification: A Circular Outline Grid-Based Feature Extraction Approach

FHA CATALYST : ELECTRONIC APPRAISAL DELIVERY MODULE - August 27, 2020 LENDER USER OVERVIEW TECH

COLLEGE &amp; UNIVERSITY FREQUENTLY ASKED QUESTIONS Last Updated 6/02/2020 COLLEGE &amp;

Can we trust the Bible? Rev. Brad Rogers Historical Critical Method 1. Author? 2. Date? 3.

Second opportunity to apply for NGIatlantic.eu Open calls 23 September 2020, 16:00 CEST Webinar

Sambuz

Useful Links

Newsletter

Mail Us

COLLEGE & UNIVERSITY FREQUENTLY ASKED QUESTIONS Last Updated 6/02/2020 COLLEGE &