sparse multiple testing can one estimate the null
play

Sparse multiple testing: can one estimate the null distribution? - PowerPoint PPT Presentation

Sparse multiple testing: can one estimate the null distribution? Etienne Roquain 1 Joint work with A. Carpentier 2 , S. Delattre 3 , N. Verzelen 4 , 1 LPSM, Sorbonne Universit, France 2 Otto-von-Guericke-Universitt Magdeburg, Allemagne 3 LPSM,


  1. Sparse multiple testing: can one estimate the null distribution? Etienne Roquain 1 Joint work with A. Carpentier 2 , S. Delattre 3 , N. Verzelen 4 , 1 LPSM, Sorbonne Université, France 2 Otto-von-Guericke-Universität Magdeburg, Allemagne 3 LPSM, Université de Paris, France 4 INRAE, Montpellier, France MMMS2 Luminy, 02/06/2020 Arxiv 1912.03109. "On using empirical null distributions in Benjamini-Hochberg procedure" To appear in AoS. "Estimating minimum effect with outlier selection " ANR “Sanssouci", ANR “BASICS", GDR ISIS “TASTY" Roquain, Etienne Sparse multiple testing 1 / 29

  2. Introduction 1 Upper bound 2 Lower bound 3 Additional results 4 One-sided alternatives 5 Roquain, Etienne Sparse multiple testing 2 / 29

  3. Motivation 1: null distribution unknown M67 photography, Package photutils Original Gaussian fitting Gumbel fitting ◮ Naive null distribution fitting ◮ Impact on the risk? Roquain, Etienne Sparse multiple testing Introduction 3 / 29

  4. Motivation 1: null distribution unknown M67 photography, Package photutils Original Gaussian fitting Gumbel fitting ◮ Naive null distribution fitting ◮ Impact on the risk? Roquain, Etienne Sparse multiple testing Introduction 3 / 29

  5. Motivation 2: null distribution wrong Figure 4 in [Efron (2008)] ◮ Empirical null [Efron (2004,2007,2008,2009)] ◮ Impact on the risk? Roquain, Etienne Sparse multiple testing Introduction 4 / 29

  6. Motivation 2: null distribution wrong Figure 4 in [Efron (2008)] ◮ Empirical null [Efron (2004,2007,2008,2009)] ◮ Impact on the risk? Roquain, Etienne Sparse multiple testing Introduction 4 / 29

  7. Existing work (selection) Estimation of the null: ◮ Series of work [Efron (2004,2007,2008,2009)] ◮ Minimax rate with Fourier analysis: [Jin and Cai (2007)]; [Cai and Jin (2010)] ◮ Two group mixture model: [Efron et al. (2001)]; [Sun and Cai (2009)]; [Cai and Sun (2009)]; [Padilla and Bickel (2012)]; [Nguyen and Matias (2014)]; [Heller and Yekutieli (2014)]; [Zablocki et al. (2017)]; [Amar et al. (2017)]; [Cai et al. (2019)]; [Rebafka et al. (2019)] ◮ Estimation in factor model: [Efron (2007a)]; [Leek and Storey (2008)]; [Friguet et al. (2009)]; [Fan et al. (2012)]; [Fan and Han (2017)] Impact on the risk: ◮ FDR control in symmetric, centered, one-sided case: [Barber and Candès (2015)]; [Arias-Castro and Chen (2017)] Lower bounds in multiple testing: ◮ [Arias-Castro and Chen (2017)]; [Rabinovich et al. (2017)]; [Castillo and R. (2020).] Roquain, Etienne Sparse multiple testing Introduction 5 / 29

  8. Existing work (selection) Estimation of the null: ◮ Series of work [Efron (2004,2007,2008,2009)] ◮ Minimax rate with Fourier analysis: [Jin and Cai (2007)]; [Cai and Jin (2010)] ◮ Two group mixture model: [Efron et al. (2001)]; [Sun and Cai (2009)]; [Cai and Sun (2009)]; [Padilla and Bickel (2012)]; [Nguyen and Matias (2014)]; [Heller and Yekutieli (2014)]; [Zablocki et al. (2017)]; [Amar et al. (2017)]; [Cai et al. (2019)]; [Rebafka et al. (2019)] ◮ Estimation in factor model: [Efron (2007a)]; [Leek and Storey (2008)]; [Friguet et al. (2009)]; [Fan et al. (2012)]; [Fan and Han (2017)] Impact on the risk: ◮ FDR control in symmetric, centered, one-sided case: [Barber and Candès (2015)]; [Arias-Castro and Chen (2017)] Lower bounds in multiple testing: ◮ [Arias-Castro and Chen (2017)]; [Rabinovich et al. (2017)]; [Castillo and R. (2020).] Roquain, Etienne Sparse multiple testing Introduction 5 / 29

  9. Existing work (selection) Estimation of the null: ◮ Series of work [Efron (2004,2007,2008,2009)] ◮ Minimax rate with Fourier analysis: [Jin and Cai (2007)]; [Cai and Jin (2010)] ◮ Two group mixture model: [Efron et al. (2001)]; [Sun and Cai (2009)]; [Cai and Sun (2009)]; [Padilla and Bickel (2012)]; [Nguyen and Matias (2014)]; [Heller and Yekutieli (2014)]; [Zablocki et al. (2017)]; [Amar et al. (2017)]; [Cai et al. (2019)]; [Rebafka et al. (2019)] ◮ Estimation in factor model: [Efron (2007a)]; [Leek and Storey (2008)]; [Friguet et al. (2009)]; [Fan et al. (2012)]; [Fan and Han (2017)] Impact on the risk: ◮ FDR control in symmetric, centered, one-sided case: [Barber and Candès (2015)]; [Arias-Castro and Chen (2017)] Lower bounds in multiple testing: ◮ [Arias-Castro and Chen (2017)]; [Rabinovich et al. (2017)]; [Castillo and R. (2020).] Roquain, Etienne Sparse multiple testing Introduction 5 / 29

  10. Setting Observations Y = ( Y i ) 1 ≤ i ≤ n indep , Y i ∼ P i , parameter P = ( P i ) 1 ≤ i ≤ n ∈ P Gaussian null assumption: Most of the P i ’s equal N ( θ, σ 2 ) , for some unknown θ, σ Example: � � P 1 , N ( θ, σ 2 ) , P 3 , N ( θ, σ 2 ) , N ( θ, σ 2 ) , N ( θ, σ 2 ) , P 7 , N ( θ, σ 2 ) P = ◮ Ensures θ = θ ( P ) and σ = σ ( P ) uniquely defined ◮ Test H 0 , i : “ P i = N ( θ ( P ) , σ 2 ( P ))” against H 1 , i : “ P i � = N ( θ ( P ) , σ 2 ( P ))” “ item i comes from the background ” “ item i comes from signal ” Roquain, Etienne Sparse multiple testing Introduction 6 / 29

  11. Setting Observations Y = ( Y i ) 1 ≤ i ≤ n indep , Y i ∼ P i , parameter P = ( P i ) 1 ≤ i ≤ n ∈ P Gaussian null assumption: Most of the P i ’s equal N ( θ, σ 2 ) , for some unknown θ, σ Example: � � P 1 , N ( θ, σ 2 ) , P 3 , N ( θ, σ 2 ) , N ( θ, σ 2 ) , N ( θ, σ 2 ) , P 7 , N ( θ, σ 2 ) P = ◮ Ensures θ = θ ( P ) and σ = σ ( P ) uniquely defined ◮ Test H 0 , i : “ P i = N ( θ ( P ) , σ 2 ( P ))” against H 1 , i : “ P i � = N ( θ ( P ) , σ 2 ( P ))” “ item i comes from the background ” “ item i comes from signal ” Roquain, Etienne Sparse multiple testing Introduction 6 / 29

  12. Setting Observations Y = ( Y i ) 1 ≤ i ≤ n indep , Y i ∼ P i , parameter P = ( P i ) 1 ≤ i ≤ n ∈ P Gaussian null assumption: Most of the P i ’s equal N ( θ, σ 2 ) , for some unknown θ, σ Example: � � P 1 , N ( θ, σ 2 ) , P 3 , N ( θ, σ 2 ) , N ( θ, σ 2 ) , N ( θ, σ 2 ) , P 7 , N ( θ, σ 2 ) P = ◮ Ensures θ = θ ( P ) and σ = σ ( P ) uniquely defined ◮ Test H 0 , i : “ P i = N ( θ ( P ) , σ 2 ( P ))” against H 1 , i : “ P i � = N ( θ ( P ) , σ 2 ( P ))” “ item i comes from the background ” “ item i comes from signal ” Roquain, Etienne Sparse multiple testing Introduction 6 / 29

  13. Criteria ◮ True null set H 0 ( P ) = { i : P satisfies H 0 , i } , n 0 ( P ) = |H 0 ( P ) | ◮ False null set H 1 ( P ) = H 0 ( P ) c , n 1 ( P ) = |H 1 ( P ) | ◮ for a procedure R ( Y ) ⊂ { 1 , . . . , n } FDP ( P , R ( Y )) = | R ( Y ) ∩ H 0 ( P ) | ’false discovery proportion’ | R ( Y ) | ∨ 1 E P [ FDP ( P , R ( Y ))] = FDR ( P , R ) ’false discovery rate’ TDP ( P , R ( Y )) = | R ( Y ) ∩ H 1 ( P ) | ’true discovery proportion’ n 1 ( P ) ∨ 1 E P [ TDP ( P , R ( Y ))] = TDR ( P , R ) ’true discovery rate’ ◮ Sparse multiple testing (enough background) n 1 ( P ) ≤ k n with k n ’small’ Roquain, Etienne Sparse multiple testing Introduction 7 / 29

  14. Criteria ◮ True null set H 0 ( P ) = { i : P satisfies H 0 , i } , n 0 ( P ) = |H 0 ( P ) | ◮ False null set H 1 ( P ) = H 0 ( P ) c , n 1 ( P ) = |H 1 ( P ) | ◮ for a procedure R ( Y ) ⊂ { 1 , . . . , n } FDP ( P , R ( Y )) = | R ( Y ) ∩ H 0 ( P ) | ’false discovery proportion’ | R ( Y ) | ∨ 1 E P [ FDP ( P , R ( Y ))] = FDR ( P , R ) ’false discovery rate’ TDP ( P , R ( Y )) = | R ( Y ) ∩ H 1 ( P ) | ’true discovery proportion’ n 1 ( P ) ∨ 1 E P [ TDP ( P , R ( Y ))] = TDR ( P , R ) ’true discovery rate’ ◮ Sparse multiple testing (enough background) n 1 ( P ) ≤ k n with k n ’small’ Roquain, Etienne Sparse multiple testing Introduction 7 / 29

  15. Criteria ◮ True null set H 0 ( P ) = { i : P satisfies H 0 , i } , n 0 ( P ) = |H 0 ( P ) | ◮ False null set H 1 ( P ) = H 0 ( P ) c , n 1 ( P ) = |H 1 ( P ) | ◮ for a procedure R ( Y ) ⊂ { 1 , . . . , n } FDP ( P , R ( Y )) = | R ( Y ) ∩ H 0 ( P ) | ’false discovery proportion’ | R ( Y ) | ∨ 1 E P [ FDP ( P , R ( Y ))] = FDR ( P , R ) ’false discovery rate’ TDP ( P , R ( Y )) = | R ( Y ) ∩ H 1 ( P ) | ’true discovery proportion’ n 1 ( P ) ∨ 1 E P [ TDP ( P , R ( Y ))] = TDR ( P , R ) ’true discovery rate’ ◮ Sparse multiple testing (enough background) n 1 ( P ) ≤ k n with k n ’small’ Roquain, Etienne Sparse multiple testing Introduction 7 / 29

  16. Criteria ◮ True null set H 0 ( P ) = { i : P satisfies H 0 , i } , n 0 ( P ) = |H 0 ( P ) | ◮ False null set H 1 ( P ) = H 0 ( P ) c , n 1 ( P ) = |H 1 ( P ) | ◮ for a procedure R ( Y ) ⊂ { 1 , . . . , n } FDP ( P , R ( Y )) = | R ( Y ) ∩ H 0 ( P ) | ’false discovery proportion’ | R ( Y ) | ∨ 1 E P [ FDP ( P , R ( Y ))] = FDR ( P , R ) ’false discovery rate’ TDP ( P , R ( Y )) = | R ( Y ) ∩ H 1 ( P ) | ’true discovery proportion’ n 1 ( P ) ∨ 1 E P [ TDP ( P , R ( Y ))] = TDR ( P , R ) ’true discovery rate’ ◮ Sparse multiple testing (enough background) n 1 ( P ) ≤ k n with k n ’small’ Roquain, Etienne Sparse multiple testing Introduction 7 / 29

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend