Minimax testing of a composite null hypothesis defined via a - PowerPoint PPT Presentation

Minimax testing of a composite null hypothesis defined via a quadratic functional Joint work with L. Comminges Asymptotic Statistics and Related Topics Tokyo, Japan Arnak S. Dalalyan ENSAE / CREST / GENES

Motivation 1 Testing the relevance of a group of variables � We observe a sampled signal f : R d → R t = ( t 1 , . . . , t d ) ⊤ �→ f ( t ) in a noisy environment. � The dimension d is large. � Based on a training sample, some variable selection procedure suggests the irrelevance of the subset of variables t J c := { t j : j ∈ J c } . � Based on a testing sample we would like to check the irrelevance of J c . This amounts to testing the hypothesis E [ Var ( f ( t ) | t J )] = 0. � Dalalyan, A.S. c Sept. 2, 2013 2

Motivation 2 Testing the validity of a partial linear model � We observe a sampled signal obeying the partial linear model : f ( t ) = g ( t J ) + β ⊤ t J c in a noisy environment. � g , J and β are unknown. � The dimension d is large, but the cardinal of J is small. � For a given set J 0 , we would like to test the hypothesis J = J 0 . This amounts to testing the hypothesis Var [ ∇ J c 0 f ( t )] = 0. � Dalalyan, A.S. c Sept. 2, 2013 3

Motivation 3 Testing the equality of two norms � Two noisy (sub)images g 1 and g 2 are observed. � The goal is to check whether they coincide up to a rotation and illumination change : g 1 ( z ) = g 2 ( R z ) + a , ∀ z ∈ D ⊂ R 2 , for some orthogonal matrix R and some a ∈ R . � This requires testing the hypothesis H 0 : ∃ ( R , a ) s.t. g 1 ( z ) = g 2 ( R z ) + a , ∀ z ∈ D (1) which is usually very time-consuming (involves a nonlinear and nonconvex minimization step). A simpler strategy is to start with testing H ′ 0 : Var [ g 1 ( Z )] = Var [ g 2 ( Z )] , and to reject the hypothesis H 0 if H ′ 0 is rejected. � Dalalyan, A.S. c Sept. 2, 2013 4

Unifying framework Testing the nullspace of a quadratic functional in regression � Dalalyan, A.S. c Sept. 2, 2013 5

Relation to previous work Non Sampled Multi- Beyond Beyond Gaussian variate Q = I Q � 0 Ingster & Stepa- x x x x � nova 2011 Ingster & Sapati- x � � x x nas 2009 Ingster, Sapa- x x x � x tinas & Suslina 2012 Laurent, Loubes x x x x � & Marteau 2011 Comminges & D. � � � � � 2012 Remark The approach adopted in the first three references is purely asymptotic, whereas Laurent et al. (2011) obtained nonasymptotic rates of separation. � Dalalyan, A.S. c Sept. 2, 2013 6

Overview of our results Testing procedure • We observe { ( x i , t i ) } i = 1 ,..., n ⊂ R × [ 0 , 1 ] d such that f ( t ) = � x i = f ( t i )+ ξ i , ℓ ∈ L θ ℓ [ f ] ϕ ℓ ( t ) , iid ∼ U [ 0 , 1 ] d . where ξ i iid with E [ ξ 1 ] = 0 and t i • We wish to test the hypothesis H 0 : Q [ f ] = � ℓ ∈ L q ℓ θ ℓ [ f ] 2 = 0 H 1 : | Q [ f ] | > ρ 2 . • Each θ ℓ [ f ] 2 is unbiasedly estimated by � � 1 θ 2 ℓ = i � = i ′ x i x i ′ ϕ ℓ ( t i ) ϕ ℓ ( t i ′ ) . n ( n − 1 ) • Given a sequence of weights w = { w ℓ } , we estimate Q [ f ] by n = � ℓ ∈ L w ℓ q ℓ � � θ 2 Q w ℓ . • Test : we fix a threshold u > 0 and reject H 0 if | � Q w n | > u . � Dalalyan, A.S. c Sept. 2, 2013 7

Overview of our results Basics on the minimax rates of separation For any estimator � Q n , we can write � Q n = Q [ f ] + ǫ n [ f ] . • Under H 0 : | � Q n | ≤ sup f ∈F 0 | ǫ n [ f ] | . Q n | ≥ ρ 2 − sup f ∈F 1 ( ρ ) | ǫ n [ f ] | . • Under H 1 : | � • The testing statistic � Q n leads to a consistent test if | ǫ n [ f ] | < ρ 2 − sup | ǫ n [ f ] | (with prob. 1 − γ ) . sup f ∈F 0 f ∈F 1 ( ρ ) • Let ρ n ( � Q ) be the smallest possible ρ > 0 satisfying sup f ∈F 0 | ǫ n [ f ] | + sup f ∈F 1 ( ρ ) | ǫ n [ f ] | < ρ 2 , (with prob. 1 − γ ) . Q n ρ n ( � • Minimax rate of separation : ρ ∗ n ≍ inf � Q ) . Where the difference with the minimax rate of estimation comes from : replacing sup f ∈F 1 ( ρ ) with sup ρ> 0 sup f ∈F 1 ( ρ ) leads to the minimax rate of estimation, but this is sub-optimal ! � Dalalyan, A.S. c Sept. 2, 2013 8

Overview of our results Minimax rates of separation • Let us call the ratio | q ℓ | / c ℓ the importance of the axis ϕ ℓ . • Let N ( T ) be the set of indices with importance ≥ T > 0. • Let M ( T ) = � ℓ ∈N ( T ) q 2 ℓ . • In the general case, the minimax rate of separation is given by � � 1 / 2 � 4 � √ B 1 M ( T ) + B 2 n n ,γ ) 2 = inf ( ρ ∗ + 2 2 T n γ 1 / 2 T > 0 � M ( T ) 1 / 2 � � n − 1 / 2 . ≍ inf + T n T > 0 • Interestingly, in the case of positive Q � 0, � M ( T ) 1 / 2 � n ,γ ) 2 ≍ inf ( ρ ∗ + T . n T > 0 • In both cases, the test defined using the statistic � Q w n with the weights w ℓ = 1 l ( | q ℓ | / c ℓ ≥ T ) achieves the optimal rate. � Dalalyan, A.S. c Sept. 2, 2013 9

Relation to the norm estimation Phase transition/ “Elbow” effect ℓ = 1 and c ℓ = � d 2 σ j Let us assume the simple case q 2 , ℓ ∈ Z d . j = 1 ℓ j � σ − 1 σ ) where ¯ σ − 1 = 1 One can check that M ( T ) ≍ T − d / ( 2 ¯ . d j In hypotheses testing : • If Q is positive, the mmx rate of separation is n ) 2 ≍ n − 4 ¯ ( ρ ∗ σ/ ( 4 ¯ σ + d ) . • If Q is neither positive nor negative, the mmx rate of separation σ + d ) � 1 / 2 ) . is n ) 2 ≍ n − ( 4 ¯ ( ρ ∗ σ/ ( 4 ¯ � Dalalyan, A.S. c Sept. 2, 2013 10

Relation to the norm estimation Phase transition/ “Elbow” effect ℓ = 1 and c ℓ = � d 2 σ j Let us assume the simple case q 2 , ℓ ∈ Z d . j = 1 ℓ j � σ − 1 σ ) where ¯ σ − 1 = 1 One can check that M ( T ) ≍ T − d / ( 2 ¯ . d j In hypotheses testing : • If Q is positive, the mmx rate of separation is n ) 2 ≍ n − 4 ¯ ( ρ ∗ σ/ ( 4 ¯ σ + d ) . • If Q is neither positive nor negative, the mmx rate of separation σ + d ) � 1 / 2 ) . is n ) 2 ≍ n − ( 4 ¯ ( ρ ∗ σ/ ( 4 ¯ In functional estimation : • If Q [ f ] = � f � 2 , the mmx rate of estimation is (Lepski et al. ’99) r ∗ n ≍ n − 2 ¯ σ/ ( 4 ¯ σ + d ) . • If Q [ f ] = � f � 2 2 , the mmx rate of estimation is (Donoho and σ + d ) � 1 / 2 ) . Nussbaum ’90) r ∗ n ≍ n − ( 4 ¯ σ/ ( 4 ¯ � Dalalyan, A.S. c Sept. 2, 2013 10

Main result I Positive functionals Theorem 1. Assume that E [ ξ 4 1 ] < ∞ and for every T > 0, the set N ( T ) = { ℓ : q ℓ ≥ Tc ℓ } is finite. For a γ ∈ ( 0 , 1 ) , let T n ,γ be such that : � � 1 / 2 � � � � n ( n − 1 ) ℓ ( q ℓ − Tc ℓ ) 2 = ℓ c ℓ ( q ℓ − Tc ℓ ) + ( 2 z 1 − γ/ 2 + o ( 1 )) . + 2 Let us define �� 1 / 2 l ∈ L q ℓ ( q ℓ − T n ,γ c ℓ ) + ρ ∗ � n ,γ = . l ∈ L c ℓ ( q ℓ − T n ,γ c ℓ ) + If several conditions are fulfilled, then the test based on the array � � 1 − T n ,γ c ℓ w ∗ � l , n = q ℓ + n ,γ ) , � satisfies γ n ( F 0 , F 1 ( ρ ∗ φ ∗ n ) ≤ γ + o ( 1 ) , as n → ∞ . � Dalalyan, A.S. c Sept. 2, 2013 11

Testing partial derivatives • Let α ∈ R d + and σ ∈ R d + be two given vectors. C [ f ] = � d � σ j j α j f /∂ t α 1 1 . . . ∂ t α d d � 2 j = 1 � ∂ σ j f /∂ t j � 2 • Let Q [ f ] = � ∂ 2 , 2 . • Let us define δ , ¯ σ , ( κ j ) and κ by δ = � d � d 1 σ = 1 1 j = 1 α j /σ j , σ j . ¯ d j = 1 • If δ < 1 and σ > d / 4 , ¯ then the exact mmx rate ρ ∗ n ,γ is given by ρ ∗ n ,γ = C ∗ γ ρ ∗ n ( 1 + o ( 1 )) , • where the minimax rate ρ ∗ n and the exact separation constant are n = n − 2 ¯ σ ( 1 − δ ) ρ ∗ σ + d , 4 ¯ 2 ( 1 + δ ) ¯ σ + d � ¯ σ ( 1 − δ ) � α j 4 z 2 ( 1 + 2 κ − 1 ) 1 4 ¯ σ + d and C ∗ 4 ¯ σ + d 2 ( 4 ¯ σ + d ) γ = 1 − γ/ 2 κ C ( d , σ , α ) with κ j = 2 σ j + σ ( 1 − δ ) and σ j 2 ¯ � d κ = � d i = 1 Γ( κ i ) j = 1 κ j and C ( d , σ , α ) = π − d � � d � ( 1 − δ )Γ( κ + 2 ) . i = 1 σ i � Dalalyan, A.S. c Sept. 2, 2013 12

Conclusion • We established minimax rates of separation in the model of regression with random design for null hypotheses corresponding to the nullspace of a general quadratic functionals. • In the case of positive functionals, we also proved sharp-minimax optimality of the proposed procedure. • When comparing two norms, the minimax rate of separation is : σ + d ∧ 1 2 ¯ σ ρ ∗ n = n − 4 . This rate shows that the watershed between the 4 ¯ two regimes corresponds to the condition ¯ σ = d / 4. In other terms, we are in the regular regime when ¯ σ > d / 4. It is interesting to note, even if we are unable to establish a direct connection, that this is also the regime under which the Sobolev 2 ⊂ L 4 ([ 0 , 1 ] d ) holds true. embedding W σ • Open questions : adaptation to the unknown smoothness, unknown noise level, the case of (sparse) Besov bodies,... � Dalalyan, A.S. c Sept. 2, 2013 13

Minimax testing of a composite null hypothesis defined via a - PowerPoint PPT Presentation

Minimax testing of a composite null hypothesis defined via a quadratic functional Joint work with L. Comminges Asymptotic Statistics and Related Topics Tokyo, Japan Arnak S. Dalalyan ENSAE / CREST / GENES Motivation 1 Testing the relevance

STAT 113 Hypothesis Testing I Colin Reimer Dawson Oberlin College October 5, 2017 1 / 17

Multiple Tests Reality Null is True Null is False (No effect/relation) (Effect/relation

Hypothesis testing get data that differ from the null hypothesis. If the data would be quite

STAT 215 Hypothesis Testing I Colin Reimer Dawson Oberlin College September 7, 2017 1 / 14

Testing Specification testing Michel Bierlaire Introduction to choice models Differences from

Hypothesis tests with binomial example STAT 587 (Engineering) Iowa State University October 2,

t -tests STAT 587 (Engineering) Iowa State University October 2, 2020 Statistical hypothesis

CS 103 Unit 11 Linked Lists Mark Redekopp 2 NULL Pointer Just like there was a null

Lecture 4: Hypothesis Testing Ani Manichaikul amanicha@jhsph.edu 20 April 2007 1 / 69 Steps of

STAT 113 Hypothesis Testing II The World According to the Null Hypothesis Colin Reimer Dawson

Chapter 6 Hypothesis Testing What is Hypothesis Testing? the use of statistical

Chapter 6 Hypothesis Testing What is Hypothesis Testing? the use of statistical

CME/STATS 195 CME/STATS 195 Lecture 7: Hypothesis Testing and Lecture 7: Hypothesis Testing and

Null Hypothesis Significance Testing Signifcance Level, Power, t -Tests 18.05 Spring 2014 Jeremy

Null Hypothesis Significance Testing Signifcance Level, Power, t -Tests 18.05 Spring 2014 Jeremy

Gov 2000: 6. Hypothesis Testing Matthew Blackwell October 11, 2016 1 / 55 1. Hypothesis

Minimax-Angle Learning for Optimal Treatment Decision with Heterogeneous Data Chengchun Shi

CS 188: Artificial Intelligence Spring 2007 Lecture 7: CSP-II and Adversarial Search 2/6/2007

Searching for Solutions Artificial Intelligence CSPP 56553 January 14, 2004 Agenda Search

For Friday BE ON TIME Bring two hard copies of your complete rough draft Be sure to

High Dimensional Predictive Inference Workshop on Current Trends and Challenges in Model

Wigner function estimation in QHT with noisy data Joint work with Lounici, K. and Peyr e, G.

Thresholding and Learning theory Dominique Picard Laboratoire Probabilit es et Mod` eles Al

DFA Minimization, Pumping Lemma CSCI 3130 Formal Languages and Automata Theory Siu On CHAN