 
              HAL Id: cea-02617133 scientifjques de niveau recherche, publiés ou non, pendence based on HSIC measures. EMS 2019 - European Meeting of Statisticians, Bernoulli Society, Anouar Meynaoui, Mélisande Albert, Beatrice Laurent, Amandine Marrel. Aggregated tests of inde- To cite this version: Anouar Meynaoui, Mélisande Albert, Beatrice Laurent, Amandine Marrel measures Aggregated tests of independence based on HSIC publics ou privés. recherche français ou étrangers, des laboratoires émanant des établissements d’enseignement et de destinée au dépôt et à la difgusion de documents https://hal-cea.archives-ouvertes.fr/cea-02617133 L’archive ouverte pluridisciplinaire HAL , est abroad, or from public or private research centers. teaching and research institutions in France or The documents may come from lished or not. entifjc research documents, whether they are pub- archive for the deposit and dissemination of sci- HAL is a multi-disciplinary open access Submitted on 25 May 2020 Jul 2019, Palerme, Italy. cea-02617133
Introduction The aggregated testing procedure Simulation results Conclusion and Prospect INSA de Toulouse Institut de Mathématiques de Toulouse, France CEA, DEN, DER, France Aggregated tests of independence based on HSIC measures (part 2) European Meeting of Statisticians, 2019 Anouar Meynaoui, Mélisande Albert, Béatrice Laurent, Amandine Marrel European Meeting of Statisticians, 2019 1 / 20
Introduction The aggregated testing procedure Simulation results Conclusion and Prospect Outline Introduction The aggregated testing procedure Simulation results Conclusion and Prospect European Meeting of Statisticians, 2019 2 / 20
Introduction The aggregated testing procedure Simulation results Conclusion and Prospect Introduction We recall that we study the independence of two real random vec- � X (1) , . . . , X ( p ) � � Y (1) , . . . , Y ( q ) � tors X = and Y = with marginal densities resp. denoted f 1 and f 2 and joint density f . We recall that we have an i.i.d. sample Z n = ( X i , Y i ) 1 ≤ i ≤ n of ( X , Y ). We rely on HSIC-based independence tests with Gaussian kernels k λ and l µ resp. associated to X and Y . In the previous talk, we first proposed for each couple of values ( λ, µ ) a theoretical HSIC test of independence of level α in (0 , 1), followed by a non-asymptotic permutation-based test, of the same level α . The power of the permuted test is shown to be approximately the same as theoretical power if enough permutations are used. European Meeting of Statisticians, 2019 3 / 20
Introduction The aggregated testing procedure Simulation results Conclusion and Prospect Introduction When f − f 1 ⊗ f 2 belongs to a Sobolev ball with regularity δ in (0 , 2], sharp upper bounds of the uniform separation rate w.r.t. the values of λ and µ are provided. The HSIC test with the optimal upper bound is shown to be mini- max over Sobolev balls. This optimal test is not adaptive, since it depends on the regularity δ . In this talk, we provide an adaptive procedure of testing inde- pendence which doesn’t depend on the regularity δ . This procedure is based on the aggregation of a collection of HSIC- tests with a collection of different bandwidths λ and µ . Numerical studies to assess the performance of the procedure and to compare methodological choices are then provided. European Meeting of Statisticians, 2019 4 / 20
Introduction The aggregated testing procedure Simulation results Conclusion and Prospect The aggregated testing procedure Single HSIC-based test leads to the question of the choice of kernel bandwidths λ and µ . Heuristic choices are adopted in practice , with no theoretical justifications. We propose here an aggregated testing procedure combining a collection of single tests based on different bandwidths. We consider a finite or countable collection Λ × U of bandwidths in � (0 , + ∞ ) p × (0 , + ∞ ) q and a collection of positive weights ω λ,µ / � such that � ( λ,µ ) ∈ Λ × U e − ω λ,µ ≤ 1. ( λ, µ ) ∈ Λ × U European Meeting of Statisticians, 2019 5 / 20
Introduction The aggregated testing procedure Simulation results Conclusion and Prospect The aggregated testing procedure For a given α ∈ (0 , 1), we define the aggregated test ∆ α which rejects ( H 0 ) if there is at least one ( λ, µ ) ∈ Λ × U such that � HSIC λ,µ > q λ,µ 1 − u α e − ωλ,µ , where u α is the less conservative value such that the test is of level α , and is defined by � � � � � � � HSIC λ,µ − q λ,µ u α = sup u > 0 ; P f 1 ⊗ f 2 sup > 0 ≤ α . 1 − ue − ωλ,µ ( λ,µ ) ∈ Λ × U The test function ∆ α associated to this aggregated test, takes values in { 0 , 1 } and is defined by � � � HSIC λ,µ − q λ,µ ⇐ ⇒ ∆ α = 1 sup > 0 . 1 − u α e − ωλ,µ ( λ,µ ) ∈ Λ × U European Meeting of Statisticians, 2019 6 / 20
Introduction The aggregated testing procedure Simulation results Conclusion and Prospect Oracle type conditions for the second kind error The aggregated testing procedure ∆ α is of level α . The second kind error of the aggregated testing procedure ∆ α verifies the inequality � � �� ∆ λ,µ P f (∆ α = 0) ≤ P f inf α e − ωλ,µ = 0 , ( λ,µ ) ∈ Λ × U α e − ωλ,µ is the single test of level α e − ω λ,µ associated to the where ∆ λ,µ bandwidths ( λ, µ ) The aggregated testing procedure has a second kind at most equal to β , if there exists at least one ( λ, µ ) ∈ Λ × U such that the test ∆ λ,µ α e − ωλ,µ has a probability of second kind error at most equal to β . European Meeting of Statisticians, 2019 7 / 20
Introduction The aggregated testing procedure Simulation results Conclusion and Prospect Oracle type conditions for the second kind error Theorem � � Let α, β ∈ (0 , 1) , ( k λ , l µ ) / ( λ, µ ) ∈ Λ × U a collection of Gaussian � � ω λ,µ / ( λ, µ ) ∈ Λ × U kernels and a collection of positive weights, such that � ( λ,µ ) ∈ Λ × U e − ω λ,µ ≤ 1 . We assume that f , f 1 and f 2 are bounded. We also assume that all bandwidths ( λ, µ ) in Λ × U verify the following conditions � 1 � � max ( λ 1 ...λ p , µ 1 ...µ q ) < 1 and n λ 1 ...λ p µ 1 ...µ q > log > 1 . α � � ∆ α , S δ Then, the uniform separation rate ρ p + q ( R ) , β , where δ ∈ (0 , 2] and R > 0 can be upper bounded as follows European Meeting of Statisticians, 2019 8 / 20
Introduction The aggregated testing procedure Simulation results Conclusion and Prospect Oracle type conditions for the second kind error � � � �� 2 ≤ C ( M f , p , q , β, δ ) 1 ∆ α , S δ ρ p + q ( R ) , β inf � n λ 1 ...λ p µ 1 ...µ q ( λ,µ ) ∈ Λ × U � p � � � � q � � log( 1 λ 2 δ µ 2 δ α ) + ω λ,µ + + i j i =1 j =1 where M f = max ( � f � ∞ , � f 1 � ∞ , � f 2 � ∞ ) and C ( M f , p , q , β, δ ) is a positive constant depending only on its arguments. This theorem gives an oracle type condition of the uniform separa- tion rate. Indeed, without knowing the regularity of f − f 1 ⊗ f 2 , we prove that the uniform separation rate of ∆ α is of the same order as the smallest uniform separation rate over ( λ, µ ) ∈ Λ × U , up to ω λ,µ . European Meeting of Statisticians, 2019 9 / 20
Introduction The aggregated testing procedure Simulation results Conclusion and Prospect Adaptive procedure of testing independence We consider the bandwidth collections Λ and U defined by Λ = { (2 − m 1 , 1 , . . . , 2 − m 1 , p ) ; ( m 1 , 1 , . . . , m 1 , p ) ∈ ( N ∗ ) p } , (1) U = { (2 − m 2 , 1 , . . . , 2 − m 2 , q ) ; ( m 2 , 1 , . . . , m 2 , q ) ∈ ( N ∗ ) q } . (2) We associate to every λ = (2 − m 1 , 1 , . . . , 2 − m 1 , p ) in Λ and µ = (2 − m 2 , 1 , . . . , 2 − m 2 , q ) in U the positive weights � � � � p q � � m 1 , i × π m 2 , j × π √ √ ω λ,µ = 2 log + 2 log , (3) 6 6 i =1 j =1 so that � ( λ,µ ) ∈ Λ × U e − ω λ,µ = 1. European Meeting of Statisticians, 2019 10 / 20
Introduction The aggregated testing procedure Simulation results Conclusion and Prospect Adaptive procedure of testing independence Corollary Assuming that log log( n ) > 1 , α, β ∈ (0 , 1) and ∆ α the aggregated testing procedure, with the particular choice of Λ , U and the weights ( ω λ,µ ) ( λ,µ ) ∈ Λ × U defined in (1) , (2) and (3) . Then, the uniform separation � � ∆ α , S δ rate ρ p + q ( R ) , β of the aggregated test ∆ α over Sobolev spaces where δ in (0 , 2] , can be upper bounded as follows � log log( n ) � 2 δ � � 4 δ +( p + q ) ∆ α , S δ ρ p + q ( R ) , β ≤ C ( M f , p , q , α, β, δ ) , n where M f = max ( � f � ∞ , � f 1 � ∞ , � f 2 � ∞ ) . The rate of the aggregation procedure over the classes of Sobolev balls is in the same order of the smallest rate of single tests, up to a loglog ( n ) factor. This combined with the result on the lower bound over Sobolev shows that the aggregated test is adaptive over these regularity classes. European Meeting of Statisticians, 2019 11 / 20
Recommend
More recommend