 
              The Finite-Set Independence Criterion (FSIC) Zoltán Szabó Arthur Gretton Wittawat Jitkrittum Gatsby Unit University College London wittawat@gatsby.ucl.ac.uk 3rd UCL Workshop on the Theory of Big Data 28 June 2017 1/10
What Is Independence Testing? Let ✭ X ❀ Y ✮ ✷ ❘ d x ✂ ❘ d y be random vectors following P xy . Given a joint sample ❢ ✭ x i ❀ y i ✮ ❣ n i ❂ 1 ✘ P xy (unknown), test H 0 ✿ P xy ❂ P x P y ❀ vs. H 1 ✿ P xy ✻ ❂ P x P y ✿ Compute a test statistic ❫ ✕ n . Reject H 0 if ❫ ✕ n ❃ T ☛ (threshold). T ☛ ❂ ✭ 1 � ☛ ✮ -quantile of the null distribution. 2/10
What Is Independence Testing? Let ✭ X ❀ Y ✮ ✷ ❘ d x ✂ ❘ d y be random vectors following P xy . Given a joint sample ❢ ✭ x i ❀ y i ✮ ❣ n i ❂ 1 ✘ P xy (unknown), test H 0 ✿ P xy ❂ P x P y ❀ vs. H 1 ✿ P xy ✻ ❂ P x P y ✿ Compute a test statistic ❫ ✕ n . Reject H 0 if ❫ ✕ n ❃ T ☛ (threshold). T ☛ ❂ ✭ 1 � ☛ ✮ -quantile of the null distribution. 2/10
What Is Independence Testing? Let ✭ X ❀ Y ✮ ✷ ❘ d x ✂ ❘ d y be random vectors following P xy . Given a joint sample ❢ ✭ x i ❀ y i ✮ ❣ n i ❂ 1 ✘ P xy (unknown), test H 0 ✿ P xy ❂ P x P y ❀ vs. H 1 ✿ P xy ✻ ❂ P x P y ✿ Compute a test statistic ❫ ✕ n . Reject H 0 if ❫ ✕ n ❃ T ☛ (threshold). T ☛ ❂ ✭ 1 � ☛ ✮ -quantile of the null distribution. P H 0 (ˆ λ n ) T α 0 25 50 75 ˆ λ n 2/10
Motivations Modern state-of-the-art test is HSIC [Gretton et al., 2005]. ✓ Nonparametric i.e., no assumption on P xy . Kernel-based. ✗ Slow. Runtime: ❖ ✭ n 2 ✮ where n ❂ sample size. ✗ No systematic way to choose kernels. Propose the Finite-Set Independence Criterion (FSIC). 1 Nonparametric. 2 Linear-time. Runtime complexity: ❖ ✭ n ✮ . Fast. 3 Tunable i.e., well-defined criterion for parameter tuning. 3/10
Motivations Modern state-of-the-art test is HSIC [Gretton et al., 2005]. ✓ Nonparametric i.e., no assumption on P xy . Kernel-based. ✗ Slow. Runtime: ❖ ✭ n 2 ✮ where n ❂ sample size. ✗ No systematic way to choose kernels. Propose the Finite-Set Independence Criterion (FSIC). 1 Nonparametric. 2 Linear-time. Runtime complexity: ❖ ✭ n ✮ . Fast. 3 Tunable i.e., well-defined criterion for parameter tuning. 3/10
Proposal: The Finite-Set Independence Criterion (FSIC) 1 Pick 2 positive definite kernels: k for X , and l for Y . ✏ ✑ � ❦ x � v ❦ 2 ✎ Gaussian kernel: k ✭ x ❀ v ✮ ❂ exp . 2 ✛ 2 x 2 Pick some feature ✭ v ❀ w ✮ ✷ ❘ d x ✂ ❘ d y 3 ✿ Transform ✭ x ❀ y ✮ ✼✦ ✭ k ✭ x ❀ v ✮ ❀ l ✭ y ❀ w ✮✮ then measure covariance ❘ d x ✂ ❘ d y ✦ ❘ ✂ ❘ FSIC 2 ✭ X ❀ Y ✮ ❂ cov 2 ✭ x ❀ y ✮ ✘ P xy ❬ k ✭ x ❀ v ✮ ❀ l ✭ y ❀ w ✮❪ ✿ 4/10
Proposal: The Finite-Set Independence Criterion (FSIC) 1 Pick 2 positive definite kernels: k for X , and l for Y . ✏ ✑ � ❦ x � v ❦ 2 ✎ Gaussian kernel: k ✭ x ❀ v ✮ ❂ exp . 2 ✛ 2 x 2 Pick some feature ✭ v ❀ w ✮ ✷ ❘ d x ✂ ❘ d y 3 ✿ Transform ✭ x ❀ y ✮ ✼✦ ✭ k ✭ x ❀ v ✮ ❀ l ✭ y ❀ w ✮✮ then measure covariance ❘ d x ✂ ❘ d y ✦ ❘ ✂ ❘ FSIC 2 ✭ X ❀ Y ✮ ❂ cov 2 ✭ x ❀ y ✮ ✘ P xy ❬ k ✭ x ❀ v ✮ ❀ l ✭ y ❀ w ✮❪ ✿ 4/10
Proposal: The Finite-Set Independence Criterion (FSIC) 1 Pick 2 positive definite kernels: k for X , and l for Y . ✏ ✑ � ❦ x � v ❦ 2 ✎ Gaussian kernel: k ✭ x ❀ v ✮ ❂ exp . 2 ✛ 2 x 2 Pick some feature ✭ v ❀ w ✮ ✷ ❘ d x ✂ ❘ d y 3 ✿ Transform ✭ x ❀ y ✮ ✼✦ ✭ k ✭ x ❀ v ✮ ❀ l ✭ y ❀ w ✮✮ then measure covariance ❘ d x ✂ ❘ d y ✦ ❘ ✂ ❘ FSIC 2 ✭ X ❀ Y ✮ ❂ cov 2 ✭ x ❀ y ✮ ✘ P xy ❬ k ✭ x ❀ v ✮ ❀ l ✭ y ❀ w ✮❪ ✿ 4/10
Proposal: The Finite-Set Independence Criterion (FSIC) 1 Pick 2 positive definite kernels: k for X , and l for Y . ✏ ✑ � ❦ x � v ❦ 2 ✎ Gaussian kernel: k ✭ x ❀ v ✮ ❂ exp . 2 ✛ 2 x 2 Pick some feature ✭ v ❀ w ✮ ✷ ❘ d x ✂ ❘ d y 3 ✿ Transform ✭ x ❀ y ✮ ✼✦ ✭ k ✭ x ❀ v ✮ ❀ l ✭ y ❀ w ✮✮ then measure covariance ❘ d x ✂ ❘ d y ✦ ❘ ✂ ❘ FSIC 2 ✭ X ❀ Y ✮ ❂ cov 2 ✭ x ❀ y ✮ ✘ P xy ❬ k ✭ x ❀ v ✮ ❀ l ✭ y ❀ w ✮❪ ✿ Data ( v, w ) correlation: 0.97 1 . 0 5 l ( y, w ) y 0 . 5 0 0 . 0 − 2 . 5 0 . 0 2 . 5 0 . 0 0 . 5 1 . 0 k ( x, v ) x 4/10
Proposal: The Finite-Set Independence Criterion (FSIC) 1 Pick 2 positive definite kernels: k for X , and l for Y . ✏ ✑ � ❦ x � v ❦ 2 ✎ Gaussian kernel: k ✭ x ❀ v ✮ ❂ exp . 2 ✛ 2 x 2 Pick some feature ✭ v ❀ w ✮ ✷ ❘ d x ✂ ❘ d y 3 ✿ Transform ✭ x ❀ y ✮ ✼✦ ✭ k ✭ x ❀ v ✮ ❀ l ✭ y ❀ w ✮✮ then measure covariance ❘ d x ✂ ❘ d y ✦ ❘ ✂ ❘ FSIC 2 ✭ X ❀ Y ✮ ❂ cov 2 ✭ x ❀ y ✮ ✘ P xy ❬ k ✭ x ❀ v ✮ ❀ l ✭ y ❀ w ✮❪ ✿ Data ( v, w ) correlation: -0.47 1 . 0 5 l ( y, w ) y 0 . 5 0 0 . 0 − 2 . 5 0 . 0 2 . 5 0 . 0 0 . 5 1 . 0 k ( x, v ) x 4/10
Proposal: The Finite-Set Independence Criterion (FSIC) 1 Pick 2 positive definite kernels: k for X , and l for Y . ✏ ✑ � ❦ x � v ❦ 2 ✎ Gaussian kernel: k ✭ x ❀ v ✮ ❂ exp . 2 ✛ 2 x 2 Pick some feature ✭ v ❀ w ✮ ✷ ❘ d x ✂ ❘ d y 3 ✿ Transform ✭ x ❀ y ✮ ✼✦ ✭ k ✭ x ❀ v ✮ ❀ l ✭ y ❀ w ✮✮ then measure covariance ❘ d x ✂ ❘ d y ✦ ❘ ✂ ❘ FSIC 2 ✭ X ❀ Y ✮ ❂ cov 2 ✭ x ❀ y ✮ ✘ P xy ❬ k ✭ x ❀ v ✮ ❀ l ✭ y ❀ w ✮❪ ✿ Data ( v, w ) correlation: 0.33 1 . 0 5 l ( y, w ) y 0 . 5 0 0 . 0 − 2 . 5 0 . 0 2 . 5 0 . 0 0 . 5 1 . 0 k ( x, v ) x 4/10
Proposal: The Finite-Set Independence Criterion (FSIC) 1 Pick 2 positive definite kernels: k for X , and l for Y . ✏ ✑ � ❦ x � v ❦ 2 ✎ Gaussian kernel: k ✭ x ❀ v ✮ ❂ exp . 2 ✛ 2 x 2 Pick some feature ✭ v ❀ w ✮ ✷ ❘ d x ✂ ❘ d y 3 ✿ Transform ✭ x ❀ y ✮ ✼✦ ✭ k ✭ x ❀ v ✮ ❀ l ✭ y ❀ w ✮✮ then measure covariance ❘ d x ✂ ❘ d y ✦ ❘ ✂ ❘ FSIC 2 ✭ X ❀ Y ✮ ❂ cov 2 ✭ x ❀ y ✮ ✘ P xy ❬ k ✭ x ❀ v ✮ ❀ l ✭ y ❀ w ✮❪ ✿ Data ( v, w ) correlation: 0.023 1 . 0 2 l ( y, w ) 0 . 5 y 0 − 2 0 . 0 0 . 0 0 . 5 1 . 0 − 10 0 10 k ( x, v ) x 4/10
Proposal: The Finite-Set Independence Criterion (FSIC) 1 Pick 2 positive definite kernels: k for X , and l for Y . ✏ ✑ � ❦ x � v ❦ 2 ✎ Gaussian kernel: k ✭ x ❀ v ✮ ❂ exp . 2 ✛ 2 x 2 Pick some feature ✭ v ❀ w ✮ ✷ ❘ d x ✂ ❘ d y 3 ✿ Transform ✭ x ❀ y ✮ ✼✦ ✭ k ✭ x ❀ v ✮ ❀ l ✭ y ❀ w ✮✮ then measure covariance ❘ d x ✂ ❘ d y ✦ ❘ ✂ ❘ FSIC 2 ✭ X ❀ Y ✮ ❂ cov 2 ✭ x ❀ y ✮ ✘ P xy ❬ k ✭ x ❀ v ✮ ❀ l ✭ y ❀ w ✮❪ ✿ Data ( v, w ) correlation: 0.025 1 . 0 2 l ( y, w ) 0 . 5 y 0 − 2 0 . 0 0 . 0 0 . 5 1 . 0 − 10 0 10 k ( x, v ) x 4/10
Proposal: The Finite-Set Independence Criterion (FSIC) 1 Pick 2 positive definite kernels: k for X , and l for Y . ✏ ✑ � ❦ x � v ❦ 2 ✎ Gaussian kernel: k ✭ x ❀ v ✮ ❂ exp . 2 ✛ 2 x 2 Pick some feature ✭ v ❀ w ✮ ✷ ❘ d x ✂ ❘ d y 3 ✿ Transform ✭ x ❀ y ✮ ✼✦ ✭ k ✭ x ❀ v ✮ ❀ l ✭ y ❀ w ✮✮ then measure covariance ❘ d x ✂ ❘ d y ✦ ❘ ✂ ❘ FSIC 2 ✭ X ❀ Y ✮ ❂ cov 2 ✭ x ❀ y ✮ ✘ P xy ❬ k ✭ x ❀ v ✮ ❀ l ✭ y ❀ w ✮❪ ✿ Data ( v, w ) correlation: 0.087 2 l ( y, w ) 0 . 5 y 0 − 2 0 . 0 0 . 0 0 . 5 1 . 0 − 10 0 10 k ( x, v ) x 4/10
General Form of FSIC J FSIC 2 ✭ X ❀ Y ✮ ❂ 1 ❳ cov 2 ✭ x ❀ y ✮ ✘ P xy ❬ k ✭ x ❀ v j ✮ ❀ l ✭ y ❀ w j ✮❪ ❀ J j ❂ 1 j ❂ 1 ✷ ❘ d x ✂ ❘ d y . for J features ❢ ✭ v j ❀ w j ✮ ❣ J Proposition 1. Assume 1 Kernels k and l satisfy some conditions (e.g. Gaussian kernels). 2 Features ❢ ✭ v i ❀ w i ✮ ❣ J i ❂ 1 are drawn from a distribution with a density. Then, for any J ✕ 1 , FSIC ✭ X ❀ Y ✮ ❂ 0 if and only if X and Y are independent Under H 0 ✿ P xy ❂ P x P y , FSIC 2 ✘ weighted sum of J dependent ✤ 2 variables. n ❭ Difficult to get ✭ 1 � ☛ ✮ -quantile for the threshold. 5/10
General Form of FSIC J FSIC 2 ✭ X ❀ Y ✮ ❂ 1 ❳ cov 2 ✭ x ❀ y ✮ ✘ P xy ❬ k ✭ x ❀ v j ✮ ❀ l ✭ y ❀ w j ✮❪ ❀ J j ❂ 1 j ❂ 1 ✷ ❘ d x ✂ ❘ d y . for J features ❢ ✭ v j ❀ w j ✮ ❣ J Proposition 1. Assume 1 Kernels k and l satisfy some conditions (e.g. Gaussian kernels). 2 Features ❢ ✭ v i ❀ w i ✮ ❣ J i ❂ 1 are drawn from a distribution with a density. Then, for any J ✕ 1 , FSIC ✭ X ❀ Y ✮ ❂ 0 if and only if X and Y are independent Under H 0 ✿ P xy ❂ P x P y , FSIC 2 ✘ weighted sum of J dependent ✤ 2 variables. n ❭ Difficult to get ✭ 1 � ☛ ✮ -quantile for the threshold. 5/10
Recommend
More recommend