The Finite-Set Independence Criterion (FSIC)
Wittawat Jitkrittum Zoltán Szabó Arthur Gretton Gatsby Unit University College London wittawat@gatsby.ucl.ac.uk 3rd UCL Workshop on the Theory of Big Data 28 June 2017
1/10
The Finite-Set Independence Criterion (FSIC) Zoltn Szab Arthur - - PowerPoint PPT Presentation
The Finite-Set Independence Criterion (FSIC) Zoltn Szab Arthur Gretton Wittawat Jitkrittum Gatsby Unit University College London wittawat@gatsby.ucl.ac.uk 3rd UCL Workshop on the Theory of Big Data 28 June 2017 1/10 What Is
1/10
i❂1 ✘ Pxy (unknown), test
2/10
i❂1 ✘ Pxy (unknown), test
2/10
i❂1 ✘ Pxy (unknown), test
2/10
1 Nonparametric. 2 Linear-time. Runtime complexity: ❖✭n✮. Fast. 3 Tunable i.e., well-defined criterion for parameter tuning.
3/10
1 Nonparametric. 2 Linear-time. Runtime complexity: ❖✭n✮. Fast. 3 Tunable i.e., well-defined criterion for parameter tuning.
3/10
1 Pick 2 positive definite kernels: k for X , and l for Y .
2✛2
x
2 Pick some feature ✭v❀ w✮ ✷ ❘dx ✂ ❘dy
✭x❀y✮✘Pxy ❬k✭x❀ v✮❀ l✭y❀ w✮❪ ✿
4/10
1 Pick 2 positive definite kernels: k for X , and l for Y .
2✛2
x
2 Pick some feature ✭v❀ w✮ ✷ ❘dx ✂ ❘dy
✭x❀y✮✘Pxy ❬k✭x❀ v✮❀ l✭y❀ w✮❪ ✿
4/10
1 Pick 2 positive definite kernels: k for X , and l for Y .
2✛2
x
2 Pick some feature ✭v❀ w✮ ✷ ❘dx ✂ ❘dy
✭x❀y✮✘Pxy ❬k✭x❀ v✮❀ l✭y❀ w✮❪ ✿
4/10
1 Pick 2 positive definite kernels: k for X , and l for Y .
2✛2
x
2 Pick some feature ✭v❀ w✮ ✷ ❘dx ✂ ❘dy
✭x❀y✮✘Pxy ❬k✭x❀ v✮❀ l✭y❀ w✮❪ ✿
4/10
1 Pick 2 positive definite kernels: k for X , and l for Y .
2✛2
x
2 Pick some feature ✭v❀ w✮ ✷ ❘dx ✂ ❘dy
✭x❀y✮✘Pxy ❬k✭x❀ v✮❀ l✭y❀ w✮❪ ✿
4/10
1 Pick 2 positive definite kernels: k for X , and l for Y .
2✛2
x
2 Pick some feature ✭v❀ w✮ ✷ ❘dx ✂ ❘dy
✭x❀y✮✘Pxy ❬k✭x❀ v✮❀ l✭y❀ w✮❪ ✿
4/10
1 Pick 2 positive definite kernels: k for X , and l for Y .
2✛2
x
2 Pick some feature ✭v❀ w✮ ✷ ❘dx ✂ ❘dy
✭x❀y✮✘Pxy ❬k✭x❀ v✮❀ l✭y❀ w✮❪ ✿
4/10
1 Pick 2 positive definite kernels: k for X , and l for Y .
2✛2
x
2 Pick some feature ✭v❀ w✮ ✷ ❘dx ✂ ❘dy
✭x❀y✮✘Pxy ❬k✭x❀ v✮❀ l✭y❀ w✮❪ ✿
4/10
1 Pick 2 positive definite kernels: k for X , and l for Y .
2✛2
x
2 Pick some feature ✭v❀ w✮ ✷ ❘dx ✂ ❘dy
✭x❀y✮✘Pxy ❬k✭x❀ v✮❀ l✭y❀ w✮❪ ✿
4/10
J
j ❂1
✭x❀y✮✘Pxy ❬k✭x❀ vj ✮❀ l✭y❀ wj ✮❪ ❀
j ❂1 ✷ ❘dx ✂ ❘dy.
1 Kernels k and l satisfy some conditions (e.g. Gaussian kernels). 2 Features ❢✭vi❀ wi✮❣J i❂1 are drawn from a distribution with a density.
5/10
J
j ❂1
✭x❀y✮✘Pxy ❬k✭x❀ vj ✮❀ l✭y❀ wj ✮❪ ❀
j ❂1 ✷ ❘dx ✂ ❘dy.
1 Kernels k and l satisfy some conditions (e.g. Gaussian kernels). 2 Features ❢✭vi❀ wi✮❣J i❂1 are drawn from a distribution with a density.
5/10
J
j ❂1
✭x❀y✮✘Pxy ❬k✭x❀ vj ✮❀ l✭y❀ wj ✮❪ ❀
j ❂1 ✷ ❘dx ✂ ❘dy.
1 Kernels k and l satisfy some conditions (e.g. Gaussian kernels). 2 Features ❢✭vi❀ wi✮❣J i❂1 are drawn from a distribution with a density.
5/10
J ❫
1❫
1 Under H0, ❫
d
2 Under H1, P✭reject H0✮ ✦ 1 as n ✦ ✶.
6/10
J ❫
1❫
1 Under H0, ❫
d
2 Under H1, P✭reject H0✮ ✦ 1 as n ✦ ✶.
6/10
J ❫
1❫
1 Under H0, ❫
d
2 Under H1, P✭reject H0✮ ✦ 1 as n ✦ ✶.
6/10
J ❫
1❫
1 Under H0, ❫
d
2 Under H1, P✭reject H0✮ ✦ 1 as n ✦ ✶.
6/10
J ❫
1❫
1 Under H0, ❫
d
2 Under H1, P✭reject H0✮ ✦ 1 as n ✦ ✶.
6/10
1 Choose ❢✭vi❀ wi✮❣J i❂1 and Gaussian widths by maximizing ❫
n
2 Reject H0 if ❫
n
7/10
1 Choose ❢✭vi❀ wi✮❣J i❂1 and Gaussian widths by maximizing ❫
n
2 Reject H0 if ❫
n
7/10
1 Choose ❢✭vi❀ wi✮❣J i❂1 and Gaussian widths by maximizing ❫
n
2 Reject H0 if ❫
n
7/10
2
2✛2
x
NFSIC-opt NFSIC-med QHSIC NyHSIC FHSIC RDC
8/10
9/10
9/10
9/10
1 nonparametric, 2 linear-time, 3 adaptive (parameters automatically tuned).
10/10
11/10
12/10
MMD(P, Q) MMD(P, Q) RKHS
Space of distributions
13/10
1 With some conditions, the test power PH1
n✭✕nT☛✮2❂n 2e❜0✿5n❝✭✕nT☛✮2❂❬✘2n2❪
nn✭n1✮❪ 2❂❬✘4n2✭n1✮❪❀
2 For large n, L✭✕n✮ is increasing in ✕n.
14/10
1 With some conditions, the test power PH1
n✭✕nT☛✮2❂n 2e❜0✿5n❝✭✕nT☛✮2❂❬✘2n2❪
nn✭n1✮❪ 2❂❬✘4n2✭n1✮❪❀
2 For large n, L✭✕n✮ is increasing in ✕n.
14/10
1 With some conditions, the test power PH1
n✭✕nT☛✮2❂n 2e❜0✿5n❝✭✕nT☛✮2❂❬✘2n2❪
nn✭n1✮❪ 2❂❬✘4n2✭n1✮❪❀
2 For large n, L✭✕n✮ is increasing in ✕n.
14/10
1 With some conditions, the test power PH1
n✭✕nT☛✮2❂n 2e❜0✿5n❝✭✕nT☛✮2❂❬✘2n2❪
nn✭n1✮❪ 2❂❬✘4n2✭n1✮❪❀
2 For large n, L✭✕n✮ is increasing in ✕n.
14/10
i❂1 ✘ ✑.
1 ❫
n1
n✭n1✮
2 ❫
n
n ✮ ✍ ✭L n1L1n1❃ n ✮ ❫
n ✿
15/10
i❂1 ✘ ✑.
1 ❫
n1
n✭n1✮
2 ❫
n
n ✮ ✍ ✭L n1L1n1❃ n ✮ ❫
n ✿
15/10
i❂1 ✘ ✑.
1 ❫
n1
n✭n1✮
2 ❫
n
n ✮ ✍ ✭L n1L1n1❃ n ✮ ❫
n ✿
15/10
1 Transforming x ✼✦ k✭x❀ v✮ and y ✼✦ l✭y❀ w✮ (from ❘dy to ❘). 2 Then, take the covariance.
16/10
1 Transforming x ✼✦ k✭x❀ v✮ and y ✼✦ l✭y❀ w✮ (from ❘dy to ❘). 2 Then, take the covariance.
16/10
J
i❂1 ❫
1 n✭n1✮
i❂1
j ✻❂i k✭xi❀ v✮l✭yj ❀ w✮.
i❁j
17/10
J
i❂1 ❫
1 n✭n1✮
i❂1
j ✻❂i k✭xi❀ v✮l✭yj ❀ w✮.
i❁j
17/10
n
i❂1 k✭xi❀ v✮l✭yi❀ w✮.
18/10
n
i❂1 k✭xi❀ v✮l✭yi❀ w✮.
18/10
n
i❂1 k✭xi❀ v✮l✭yi❀ w✮.
18/10
n
i❂1 k✭xi❀ v✮l✭yi❀ w✮.
18/10
n
i❂1 k✭xi❀ v✮l✭yi❀ w✮.
18/10
n
i❂1 k✭xi❀ v✮l✭yi❀ w✮.
18/10
n
i❂1 k✭xi❀ v✮l✭yi❀ w✮.
18/10
J
i❂1 ❫
0.000 0.003 0.006 0.009 0.012 0.015 0.018 0.021 0.024
19/10
J
i❂1 ❫
0.000 0.003 0.006 0.009 0.012 0.015 0.018 0.021 0.024 0.000 0.003 0.006 0.009 0.012 0.015 0.018 0.021 0.024
19/10
J
i❂1 ❫
0.000 0.003 0.006 0.009 0.012 0.015 0.018 0.021 0.024 0.000 0.003 0.006 0.009 0.012 0.015 0.018 0.021 0.024
19/10
J
i❂1 ❫
0.000 0.003 0.006 0.009 0.012 0.015 0.018 0.021 0.024
19/10
J
i❂1 ❫
0.000 0.003 0.006 0.009 0.012 0.015 0.018 0.021 0.024
19/10
witness
J
i❂1 ❫
witness
20/10
103 104 105 Sample size n 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 Type-I error 103 104 105 Sample size n 10-1 100 101 102 103 Time (s)
21/10
NFSIC-opt NFSIC-med QHSIC NyHSIC FHSIC RDC 103 104 105 Sample size n 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 Type-I error 103 104 105 Sample size n 10-1 100 101 102 103 Time (s)
21/10
NFSIC-opt NFSIC-med QHSIC NyHSIC FHSIC RDC 103 104 105 Sample size n 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 Type-I error 103 104 105 Sample size n 10-1 100 101 102 103 Time (s)
21/10
1 2 3 4 5 6 ω in 1 + sin(ωx)sin(ωy) 0.0 0.2 0.4 0.6 0.8 1.0 Test power
22/10
1 2 3 4 5 6 ω in 1 + sin(ωx)sin(ωy) 0.0 0.2 0.4 0.6 0.8 1.0 Test power
3 2 1 1 2 3 x 3 2 1 1 2 3 y
0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00
22/10
1 2 3 4 5 6 ω in 1 + sin(ωx)sin(ωy) 0.0 0.2 0.4 0.6 0.8 1.0 Test power
3 2 1 1 2 3 x 3 2 1 1 2 3 y
22/10
1 2 3 4 5 6 ω in 1 + sin(ωx)sin(ωy) 0.0 0.2 0.4 0.6 0.8 1.0 Test power
3 2 1 1 2 3 x 3 2 1 1 2 3 y
22/10
1 2 3 4 5 6 ω in 1 + sin(ωx)sin(ωy) 0.0 0.2 0.4 0.6 0.8 1.0 Test power
3 2 1 1 2 3 x 3 2 1 1 2 3 y
22/10
NFSIC-opt NFSIC-med QHSIC NyHSIC FHSIC RDC
1 2 3 4 5 6 ω in 1 + sin(ωx)sin(ωy) 0.0 0.2 0.4 0.6 0.8 1.0 Test power
3 2 1 1 2 3 x 3 2 1 1 2 3 y
22/10
i❂1 sign✭xi✮, where x ✘ ◆✭0❀ Idy✮ and Z ✘ ◆✭0❀ 1✮ (noise).
23/10
i❂1 sign✭xi✮, where x ✘ ◆✭0❀ Idy✮ and Z ✘ ◆✭0❀ 1✮ (noise).
23/10
3 2 1 1 2 3 x 3 2 1 1 2 3 y
ω = 2. 00
100 200 300 400 500 600 J 0.2 0.4 0.6 0.8 1.0 Test power
24/10
500 1000 1500 2000 Sample size n 0.000 0.005 0.010 0.015 0.020 0.025 Type-I error
500 1000 1500 2000 Sample size n 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Test power
25/10
NFSIC-opt NFSIC-med QHSIC NyHSIC FHSIC RDC 500 1000 1500 2000 Sample size n 0.000 0.005 0.010 0.015 0.020 0.025 Type-I error
500 1000 1500 2000 Sample size n 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Test power
25/10
26/10
27/10