The Finite-Set Independence Criterion (FSIC) Zoltn Szab Arthur - PowerPoint PPT Presentation

The Finite-Set Independence Criterion (FSIC) Zoltán Szabó Arthur Gretton Wittawat Jitkrittum Gatsby Unit University College London wittawat@gatsby.ucl.ac.uk 3rd UCL Workshop on the Theory of Big Data 28 June 2017 1/10

What Is Independence Testing? Let ✭ X ❀ Y ✮ ✷ ❘ d x ✂ ❘ d y be random vectors following P xy . Given a joint sample ❢ ✭ x i ❀ y i ✮ ❣ n i ❂ 1 ✘ P xy (unknown), test H 0 ✿ P xy ❂ P x P y ❀ vs. H 1 ✿ P xy ✻ ❂ P x P y ✿ Compute a test statistic ❫ ✕ n . Reject H 0 if ❫ ✕ n ❃ T ☛ (threshold). T ☛ ❂ ✭ 1 � ☛ ✮ -quantile of the null distribution. 2/10

What Is Independence Testing? Let ✭ X ❀ Y ✮ ✷ ❘ d x ✂ ❘ d y be random vectors following P xy . Given a joint sample ❢ ✭ x i ❀ y i ✮ ❣ n i ❂ 1 ✘ P xy (unknown), test H 0 ✿ P xy ❂ P x P y ❀ vs. H 1 ✿ P xy ✻ ❂ P x P y ✿ Compute a test statistic ❫ ✕ n . Reject H 0 if ❫ ✕ n ❃ T ☛ (threshold). T ☛ ❂ ✭ 1 � ☛ ✮ -quantile of the null distribution. P H 0 (ˆ λ n ) T α 0 25 50 75 ˆ λ n 2/10

Motivations Modern state-of-the-art test is HSIC [Gretton et al., 2005]. ✓ Nonparametric i.e., no assumption on P xy . Kernel-based. ✗ Slow. Runtime: ❖ ✭ n 2 ✮ where n ❂ sample size. ✗ No systematic way to choose kernels. Propose the Finite-Set Independence Criterion (FSIC). 1 Nonparametric. 2 Linear-time. Runtime complexity: ❖ ✭ n ✮ . Fast. 3 Tunable i.e., well-defined criterion for parameter tuning. 3/10

Proposal: The Finite-Set Independence Criterion (FSIC) 1 Pick 2 positive definite kernels: k for X , and l for Y . ✏ ✑ � ❦ x � v ❦ 2 ✎ Gaussian kernel: k ✭ x ❀ v ✮ ❂ exp . 2 ✛ 2 x 2 Pick some feature ✭ v ❀ w ✮ ✷ ❘ d x ✂ ❘ d y 3 ✿ Transform ✭ x ❀ y ✮ ✼✦ ✭ k ✭ x ❀ v ✮ ❀ l ✭ y ❀ w ✮✮ then measure covariance ❘ d x ✂ ❘ d y ✦ ❘ ✂ ❘ FSIC 2 ✭ X ❀ Y ✮ ❂ cov 2 ✭ x ❀ y ✮ ✘ P xy ❬ k ✭ x ❀ v ✮ ❀ l ✭ y ❀ w ✮❪ ✿ 4/10

Proposal: The Finite-Set Independence Criterion (FSIC) 1 Pick 2 positive definite kernels: k for X , and l for Y . ✏ ✑ � ❦ x � v ❦ 2 ✎ Gaussian kernel: k ✭ x ❀ v ✮ ❂ exp . 2 ✛ 2 x 2 Pick some feature ✭ v ❀ w ✮ ✷ ❘ d x ✂ ❘ d y 3 ✿ Transform ✭ x ❀ y ✮ ✼✦ ✭ k ✭ x ❀ v ✮ ❀ l ✭ y ❀ w ✮✮ then measure covariance ❘ d x ✂ ❘ d y ✦ ❘ ✂ ❘ FSIC 2 ✭ X ❀ Y ✮ ❂ cov 2 ✭ x ❀ y ✮ ✘ P xy ❬ k ✭ x ❀ v ✮ ❀ l ✭ y ❀ w ✮❪ ✿ Data ( v, w ) correlation: 0.97 1 . 0 5 l ( y, w ) y 0 . 5 0 0 . 0 − 2 . 5 0 . 0 2 . 5 0 . 0 0 . 5 1 . 0 k ( x, v ) x 4/10

Proposal: The Finite-Set Independence Criterion (FSIC) 1 Pick 2 positive definite kernels: k for X , and l for Y . ✏ ✑ � ❦ x � v ❦ 2 ✎ Gaussian kernel: k ✭ x ❀ v ✮ ❂ exp . 2 ✛ 2 x 2 Pick some feature ✭ v ❀ w ✮ ✷ ❘ d x ✂ ❘ d y 3 ✿ Transform ✭ x ❀ y ✮ ✼✦ ✭ k ✭ x ❀ v ✮ ❀ l ✭ y ❀ w ✮✮ then measure covariance ❘ d x ✂ ❘ d y ✦ ❘ ✂ ❘ FSIC 2 ✭ X ❀ Y ✮ ❂ cov 2 ✭ x ❀ y ✮ ✘ P xy ❬ k ✭ x ❀ v ✮ ❀ l ✭ y ❀ w ✮❪ ✿ Data ( v, w ) correlation: -0.47 1 . 0 5 l ( y, w ) y 0 . 5 0 0 . 0 − 2 . 5 0 . 0 2 . 5 0 . 0 0 . 5 1 . 0 k ( x, v ) x 4/10

Proposal: The Finite-Set Independence Criterion (FSIC) 1 Pick 2 positive definite kernels: k for X , and l for Y . ✏ ✑ � ❦ x � v ❦ 2 ✎ Gaussian kernel: k ✭ x ❀ v ✮ ❂ exp . 2 ✛ 2 x 2 Pick some feature ✭ v ❀ w ✮ ✷ ❘ d x ✂ ❘ d y 3 ✿ Transform ✭ x ❀ y ✮ ✼✦ ✭ k ✭ x ❀ v ✮ ❀ l ✭ y ❀ w ✮✮ then measure covariance ❘ d x ✂ ❘ d y ✦ ❘ ✂ ❘ FSIC 2 ✭ X ❀ Y ✮ ❂ cov 2 ✭ x ❀ y ✮ ✘ P xy ❬ k ✭ x ❀ v ✮ ❀ l ✭ y ❀ w ✮❪ ✿ Data ( v, w ) correlation: 0.33 1 . 0 5 l ( y, w ) y 0 . 5 0 0 . 0 − 2 . 5 0 . 0 2 . 5 0 . 0 0 . 5 1 . 0 k ( x, v ) x 4/10

Proposal: The Finite-Set Independence Criterion (FSIC) 1 Pick 2 positive definite kernels: k for X , and l for Y . ✏ ✑ � ❦ x � v ❦ 2 ✎ Gaussian kernel: k ✭ x ❀ v ✮ ❂ exp . 2 ✛ 2 x 2 Pick some feature ✭ v ❀ w ✮ ✷ ❘ d x ✂ ❘ d y 3 ✿ Transform ✭ x ❀ y ✮ ✼✦ ✭ k ✭ x ❀ v ✮ ❀ l ✭ y ❀ w ✮✮ then measure covariance ❘ d x ✂ ❘ d y ✦ ❘ ✂ ❘ FSIC 2 ✭ X ❀ Y ✮ ❂ cov 2 ✭ x ❀ y ✮ ✘ P xy ❬ k ✭ x ❀ v ✮ ❀ l ✭ y ❀ w ✮❪ ✿ Data ( v, w ) correlation: 0.023 1 . 0 2 l ( y, w ) 0 . 5 y 0 − 2 0 . 0 0 . 0 0 . 5 1 . 0 − 10 0 10 k ( x, v ) x 4/10

Proposal: The Finite-Set Independence Criterion (FSIC) 1 Pick 2 positive definite kernels: k for X , and l for Y . ✏ ✑ � ❦ x � v ❦ 2 ✎ Gaussian kernel: k ✭ x ❀ v ✮ ❂ exp . 2 ✛ 2 x 2 Pick some feature ✭ v ❀ w ✮ ✷ ❘ d x ✂ ❘ d y 3 ✿ Transform ✭ x ❀ y ✮ ✼✦ ✭ k ✭ x ❀ v ✮ ❀ l ✭ y ❀ w ✮✮ then measure covariance ❘ d x ✂ ❘ d y ✦ ❘ ✂ ❘ FSIC 2 ✭ X ❀ Y ✮ ❂ cov 2 ✭ x ❀ y ✮ ✘ P xy ❬ k ✭ x ❀ v ✮ ❀ l ✭ y ❀ w ✮❪ ✿ Data ( v, w ) correlation: 0.025 1 . 0 2 l ( y, w ) 0 . 5 y 0 − 2 0 . 0 0 . 0 0 . 5 1 . 0 − 10 0 10 k ( x, v ) x 4/10

Proposal: The Finite-Set Independence Criterion (FSIC) 1 Pick 2 positive definite kernels: k for X , and l for Y . ✏ ✑ � ❦ x � v ❦ 2 ✎ Gaussian kernel: k ✭ x ❀ v ✮ ❂ exp . 2 ✛ 2 x 2 Pick some feature ✭ v ❀ w ✮ ✷ ❘ d x ✂ ❘ d y 3 ✿ Transform ✭ x ❀ y ✮ ✼✦ ✭ k ✭ x ❀ v ✮ ❀ l ✭ y ❀ w ✮✮ then measure covariance ❘ d x ✂ ❘ d y ✦ ❘ ✂ ❘ FSIC 2 ✭ X ❀ Y ✮ ❂ cov 2 ✭ x ❀ y ✮ ✘ P xy ❬ k ✭ x ❀ v ✮ ❀ l ✭ y ❀ w ✮❪ ✿ Data ( v, w ) correlation: 0.087 2 l ( y, w ) 0 . 5 y 0 − 2 0 . 0 0 . 0 0 . 5 1 . 0 − 10 0 10 k ( x, v ) x 4/10

General Form of FSIC J FSIC 2 ✭ X ❀ Y ✮ ❂ 1 ❳ cov 2 ✭ x ❀ y ✮ ✘ P xy ❬ k ✭ x ❀ v j ✮ ❀ l ✭ y ❀ w j ✮❪ ❀ J j ❂ 1 j ❂ 1 ✷ ❘ d x ✂ ❘ d y . for J features ❢ ✭ v j ❀ w j ✮ ❣ J Proposition 1. Assume 1 Kernels k and l satisfy some conditions (e.g. Gaussian kernels). 2 Features ❢ ✭ v i ❀ w i ✮ ❣ J i ❂ 1 are drawn from a distribution with a density. Then, for any J ✕ 1 , FSIC ✭ X ❀ Y ✮ ❂ 0 if and only if X and Y are independent Under H 0 ✿ P xy ❂ P x P y , FSIC 2 ✘ weighted sum of J dependent ✤ 2 variables. n ❭ Difficult to get ✭ 1 � ☛ ✮ -quantile for the threshold. 5/10

The Finite-Set Independence Criterion (FSIC) Zoltn Szab Arthur - PowerPoint PPT Presentation

The Finite-Set Independence Criterion (FSIC) Zoltn Szab Arthur Gretton Wittawat Jitkrittum Gatsby Unit University College London wittawat@gatsby.ucl.ac.uk 3rd UCL Workshop on the Theory of Big Data 28 June 2017 1/10 What Is

NEW CRITERIA LABELS Criterion 1. Students Criterion 2. Program Educational Objectives Criterion

GHDL and the economy of EDA FOSS Tristan Gingold - FSiC 2019 1 / 21 What is GHDL ? A Free

Von Mises Failure Criterion Von Mises Criterion . . . in Mechanics of Materials: Computing V

Plan of the Lecture Review: Nyquist stability criterion Todays topic: Nyquist stability

A New Information Criterion A New Information Criterion for the Selection of Subspace Models for

Subspace Information Criterion Subspace Information Criterion for Image Restoration for Image

Shear Strength Chapter 10 Mohrs Failure Criterion 1 4/13/2015 Coulombs

CS70: Jean Walrand: Lecture 23. Bayes Rule, Independence, Mutual Independence 1. Conditional

Finite state automata Finite graphs with labels on edges/nodes Lecture 2 a set of nodes

Finite Automata A finite automaton has a finite set of states with which it accepts or rejects

Finite A to B implies |A| = |B| Cardinality for finite A, B finite-card .1 finite-card .2

Input. A set of men M , and a set of women W . Input. A set of men M , and a set of women W .

Independence Complexes of Finite Groups Casey Pinckney Research Advisors: Dr. Alexander Hulpke,

Order Independence Krzysztof R. Apt CWI and University of Amsterdam Order Independence p.

Higher independence Vera Fischer University of Vienna February 4th, 2020 Vera Fischer

Categorical data Modelling and Independence R.W. Oldford Eikosograms - Dependence/independence

Machine Learning for Signal Processing Detecting faces (& other objects) in images Class 7.

Problem and model selection and model selection Elisabeth Gnatowski Elisabeth Gnatowski

Stochastic Simulation Idea: probabilities samples Get probabilities from samples: X count X

EDDIE: EM-Based Detection of Deviations in Program Execution Published at ISCA 2017 Alireza

Discrete Probabilistic Programming from First Principles Guy Van den Broeck The Fourth

Implementing the LeybourneTaylor test for seasonal unit roots in Stata Christopher F Baum

Probabilistic Graphical Models Probabilistic Graphical Models Structure learning in Bayesian

Week 3: Finish SLR Inference Then Multiple Linear Regression I. Confidence and Prediction

The Finite-Set Independence Criterion (FSIC) Zoltn Szab Arthur - PowerPoint PPT Presentation

The Finite-Set Independence Criterion (FSIC) Zoltn Szab Arthur Gretton Wittawat Jitkrittum Gatsby Unit University College London wittawat@gatsby.ucl.ac.uk 3rd UCL Workshop on the Theory of Big Data 28 June 2017 1/10 What Is

NEW CRITERIA LABELS Criterion 1. Students Criterion 2. Program Educational Objectives Criterion

GHDL and the economy of EDA FOSS Tristan Gingold - FSiC 2019 1 / 21 What is GHDL ? A Free

Von Mises Failure Criterion Von Mises Criterion . . . in Mechanics of Materials: Computing V

Plan of the Lecture Review: Nyquist stability criterion Todays topic: Nyquist stability

A New Information Criterion A New Information Criterion for the Selection of Subspace Models for

Subspace Information Criterion Subspace Information Criterion for Image Restoration for Image

Shear Strength Chapter 10 Mohrs Failure Criterion 1 4/13/2015 Coulombs

CS70: Jean Walrand: Lecture 23. Bayes Rule, Independence, Mutual Independence 1. Conditional

Finite state automata Finite graphs with labels on edges/nodes Lecture 2 a set of nodes

Finite Automata A finite automaton has a finite set of states with which it accepts or rejects

Finite A to B implies |A| = |B| Cardinality for finite A, B finite-card .1 finite-card .2

Input. A set of men M , and a set of women W . Input. A set of men M , and a set of women W .

Independence Complexes of Finite Groups Casey Pinckney Research Advisors: Dr. Alexander Hulpke,

Order Independence Krzysztof R. Apt CWI and University of Amsterdam Order Independence p.

Higher independence Vera Fischer University of Vienna February 4th, 2020 Vera Fischer

Categorical data Modelling and Independence R.W. Oldford Eikosograms - Dependence/independence

Machine Learning for Signal Processing Detecting faces (&amp; other objects) in images Class 7.

Problem and model selection and model selection Elisabeth Gnatowski Elisabeth Gnatowski

Stochastic Simulation Idea: probabilities samples Get probabilities from samples: X count X

EDDIE: EM-Based Detection of Deviations in Program Execution Published at ISCA 2017 Alireza

Discrete Probabilistic Programming from First Principles Guy Van den Broeck The Fourth

Implementing the LeybourneTaylor test for seasonal unit roots in Stata Christopher F Baum

Probabilistic Graphical Models Probabilistic Graphical Models Structure learning in Bayesian

Week 3: Finish SLR Inference Then Multiple Linear Regression I. Confidence and Prediction

Machine Learning for Signal Processing Detecting faces (& other objects) in images Class 7.