Relative Goodness-of-Fit Tests for Models with Latent Variables - PowerPoint PPT Presentation

Relative Goodness-of-Fit Tests for Models with Latent Variables Arthur Gretton Gatsby Computational Neuroscience Unit, University College London June 15, 2019 1/37

Model Criticism 2/37

Model Criticism Data = robbery events in Chicago in 2016. 2/37

Model Criticism Is this a good model? 2/37

Model Criticism Goals : Test if a (complicated) model fits the data. 2/37

Model Criticism "All models are wrong." G. Box (1976) 3/37

Relative model comparison Have: two candidate models P and Q , and samples ❢ x i ❣ n i ❂ 1 from reference distribution R Goal: which of P and Q is better? Samples from LSGAN, Samples from GAN, Goodfellow et al. (2014) Mao et al. (2017) Which model is better? 4/37

Most interesting models have latent structure Graphical model representation of hierarchical LDA with a nested CRP prior, Blei et al. (2003) & # c 1 % c 2 $ c 3 z " w ! c L N M 8 5/37 (b)

Outline Relative goodness-of-fit tests for Models with Latent Variables The kernel Stein discrepancy ✎ Comparing two models via samples: MMD and the witness function. ✎ Comparing a sample and a model: Stein modification of the witness class Constructing a relative hypothesis test using the KSD Relative hypothesis tests with latent variables (new, unpublished) 6/37

Kernel Stein Discrepancy Model P , data ❢ x i ❣ n i ❂ 1 ✘ Q . “All models are wrong” ( P ✻ ❂ Q ). 7/37

Integral probability metrics Integral probability metric: Find a "well behaved function" f ✭ x ✮ to maximize E Q f ✭ Y ✮ � E P f ✭ X ✮ Smooth function 1 0.5 f(x) 0 -0.5 -1 0 0.2 0.4 0.6 0.8 1 x 8/37

Integral probability metrics Integral probability metric: Find a "well behaved function" f ✭ x ✮ to maximize E Q f ✭ Y ✮ � E P f ✭ X ✮ Smooth function 1 0.5 f(x) 0 -0.5 -1 0 0.2 0.4 0.6 0.8 1 x 9/37

All of kernel methods Functions are linear combinations of features: ❦ f ❦ 2 ❋ ✿❂ P ✶ i ❂ 1 f i 2 10/37

All of kernel methods “The kernel trick” 0.8 0.6 ✶ f(x) ❳ 0.4 f ✭ x ✮ ❂ f ❵ ✬ ❵ ✭ x ✮ ❵ ❂ 1 0.2 m ❳ ❂ ☛ i k ✭ x i ❀ x ✮ 0 -6 -4 -2 0 2 4 6 8 i ❂ 1 x 11/37

All of kernel methods “The kernel trick” 0.8 0.6 0.4 f(x) 0.2 ✶ ❳ f ✭ x ✮ ❂ f ❵ ✬ ❵ ✭ x ✮ 0 ❵ ❂ 1 -0.2 m ❳ -0.4 ❂ ☛ i k ✭ x i ❀ x ✮ -6 -4 -2 0 2 4 6 8 x i ❂ 1 f ❵ ✿❂ P m i ❂ 1 ☛ i ✬ ❵ ✭ x i ✮ Function of infinitely many features expressed using m coefficients. 11/37

MMD as an integral probability metric Maximum mean discrepancy : smooth function for P vs Q MMD ✭ P ❀ Q ❀ ❋ ✮ ✿❂ sup ❬ E P f ✭ X ✮ � E Q f ✭ Y ✮❪ ❦ f ❦ ❋ ✔ 1 12/37

MMD as an integral probability metric Maximum mean discrepancy : smooth function for P vs Q MMD ✭ P ❀ Q ❀ ❋ ✮ ✿❂ sup ❬ E P f ✭ X ✮ � E Q f ✭ Y ✮❪ ❦ f ❦ ❋ ✔ 1 0.4 0.3 0.2 p ( x ) 0.1 q ( x ) f * ( x ) - 4 - 2 2 4 - 0.1 - 0.2 - 0.3 12/37

MMD as an integral probability metric Maximum mean discrepancy : smooth function for P vs Q MMD ✭ P ❀ Q ❀ ❋ ✮ ✿❂ sup ❬ E P f ✭ X ✮ � E Q f ✭ Y ✮❪ ❦ f ❦ ❋ ✔ 1 For characteristic RKHS ❋ , MMD ✭ P ❀ Q ❀ ❋ ✮ ❂ 0 iff P ❂ Q Other choices for witness function class: Bounded continuous [Dudley, 2002] Bounded variation 1 (Kolmogorov metric) [Müller, 1997] 1-Lipschitz (Wasserstein distances) [Dudley, 2002] 12/37

Statistical model criticism: toy example MMD ✭ P ❀ Q ✮ ❂ sup ❦ f ❦ ❋ ✔ 1 ❬ E q f � E p f ❪ 0.4 0.3 0.2 p ( x ) 0.1 q ( x ) f * ( x ) - 4 - 2 2 4 - 0.1 - 0.2 - 0.3 Can we compute MMD with samples from Q and a model P ? Problem: usualy can’t compute E p f in closed form. 13/37

Stein idea To get rid of E p f in sup ❬ E q f � E p f ❪ ❦ f ❦ ❋ ✔ 1 we define the (1-D) Stein operator 1 d ❬ ❆ p f ❪ ✭ x ✮ ❂ dx ✭ f ✭ x ✮ p ✭ x ✮✮ p ✭ x ✮ Then E p ❆ p f ❂ 0 subject to appropriate boundary conditions. Gorham and Mackey (NeurIPS 15), Oates, Girolami, Chopin (JRSS B 2016) 14/37

Kernel Stein Discrepancy Stein operator 1 d ❆ p f ❂ dx ✭ f ✭ x ✮ p ✭ x ✮✮ p ✭ x ✮ Kernel Stein Discrepancy (KSD) KSD p ✭ Q ✮ ❂ sup E q ❆ p g � E p ❆ p g ❦ g ❦ ❋ ✔ 1 15/37

Kernel Stein Discrepancy Stein operator 1 d ❆ p f ❂ dx ✭ f ✭ x ✮ p ✭ x ✮✮ p ✭ x ✮ Kernel Stein Discrepancy (KSD) KSD p ✭ Q ✮ ❂ sup E q ❆ p g � ✘✘✘ E p ❆ p g ❂ sup E q ❆ p g ✘ ❦ g ❦ ❋ ✔ 1 ❦ g ❦ ❋ ✔ 1 15/37

Kernel Stein Discrepancy Stein operator 1 d ❆ p f ❂ dx ✭ f ✭ x ✮ p ✭ x ✮✮ p ✭ x ✮ Kernel Stein Discrepancy (KSD) KSD p ✭ Q ✮ ❂ sup E q ❆ p g � ✘✘✘ E p ❆ p g ❂ sup E q ❆ p g ✘ ❦ g ❦ ❋ ✔ 1 ❦ g ❦ ❋ ✔ 1 0.4 0.2 p ( x ) - 4 - 2 2 4 q ( x ) - 0.2 g * ( x ) - 0.4 - 0.6 15/37

Kernel Stein Discrepancy Stein operator 1 d ❆ p f ❂ dx ✭ f ✭ x ✮ p ✭ x ✮✮ p ✭ x ✮ Kernel Stein Discrepancy (KSD) KSD p ✭ Q ✮ ❂ sup E q ❆ p g � ✘✘✘ E p ❆ p g ❂ sup E q ❆ p g ✘ ❦ g ❦ ❋ ✔ 1 ❦ g ❦ ❋ ✔ 1 0.4 0.3 p ( x ) 0.2 q ( x ) g * ( x ) 0.1 - 4 - 2 2 4 15/37

Simple expression using kernels Re-write stein operator as: 1 d ❬ ❆ p f ❪ ✭ x ✮ ❂ dx ✭ f ✭ x ✮ p ✭ x ✮✮ p ✭ x ✮ ❂ f ✭ x ✮ d dx log p ✭ x ✮ ✰ d dx f ✭ x ✮ Can we define “Stein features”? ✒ d ✓ f ✭ x ✮ ✰ d ❬ ❆ p f ❪ ✭ x ✮ ❂ dx log p ✭ x ✮ dx f ✭ x ✮ ✡ f ❀ ☛ ❂✿ ✘ ✭ x ✮ ❋ ⑤④③⑥ stein features where E x ✘ p ✘ ✭ x ✮ ❂ 0. 16/37

✭ ✮ ✒ ✓ ❬ ❆ ❪ ✭ ✮ ❂ ✭ ✮ ✭ ✮ ✰ ✭ ✮ ✯ ✰ ✒ ✓ ❂ ❀ ✭ ✮ ✬ ✭ ✮ ✰ ✬ ✭ ✮ ❋ ⑤ ④③ ⑥ ✭ ✮ ❂✿ ❤ ❀ ✘ ✭ ✮ ✐ ❋ ✿ The kernel trick for derivatives Reproducing property for the derivative: for differentiable k ✭ x ❀ x ✵ ✮ , ✜ ✢ d f ❀ d dx f ✭ x ✮ ❂ dx ✬ ✭ x ✮ ❋ 17/37

The kernel trick for derivatives Reproducing property for the derivative: for differentiable k ✭ x ❀ x ✵ ✮ , ✜ ✢ d f ❀ d dx f ✭ x ✮ ❂ dx ✬ ✭ x ✮ ❋ Using kernel derivative trick in ✭ a ✮ , ✒ d ✓ f ✭ x ✮ ✰ d ❬ ❆ p f ❪ ✭ x ✮ ❂ dx log p ✭ x ✮ dx f ✭ x ✮ ✒ d ✯ ✰ ✓ ✬ ✭ x ✮ ✰ d ❂ f ❀ dx log p ✭ x ✮ dx ✬ ✭ x ✮ ❋ ⑤ ④③ ⑥ ✭ a ✮ ❂✿ ❤ f ❀ ✘ ✭ x ✮ ✐ ❋ ✿ 17/37

✭ ✮ ✒ ✓ ✭ ✮ ❁ ✶ ✿ ✘ Kernel stein discrepancy: derivation Closed-form expression for KSD: given independent x ❀ x ✵ ✘ Q , then KSD p ✭ Q ✮ ❂ sup E x ✘ q ✭❬ ❆ p g ❪ ✭ x ✮✮ ❦ g ❦ ❋ ✔ 1 E x ✘ q ❤ g ❀ ✘ x ✐ ❋ ❂ sup ❦ g ❦ ❋ ✔ 1 ❂ sup ❤ g ❀ E x ✘ q ✘ x ✐ ❋ ❂ ❦ E x ✘ q ✘ x ❦ ❋ ✭ a ✮ ❦ g ❦ ❋ ✔ 1 Chwialkowski, Strathmann, G., (ICML 2016) Liu, Lee, Jordan (ICML 2016) 18/37

Kernel stein discrepancy: derivation Closed-form expression for KSD: given independent x ❀ x ✵ ✘ Q , then KSD p ✭ Q ✮ ❂ sup E x ✘ q ✭❬ ❆ p g ❪ ✭ x ✮✮ ❦ g ❦ ❋ ✔ 1 E x ✘ q ❤ g ❀ ✘ x ✐ ❋ ❂ sup ❦ g ❦ ❋ ✔ 1 ❂ sup ❤ g ❀ E x ✘ q ✘ x ✐ ❋ ❂ ❦ E x ✘ q ✘ x ❦ ❋ ✭ a ✮ ❦ g ❦ ❋ ✔ 1 Caution: ✭ a ✮ requires a condition for the Riesz theorem to hold, ✒ d ✓ 2 E x ✘ q dx log p ✭ x ✮ ❁ ✶ ✿ Chwialkowski, Strathmann, G., (ICML 2016) Liu, Lee, Jordan (ICML 2016) 18/37

The witness function: Chicago Crime Model ❂ 10-component p Gaussian mixture. 19/37

The witness function: Chicago Crime Witness function shows g mismatch 19/37

Does the Riesz condition matter? Consider the standard normal, 1 ✏ ✑ � x 2 ❂ 2 ♣ p ✭ x ✮ ❂ 2 ✙ exp ✿ Then d dx log p ✭ x ✮ ❂ � x ✿ If q is a Cauchy distribution, then the integral ✒ d ❩ ✶ ✓ 2 x 2 q ✭ x ✮ dx E x ✘ q dx log p ✭ x ✮ ❂ �✶ is undefined. 20/37

Relative Goodness-of-Fit Tests for Models with Latent Variables - PowerPoint PPT Presentation

Relative Goodness-of-Fit Tests for Models with Latent Variables Arthur Gretton Gatsby Computational Neuroscience Unit, University College London June 15, 2019 1/37 Model Criticism 2/37 Model Criticism 2/37 Model Criticism Data =

Goodness of Fit Tests Marc H. Mehlman marcmehlman@yahoo.com University of New Haven Marc

Goodness of Fit Tests Marc H. Mehlman marcmehlman@yahoo.com University of New Haven Marc

Goodness of Fit & Contingency Tests Brandan Victor Hasan Outline: Goodness of

Lectures 2 and 3: Goodness of Fit Applied Statistics 2014 1 / 36 GoF testing EDF tests

Statistics for Applications Chapter 6: Testing goodness of fit 1/25 Goodness of fit

Goodness-of-Fit Testing with Empirical Copulas Sami Umut Can John Einmahl Roger Laeven

Chapter 10 2 tests for goodness of fit and independence Prof. Tesler Math 186 Winter 2018 Ch.

GOODNESS LEADS GOODNESS LEADS The intentions inside shape the actions outside! When we operate

2.4 OLS: Goodness of Fit and Bias ECON 480 Econometrics Fall 2020 Ryan Safner

Ordinary Least Squares (Linear) Regression Department of Political Science and Government Aarhus

for Poisson Regression 1 Outline Example 3: Recall of Stressful Events Goodness of fit

Estatstica e Modelos Probabilsticos - COE241 Aula passada Aula de hoje Goodness of fit:

Residuals and Goodness-of-fit tests for marked Gibbs point processes Fr ed eric Lavancier

1 Latent variable models In the next section we will discuss latent variable models for

Part III: Latent Tree Models Le Song ICML 2012 Tutorial on Spectral Algorithms for Latent

Models for Replicated Discrimination Tests: A Synthesis of Latent Class Mixture Models and

Overview Introduction & Background aka Theory Complex phenomena deserve Goals,

06-10-2014 Institut for Produktionsdyr og Heste Optimal replacement policies for dairy cows based

RESHAPING THE SCIENCE OF RELIABILITY WITH THE ENTROPY FUNCTION Paolo Rocchi Giulia Capacci IBM

Singlet Contributions to the Vector Correlator Computational Theoretical Particle Physics B AC

NEUT model improvements and external data fits Tom Feusels for T2K Collaboration University of

Composition of Power Series, Change of Basis and Orthogonal Polynomials Bruno Salvy

John Barnes NOAA/ESRL/Global Monitoring Division N. C. Sharma, Central Connecticut State

Data Structure Definition Array implementation Begin Data Structures Grand Tour Minesweeper

Relative Goodness-of-Fit Tests for Models with Latent Variables - PowerPoint PPT Presentation

Relative Goodness-of-Fit Tests for Models with Latent Variables Arthur Gretton Gatsby Computational Neuroscience Unit, University College London June 15, 2019 1/37 Model Criticism 2/37 Model Criticism 2/37 Model Criticism Data =

Goodness of Fit Tests Marc H. Mehlman marcmehlman@yahoo.com University of New Haven Marc

Goodness of Fit Tests Marc H. Mehlman marcmehlman@yahoo.com University of New Haven Marc

Goodness of Fit &amp; Contingency Tests Brandan Victor Hasan Outline: Goodness of

Lectures 2 and 3: Goodness of Fit Applied Statistics 2014 1 / 36 GoF testing EDF tests

Statistics for Applications Chapter 6: Testing goodness of fit 1/25 Goodness of fit

Goodness-of-Fit Testing with Empirical Copulas Sami Umut Can John Einmahl Roger Laeven

Chapter 10 2 tests for goodness of fit and independence Prof. Tesler Math 186 Winter 2018 Ch.

GOODNESS LEADS GOODNESS LEADS The intentions inside shape the actions outside! When we operate

2.4 OLS: Goodness of Fit and Bias ECON 480 Econometrics Fall 2020 Ryan Safner

Ordinary Least Squares (Linear) Regression Department of Political Science and Government Aarhus

for Poisson Regression 1 Outline Example 3: Recall of Stressful Events Goodness of fit

Estatstica e Modelos Probabilsticos - COE241 Aula passada Aula de hoje Goodness of fit:

Residuals and Goodness-of-fit tests for marked Gibbs point processes Fr ed eric Lavancier

1 Latent variable models In the next section we will discuss latent variable models for

Part III: Latent Tree Models Le Song ICML 2012 Tutorial on Spectral Algorithms for Latent

Models for Replicated Discrimination Tests: A Synthesis of Latent Class Mixture Models and

Overview Introduction &amp; Background aka Theory Complex phenomena deserve Goals,

06-10-2014 Institut for Produktionsdyr og Heste Optimal replacement policies for dairy cows based

RESHAPING THE SCIENCE OF RELIABILITY WITH THE ENTROPY FUNCTION Paolo Rocchi Giulia Capacci IBM

Singlet Contributions to the Vector Correlator Computational Theoretical Particle Physics B AC

NEUT model improvements and external data fits Tom Feusels for T2K Collaboration University of

Composition of Power Series, Change of Basis and Orthogonal Polynomials Bruno Salvy

John Barnes NOAA/ESRL/Global Monitoring Division N. C. Sharma, Central Connecticut State

Data Structure Definition Array implementation Begin Data Structures Grand Tour Minesweeper

Goodness of Fit & Contingency Tests Brandan Victor Hasan Outline: Goodness of

Overview Introduction & Background aka Theory Complex phenomena deserve Goals,