relative goodness of fit tests for models with latent
play

Relative Goodness-of-Fit Tests for Models with Latent Variables - PowerPoint PPT Presentation

Relative Goodness-of-Fit Tests for Models with Latent Variables Arthur Gretton Gatsby Computational Neuroscience Unit, University College London June 15, 2019 1/37 Model Criticism 2/37 Model Criticism 2/37 Model Criticism Data =


  1. Relative Goodness-of-Fit Tests for Models with Latent Variables Arthur Gretton Gatsby Computational Neuroscience Unit, University College London June 15, 2019 1/37

  2. Model Criticism 2/37

  3. Model Criticism 2/37

  4. Model Criticism Data = robbery events in Chicago in 2016. 2/37

  5. Model Criticism Is this a good model? 2/37

  6. Model Criticism Goals : Test if a (complicated) model fits the data. 2/37

  7. Model Criticism "All models are wrong." G. Box (1976) 3/37

  8. Relative model comparison Have: two candidate models P and Q , and samples ❢ x i ❣ n i ❂ 1 from reference distribution R Goal: which of P and Q is better? Samples from LSGAN, Samples from GAN, Goodfellow et al. (2014) Mao et al. (2017) Which model is better? 4/37

  9. Most interesting models have latent structure Graphical model representation of hierarchical LDA with a nested CRP prior, Blei et al. (2003) & # c 1 % c 2 $ c 3 z " w ! c L N M 8 5/37 (b)

  10. Outline Relative goodness-of-fit tests for Models with Latent Variables The kernel Stein discrepancy ✎ Comparing two models via samples: MMD and the witness function. ✎ Comparing a sample and a model: Stein modification of the witness class Constructing a relative hypothesis test using the KSD Relative hypothesis tests with latent variables (new, unpublished) 6/37

  11. Kernel Stein Discrepancy Model P , data ❢ x i ❣ n i ❂ 1 ✘ Q . “All models are wrong” ( P ✻ ❂ Q ). 7/37

  12. Integral probability metrics Integral probability metric: Find a "well behaved function" f ✭ x ✮ to maximize E Q f ✭ Y ✮ � E P f ✭ X ✮ Smooth function 1 0.5 f(x) 0 -0.5 -1 0 0.2 0.4 0.6 0.8 1 x 8/37

  13. Integral probability metrics Integral probability metric: Find a "well behaved function" f ✭ x ✮ to maximize E Q f ✭ Y ✮ � E P f ✭ X ✮ Smooth function 1 0.5 f(x) 0 -0.5 -1 0 0.2 0.4 0.6 0.8 1 x 9/37

  14. All of kernel methods Functions are linear combinations of features: ❦ f ❦ 2 ❋ ✿❂ P ✶ i ❂ 1 f i 2 10/37

  15. All of kernel methods “The kernel trick” 0.8 0.6 ✶ f(x) ❳ 0.4 f ✭ x ✮ ❂ f ❵ ✬ ❵ ✭ x ✮ ❵ ❂ 1 0.2 m ❳ ❂ ☛ i k ✭ x i ❀ x ✮ 0 -6 -4 -2 0 2 4 6 8 i ❂ 1 x 11/37

  16. All of kernel methods “The kernel trick” 0.8 0.6 0.4 f(x) 0.2 ✶ ❳ f ✭ x ✮ ❂ f ❵ ✬ ❵ ✭ x ✮ 0 ❵ ❂ 1 -0.2 m ❳ -0.4 ❂ ☛ i k ✭ x i ❀ x ✮ -6 -4 -2 0 2 4 6 8 x i ❂ 1 f ❵ ✿❂ P m i ❂ 1 ☛ i ✬ ❵ ✭ x i ✮ Function of infinitely many features expressed using m coefficients. 11/37

  17. MMD as an integral probability metric Maximum mean discrepancy : smooth function for P vs Q MMD ✭ P ❀ Q ❀ ❋ ✮ ✿❂ sup ❬ E P f ✭ X ✮ � E Q f ✭ Y ✮❪ ❦ f ❦ ❋ ✔ 1 12/37

  18. MMD as an integral probability metric Maximum mean discrepancy : smooth function for P vs Q MMD ✭ P ❀ Q ❀ ❋ ✮ ✿❂ sup ❬ E P f ✭ X ✮ � E Q f ✭ Y ✮❪ ❦ f ❦ ❋ ✔ 1 0.4 0.3 0.2 p ( x ) 0.1 q ( x ) f * ( x ) - 4 - 2 2 4 - 0.1 - 0.2 - 0.3 12/37

  19. MMD as an integral probability metric Maximum mean discrepancy : smooth function for P vs Q MMD ✭ P ❀ Q ❀ ❋ ✮ ✿❂ sup ❬ E P f ✭ X ✮ � E Q f ✭ Y ✮❪ ❦ f ❦ ❋ ✔ 1 For characteristic RKHS ❋ , MMD ✭ P ❀ Q ❀ ❋ ✮ ❂ 0 iff P ❂ Q Other choices for witness function class: Bounded continuous [Dudley, 2002] Bounded variation 1 (Kolmogorov metric) [Müller, 1997] 1-Lipschitz (Wasserstein distances) [Dudley, 2002] 12/37

  20. Statistical model criticism: toy example MMD ✭ P ❀ Q ✮ ❂ sup ❦ f ❦ ❋ ✔ 1 ❬ E q f � E p f ❪ 0.4 0.3 0.2 p ( x ) 0.1 q ( x ) f * ( x ) - 4 - 2 2 4 - 0.1 - 0.2 - 0.3 Can we compute MMD with samples from Q and a model P ? Problem: usualy can’t compute E p f in closed form. 13/37

  21. Stein idea To get rid of E p f in sup ❬ E q f � E p f ❪ ❦ f ❦ ❋ ✔ 1 we define the (1-D) Stein operator 1 d ❬ ❆ p f ❪ ✭ x ✮ ❂ dx ✭ f ✭ x ✮ p ✭ x ✮✮ p ✭ x ✮ Then E p ❆ p f ❂ 0 subject to appropriate boundary conditions. Gorham and Mackey (NeurIPS 15), Oates, Girolami, Chopin (JRSS B 2016) 14/37

  22. Kernel Stein Discrepancy Stein operator 1 d ❆ p f ❂ dx ✭ f ✭ x ✮ p ✭ x ✮✮ p ✭ x ✮ Kernel Stein Discrepancy (KSD) KSD p ✭ Q ✮ ❂ sup E q ❆ p g � E p ❆ p g ❦ g ❦ ❋ ✔ 1 15/37

  23. Kernel Stein Discrepancy Stein operator 1 d ❆ p f ❂ dx ✭ f ✭ x ✮ p ✭ x ✮✮ p ✭ x ✮ Kernel Stein Discrepancy (KSD) KSD p ✭ Q ✮ ❂ sup E q ❆ p g � ✘✘✘ E p ❆ p g ❂ sup E q ❆ p g ✘ ❦ g ❦ ❋ ✔ 1 ❦ g ❦ ❋ ✔ 1 15/37

  24. Kernel Stein Discrepancy Stein operator 1 d ❆ p f ❂ dx ✭ f ✭ x ✮ p ✭ x ✮✮ p ✭ x ✮ Kernel Stein Discrepancy (KSD) KSD p ✭ Q ✮ ❂ sup E q ❆ p g � ✘✘✘ E p ❆ p g ❂ sup E q ❆ p g ✘ ❦ g ❦ ❋ ✔ 1 ❦ g ❦ ❋ ✔ 1 0.4 0.2 p ( x ) - 4 - 2 2 4 q ( x ) - 0.2 g * ( x ) - 0.4 - 0.6 15/37

  25. Kernel Stein Discrepancy Stein operator 1 d ❆ p f ❂ dx ✭ f ✭ x ✮ p ✭ x ✮✮ p ✭ x ✮ Kernel Stein Discrepancy (KSD) KSD p ✭ Q ✮ ❂ sup E q ❆ p g � ✘✘✘ E p ❆ p g ❂ sup E q ❆ p g ✘ ❦ g ❦ ❋ ✔ 1 ❦ g ❦ ❋ ✔ 1 0.4 0.3 p ( x ) 0.2 q ( x ) g * ( x ) 0.1 - 4 - 2 2 4 15/37

  26. Simple expression using kernels Re-write stein operator as: 1 d ❬ ❆ p f ❪ ✭ x ✮ ❂ dx ✭ f ✭ x ✮ p ✭ x ✮✮ p ✭ x ✮ ❂ f ✭ x ✮ d dx log p ✭ x ✮ ✰ d dx f ✭ x ✮ Can we define “Stein features”? ✒ d ✓ f ✭ x ✮ ✰ d ❬ ❆ p f ❪ ✭ x ✮ ❂ dx log p ✭ x ✮ dx f ✭ x ✮ ✡ f ❀ ☛ ❂✿ ✘ ✭ x ✮ ❋ ⑤④③⑥ stein features where E x ✘ p ✘ ✭ x ✮ ❂ 0. 16/37

  27. Simple expression using kernels Re-write stein operator as: 1 d ❬ ❆ p f ❪ ✭ x ✮ ❂ dx ✭ f ✭ x ✮ p ✭ x ✮✮ p ✭ x ✮ ❂ f ✭ x ✮ d dx log p ✭ x ✮ ✰ d dx f ✭ x ✮ Can we define “Stein features”? ✒ d ✓ f ✭ x ✮ ✰ d ❬ ❆ p f ❪ ✭ x ✮ ❂ dx log p ✭ x ✮ dx f ✭ x ✮ ✡ f ❀ ☛ ❂✿ ✘ ✭ x ✮ ❋ ⑤④③⑥ stein features where E x ✘ p ✘ ✭ x ✮ ❂ 0. 16/37

  28. ✭ ✮ ✒ ✓ ❬ ❆ ❪ ✭ ✮ ❂ ✭ ✮ ✭ ✮ ✰ ✭ ✮ ✯ ✰ ✒ ✓ ❂ ❀ ✭ ✮ ✬ ✭ ✮ ✰ ✬ ✭ ✮ ❋ ⑤ ④③ ⑥ ✭ ✮ ❂✿ ❤ ❀ ✘ ✭ ✮ ✐ ❋ ✿ The kernel trick for derivatives Reproducing property for the derivative: for differentiable k ✭ x ❀ x ✵ ✮ , ✜ ✢ d f ❀ d dx f ✭ x ✮ ❂ dx ✬ ✭ x ✮ ❋ 17/37

  29. The kernel trick for derivatives Reproducing property for the derivative: for differentiable k ✭ x ❀ x ✵ ✮ , ✜ ✢ d f ❀ d dx f ✭ x ✮ ❂ dx ✬ ✭ x ✮ ❋ Using kernel derivative trick in ✭ a ✮ , ✒ d ✓ f ✭ x ✮ ✰ d ❬ ❆ p f ❪ ✭ x ✮ ❂ dx log p ✭ x ✮ dx f ✭ x ✮ ✒ d ✯ ✰ ✓ ✬ ✭ x ✮ ✰ d ❂ f ❀ dx log p ✭ x ✮ dx ✬ ✭ x ✮ ❋ ⑤ ④③ ⑥ ✭ a ✮ ❂✿ ❤ f ❀ ✘ ✭ x ✮ ✐ ❋ ✿ 17/37

  30. ✭ ✮ ✒ ✓ ✭ ✮ ❁ ✶ ✿ ✘ Kernel stein discrepancy: derivation Closed-form expression for KSD: given independent x ❀ x ✵ ✘ Q , then KSD p ✭ Q ✮ ❂ sup E x ✘ q ✭❬ ❆ p g ❪ ✭ x ✮✮ ❦ g ❦ ❋ ✔ 1 E x ✘ q ❤ g ❀ ✘ x ✐ ❋ ❂ sup ❦ g ❦ ❋ ✔ 1 ❂ sup ❤ g ❀ E x ✘ q ✘ x ✐ ❋ ❂ ❦ E x ✘ q ✘ x ❦ ❋ ✭ a ✮ ❦ g ❦ ❋ ✔ 1 Chwialkowski, Strathmann, G., (ICML 2016) Liu, Lee, Jordan (ICML 2016) 18/37

  31. ✭ ✮ ✒ ✓ ✭ ✮ ❁ ✶ ✿ ✘ Kernel stein discrepancy: derivation Closed-form expression for KSD: given independent x ❀ x ✵ ✘ Q , then KSD p ✭ Q ✮ ❂ sup E x ✘ q ✭❬ ❆ p g ❪ ✭ x ✮✮ ❦ g ❦ ❋ ✔ 1 E x ✘ q ❤ g ❀ ✘ x ✐ ❋ ❂ sup ❦ g ❦ ❋ ✔ 1 ❂ sup ❤ g ❀ E x ✘ q ✘ x ✐ ❋ ❂ ❦ E x ✘ q ✘ x ❦ ❋ ✭ a ✮ ❦ g ❦ ❋ ✔ 1 Chwialkowski, Strathmann, G., (ICML 2016) Liu, Lee, Jordan (ICML 2016) 18/37

  32. Kernel stein discrepancy: derivation Closed-form expression for KSD: given independent x ❀ x ✵ ✘ Q , then KSD p ✭ Q ✮ ❂ sup E x ✘ q ✭❬ ❆ p g ❪ ✭ x ✮✮ ❦ g ❦ ❋ ✔ 1 E x ✘ q ❤ g ❀ ✘ x ✐ ❋ ❂ sup ❦ g ❦ ❋ ✔ 1 ❂ sup ❤ g ❀ E x ✘ q ✘ x ✐ ❋ ❂ ❦ E x ✘ q ✘ x ❦ ❋ ✭ a ✮ ❦ g ❦ ❋ ✔ 1 Caution: ✭ a ✮ requires a condition for the Riesz theorem to hold, ✒ d ✓ 2 E x ✘ q dx log p ✭ x ✮ ❁ ✶ ✿ Chwialkowski, Strathmann, G., (ICML 2016) Liu, Lee, Jordan (ICML 2016) 18/37

  33. The witness function: Chicago Crime Model ❂ 10-component p Gaussian mixture. 19/37

  34. The witness function: Chicago Crime Witness function shows g mismatch 19/37

  35. Does the Riesz condition matter? Consider the standard normal, 1 ✏ ✑ � x 2 ❂ 2 ♣ p ✭ x ✮ ❂ 2 ✙ exp ✿ Then d dx log p ✭ x ✮ ❂ � x ✿ If q is a Cauchy distribution, then the integral ✒ d ❩ ✶ ✓ 2 x 2 q ✭ x ✮ dx E x ✘ q dx log p ✭ x ✮ ❂ �✶ is undefined. 20/37

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend