the maximum mean discrepancy for training generative
play

The Maximum Mean Discrepancy for Training Generative Adversarial - PowerPoint PPT Presentation

The Maximum Mean Discrepancy for Training Generative Adversarial Networks Arthur Gretton Gatsby Computational Neuroscience Unit, University College London Cardiff, 2018 1/73 A motivation: comparing two samples Given: Samples from unknown


  1. MMD as an integral probability metric Maximum mean discrepancy : smooth function for P vs Q MMD ✭ P ❀ Q ❀ F ✮ ✿❂ sup ❬ E P f ✭ X ✮ � E Q f ✭ Y ✮❪ ❦ f ❦✔ 1 ( F ❂ unit ball in RKHS ❋ ) For characteristic RKHS ❋ , MMD ✭ P ❀ Q ❀ F ✮ ❂ 0 iff P ❂ Q Other choices for witness function class: Bounded continuous [Dudley, 2002] Bounded varation 1 (Kolmogorov metric) [Müller, 1997] Bounded Lipschitz (Wasserstein distances) [Dudley, 2002] 22/73

  2. Integral prob. metric vs feature difference The MMD: Witness f for Gauss and Laplace densities 0.8 f Gauss 0.6 Laplace MMD ✭ P ❀ Q ❀ F ✮ Prob. density and f 0.4 ❂ sup ❬ E P f ✭ X ✮ � E Q f ✭ Y ✮❪ 0.2 f ✷ F 0 −0.2 −0.4 −0.6 −6 −4 −2 0 2 4 6 X 23/73

  3. Integral prob. metric vs feature difference The MMD: use E P f ✭ X ✮ ❂ ❤ ✖ P ❀ f ✐ ❋ MMD ✭ P ❀ Q ❀ F ✮ ❂ sup ❬ E P f ✭ X ✮ � E Q f ✭ Y ✮❪ f ✷ F ❂ sup ❤ f ❀ ✖ P � ✖ Q ✐ ❋ f ✷ F 23/73

  4. Integral prob. metric vs feature difference The MMD: MMD ✭ P ❀ Q ❀ F ✮ ❂ sup ❬ E P f ✭ X ✮ � E Q f ✭ Y ✮❪ f ✷ F ❂ sup ❤ f ❀ ✖ P � ✖ Q ✐ ❋ f ✷ F f 23/73

  5. Integral prob. metric vs feature difference The MMD: MMD ✭ P ❀ Q ❀ F ✮ ❂ sup ❬ E P f ✭ X ✮ � E Q f ✭ Y ✮❪ f f ✷ F ❂ sup ❤ f ❀ ✖ P � ✖ Q ✐ ❋ f ✷ F 23/73

  6. Integral prob. metric vs feature difference The MMD: MMD ✭ P ❀ Q ❀ F ✮ ❂ sup ❬ E P f ✭ X ✮ � E Q f ✭ Y ✮❪ f* f ✷ F ❂ sup ❤ f ❀ ✖ P � ✖ Q ✐ ❋ f ✷ F 23/73

  7. Integral prob. metric vs feature difference The MMD: MMD ✭ P ❀ Q ❀ F ✮ ❂ sup ❬ E P f ✭ X ✮ � E Q f ✭ Y ✮❪ f ✷ F ❂ sup ❤ f ❀ ✖ P � ✖ Q ✐ ❋ f ✷ F ❂ ❦ ✖ P � ✖ Q ❦ Function view and feature view equivalent 23/73

  8. Construction of MMD witness Construction of empirical witness function (proof: next slide!) Observe X ❂ ❢ x 1 ❀ ✿ ✿ ✿ ❀ x n ❣ ✘ P Observe Y ❂ ❢ y 1 ❀ ✿ ✿ ✿ ❀ y n ❣ ✘ Q 24/73

  9. Construction of MMD witness Construction of empirical witness function (proof: next slide!) 24/73

  10. Construction of MMD witness Construction of empirical witness function (proof: next slide!) v 24/73

  11. Construction of MMD witness Construction of empirical witness function (proof: next slide!) v witness ✭ v ✮ ⑤ ④③ ⑥ 24/73

  12. Derivation of empirical witness function Recall the witness function expression f ✄ ✴ ✖ P � ✖ Q 25/73

  13. Derivation of empirical witness function Recall the witness function expression f ✄ ✴ ✖ P � ✖ Q The empirical feature mean for P n ❳ ✖ P ✿❂ 1 ❜ ✬ ✭ x i ✮ n i ❂ 1 25/73

  14. Derivation of empirical witness function Recall the witness function expression f ✄ ✴ ✖ P � ✖ Q The empirical feature mean for P n ❳ ✖ P ✿❂ 1 ❜ ✬ ✭ x i ✮ n i ❂ 1 The empirical witness function at v f ✄ ✭ v ✮ ❂ ❤ f ✄ ❀ ✬ ✭ v ✮ ✐ ❋ 25/73

  15. Derivation of empirical witness function Recall the witness function expression f ✄ ✴ ✖ P � ✖ Q The empirical feature mean for P n ❳ ✖ P ✿❂ 1 ❜ ✬ ✭ x i ✮ n i ❂ 1 The empirical witness function at v f ✄ ✭ v ✮ ❂ ❤ f ✄ ❀ ✬ ✭ v ✮ ✐ ❋ ✴ ❤ ❜ ✖ P � ❜ ✖ Q ❀ ✬ ✭ v ✮ ✐ ❋ 25/73

  16. Derivation of empirical witness function Recall the witness function expression f ✄ ✴ ✖ P � ✖ Q The empirical feature mean for P n ❳ ✖ P ✿❂ 1 ❜ ✬ ✭ x i ✮ n i ❂ 1 The empirical witness function at v f ✄ ✭ v ✮ ❂ ❤ f ✄ ❀ ✬ ✭ v ✮ ✐ ❋ ✴ ❤ ❜ ✖ P � ❜ ✖ Q ❀ ✬ ✭ v ✮ ✐ ❋ n n ❳ ❳ ❂ 1 k ✭ x i ❀ v ✮ � 1 k ✭ y i ❀ v ✮ n n i ❂ 1 i ❂ 1 ❤ ✐ Don’t need explicit feature coefficients f ✄ ✿❂ f ✄ f ✄ ✿ ✿ ✿ 1 2 25/73

  17. Interlude: divergence measures 26/73

  18. Divergences 27/73

  19. Divergences 28/73

  20. Divergences 29/73

  21. Divergences 30/73

  22. Divergences Sriperumbudur, Fukumizu, G, Schoelkopf, Lanckriet (2012) 31/73

  23. Two-Sample Testing with MMD 32/73

  24. ❍ ❂ ❭ ❍ ✻ ❂ ❭ ❭ ☛ ☛ A statistical test using MMD The empirical MMD: ❳ ❳ 2 ❂ 1 1 ❭ k ✭ x i ❀ x j ✮ ✰ k ✭ y i ❀ y j ✮ MMD n ✭ n � 1 ✮ n ✭ n � 1 ✮ i ✻ ❂ j i ✻ ❂ j ❳ � 2 k ✭ x i ❀ y j ✮ n 2 i ❀ j How does this help decide whether P ❂ Q ? 33/73

  25. ❭ ☛ ☛ A statistical test using MMD The empirical MMD: ❳ ❳ 2 ❂ 1 1 ❭ k ✭ x i ❀ x j ✮ ✰ k ✭ y i ❀ y j ✮ MMD n ✭ n � 1 ✮ n ✭ n � 1 ✮ i ✻ ❂ j i ✻ ❂ j ❳ � 2 k ✭ x i ❀ y j ✮ n 2 i ❀ j Perspective from statistical hypothesis testing: Null hypothesis ❍ 0 when P ❂ Q 2 “close to zero”. • should see ❭ MMD Alternative hypothesis ❍ 1 when P ✻ ❂ Q 2 “far from zero” • should see ❭ MMD 33/73

  26. A statistical test using MMD The empirical MMD: ❳ ❳ 2 ❂ 1 1 ❭ k ✭ x i ❀ x j ✮ ✰ k ✭ y i ❀ y j ✮ MMD n ✭ n � 1 ✮ n ✭ n � 1 ✮ i ✻ ❂ j i ✻ ❂ j ❳ � 2 k ✭ x i ❀ y j ✮ n 2 i ❀ j Perspective from statistical hypothesis testing: Null hypothesis ❍ 0 when P ❂ Q 2 “close to zero”. • should see ❭ MMD Alternative hypothesis ❍ 1 when P ✻ ❂ Q 2 “far from zero” • should see ❭ MMD 2 to get false positive rate ☛ Want Threshold c ☛ for ❭ MMD 33/73

  27. 2 when P ✻ ❂ Q Behaviour of ❭ MMD Draw n ❂ 200 i.i.d samples from P and Q Laplace with different y-variance. ♣ n ✂ ❭ 2 ❂ 1 ✿ 2 MMD 10 P 8 Q 6 4 2 0 -2 -4 -6 -8 -10 -2 0 2 34/73

  28. 2 when P ✻ ❂ Q Behaviour of ❭ MMD Draw n ❂ 200 i.i.d samples from P and Q Laplace with different y-variance. ♣ n ✂ ❭ 2 ❂ 1 ✿ 2 MMD 10 8 P 8 Q 7 6 4 6 2 5 0 4 -2 3 -4 -6 2 -8 1 -10 -2 0 2 0 0 0.5 1 1.5 2 2.5 35/73

  29. 2 when P ✻ ❂ Q Behaviour of ❭ MMD Draw n ❂ 200 new samples from P and Q Laplace with different y-variance. ♣ n ✂ ❭ 2 ❂ 1 ✿ 5 MMD 10 4 P 8 Q 3.5 6 4 3 2 2.5 0 2 -2 1.5 -4 -6 1 -8 0.5 -10 -2 0 2 0 0 0.5 1 1.5 2 2.5 36/73

  30. 2 when P ✻ ❂ Q Behaviour of ❭ MMD Repeat this 150 times ✿ ✿ ✿ 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 0 0.5 1 1.5 2 2.5 37/73

  31. 2 when P ✻ ❂ Q Behaviour of ❭ MMD Repeat this 300 times ✿ ✿ ✿ 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 0 0.5 1 1.5 2 2.5 37/73

  32. 2 when P ✻ ❂ Q Behaviour of ❭ MMD Repeat this 3000 times ✿ ✿ ✿ 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 0 0.5 1 1.5 2 2.5 37/73

  33. 2 when P ✻ ❂ Q Asymptotics of ❭ MMD When P ✻ ❂ Q , statistic is asymptotically normal, 2 � MMD ✭ P ❀ Q ✮ ❭ MMD D ♣ � ✦ ◆ ✭ 0 ❀ 1 ✮ ❀ V n ✭ P ❀ Q ✮ � n � 1 ✁ . where variance V n ✭ P ❀ Q ✮ ❂ O Two Laplace distributions with different variances 1.5 P X 1.5 Q X Empirical PDF Gaussian fit Prob. density 1 1 0.5 0 −6 −4 −2 0 2 4 6 0.5 X 0 0 0.5 1 1.5 2 2.5 3 3.5 38/73

  34. 2 when P ❂ Q Behaviour of ❭ MMD What happens when P and Q are the same? 39/73

  35. 2 when P ❂ Q Behaviour of ❭ MMD Case of P ❂ Q ❂ ◆ ✭ 0 ❀ 1 ✮ 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -2 0 2 4 6 40/73

  36. 2 when P ❂ Q Behaviour of ❭ MMD Case of P ❂ Q ❂ ◆ ✭ 0 ❀ 1 ✮ 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -2 0 2 4 6 40/73

  37. 2 when P ❂ Q Behaviour of ❭ MMD Case of P ❂ Q ❂ ◆ ✭ 0 ❀ 1 ✮ 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -2 0 2 4 6 40/73

  38. 2 when P ❂ Q Behaviour of ❭ MMD Case of P ❂ Q ❂ ◆ ✭ 0 ❀ 1 ✮ 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -2 0 2 4 6 40/73

  39. 2 when P ❂ Q Behaviour of ❭ MMD Case of P ❂ Q ❂ ◆ ✭ 0 ❀ 1 ✮ 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -2 0 2 4 6 40/73

  40. 2 when P ❂ Q Asymptotics of ❭ MMD Where P ❂ Q , statistic has asymptotic distribution ✶ ❤ ✐ ❳ 2 ✘ n ❭ z 2 l � 2 MMD ✕ l l ❂ 1 where ❩ ✕ i ✥ i ✭ x ✵ ✮ ❂ k ✭ x ❀ x ✵ ✮ ⑦ ✥ i ✭ x ✮ dP ✭ x ✮ ⑤ ④③ ⑥ 0.6 ❳ centred z l ✘ ◆ ✭ 0 ❀ 2 ✮ i ✿ i ✿ d ✿ 0.4 0.2 0 -2 0 2 4 6 41/73

  41. A statistical test A summary of the asymptotics: 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -2 -1 0 1 2 3 4 5 6 42/73

  42. A statistical test Test construction: (G., Borgwardt, Rasch, Schoelkopf, and Smola, JMLR 2012) 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -2 -1 0 1 2 3 4 5 6 42/73

  43. How do we get test threshold c ☛ ? Original empirical MMD for dogs and fish: ❳ 2 ❂ 1 k ( x i , x j ) k ( x i , y j ) ❭ k ✭ x i ❀ x j ✮ MMD n ✭ n � 1 ✮ i ✻ ❂ j ❳ 1 ✰ k ✭ y i ❀ y j ✮ n ✭ n � 1 ✮ i ✻ ❂ j ❳ � 2 k ( y i , y j ) k ✭ x i ❀ y j ✮ n 2 i ❀ j 43/73

  44. How do we get test threshold c ☛ ? Permuted dog and fish samples ( merdogs ): 44/73

  45. How do we get test threshold c ☛ ? Permuted dog and fish samples ( merdogs ): ❳ 2 ❂ 1 ❭ k ✭⑦ x i ❀ ⑦ x j ✮ MMD n ✭ n � 1 ✮ k (˜ x i , ˜ x j ) k (˜ x i , ˜ y j ) i ✻ ❂ j ❳ 1 ✰ k ✭ ~ y i ❀ ~ y j ✮ n ✭ n � 1 ✮ i ✻ ❂ j ❳ � 2 k ✭⑦ x i ❀ ~ y j ✮ n 2 k (˜ y i , ˜ y j ) i ❀ j Permutation simulates P ❂ Q 44/73

  46. How to choose the best kernel: optimising the kernel parameters 45/73

  47. Graphical illustration Maximising test power same as minimizing false negatives 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -2 -1 0 1 2 3 4 5 6 46/73

  48. Optimizing kernel for test power The power of our test (Pr 1 denotes probability under P ✻ ❂ Q ): ✒ ✓ 2 ❃ ❫ n ❭ Pr 1 MMD c ☛ 47/73

  49. Optimizing kernel for test power The power of our test (Pr 1 denotes probability under P ✻ ❂ Q ): ✒ ✓ 2 ❃ ❫ n ❭ Pr 1 MMD c ☛ ✥ ✦ n MMD 2 ✭ P ❀ Q ✮ c ☛ ✦ ✟ ♣ � ♣ V n ✭ P ❀ Q ✮ V n ✭ P ❀ Q ✮ where ✟ is the CDF of the standard normal distribution. c ☛ is an estimate of c ☛ test threshold. ❫ 47/73

  50. Optimizing kernel for test power The power of our test (Pr 1 denotes probability under P ✻ ❂ Q ): ✒ ✓ 2 ❃ ❫ n ❭ Pr 1 MMD c ☛ ✥ ✦ MMD 2 ✭ P ❀ Q ✮ c ☛ ✦ ✟ ♣ � ♣ V n ✭ P ❀ Q ✮ n V n ✭ P ❀ Q ✮ ⑤ ④③ ⑥ ⑤ ④③ ⑥ O ✭ n 1 ❂ 2 ✮ O ✭ n � 1 ❂ 2 ✮ ♣ V n ✭ P ❀ Q ✮ ✘ O ✭ n � 1 ❂ 2 ✮ Variance under ❍ 1 decreases as For large n , second term negligible! 47/73

  51. Optimizing kernel for test power The power of our test (Pr 1 denotes probability under P ✻ ❂ Q ): ✒ ✓ 2 ❃ ❫ n ❭ Pr 1 MMD c ☛ ✥ ✦ MMD 2 ✭ P ❀ Q ✮ c ☛ ✦ ✟ ♣ V n ✭ P ❀ Q ✮ � ♣ n V n ✭ P ❀ Q ✮ To maximize test power, maximize MMD 2 ✭ P ❀ Q ✮ ♣ V n ✭ P ❀ Q ✮ (Sutherland, Tung, Strathmann, De, Ramdas, Smola, G., ICLR 2017) Code: github.com/dougalsutherland/opt-mmd 47/73

  52. Troubleshooting for generative adversarial networks Samples from a GAN MNIST samples 48/73

  53. Troubleshooting for generative adversarial networks Samples from a GAN MNIST samples Power for optimzed ARD kernel : 1.00 at ☛ ❂ 0 ✿ 01 Power for optimized RBF kernel: 0.57 at ☛ ❂ 0 ✿ 01 48/73 ARD map

  54. Troubleshooting generative adversarial networks 49/73

  55. Training GANs with MMD 50/73

  56. What is a Generative Adversarial Network (GAN)? 51/73

  57. What is a Generative Adversarial Network (GAN)? 51/73

  58. What is a Generative Adversarial Network (GAN)? 51/73

  59. What is a Generative Adversarial Network (GAN)? 51/73

  60. Why is classification not enough? 52/73

  61. MMD for GAN critic Can you use MMD as a critic to train GANs? From ICML 2015: From UAI 2015: 53/73

  62. MMD for GAN critic Can you use MMD as a critic to train GANs? Need better image features. 53/73

  63. How to improve the critic witness Add convolutional features! The critic (teacher) also needs to be trained. How to regularise? MMD GAN Li et al., [NIPS 2017] Coulomb GAN Unterthiner et al., [ICLR 2018] 54/73

  64. WGAN-GP Wasserstein GAN Arjovsky et al. [ICML 2017] WGAN-GP Gukrajani et al. [NIPS 2017] 55/73

  65. WGAN-GP Wasserstein GAN Arjovsky et al. [ICML 2017] WGAN-GP Gukrajani et al. [NIPS 2017] Given a generator G ✒ with parameters ✒ to be trained. Samples Y ✘ G ✒ ✭ Z ✮ where Z ✘ R Given critic features h ✥ with parameters ✥ to be trained. f ✥ a linear function of h ✥ . 55/73

  66. WGAN-GP Wasserstein GAN Arjovsky et al. [ICML 2017] WGAN-GP Gukrajani et al. [NIPS 2017] Given a generator G ✒ with parameters ✒ to be trained. Samples Y ✘ G ✒ ✭ Z ✮ where Z ✘ R Given critic features h ✥ with parameters ✥ to be trained. f ✥ a linear function of h ✥ . WGAN-GP gradient penalty: ✏✌ ✌ ✑ 2 ✌ ✌ X f ✒ ✭ ❢ max E X ✘ P f ✥ ✭ X ✮ � E Z ✘ R f ✥ ✭ G ✒ ✭ Z ✮✮ ✰ ✕ E ❡ ✌ r ❡ X ✮ ✌ � 1 X ✥ where ❢ X ❂ ✌ x i ✰ ✭ 1 � ✌ ✮ G ✥ ✭ z j ✮ x i ✷ ❢ x ❵ ❣ m z j ✷ ❢ z ❵ ❣ n 55/73 ✌ ✘ ❯ ✭❬ 0 ❀ 1 ❪✮ ❵ ❂ 1 ❵ ❂ 1

  67. The (W)MMD Train MMD critic features with the witness function gradient penalty Binkowski, Sutherland, Arbel, G. [ICLR 2018], Bellemare et al. [2017] for energy distance : ✏✌ ✌ ✑ 2 ✌ ✌ MMD 2 ✭ h ✥ ✭ X ✮ ❀ h ✥ ✭ G ✒ ✭ Z ✮✮✮ ✰ ✕ E ❡ X f ✥ ✭ ❢ ✌ r ❡ ✌ � 1 max X ✮ X ✥ where ❢ X ❂ ✌ x i ✰ ✭ 1 � ✌ ✮ G ✥ ✭ z j ✮ x i ✷ ❢ x ❵ ❣ m z j ✷ ❢ z ❵ ❣ n ✌ ✘ ❯ ✭❬ 0 ❀ 1 ❪✮ ❵ ❂ 1 ❵ ❂ 1 Remark by Bottou et al. (2017): gradient penalty modifies the function class. So critic is 56/73 not an MMD in RKHS ❋ .

  68. MMD for GAN critic: revisited From ICLR 2018: 57/73

  69. MMD for GAN critic: revisited Samples are better! 57/73

  70. MMD for GAN critic: revisited Samples are better! Can we do better still? 57/73

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend