cycle consistent adversarial learning as approximate
play

Cycle-Consistent Adversarial Learning as Approximate Bayesian - PowerPoint PPT Presentation

Cycle-Consistent Adversarial Learning as Approximate Bayesian Inference Louis C. Tiao 1 Edwin V. Bonilla 2 Fabio Ramos 1 July 22, 2018 1 University of Sydney, 2 University of New South Wales Motivation: Unpaired Image-to-Image Translation


  1. Cycle-Consistent Adversarial Learning as Approximate Bayesian Inference Louis C. Tiao 1 Edwin V. Bonilla 2 Fabio Ramos 1 July 22, 2018 1 University of Sydney, 2 University of New South Wales

  2. Motivation: Unpaired Image-to-Image Translation ⋯ Figure 1: From Zhu et al. (2017) ⋯ ⋯ 1 Monet Photos Zebras Horses Summer Winter Monet photo zebra horse summer winter photo Monet horse zebra winter summer Photograph Monet Van Gogh Cezanne Ukiyo-e Paired Unpaired X Y x i y i ( ) ( ) n o , n o , , n o ,

  3. Cycle-Consistent Adversarial Learning (CycleGAN) gan Encourage tighter correspondences—must be able to reconstruct Cycle-consistency losses • Introduced by Kim et al. (2017); Zhu et al. (2017) gan 2 Yield realistic outputs in the other domain. Distribution matching (gan objectives) • Forward and reverse mappings m φ : x �→ z and µ θ : z �→ x • Discriminators D α and D β ℓ reverse ( α ; φ ) = E p ∗ ( z ) [log D α ( z )] + E q ∗ ( x ) [log( 1 − D α ( m φ ( x )))] , ( β ; θ ) = E p ∗ ( x ) [log D β ( x )] + E p ∗ ( z ) [log( 1 − D β ( µ θ ( z )))] . ℓ forward output from input and vice versa. May alleviate mode-collapse const ( θ , φ ) = E q ∗ ( x ) [ ∥ x − µ θ ( m φ ( x )) ∥ ρ ρ ] , ℓ reverse const ( θ , φ ) = E p ∗ ( z ) [ ∥ z − m φ ( µ θ ( z )) ∥ ρ ℓ forward ρ ] .

  4. Contributions We cast the problem of learning inter-domain correspondences variable model (lvm) . 1. We introduce implicit latent variable models (ilvms) , • prior over latent variables specified flexibly as implicit distribution . 2. We develop a new variational inference (vi) algorithm based on • minimizing the symmetric Kullback-Leibler (kl) divergence • between a variational and exact joint distribution . 3. We demonstrate that cyclegan (Kim et al., 2017; Zhu et al., 2017) can be instantiated as a special case of our framework. 3 without paired data as approximate Bayesian inference in a latent

  5. Implicit Latent Variable Models z n treatment of prior information. • Offers utmost degree of flexibility in • Given only by a finite collection Implicit Prior usual) Join Distribution N Prescribed Likelihood x n likelihood prior 4 Likelihood p θ ( x n | z n ) is prescribed (as p ∗ ( z ) p θ ( x , z ) = p θ ( x | z ) � �� � � �� � Prior p ∗ ( z ) over latent variables specified as implicit distribution θ Z ∗ = { z ∗ m } M m = 1 of its samples, z ∗ m ∼ p ∗ ( z )

  6. Implicit Latent Variable Models: Example Unpaired Image-to-Image Translation one domain. 5 • Prior distribution p ∗ ( z ) specified by images Z ∗ = { z ∗ m } M m = 1 from • Empirical data distribution q ∗ ( x ) specified by images X ∗ = { x n } N n = 1 from another domain. (a) samples from p ∗ ( z ) (b) a sample from q ∗ ( x )

  7. Inference in Implicit Latent Variable Models Having specified the generative model, our aims are Classical Variational Inference • Reduces inference problem to optimization problem 6 • Optimize θ by maximizing marginal likelihood p θ ( x ) • Infer hidden representations z by computing posterior p θ ( z | x ) Both require intractable p θ ( x ) • must resort to approximate inference • Approximate exact posterior p θ ( z | x ) with variational posterior q φ ( z | x ) φ kl [ q φ ( z | x ) ∥ p θ ( z | x )] min

  8. Symmetric Joint-Matching Variational Inference

  9. Joint-Matching Variational Inference Variational Joint • Consider instead directly approximating the exact joint with variational joint x n z n N 7 q φ ( x , z ) = q φ ( z | x ) q ∗ ( x ) • variational posterior q φ ( z | x ) also prescribed φ θ

  10. Symmetric Joint-Matching Variational Inference Minimize symmetric kl divergence between joints for details) 2. Helps avoid under/over-dispersed approximations (see paper 1. Because we can: Why? reverse kl 8 where forward kl kl symm [ p θ ( x , z ) ∥ q φ ( x , z )] kl symm [ p ∥ q ] = kl [ p ∥ q ] + kl [ q ∥ p ] � �� � � �� � • kl symm [ p θ ( x , z ) ∥ q φ ( x , z )] tractable • kl symm [ p θ ( z | x ) ∥ q φ ( z | x )] intractable

  11. Reverse kl Variational Objective constant sample! intractable • Minimizing reverse kl divergence between joints equivalent to • Recall (negative) elbo, 9 maximizing usual evidence lower bound (elbo) , kl [ q φ ( x , z ) ∥ p θ ( x , z )] = E q φ ( x , z ) [log q φ ( x , z ) − log p θ ( x , z )] − H [ q ∗ ( x )] = E q φ ( x , z ) [log q φ ( z | x ) − log p θ ( x , z )] � �� � � �� � L nelbo ( θ , φ ) + E q ∗ ( x ) kl [ q φ ( z | x ) ∥ p ∗ ( z )] L nelbo ( θ , φ ) = E q ∗ ( x ) q φ ( z | x ) [ − log p θ ( x | z )] � �� � � �� � L nell ( θ , φ ) • kl term is intractable as prior p ∗ ( z ) is unavailable—can only

  12. Forward kl Variational Objective constant unavailable—can only sample! intractable • Minimizing forward kl divergence between joints (aplbo) 10 kl [ p θ ( x , z ) ∥ q φ ( x , z )] = E p θ ( x , z ) [log p θ ( x , z ) − log q φ ( x , z )] − H [ p ∗ ( z )] = E p θ ( x , z ) [log p θ ( x | z ) − log q φ ( x , z )] � �� � � �� � L naplbo ( θ , φ ) • New variational objective, aggregate posterior lower bound + E p ∗ ( z ) kl [ p θ ( x | z ) ∥ q ∗ ( x )] L naplbo ( θ , φ ) = E p ∗ ( z ) p θ ( x | z ) [ − log q φ ( z | x )] � �� � � �� � L nelp ( θ , φ ) • kl term is intractable as empirical data distribution q ∗ ( x ) is

  13. Density Ratio Estimation and f -divergence Approximation f • Estimate divergence using a l.b. that just requires samples! • Turns divergence estimation into an optimization problem f where tractable General f -divergence lower bound (Nguyen et al., 2010) 11 intractable For convex lower-semicontinuous function f : R + → R , E q ∗ ( x ) D f [ p ∗ ( z ) ∥ q φ ( z | x )] ≥ max α L latent ( α ; φ ) , � �� � � �� � ( α ; φ ) = E q ∗ ( x ) q φ ( z | x ) [ f ′ ( r α ( z ; x ))] − E q ∗ ( x ) p ∗ ( z ) [ f ⋆ ( f ′ ( r α ( z ; x )))] L latent • r α is a neural net with parameters α , with equality at α ( z ; x ) = q φ ( z | x ) r ∗ p ∗ ( z )

  14. kl divergence lower bound tractable tractable kl tractable intractable tractable Example: kl divergence lower bound Yields estimate of the elbo where all terms are tractable, kl where 12 intractable kl For f ( u ) = u log u , we instantiate the kl lower bound E q ∗ ( x ) kl [ q φ ( z | x ) ∥ p ∗ ( z )] ≥ max α L latent ( α ; φ ) � �� � � �� � L latent ( α ; φ ) = E q ∗ ( x ) q φ ( z | x ) [log r α ( z ; x )] − E q ∗ ( x ) p ∗ ( z ) [ r α ( z ; x ) − 1 ] + E q ∗ ( x ) kl [ q φ ( z | x ) ∥ p ∗ ( z )] L nelbo ( θ , φ ) = L nell ( θ , φ ) � �� � � �� � ≥ max α L nell ( θ , φ ) + L latent ( α ; φ ) � �� � � �� �

  15. CycleGAN as a Special Case

  16. Cycle-consistency as Conditional Probability Maximization For Gaussian likelihood and variational posterior prior probabilities: Cycle-consistency corresponds to maximizing conditional 13 p θ ( x | z ) = N ( x | µ θ ( z ) , τ 2 I ) , q φ ( z | x ) = N ( z | m φ ( x ) , t 2 I ) const ( θ , φ ) from L nell ( θ , φ ) Can instantiate ℓ reverse as posterior q φ ( z | x ) degenerates (as t → 0) const ( θ , φ ) from L nelp ( θ , φ ) Can instantiate ℓ forward as likelihood p θ ( x | z ) degenerates (as τ → 0) • ell. forces q φ ( z | x ) to place mass on hidden representations that recover the data • elp. forces p θ ( x | z ) to generate observations that recover the

  17. Distribution Matching as Regularization f gan kl gan For appropriate setting of f , and simplifying the mappings and Summary Approximately minimizes intractable divergences: kl gan f discriminators, gan 14 • Can instantiate ℓ reverse ( α ; φ ) from L latent ( α ; φ ) ( β ; θ ) from L observed • Can instantiate ℓ forward ( β ; θ ) • D f [ p ∗ ( z ) ∥ q φ ( z | x )] — forces q φ ( z | x ) to match prior p ∗ ( z ) • D f [ q ∗ ( x ) ∥ p θ ( x | z )] — forces p θ ( x | z ) to match data q ∗ ( x ) L nelbo ( θ , φ ) ≥ max α L nell ( θ , φ ) + L latent ( α ; φ ) � �� � � �� � const ( θ , φ ) ( α ; φ ) ℓ reverse ℓ reverse L naplbo ( θ , φ ) ≥ max β L nelp ( θ , φ ) + L observed ( β ; θ ) � �� � � �� � const ( θ , φ ) ( β ; θ ) ℓ forward ℓ forward

  18. Conclusion • Formulated implicit latent variable models , which introduces implicit prior over latent variables knowledge • Developed new paradigm for variational inference • directly approximates exact joint distribution • minimizes the symmetric kl divergence • Provided theoretical treatment of the links between CycleGAN methods and Variational Bayes Poster Session To find out more, come visit us at our poster! Poster #14, Session 4 (17:10-18:00 Saturday, 14 July) 15 • Offers utmost degree of flexibility in incorporating prior

  19. Questions? 15

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend