ammi introduction to deep learning 10 1 generative
play

AMMI Introduction to Deep Learning 10.1. Generative Adversarial - PowerPoint PPT Presentation

AMMI Introduction to Deep Learning 10.1. Generative Adversarial Networks Fran cois Fleuret https://fleuret.org/ammi-2018/ Thu Sep 6 16:09:56 CAT 2018 COLE POLYTECHNIQUE FDRALE DE LAUSANNE A different approach to learn


  1. AMMI – Introduction to Deep Learning 10.1. Generative Adversarial Networks Fran¸ cois Fleuret https://fleuret.org/ammi-2018/ Thu Sep 6 16:09:56 CAT 2018 ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE

  2. A different approach to learn high-dimension generative models are the Generative Adversarial Networks proposed by Goodfellow et al. (2014). Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 1 / 29

  3. A different approach to learn high-dimension generative models are the Generative Adversarial Networks proposed by Goodfellow et al. (2014). The idea behind GANs is to train two networks jointly: • A discriminator D to classify samples as “real” or “fake”, • a generator G to map a [simple] fixed distribution to samples that fool D . Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 1 / 29

  4. A different approach to learn high-dimension generative models are the Generative Adversarial Networks proposed by Goodfellow et al. (2014). The idea behind GANs is to train two networks jointly: • A discriminator D to classify samples as “real” or “fake”, • a generator G to map a [simple] fixed distribution to samples that fool D . D “real” Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 1 / 29

  5. A different approach to learn high-dimension generative models are the Generative Adversarial Networks proposed by Goodfellow et al. (2014). The idea behind GANs is to train two networks jointly: • A discriminator D to classify samples as “real” or “fake”, • a generator G to map a [simple] fixed distribution to samples that fool D . D “real” G D “fake” Z Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 1 / 29

  6. A different approach to learn high-dimension generative models are the Generative Adversarial Networks proposed by Goodfellow et al. (2014). The idea behind GANs is to train two networks jointly: • A discriminator D to classify samples as “real” or “fake”, • a generator G to map a [simple] fixed distribution to samples that fool D . D “real” G D “fake” Z The approach is adversarial since the two networks have antagonistic objectives. Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 1 / 29

  7. A bit more formally, let 풳 be the signal space and D the latent space dimension. • The generator G : R D → 풳 is trained so that [ideally] if it gets a random normal-distributed Z as input, it produces a sample following the data distribution as output. Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 2 / 29

  8. A bit more formally, let 풳 be the signal space and D the latent space dimension. • The generator G : R D → 풳 is trained so that [ideally] if it gets a random normal-distributed Z as input, it produces a sample following the data distribution as output. • The discriminator D : 풳 → [0 , 1] is trained so that if it gets a sample as input, it predicts if it is genuine. Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 2 / 29

  9. If G is fixed, to train D given a set of “real points” x n ∼ µ, n = 1 , . . . , N , we can generate z n ∼ 풩 (0 , I ) , n = 1 , . . . , N , build a two-class data-set � � 풟 = ( x 1 , 1) , . . . , ( x N , 1) , ( G ( z 1 ) , 0) , . . . , ( G ( z N ) , 0) , � �� � � �� � real samples ∼ µ fake samples ∼ µ G Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 3 / 29

  10. If G is fixed, to train D given a set of “real points” x n ∼ µ, n = 1 , . . . , N , we can generate z n ∼ 풩 (0 , I ) , n = 1 , . . . , N , build a two-class data-set � � 풟 = ( x 1 , 1) , . . . , ( x N , 1) , ( G ( z 1 ) , 0) , . . . , ( G ( z N ) , 0) , � �� � � �� � real samples ∼ µ fake samples ∼ µ G and minimize the binary cross-entropy � N � N ℒ ( D ) = − 1 � � log D ( x n ) + log(1 − D ( G ( z n ))) 2 N n =1 n =1 = − 1 � � � � �� ˆ + ˆ E X ∼ µ log D ( X ) E X ∼ µ G log(1 − D ( X )) , 2 where µ is the true distribution of the data, and µ G is the distribution of G ( Z ) with Z ∼ 풩 (0 , I ). Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 3 / 29

  11. The situation is slightly more complicated since we also want to optimize G to maximize D ’s loss. Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 4 / 29

  12. The situation is slightly more complicated since we also want to optimize G to maximize D ’s loss. Goodfellow et al. (2014) provide an analysis of the resulting equilibrium of that strategy. Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 4 / 29

  13. Let’s define � � � � V ( D , G ) = E X ∼ µ log D ( X ) + E X ∼ µ G log(1 − D ( X )) which is high if D is doing a good job (low cross entropy), and low if G fools D . Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 5 / 29

  14. Let’s define � � � � V ( D , G ) = E X ∼ µ log D ( X ) + E X ∼ µ G log(1 − D ( X )) which is high if D is doing a good job (low cross entropy), and low if G fools D . Our ultimate goal is a G ∗ that fools any D , so G ∗ = argmin max V ( D , G ) . D G Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 5 / 29

  15. Let’s define � � � � V ( D , G ) = E X ∼ µ log D ( X ) + E X ∼ µ G log(1 − D ( X )) which is high if D is doing a good job (low cross entropy), and low if G fools D . Our ultimate goal is a G ∗ that fools any D , so G ∗ = argmin max V ( D , G ) . D G If we define the optimal discriminator for a given generator D ∗ G = argmax V ( D , G ) , D our objective becomes G ∗ = argmin V ( D ∗ G , G ) . G Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 5 / 29

  16. We have � � � � V ( D , G ) = E X ∼ µ log D ( X ) + E X ∼ µ G log(1 − D ( X )) � = µ ( x ) log D ( x ) + µ G ( x ) log(1 − D ( x )) dx . x Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 6 / 29

  17. We have � � � � V ( D , G ) = E X ∼ µ log D ( X ) + E X ∼ µ G log(1 − D ( X )) � = µ ( x ) log D ( x ) + µ G ( x ) log(1 − D ( x )) dx . x Since µ ( x ) argmax µ ( x ) log d + µ G ( x ) log(1 − d ) = µ ( x ) + µ G ( x ) , d and D ∗ G = argmax V ( D , G ) , D if there is no regularization on D , we get µ ( x ) ∀ x , D ∗ G ( x ) = µ ( x ) + µ G ( x ) . Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 6 / 29

  18. So, since µ ( x ) ∀ x , D ∗ G ( x ) = µ ( x ) + µ G ( x ) . we get � � � � V ( D ∗ G , G ) = E X ∼ µ log D ∗ G ( X ) + E X ∼ µ G log(1 − D ∗ G ( X )) � � � � µ ( X ) µ G ( X ) = E X ∼ µ log + E X ∼ µ G log µ ( X ) + µ G ( X ) µ ( X ) + µ G ( X ) � � � � � � µ + µ G µ + µ G � � = D KL + D KL − log 4 � � µ µ G � 2 � 2 Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 7 / 29

  19. So, since µ ( x ) ∀ x , D ∗ G ( x ) = µ ( x ) + µ G ( x ) . we get � � � � V ( D ∗ G , G ) = E X ∼ µ log D ∗ G ( X ) + E X ∼ µ G log(1 − D ∗ G ( X )) � � � � µ ( X ) µ G ( X ) = E X ∼ µ log + E X ∼ µ G log µ ( X ) + µ G ( X ) µ ( X ) + µ G ( X ) � � � � � � µ + µ G µ + µ G � � = D KL + D KL − log 4 � � µ µ G � 2 � 2 = 2 D JS ( µ, µ G ) − log 4 where D JS is the Jensen-Shannon Divergence, a standard dissimilarity measure between distributions. Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 7 / 29

  20. To recap: if there is no capacity limitation for D , and if we define � � � � V ( D , G ) = E X ∼ µ log D ( X ) + E X ∼ µ G log(1 − D ( X )) , computing G ∗ = argmin max V ( D , G ) D G amounts to compute G ∗ = argmin D JS ( µ, µ G ) , G where D JS is a reasonable dissimilarity measure between distributions. Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 8 / 29

  21. To recap: if there is no capacity limitation for D , and if we define � � � � V ( D , G ) = E X ∼ µ log D ( X ) + E X ∼ µ G log(1 − D ( X )) , computing G ∗ = argmin max V ( D , G ) D G amounts to compute G ∗ = argmin D JS ( µ, µ G ) , G where D JS is a reasonable dissimilarity measure between distributions. Although this derivation provides a nice formal framework, in practice D � is not “fully” optimized to [come close to] D ∗ G when optimizing G . In our minimal example, we alternate gradient steps to improve G and D . Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 8 / 29

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend