AMMI Introduction to Deep Learning 10.1. Generative Adversarial - PowerPoint PPT Presentation

AMMI – Introduction to Deep Learning 10.1. Generative Adversarial Networks Fran¸ cois Fleuret https://fleuret.org/ammi-2018/ Thu Sep 6 16:09:56 CAT 2018 ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE

A different approach to learn high-dimension generative models are the Generative Adversarial Networks proposed by Goodfellow et al. (2014). Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 1 / 29

A different approach to learn high-dimension generative models are the Generative Adversarial Networks proposed by Goodfellow et al. (2014). The idea behind GANs is to train two networks jointly: • A discriminator D to classify samples as “real” or “fake”, • a generator G to map a [simple] fixed distribution to samples that fool D . Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 1 / 29

A different approach to learn high-dimension generative models are the Generative Adversarial Networks proposed by Goodfellow et al. (2014). The idea behind GANs is to train two networks jointly: • A discriminator D to classify samples as “real” or “fake”, • a generator G to map a [simple] fixed distribution to samples that fool D . D “real” Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 1 / 29

A different approach to learn high-dimension generative models are the Generative Adversarial Networks proposed by Goodfellow et al. (2014). The idea behind GANs is to train two networks jointly: • A discriminator D to classify samples as “real” or “fake”, • a generator G to map a [simple] fixed distribution to samples that fool D . D “real” G D “fake” Z Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 1 / 29

A different approach to learn high-dimension generative models are the Generative Adversarial Networks proposed by Goodfellow et al. (2014). The idea behind GANs is to train two networks jointly: • A discriminator D to classify samples as “real” or “fake”, • a generator G to map a [simple] fixed distribution to samples that fool D . D “real” G D “fake” Z The approach is adversarial since the two networks have antagonistic objectives. Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 1 / 29

A bit more formally, let 풳 be the signal space and D the latent space dimension. • The generator G : R D → 풳 is trained so that [ideally] if it gets a random normal-distributed Z as input, it produces a sample following the data distribution as output. Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 2 / 29

A bit more formally, let 풳 be the signal space and D the latent space dimension. • The generator G : R D → 풳 is trained so that [ideally] if it gets a random normal-distributed Z as input, it produces a sample following the data distribution as output. • The discriminator D : 풳 → [0 , 1] is trained so that if it gets a sample as input, it predicts if it is genuine. Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 2 / 29

If G is fixed, to train D given a set of “real points” x n ∼ µ, n = 1 , . . . , N , we can generate z n ∼ 풩 (0 , I ) , n = 1 , . . . , N , build a two-class data-set � � 풟 = ( x 1 , 1) , . . . , ( x N , 1) , ( G ( z 1 ) , 0) , . . . , ( G ( z N ) , 0) , � �� real samples ∼ µ fake samples ∼ µ G Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 3 / 29

If G is fixed, to train D given a set of “real points” x n ∼ µ, n = 1 , . . . , N , we can generate z n ∼ 풩 (0 , I ) , n = 1 , . . . , N , build a two-class data-set � � 풟 = ( x 1 , 1) , . . . , ( x N , 1) , ( G ( z 1 ) , 0) , . . . , ( G ( z N ) , 0) , � �� real samples ∼ µ fake samples ∼ µ G and minimize the binary cross-entropy � N � N ℒ ( D ) = − 1 � � log D ( x n ) + log(1 − D ( G ( z n ))) 2 N n =1 n =1 = − 1 � � � � �� ˆ + ˆ E X ∼ µ log D ( X ) E X ∼ µ G log(1 − D ( X )) , 2 where µ is the true distribution of the data, and µ G is the distribution of G ( Z ) with Z ∼ 풩 (0 , I ). Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 3 / 29

The situation is slightly more complicated since we also want to optimize G to maximize D ’s loss. Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 4 / 29

The situation is slightly more complicated since we also want to optimize G to maximize D ’s loss. Goodfellow et al. (2014) provide an analysis of the resulting equilibrium of that strategy. Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 4 / 29

Let’s define � � � � V ( D , G ) = E X ∼ µ log D ( X ) + E X ∼ µ G log(1 − D ( X )) which is high if D is doing a good job (low cross entropy), and low if G fools D . Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 5 / 29

Let’s define � � � � V ( D , G ) = E X ∼ µ log D ( X ) + E X ∼ µ G log(1 − D ( X )) which is high if D is doing a good job (low cross entropy), and low if G fools D . Our ultimate goal is a G ∗ that fools any D , so G ∗ = argmin max V ( D , G ) . D G Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 5 / 29

Let’s define � � � � V ( D , G ) = E X ∼ µ log D ( X ) + E X ∼ µ G log(1 − D ( X )) which is high if D is doing a good job (low cross entropy), and low if G fools D . Our ultimate goal is a G ∗ that fools any D , so G ∗ = argmin max V ( D , G ) . D G If we define the optimal discriminator for a given generator D ∗ G = argmax V ( D , G ) , D our objective becomes G ∗ = argmin V ( D ∗ G , G ) . G Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 5 / 29

We have � � � � V ( D , G ) = E X ∼ µ log D ( X ) + E X ∼ µ G log(1 − D ( X )) � = µ ( x ) log D ( x ) + µ G ( x ) log(1 − D ( x )) dx . x Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 6 / 29

We have � � � � V ( D , G ) = E X ∼ µ log D ( X ) + E X ∼ µ G log(1 − D ( X )) � = µ ( x ) log D ( x ) + µ G ( x ) log(1 − D ( x )) dx . x Since µ ( x ) argmax µ ( x ) log d + µ G ( x ) log(1 − d ) = µ ( x ) + µ G ( x ) , d and D ∗ G = argmax V ( D , G ) , D if there is no regularization on D , we get µ ( x ) ∀ x , D ∗ G ( x ) = µ ( x ) + µ G ( x ) . Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 6 / 29

So, since µ ( x ) ∀ x , D ∗ G ( x ) = µ ( x ) + µ G ( x ) . we get � � � � V ( D ∗ G , G ) = E X ∼ µ log D ∗ G ( X ) + E X ∼ µ G log(1 − D ∗ G ( X )) � � � � µ ( X ) µ G ( X ) = E X ∼ µ log + E X ∼ µ G log µ ( X ) + µ G ( X ) µ ( X ) + µ G ( X ) � � � � � � µ + µ G µ + µ G � � = D KL + D KL − log 4 � � µ µ G � 2 � 2 Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 7 / 29

So, since µ ( x ) ∀ x , D ∗ G ( x ) = µ ( x ) + µ G ( x ) . we get � � � � V ( D ∗ G , G ) = E X ∼ µ log D ∗ G ( X ) + E X ∼ µ G log(1 − D ∗ G ( X )) � � � � µ ( X ) µ G ( X ) = E X ∼ µ log + E X ∼ µ G log µ ( X ) + µ G ( X ) µ ( X ) + µ G ( X ) � � � � � � µ + µ G µ + µ G � � = D KL + D KL − log 4 � � µ µ G � 2 � 2 = 2 D JS ( µ, µ G ) − log 4 where D JS is the Jensen-Shannon Divergence, a standard dissimilarity measure between distributions. Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 7 / 29

To recap: if there is no capacity limitation for D , and if we define � � � � V ( D , G ) = E X ∼ µ log D ( X ) + E X ∼ µ G log(1 − D ( X )) , computing G ∗ = argmin max V ( D , G ) D G amounts to compute G ∗ = argmin D JS ( µ, µ G ) , G where D JS is a reasonable dissimilarity measure between distributions. Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 8 / 29

To recap: if there is no capacity limitation for D , and if we define � � � � V ( D , G ) = E X ∼ µ log D ( X ) + E X ∼ µ G log(1 − D ( X )) , computing G ∗ = argmin max V ( D , G ) D G amounts to compute G ∗ = argmin D JS ( µ, µ G ) , G where D JS is a reasonable dissimilarity measure between distributions. Although this derivation provides a nice formal framework, in practice D � is not “fully” optimized to [come close to] D ∗ G when optimizing G . In our minimal example, we alternate gradient steps to improve G and D . Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 10.1. Generative Adversarial Networks 8 / 29

AMMI Introduction to Deep Learning 10.1. Generative Adversarial - PowerPoint PPT Presentation

AMMI Introduction to Deep Learning 10.1. Generative Adversarial Networks Fran cois Fleuret https://fleuret.org/ammi-2018/ Thu Sep 6 16:09:56 CAT 2018 COLE POLYTECHNIQUE FDRALE DE LAUSANNE A different approach to learn

AMMI Introduction to Deep Learning 6.5. Residual networks Fran cois Fleuret

AMMI Introduction to Deep Learning 11.3. Word embeddings and translation Fran cois Fleuret

AMMI Introduction to Deep Learning 8.4. Optimizing inputs Fran cois Fleuret

AMMI Introduction to Deep Learning 9.1. Transposed convolutions Fran cois Fleuret

AMMI Introduction to Deep Learning 6.3. Dropout Fran cois Fleuret

AMMI Introduction to Deep Learning 11.2. LSTM and GRU Fran cois Fleuret

AMMI Introduction to Deep Learning 7.2. Networks for image classification Fran cois

AMMI Introduction to Deep Learning 1.3. What is really happening? Fran cois Fleuret

AMMI Introduction to Deep Learning 8.3. Visualizing the processing in the input Fran cois

AMMI Introduction to Deep Learning 10.4. Model persistence and checkpoints Fran cois

AMMI Introduction to Deep Learning 4.1. DAG networks Fran cois Fleuret

AMMI Introduction to Deep Learning 1.2. Current applications and success Fran cois Fleuret

AMMI Introduction to Deep Learning 6.6. Using GPUs Fran cois Fleuret

AMMI Introduction to Deep Learning 8.2. Looking at activations Fran cois Fleuret

AMMI Introduction to Deep Learning 5.3. PyTorch optimizers Fran cois Fleuret

AMMI Introduction to Deep Learning 7.3. Networks for object detection Fran cois Fleuret

1 Plan for today Approaches to modeling in CG 1st half How does one describe reality?

Unparsing (pretty printing) Prof. Dr. Ralf Lmmel Universitt Koblenz-Landau Software

Computational Social Choice: Spring 2009 Ulle Endriss Institute for Logic, Language and

Self-conjugate core partitions: Its storytime! Christopher R. H. Hanusa Queens College, CUNY

District 91 4 May 2018 District Director : Pedro Casillas DTM Agenda Alison Morris Richard

Advanced Software Engineering with C++ Templates Templates II: Traits Thomas Gschwind <thg at

Addressing Food Poverty in Children the role of Breakfast Clubs Sarah Jane Flaherty Healthy

Welcome to your virtual Residential Education Experience! We are: Lindsay Barndt, Director of

AMMI Introduction to Deep Learning 10.1. Generative Adversarial - PowerPoint PPT Presentation

AMMI Introduction to Deep Learning 10.1. Generative Adversarial Networks Fran cois Fleuret https://fleuret.org/ammi-2018/ Thu Sep 6 16:09:56 CAT 2018 COLE POLYTECHNIQUE FDRALE DE LAUSANNE A different approach to learn

AMMI Introduction to Deep Learning 6.5. Residual networks Fran cois Fleuret

AMMI Introduction to Deep Learning 11.3. Word embeddings and translation Fran cois Fleuret

AMMI Introduction to Deep Learning 8.4. Optimizing inputs Fran cois Fleuret

AMMI Introduction to Deep Learning 9.1. Transposed convolutions Fran cois Fleuret

AMMI Introduction to Deep Learning 6.3. Dropout Fran cois Fleuret

AMMI Introduction to Deep Learning 11.2. LSTM and GRU Fran cois Fleuret

AMMI Introduction to Deep Learning 7.2. Networks for image classification Fran cois

AMMI Introduction to Deep Learning 1.3. What is really happening? Fran cois Fleuret

AMMI Introduction to Deep Learning 8.3. Visualizing the processing in the input Fran cois

AMMI Introduction to Deep Learning 10.4. Model persistence and checkpoints Fran cois

AMMI Introduction to Deep Learning 4.1. DAG networks Fran cois Fleuret

AMMI Introduction to Deep Learning 1.2. Current applications and success Fran cois Fleuret

AMMI Introduction to Deep Learning 6.6. Using GPUs Fran cois Fleuret

AMMI Introduction to Deep Learning 8.2. Looking at activations Fran cois Fleuret

AMMI Introduction to Deep Learning 5.3. PyTorch optimizers Fran cois Fleuret

AMMI Introduction to Deep Learning 7.3. Networks for object detection Fran cois Fleuret

1 Plan for today Approaches to modeling in CG 1st half How does one describe reality?

Unparsing (pretty printing) Prof. Dr. Ralf Lmmel Universitt Koblenz-Landau Software

Computational Social Choice: Spring 2009 Ulle Endriss Institute for Logic, Language and

Self-conjugate core partitions: Its storytime! Christopher R. H. Hanusa Queens College, CUNY

District 91 4 May 2018 District Director : Pedro Casillas DTM Agenda Alison Morris Richard

Advanced Software Engineering with C++ Templates Templates II: Traits Thomas Gschwind &lt;thg at

Addressing Food Poverty in Children the role of Breakfast Clubs Sarah Jane Flaherty Healthy

Welcome to your virtual Residential Education Experience! We are: Lindsay Barndt, Director of

Advanced Software Engineering with C++ Templates Templates II: Traits Thomas Gschwind <thg at