Deep learning 11.3. Conditional GAN and image translation Fran - PowerPoint PPT Presentation

Deep learning 11.3. Conditional GAN and image translation Fran¸ cois Fleuret https://fleuret.org/ee559/ Nov 2, 2020

All the models we have seen so far model a density in high dimension and provide means to sample according to it, which is useful for synthesis only. Fran¸ cois Fleuret Deep learning / 11.3. Conditional GAN and image translation 1 / 29

All the models we have seen so far model a density in high dimension and provide means to sample according to it, which is useful for synthesis only. However, most of the practical applications require the ability to sample a conditional distribution. E.g.: • Next frame prediction. • “in-painting”, • segmentation, • style transfer. Fran¸ cois Fleuret Deep learning / 11.3. Conditional GAN and image translation 1 / 29

The Conditional GAN proposed by Mirza and Osindero (2014) consists of parameterizing both G and D by a conditioning quantity Y . � � � � V ( D , G ) = E ( X , Y ) ∼ µ log D ( X , Y ) + E Z ∼ 풩 (0 , I ) , Y ∼ µ Y log(1 − D ( G ( Z , Y ) , Y )) , Fran¸ cois Fleuret Deep learning / 11.3. Conditional GAN and image translation 2 / 29

To generate MNIST characters, with [0 , 1] 100 � � Z ∼ 풰 , and conditioned with the class y , encoded as a one-hot vector of dimension 10, the model is Fran¸ cois Fleuret Deep learning / 11.3. Conditional GAN and image translation 3 / 29

To generate MNIST characters, with [0 , 1] 100 � � Z ∼ 풰 , and conditioned with the class y , encoded as a one-hot vector of dimension 10, the model is y fc 10d 1000d x fc fc 1200d 784d z fc 100d 200d G Fran¸ cois Fleuret Deep learning / 11.3. Conditional GAN and image translation 3 / 29

To generate MNIST characters, with [0 , 1] 100 � � Z ∼ 풰 , and conditioned with the class y , encoded as a one-hot vector of dimension 10, the model is maxout 240d y fc maxout fc δ 10d 1000d 240d 1d x fc fc maxout 1200d 784d 50d z fc D 100d 200d Fran¸ cois Fleuret Deep learning / 11.3. Conditional GAN and image translation 3 / 29

To generate MNIST characters, with [0 , 1] 100 � � Z ∼ 풰 , and conditioned with the class y , encoded as a one-hot vector of dimension 10, the model is maxout 240d y fc maxout fc δ 10d 1000d 240d 1d x fc fc maxout 1200d 784d 50d z fc 100d 200d Fran¸ cois Fleuret Deep learning / 11.3. Conditional GAN and image translation 3 / 29

Figure 2: Generated MNIST digits, each row conditioned on one label (Mirza and Osindero, 2014) Fran¸ cois Fleuret Deep learning / 11.3. Conditional GAN and image translation 4 / 29

Another option to condition the generator consists of making the parameter of its batchnorm layers class-conditional (Dumoulin et al., 2016). (Brock et al., 2018) Fran¸ cois Fleuret Deep learning / 11.3. Conditional GAN and image translation 5 / 29

(Brock et al., 2018) Fran¸ cois Fleuret Deep learning / 11.3. Conditional GAN and image translation 6 / 29

Image-to-Image translations Fran¸ cois Fleuret Deep learning / 11.3. Conditional GAN and image translation 7 / 29

The main issue to generate realistic signals is that the value X to predict may remain non-deterministic given the conditioning quantity Y . Fran¸ cois Fleuret Deep learning / 11.3. Conditional GAN and image translation 8 / 29

The main issue to generate realistic signals is that the value X to predict may remain non-deterministic given the conditioning quantity Y . For a loss function such as MSE, the best fit is E ( X | Y = y ) which can be pretty different from the MAP, or from any reasonable sample from µ X | Y = y . In practice, for images there is often remaining location indeterminacy that results into a blurry prediction. Fran¸ cois Fleuret Deep learning / 11.3. Conditional GAN and image translation 8 / 29

The main issue to generate realistic signals is that the value X to predict may remain non-deterministic given the conditioning quantity Y . For a loss function such as MSE, the best fit is E ( X | Y = y ) which can be pretty different from the MAP, or from any reasonable sample from µ X | Y = y . In practice, for images there is often remaining location indeterminacy that results into a blurry prediction. Sampling according to µ X | Y = y is the proper way to address the problem. Fran¸ cois Fleuret Deep learning / 11.3. Conditional GAN and image translation 8 / 29

Isola et al. (2016) use a GAN-like setup to address this issue for the “translation” of images with pixel-to-pixel correspondence: • edges to realistic photos, • semantic segmentation, • gray-scales to colors, etc. Fran¸ cois Fleuret Deep learning / 11.3. Conditional GAN and image translation 9 / 29

Positive examples Negative examples Real or fake pair? Real or fake pair? D D G G tries to synthesize fake images that fool D D tries to identify the fakes Figure 2: Training a conditional GAN to predict aerial photos from maps. The discriminator, D , learns to classify between real and synthesized pairs. The generator learns to fool the discriminator. Unlike an unconditional GAN, both the generator and discriminator observe an input image. (Isola et al., 2016) Fran¸ cois Fleuret Deep learning / 11.3. Conditional GAN and image translation 10 / 29

They define � � � � V ( D , G ) = E ( X , Y ) ∼ µ log D ( Y , X ) + E Z ∼ µ Z , X ∼ µ X log(1 − D ( G ( Z , X ) , X )) , � � ℒ L 1 ( G ) = E ( X , Y ) ∼ µ, Z ∼ 풩 (0 , I ) � Y − G ( Z , X ) � 1 , and G ∗ = argmin max V ( D , G ) + λ ℒ L 1 ( G ) . D G Fran¸ cois Fleuret Deep learning / 11.3. Conditional GAN and image translation 11 / 29

They define � � � � V ( D , G ) = E ( X , Y ) ∼ µ log D ( Y , X ) + E Z ∼ µ Z , X ∼ µ X log(1 − D ( G ( Z , X ) , X )) , � � ℒ L 1 ( G ) = E ( X , Y ) ∼ µ, Z ∼ 풩 (0 , I ) � Y − G ( Z , X ) � 1 , and G ∗ = argmin max V ( D , G ) + λ ℒ L 1 ( G ) . D G The term ℒ L 1 pushes toward proper pixel-wise prediction, and V makes the generator prefer realistic images to better fitting pixel-wise. Fran¸ cois Fleuret Deep learning / 11.3. Conditional GAN and image translation 11 / 29

They define � � � � V ( D , G ) = E ( X , Y ) ∼ µ log D ( Y , X ) + E Z ∼ µ Z , X ∼ µ X log(1 − D ( G ( Z , X ) , X )) , � � ℒ L 1 ( G ) = E ( X , Y ) ∼ µ, Z ∼ 풩 (0 , I ) � Y − G ( Z , X ) � 1 , and G ∗ = argmin max V ( D , G ) + λ ℒ L 1 ( G ) . D G The term ℒ L 1 pushes toward proper pixel-wise prediction, and V makes the generator prefer realistic images to better fitting pixel-wise. Note that contrary to Mirza and Osindero’s convention, here X is the � conditioning quantity and Y the signal to generate. Fran¸ cois Fleuret Deep learning / 11.3. Conditional GAN and image translation 11 / 29

For G , they start with Radford et al. (2015)’s DCGAN architecture and add skip connections from layer i to layer D − i that concatenate channels. Encoder-decoder U-Net Figure 3: Two choices for the architecture of the generator. The “U-Net” [34] is an encoder-decoder with skip connections between mirrored layers in the encoder and decoder stacks. (Isola et al., 2016) Fran¸ cois Fleuret Deep learning / 11.3. Conditional GAN and image translation 12 / 29

For G , they start with Radford et al. (2015)’s DCGAN architecture and add skip connections from layer i to layer D − i that concatenate channels. Encoder-decoder U-Net Figure 3: Two choices for the architecture of the generator. The “U-Net” [34] is an encoder-decoder with skip connections between mirrored layers in the encoder and decoder stacks. (Isola et al., 2016) Randomness Z is provided through dropout, and not as an additional input. Fran¸ cois Fleuret Deep learning / 11.3. Conditional GAN and image translation 12 / 29

The discriminator D is a regular convnet which scores overlapping patches of size N × N and averages the scores for the final one. This controls the network’s complexity, while allowing to detect any inconsistency of the generated image ( e.g. blurriness). Fran¸ cois Fleuret Deep learning / 11.3. Conditional GAN and image translation 13 / 29

Input Ground truth L1 cGAN L1 + cGAN Figure 4: Different losses induce different quality of results. Each column shows results trained under a different loss. Please see https://phillipi.github.io/pix2pix/ for additional examples. (Isola et al., 2016) Fran¸ cois Fleuret Deep learning / 11.3. Conditional GAN and image translation 14 / 29

L1 1x1 16x16 70x70 256x256 Figure 6: Patch size variations. Uncertainty in the output manifests itself differently for different loss functions. Uncertain regions become blurry and desaturated under L1. The 1x1 PixelGAN encourages greater color diversity but has no effect on spatial statistics. The 16x16 PatchGAN creates locally sharp results, but also leads to tiling artifacts beyond the scale it can observe. The 70x70 PatchGAN forces outputs that are sharp, even if incorrect, in both the spatial and spectral (coforfulness) dimensions. The full 256x256 ImageGAN produces results that are visually similar to the 70x70 PatchGAN, but somewhat lower quality according to our FCN-score metric (Table 2). Please see https://phillipi.github.io/pix2pix/ for additional examples. (Isola et al., 2016) Fran¸ cois Fleuret Deep learning / 11.3. Conditional GAN and image translation 15 / 29

Deep learning 11.3. Conditional GAN and image translation Fran - PowerPoint PPT Presentation

Deep learning 11.3. Conditional GAN and image translation Fran cois Fleuret https://fleuret.org/ee559/ Nov 2, 2020 All the models we have seen so far model a density in high dimension and provide means to sample according to it, which is

Bridging Theory and Practice of GANs MIX+GAN Ian Goodfellow, Sta ff Research Scientist, Google

GANs for Creativity and Design MIX+GAN Ian Goodfellow, Sta ff Research Scientist, Google Brain

GANs for Limited Labeled Data MIX+GAN Ian Goodfellow, Sta ff Research Scientist, Google Brain

Adversarial Machine Learning MIX+GAN Ian Goodfellow, Sta ff Research Scientist, Google Brain

Adversarial Machine Learning MIX+GAN Ian Goodfellow, Sta ff Research Scientist, Google Brain

Adversarial Machine Learning MIX+GAN Ian Goodfellow, Sta ff Research Scientist, Google Brain

Generative Adversarial Networks MIX+GAN Ian Goodfellow, Sta ff Research Scientist, Google Brain

Generative Adversarial Networks MIX+GAN Ian Goodfellow, Sta ff Research Scientist, Google Brain

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Simulating GaN Based Devices Optical and Electrical GaN Device Simulations Contents

Le GaN dans les systmes militaires GaN in military systems Francis Doukhan

Introduction to GANs LSGAN SAGAN MIX+GAN Ian Goodfellow, Sta ff Research Scientist, Google Brain

Introduction to GANs LSGAN SAGAN MIX+GAN Ian Goodfellow, Sta ff Research Scientist, Google Brain

Approximation of the conditional number of exceedances Han Liang Gan University of Melbourne

Semantic Image Analogy with a Conditional Single-Image GAN Ji a cheng Li , Zhiwei Xiong, Dong

Apply Image-to-Image Translation on Autonomous Driving Systems Testing Presented by Yilin Han,

CMP722 ADVANCED COMPUTER VISION Lecture #8 Image Synthesis Aykut Erdem // Hacettepe

Advanced Machine Learning CS 7140 - Spring 2018 Lecture 20: Generative Adversarial Networks

Days 3&4: ELAN Our class Google Drive folder: Lesson 2 bit.ly/DigLangDocLSA2019 Andrea

Swiss Cadastre Preparing for Swiss Cadastre Preparing for E- -Government Government E Dr.

Generative Adversarial Networks Stefano Ermon, Aditya Grover Stanford University Lecture 10

CSC421/2516 Lecture 18: Generative Adversarial Networks Roger Grosse and Jimmy Ba Roger Grosse

Anyway S102 Functions # Select "name" and "value" columns from secure

Understanding Design Pattern Density with Aspects A Case Study in JHotDraw with AspectJ Simon

Deep learning 11.3. Conditional GAN and image translation Fran - PowerPoint PPT Presentation

Deep learning 11.3. Conditional GAN and image translation Fran cois Fleuret https://fleuret.org/ee559/ Nov 2, 2020 All the models we have seen so far model a density in high dimension and provide means to sample according to it, which is

Bridging Theory and Practice of GANs MIX+GAN Ian Goodfellow, Sta ff Research Scientist, Google

GANs for Creativity and Design MIX+GAN Ian Goodfellow, Sta ff Research Scientist, Google Brain

GANs for Limited Labeled Data MIX+GAN Ian Goodfellow, Sta ff Research Scientist, Google Brain

Adversarial Machine Learning MIX+GAN Ian Goodfellow, Sta ff Research Scientist, Google Brain

Adversarial Machine Learning MIX+GAN Ian Goodfellow, Sta ff Research Scientist, Google Brain

Adversarial Machine Learning MIX+GAN Ian Goodfellow, Sta ff Research Scientist, Google Brain

Generative Adversarial Networks MIX+GAN Ian Goodfellow, Sta ff Research Scientist, Google Brain

Generative Adversarial Networks MIX+GAN Ian Goodfellow, Sta ff Research Scientist, Google Brain

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Simulating GaN Based Devices Optical and Electrical GaN Device Simulations Contents

Le GaN dans les systmes militaires GaN in military systems Francis Doukhan

Introduction to GANs LSGAN SAGAN MIX+GAN Ian Goodfellow, Sta ff Research Scientist, Google Brain

Introduction to GANs LSGAN SAGAN MIX+GAN Ian Goodfellow, Sta ff Research Scientist, Google Brain

Approximation of the conditional number of exceedances Han Liang Gan University of Melbourne

Semantic Image Analogy with a Conditional Single-Image GAN Ji a cheng Li , Zhiwei Xiong, Dong

Apply Image-to-Image Translation on Autonomous Driving Systems Testing Presented by Yilin Han,

CMP722 ADVANCED COMPUTER VISION Lecture #8 Image Synthesis Aykut Erdem // Hacettepe

Advanced Machine Learning CS 7140 - Spring 2018 Lecture 20: Generative Adversarial Networks

Days 3&amp;4: ELAN Our class Google Drive folder: Lesson 2 bit.ly/DigLangDocLSA2019 Andrea

Swiss Cadastre Preparing for Swiss Cadastre Preparing for E- -Government Government E Dr.

Generative Adversarial Networks Stefano Ermon, Aditya Grover Stanford University Lecture 10

CSC421/2516 Lecture 18: Generative Adversarial Networks Roger Grosse and Jimmy Ba Roger Grosse

Anyway S102 Functions # Select &quot;name&quot; and &quot;value&quot; columns from secure

Understanding Design Pattern Density with Aspects A Case Study in JHotDraw with AspectJ Simon

Days 3&4: ELAN Our class Google Drive folder: Lesson 2 bit.ly/DigLangDocLSA2019 Andrea

Anyway S102 Functions # Select "name" and "value" columns from secure