Conditional Adversarial Networks (or mapping from A to B) CS448V - - PowerPoint PPT Presentation

conditional adversarial networks
SMART_READER_LITE
LIVE PREVIEW

Conditional Adversarial Networks (or mapping from A to B) CS448V - - PowerPoint PPT Presentation

Conditional Adversarial Networks (or mapping from A to B) CS448V Computational Video Manipulation May 22 th , 2019 Why? - Cool! Trendy! - Google Scholar Pix2Pix CycleGAN Hundreds of applications and follow-up works Why? -


slide-1
SLIDE 1

Conditional Adversarial Networks

(or “mapping from A to B”)

CS448V — Computational Video Manipulation May 22th, 2019

slide-2
SLIDE 2
slide-3
SLIDE 3

Why? - Cool! Trendy! - Google Scholar

Hundreds of applications and follow-up works

CycleGAN Pix2Pix

slide-4
SLIDE 4

Why? - Cool! Trendy! - Google Scholar

Hundreds of applications and follow-up works

CycleGAN Pix2Pix

slide-5
SLIDE 5

Enhancing Transitions

slide-6
SLIDE 6

Single-Photo Facial Animation

slide-7
SLIDE 7

Text-based Editing

slide-8
SLIDE 8

Few-Shot Reenactment

slide-9
SLIDE 9

Digital Humans

slide-10
SLIDE 10

Overview

  • Convolutional Neural Networks
  • Generative Modeling
  • Pix2Pix (“mapping from A to B”)
slide-11
SLIDE 11

Convolutional Neural Network

  • 2D Convolution Layers (Conv2D)
  • Subsampling Layers (MaxPool, …)
  • Non-linearity Layers (ReLU, …)
  • Normalization Layers (BatchNorm, …)
  • Upsampling Layers (TransposedConv, …)

Components?

slide-12
SLIDE 12

Convolutional Neural Network

  • 2D Convolution Layers (Conv2D)
  • Subsampling Layers (MaxPool, …)
  • Non-linearity Layers (ReLU, …)
  • Normalization Layers (BatchNorm, …)
  • Upsampling Layers (TransposedConv, …)

Components?

slide-13
SLIDE 13

Convolution

3 32 width

depth

32

height 32x32x3 image

slide-14
SLIDE 14

Convolution

3 32 width

depth

Convolve the filter with the image, i.e., “slide over the image spatially, computing dot products”

3 5 5

5x5x3 filter

32

height 32x32x3 image

slide-15
SLIDE 15

Convolution

3 32 32

width height depth

Result: 1 number, the result of taking the dot product between the filter and a small 5x5x3 chunk of the image, i.e., 5x5x3 = 70-dimensional dot product + bias

5x5x3 filter wTx + b 32x32x3 image

slide-16
SLIDE 16

Convolution

3 32 32

width height depth

Convolve (slide) over all spatial locations

1 28 28

5x5x3 filter 32x32x3 image Activation map

slide-17
SLIDE 17

Convolution

3 32 32

width height depth

Convolve (slide) over all spatial locations

1 28 28

?

5x5x3 filter 32x32x3 image Activation map

slide-18
SLIDE 18

Convolution

3 32 32

width height depth

Convolve (slide) over all spatial locations

1 28 28

Invariant to? Translation Rotation Scaling

32x32x3 image Activation map

? ? ?

slide-19
SLIDE 19

Convolution

3 32 32

width height depth

Convolve (slide) over all spatial locations

1 28 28

Invariant to? Translation Rotation Scaling

32x32x3 image Activation map

slide-20
SLIDE 20

Convolution

3 32 32

width height depth

Convolve (slide) over all spatial locations

Activation map 32x32x3 image

slide-21
SLIDE 21

Convolution Layer

3 32 32

width height depth

Convolution Layer

Activation tensor 32x32x3 image

slide-22
SLIDE 22

Convolutional Neural Network

3 32 32

Convolution ReLU e.g. 6 5x5x3 filters

32x32x3 image

?

slide-23
SLIDE 23

Convolutional Neural Network

3 32 32

Convolution ReLU e.g. 6 5x5x3 filters

6 28 28

32x32x3 image 28x28x6 tensor

slide-24
SLIDE 24

Convolutional Neural Network

3 32 32

Convolution ReLU e.g. 6 5x5x3 filters Convolution ReLU e.g. 10 5x5x6 filters

6 28 28

32x32x3 image 28x28x6 tensor

?

slide-25
SLIDE 25

Convolutional Neural Network

32x32x3 image

3 32 32

Convolution ReLU e.g. 6 5x5x3 filters Convolution ReLU e.g. 10 5x5x6 filters Convolution ReLU

...

6 28 28 10 24 24

28x28x6 tensor 24x24x10 tensor

slide-26
SLIDE 26

Convolutional Neural Networks

[LeNet-5, LeCun 1980]

slide-27
SLIDE 27

Feature Hierarchy

Learn the features from data instead of hand engineering them! (If enough data is available)

slide-28
SLIDE 28

U-Net

Skip connections

“Propagate low-level features directly, helps with details”

slide-29
SLIDE 29

Overview

  • Convolutional Neural Networks
  • Generative Modeling
  • Pix2Pix
slide-30
SLIDE 30

Overview

  • Convolutional Neural Networks
  • Generative Modeling
  • Pix2Pix
slide-31
SLIDE 31

Generative Modeling

We want to learn 𝑞 X from data, such that we can “sample from it”!

𝑞(X) 𝑞 X

𝐲𝑗 𝑗=1

𝑂

x ~ 𝑞 X

Density Function

New Samples Training Data

“more of the same!”

slide-32
SLIDE 32

Generative 2D Face Modeling

The world needs more celebrities … or not … ?

𝐲𝑗 𝑗=1

𝑂

x ~ 𝑞 X New Samples Training Data

slide-33
SLIDE 33

3.5 Years of Progress on Faces

slide-34
SLIDE 34

https://thispersondoesnotexist.com

2018

slide-35
SLIDE 35

StyleGAN - Interpolation

slide-36
SLIDE 36

Overview

  • Convolutional Neural Networks
  • Generative Modeling
  • Pix2Pix (“mapping from A to B”)
slide-37
SLIDE 37

Overview

  • Convolutional Neural Networks
  • Generative Modeling
  • Pix2Pix (“mapping from A to B”)
slide-38
SLIDE 38
slide-39
SLIDE 39

Image-to-Image Translation

slide-40
SLIDE 40

Image-to-Image Translation

slide-41
SLIDE 41

Image-to-Image Translation

slide-42
SLIDE 42

Image-to-Image Translation

slide-43
SLIDE 43

Image-to-Image Translation

[Zhang et al., ECCV 2016]

argmin

G

𝔽𝐲,𝐳[L(G 𝐲 , 𝐳)]

Loss Neural Network

G

slide-44
SLIDE 44

Image-to-Image Translation

[Zhang et al., ECCV 2016]

argmin

G

𝔽𝐲,𝐳[L(G 𝐲 , 𝐳)]

Loss Neural Network

G

Paired!

slide-45
SLIDE 45

Image-to-Image Translation

[Zhang et al., ECCV 2016]

argmin

G

𝔽𝐲,𝐳[L(G 𝐲 , 𝐳)]

Loss Neural Network

G

slide-46
SLIDE 46

Image-to-Image Translation

[Zhang et al., ECCV 2016]

argmin

G

𝔽𝐲,𝐳[L(G 𝐲 , 𝐳)]

Neural Network

G

“What should I do?”

slide-47
SLIDE 47

Image-to-Image Translation

[Zhang et al., ECCV 2016]

argmin

G

𝔽𝐲,𝐳[L(G 𝐲 , 𝐳)]

“How should I do it?”

G

“What should I do?”

slide-48
SLIDE 48

Be careful what you wish for!

𝑀 𝐳, 𝐳 = 𝐳 − 𝐳 2

2

slide-49
SLIDE 49

Degradation to the mean!

𝑀 𝐳, 𝐳 = 𝐳 − 𝐳 2

2

slide-50
SLIDE 50

Automate Design of the Loss?

slide-51
SLIDE 51

Automate Design of the Loss?

slide-52
SLIDE 52

Automate Design of the Loss?

Deep learning got rid of handcrafted features. Can we also get rid of handcrafting the loss function?

slide-53
SLIDE 53

Automate Design of the Loss?

Universal loss function?

Deep learning got rid of handcrafted features. Can we also get rid of handcrafting the loss function?

slide-54
SLIDE 54

Automate Design of the Loss?

Universal loss function?

Deep learning got rid of handcrafted features. Can we also get rid of handcrafting the loss function?

slide-55
SLIDE 55

Discriminator as a Loss Function

Discriminator (Classifier)

Real or Fake?

slide-56
SLIDE 56

Conditional GAN

slide-57
SLIDE 57

Conditional GAN

Generator (G) Input 𝐲 Output 𝐳

slide-58
SLIDE 58

Conditional GAN

Discriminator (D)

Real or Fake?

Generator (G)

G tries to synthesize fake images that fool D D tries to tell real from fake

Input 𝐲 Output 𝐳

slide-59
SLIDE 59

Conditional GAN (Discriminator)

Discriminator (D) Generator (G)

Fake (0.9)

arg max

D 𝔽𝐲,𝐳[ log D G 𝐲

+ log 1 − D 𝐳 ]

Input 𝐲 Output 𝐳

“1”

D tries to identify the fakes

slide-60
SLIDE 60

Conditional GAN (Discriminator)

Discriminator (D) Generator (G) Discriminator (D)

Fake (0.9) Real (0.1)

arg max

D 𝔽𝐲,𝐳[ log D G 𝐲

+ log 1 − D 𝐳 ]

Input 𝐲 Output 𝐳 GT 𝐳

“1” “0”

D tries to identify the fakes D tries to identify the real images

slide-61
SLIDE 61

Conditional GAN (Generator)

Discriminator (D) Generator (G)

arg min

G 𝔽𝐲,𝐳[ log D G 𝐲

+ log 1 − D 𝐳 ]

G tries to synthesize fake images that fool D.

Input 𝐲 Output 𝐳

Real (0.1) “0”

slide-62
SLIDE 62

Conditional GAN (Generator)

Discriminator (D) Generator (G)

arg min

G 𝔽𝐲,𝐳[ log D G 𝐲

+ log 1 − D 𝐳 ]

G tries to synthesize fake images that fool D.

Input 𝐲 Output 𝐳

Real (0.1) “0”

slide-63
SLIDE 63

Conditional GAN

Discriminator (D) Generator (G)

arg min

G

max

D

𝔽𝐲,𝐳[ log D G 𝐲 + log 1 − D 𝐳 ]

G tries to synthesize fake images that fool the best D.

Input 𝐲 Output 𝐳

Real or Fake?

slide-64
SLIDE 64

Conditional GAN

Loss Function (D) Generator (G)

G’s perspective: D is a loss function Rather than being hand-designed, it is learned jointly!

Input 𝐲 Output 𝐳

Real or Fake?

slide-65
SLIDE 65

Conditional Discriminator

Discriminator (D) Generator (G)

arg min

G

max

D

𝔽𝐲,𝐳[ log D 𝐲, G 𝐲 + log 1 − D 𝐲, 𝐳 ]

Input 𝐲 Output 𝐳 Input 𝐲

slide-66
SLIDE 66

Patch Discriminator

“Rather than penalizing if the output image looks fake, penalize if each overlapping patch in the output looks fake”

slide-67
SLIDE 67

1x1 Pixel Discriminator

slide-68
SLIDE 68

Image Discriminator

slide-69
SLIDE 69

70x70 Patch Discriminator

slide-70
SLIDE 70

Conditional Discriminator

Discriminator (D) Generator (G)

𝑀𝑑𝐻𝐵𝑂 G, D = 𝔽𝐲,𝐳[ log D 𝐲, G 𝐲 + log 1 − D 𝐲, 𝐳 ]

Input 𝐲 Output 𝐳

slide-71
SLIDE 71

Reconstruction Loss

Generator (G)

𝑚1 𝑀𝑚1 G = 𝔽𝐲,𝐳 G x − y 1

𝐻∗ = arg min

G

max

D

𝑀𝑑𝐻𝐵𝑂 G, D + 𝜇 𝑀𝑚1(G)

“Stable training + fast convergence”

100

slide-72
SLIDE 72

Ablation Study

?

slide-73
SLIDE 73

Ablation Study

slide-74
SLIDE 74

Results on the Test Split

slide-75
SLIDE 75

Results for Hand Drawings

?

slide-76
SLIDE 76

Demo: Pix2Pix

slide-77
SLIDE 77

Limitations

  • 1. Paired data is required
  • 2. Temporally instable if applied per-frame to a video sequence
  • 3. Does not generalize to 3D transformations
slide-78
SLIDE 78

CycleGAN

slide-79
SLIDE 79

Cycle Consistency

slide-80
SLIDE 80

CycleGAN

slide-81
SLIDE 81

Recycle-GAN

slide-82
SLIDE 82

Limitations

  • 1. Paired data is required
  • 2. Temporally instable if applied per-frame to a video sequence
  • 3. Does not generalize to 3D transformations
slide-83
SLIDE 83

Limitations

  • 1. Paired data is required
  • 2. Temporally instable if applied per-frame to a video sequence
  • 3. Does not generalize to 3D transformations
slide-84
SLIDE 84

Vid2Vid

slide-85
SLIDE 85

Vid2Vid

slide-86
SLIDE 86

Limitations

  • 1. Paired data is required
  • 2. Temporally instable if applied per-frame to a video sequence
  • 3. Does not generalize to 3D transformations
slide-87
SLIDE 87

Limitations

  • 1. Paired data is required
  • 2. Temporally instable if applied per-frame to a video sequence
  • 3. Does not generalize to 3D transformations
slide-88
SLIDE 88

DeepVoxels

slide-89
SLIDE 89

Summary

  • Convolutional Neural Networks
  • Generative Modeling
  • Pix2Pix (“mapping from A to B”)
slide-90
SLIDE 90

References

  • CVPR GAN Tutorial
  • https://sites.google.com/view/cvpr2018tutorialongans
  • CS231n
  • http://cs231n.stanford.edu/slides/2016/winter1516_lecture7.pdf