Style le GAN
- Prof. Leal-Taixé and Prof. Niessner
1
Style le GAN Prof. Leal-Taix and Prof. Niessner 1 Style leGAN - - PowerPoint PPT Presentation
Style le GAN Prof. Leal-Taix and Prof. Niessner 1 Style leGAN Style-based generator Traditional Prof. Leal-Taix and Prof. Niessner 2 [Karras et al. 19] StyleGAN Style leGAN Style-based generator Traditional Prof. Leal-Taix and
1
2
[Karras et al. 19] StyleGAN
Traditional Style-based generator
3
[Karras et al. 19] StyleGAN
Traditional Style-based generator
4
[Karras et al. 19] StyleGAN
FID (Frechet inception distance) on 50k gen. images
5
[Karras et al. 19] StyleGAN
https://youtu.be/kSLJriaOumA
6
[Karras et al. 19] StyleGAN
https://youtu.be/kSLJriaOumA
Interesting analysis about design choices!
– https://arxiv.org/pdf/1912.04958.pdf – https://github.com/NVlabs/stylegan2 – https://youtu.be/c-NJtV9Jvp0
7
8
– i.e., output are samples (distribution is in model)
governed by a prior imposed by model structure
– i.e., outputs are probabilities (e.g., softmax)
9
distributions
– Modeling an image → sequence problem – Predict one pixel at a time – Next pixel determined by all previously predicted pixels
10
[Van den Oord et al 2016]
11
[Van den Oord et al 2016]
For RGB
12
𝑦𝑗 ∈ 0,255 → 256-way softmax
[Van den Oord et al 2016]
– Can compute pixels in row in parallel
pixel
13
[Van den Oord et al 2016]
architecture
problem
𝑞𝑗,𝑘depends on 𝑞𝑗,𝑘−1 and 𝑞𝑗−1,𝑘
14
[Van den Oord et al 2016]
values can be used as context
during 1st conv
values
15
[Van den Oord et al 2016]
64x64 images, trained on ImageNet
16
[Van den Oord et al 2016]
unbounded dependency range within the receptive field
– Can be very computationally costly
– standard convs capture a bounded receptive field – All pixel features can be computed at once (during training)
17
[Van den Oord et al 2016]
dimensions
seeing future context
18
[Van den Oord et al 2016]
http://sergeiturukin.com/2017/02/22/pixelcnn.h
Mask A
reduce performance gap between PixelCNN and PixelRNN
19
[Van den Oord et al 2016]
𝑧 = tanh 𝑋
𝑙,𝑔 ∗ 𝑦 ⊙ 𝜏(𝑋 𝑙, ∗ 𝑦)
kth layer sigmoid element-wise product convolution
20
[Van den Oord et al 2016]
http://sergeiturukin.com/2017/02/24/gated-pixelcnn.html
5x5 image / 3x3 conv Receptive Field Unseen context
current row
above
21
[Van den Oord et al 2016]
22
[Van den Oord et al 2016]
𝑧 = tanh 𝑋
𝑙,𝑔 ∗ 𝑦 + 𝑊 𝑙,𝑔 𝑈 ℎ
⊙ 𝜏(𝑋
𝑙, ∗ 𝑦 + 𝑊 𝑙, 𝑈 ℎ)
latent vector to be conditioned on
23
[Van den Oord et al 2016]
– Explicitly model probability densities – More stable training – Can be applied to both discrete and continuous data
– Have been empirically demonstrated to produce higher quality images – Faster to train
24
25
Generating Diverse High-Fidelity Images with VQ-VAE-2 https://arxiv.org/pdf/1906.00446.pdf [Razavi et al. 19] Vector Quantized Variational AutoEncoder
26
Two options
– Single random variable z seeds entire video (all frames)
– Random variable z for each frame of the video
General issues
– Temporal coherency – Drift over time (many models collapse to mean image)
27
28
[Clark et al. 2019] Adversarial Video Generation on Complex Datasets
29
[Clark et al. 2019] Adversarial Video Generation on Complex Datasets
– 256 x 256, 128 x 128, and 64 x 64 – Lengths of up 48 frames
30
[Clark et al. 2019] Adversarial Video Generation on Complex Datasets
– Each frame is high quality, but temporally inconsistent
31
32
Wang et al. 18: Vid2Vid
past L source frames past L generated frames (set L = 2) Full Learning Objective:
33
Wang et al. 18: Vid2Vid
34
Wang et al. 18: Vid2Vid
– Separate discriminator for temporal parts
– Consider recent history of prev. frames – Train all of it jointly
35
Wang et al. 18: Vid2Vid
Siggraph’18 [Kim et al 18]: Deep Portraits
Similar to “Image-to-Image Translation” (Pix2Pix) [Isola et al.]
Siggraph’18 [Kim et al 18]: Deep Portraits
Siggraph’18 [Kim et al 18]: Deep Portraits
Siggraph’18 [Kim et al 18]: Deep Portraits
Neural Network converts synthetic data to realistic video
Siggraph’18 [Kim et al 18]: Deep Portraits
Siggraph’18 [Kim et al 18]: Deep Portraits
Siggraph’18 [Kim et al 18]: Deep Portraits
Siggraph’18 [Kim et al 18]: Deep Portraits
Interactive Video Editing
with SGD
Siggraph’18 [Kim et al 18]: Deep Portraits
[Chan et al. ’18] Everybody Dance Now
[Chan et al. ’18] Everybody Dance Now
[Chan et al. ’18] Everybody Dance Now
different input
i.e., accurate tracking
3D notion
[Chan et al. ’18] Everybody Dance Now
– Tracking quality translates to resulting image quality – Tracking human skeletons is less developed than faces
– Fun fact, there were like 4 papers with a similar idea that appeared around the same time…
[Chan et al. ’18] Everybody Dance Now
– Neural Rendering – 3D Deep Learning
50
51