style le gan
play

Style le GAN Prof. Leal-Taix and Prof. Niessner 1 Style leGAN - PowerPoint PPT Presentation

Style le GAN Prof. Leal-Taix and Prof. Niessner 1 Style leGAN Style-based generator Traditional Prof. Leal-Taix and Prof. Niessner 2 [Karras et al. 19] StyleGAN Style leGAN Style-based generator Traditional Prof. Leal-Taix and


  1. Style le GAN Prof. Leal-Taixé and Prof. Niessner 1

  2. Style leGAN Style-based generator Traditional Prof. Leal-Taixé and Prof. Niessner 2 [Karras et al. 19] StyleGAN

  3. Style leGAN Style-based generator Traditional Prof. Leal-Taixé and Prof. Niessner 3 [Karras et al. 19] StyleGAN

  4. Style leGAN FID (Frechet inception distance) on 50k gen. images -> Architecture is similar to Progressive Growing GAN Prof. Leal-Taixé and Prof. Niessner 4 [Karras et al. 19] StyleGAN

  5. Style leGAN Prof. Leal-Taixé and Prof. Niessner 5 https://youtu.be/kSLJriaOumA [Karras et al. 19] StyleGAN

  6. Style leGAN Prof. Leal-Taixé and Prof. Niessner 6 https://youtu.be/kSLJriaOumA [Karras et al. 19] StyleGAN

  7. Style leGAN2 Interesting analysis about design choices! https://arxiv.org/pdf/1912.04958.pdf – https://github.com/NVlabs/stylegan2 – – https://youtu.be/c-NJtV9Jvp0 Prof. Leal-Taixé and Prof. Niessner 7

  8. Autoregressiv ive Models ls Prof. Leal-Taixé and Prof. Niessner 8

  9. Autore regressive Models vs GANs • GANs learn implicit data distribution – i.e., output are samples (distribution is in model) • Autoregressive models learn an explicit distribution governed by a prior imposed by model structure – i.e., outputs are probabilities (e.g., softmax) Prof. Leal-Taixé and Prof. Niessner 9

  10. Pix ixelR lRNN • Goal: model distribution of natural images • Interpret pixels of an image as product of conditional distributions – Modeling an image → sequence problem – Predict one pixel at a time – Next pixel determined by all previously predicted pixels  Use a Recurrent Neural Network Prof. Leal-Taixé and Prof. Niessner [Van den Oord et al 2016] 10

  11. Pix ixelR lRNN For RGB Prof. Leal-Taixé and Prof. Niessner [Van den Oord et al 2016] 11

  12. Pix ixelR lRNN 𝑦 𝑗 ∈ 0,255 → 256-way softmax Prof. Leal-Taixé and Prof. Niessner [Van den Oord et al 2016] 12

  13. Pix ixelR lRNN • Row LSTM model architecture • Image processed row by row • Hidden state of pixel depends on the 3 pixels above it – Can compute pixels in row in parallel • Incomplete context for each pixel Prof. Leal-Taixé and Prof. Niessner [Van den Oord et al 2016] 13

  14. Pix ixelR lRNN • Diagonal BiLSTM model architecture • Solve incomplete context problem • Hidden state of pixel 𝑞 𝑗,𝑘 depends on 𝑞 𝑗,𝑘−1 and 𝑞 𝑗−1,𝑘 • Image processed by diagonals Prof. Leal-Taixé and Prof. Niessner [Van den Oord et al 2016] 14

  15. Pix ixelR lRNN • Masked Convolutions • Only previously predicted values can be used as context • Mask A: restrict context during 1 st conv • Mask B: subsequent convs • Masking by zeroing out values Prof. Leal-Taixé and Prof. Niessner [Van den Oord et al 2016] 15

  16. Pix ixelR lRNN • Generated 64x64 images, trained on ImageNet Prof. Leal-Taixé and Prof. Niessner [Van den Oord et al 2016] 16

  17. Pix ixelCNN • Row and Diagonal LSTM layers have potentially unbounded dependency range within the receptive field – Can be very computationally costly  PixelCNN: – standard convs capture a bounded receptive field – All pixel features can be computed at once (during training) Prof. Leal-Taixé and Prof. Niessner [Van den Oord et al 2016] 17

  18. Pix ixelCNN • Model preserves spatial dimensions • Masked convolutions to avoid seeing future context http://sergeiturukin.com/2017/02/22/pixelcnn.h Mask A Prof. Leal-Taixé and Prof. Niessner [Van den Oord et al 2016] 18

  19. Gated Pix ixelCNN • Gated blocks • Imitate multiplicative complexity of PixelRNNs to reduce performance gap between PixelCNN and PixelRNN • Replace ReLU with gated block of sigmoid, tanh k th layer sigmoid 𝑧 = tanh 𝑋 𝑙,𝑔 ∗ 𝑦 ⊙ 𝜏(𝑋 𝑙,𝑕 ∗ 𝑦) element-wise product convolution Prof. Leal-Taixé and Prof. Niessner [Van den Oord et al 2016] 19

  20. Pix ixelCNN Bli lind Spot 5x5 image / 3x3 conv Receptive Field Unseen context http://sergeiturukin.com/2017/02/24/gated-pixelcnn.html Prof. Leal-Taixé and Prof. Niessner [Van den Oord et al 2016] 20

  21. Pix ixelCNN: : Eli limin inatin ing Bli lind Spot • Split convolution to two stacks • Horizontal stack conditions on current row • Vertical stack conditions on pixels above Prof. Leal-Taixé and Prof. Niessner [Van den Oord et al 2016] 21

  22. Conditional Pix ixelCNN • Conditional image generation • E.g., condition on semantic class, text description latent vector to be conditioned on 𝑈 ℎ 𝑈 ℎ) 𝑧 = tanh 𝑋 𝑙,𝑔 ∗ 𝑦 + 𝑊 ⊙ 𝜏(𝑋 𝑙,𝑕 ∗ 𝑦 + 𝑊 𝑙,𝑔 𝑙,𝑕 Prof. Leal-Taixé and Prof. Niessner [Van den Oord et al 2016] 22

  23. Conditional Pix ixelCNN Prof. Leal-Taixé and Prof. Niessner [Van den Oord et al 2016] 23

  24. Autore regressive Models vs GANs • Advantages of autoregressive: – Explicitly model probability densities – More stable training – Can be applied to both discrete and continuous data • Advantages of GANs: – Have been empirically demonstrated to produce higher quality images – Faster to train Prof. Leal-Taixé and Prof. Niessner 24

  25. Autore regressive Models • State of the art is pretty impressive  Vector Quantized Variational AutoEncoder Generating Diverse High-Fidelity Images with VQ-VAE-2 Prof. Leal-Taixé and Prof. Niessner 25 https://arxiv.org/pdf/1906.00446.pdf [Razavi et al. 19]

  26. Generativ ive Models ls on Vid ideos Prof. Leal-Taixé and Prof. Niessner 26

  27. GANs on Vid ideos Two options – Single random variable z seeds entire video (all frames) • Very high dimensional output • How to do for variable length? • Future frames deterministic given past – Random variable z for each frame of the video • Need conditioning for future from the past • How to get combination of past frames + random vectors during training General issues – Temporal coherency – Drift over time (many models collapse to mean image) Prof. Leal-Taixé and Prof. Niessner 27

  28. GANs on Vid ideos: : DVD-GAN GAN Prof. Leal-Taixé and Prof. Niessner 28 [Clark et al. 2019] Adversarial Video Generation on Complex Datasets

  29. GANs on Vid ideos: : DVD-GAN GAN Prof. Leal-Taixé and Prof. Niessner 29 [Clark et al. 2019] Adversarial Video Generation on Complex Datasets

  30. GANs on Vid ideos: : DVD-GAN GAN • Trained on Kinetics-600 dataset – 256 x 256, 128 x 128, and 64 x 64 – Lengths of up 48 frames -> This is state of the art! -> Videos from scratch still incredibly challenging Prof. Leal-Taixé and Prof. Niessner 30 [Clark et al. 2019] Adversarial Video Generation on Complex Datasets

  31. Conditional GANs on Vid ideos • Challenge: – Each frame is high quality, but temporally inconsistent Prof. Leal-Taixé and Prof. Niessner 31

  32. Vid ideo-to to-Vid ideo Synthesis is • Sequential Generator: past L generated frames past L source frames (set L = 2) • Conditional Image Discriminator 𝐸 𝑗 (is it real image) Conditional Video Discriminator 𝐸 𝑤 (temp. consistency via flow) • Full Learning Objective: Prof. Leal-Taixé and Prof. Niessner 32 Wang et al. 18: Vid2Vid

  33. Vid ideo-to to-Vid ideo Synthesis is Prof. Leal-Taixé and Prof. Niessner 33 Wang et al. 18: Vid2Vid

  34. Vid ideo-to to-Vid ideo Synthesis is Prof. Leal-Taixé and Prof. Niessner 34 Wang et al. 18: Vid2Vid

  35. Vid ideo-to to-Vid ideo Synthesis is • Key ideas: – Separate discriminator for temporal parts • In this case based on optical flow – Consider recent history of prev. frames – Train all of it jointly Prof. Leal-Taixé and Prof. Niessner 35 Wang et al. 18: Vid2Vid

  36. Deep Vid ideo Port rtraits Siggraph’18 [Kim et al 18]: Deep Portraits

  37. Deep Vid ideo Port rtraits Similar to “Image -to- Image Translation” (Pix2Pix) [Isola et al.] Siggraph’18 [Kim et al 18]: Deep Portraits

  38. Deep Vid ideo Port rtraits Siggraph’18 [Kim et al 18]: Deep Portraits

  39. Deep Vid ideo Port rtraits Neural Network converts synthetic data to realistic video Siggraph’18 [Kim et al 18]: Deep Portraits

  40. Deep Vid ideo Port rtraits Siggraph’18 [Kim et al 18]: Deep Portraits

  41. Deep Vid ideo Port rtraits Siggraph’18 [Kim et al 18]: Deep Portraits

  42. Deep Vid ideo Port rtraits Siggraph’18 [Kim et al 18]: Deep Portraits

  43. Deep Vid ideo Port rtraits Interactive Video Editing Siggraph’18 [Kim et al 18]: Deep Portraits

  44. Deep Vid ideo Port rtraits: : In Insights Synthetic data for tracking is great anchor / stabilizer • Overfitting on small datasets works pretty well • Need to stay within training set w.r.t. motions • • No real learning; essentially, optimizing the problem with SGD -> should be pretty interesting for future directions Siggraph’18 [Kim et al 18]: Deep Portraits

  45. Every rybody Dance Now [Chan et al. ’18] Everybody Dance Now

  46. Every rybody Dance Now [Chan et al. ’18] Everybody Dance Now

  47. Every rybody Dance Now [Chan et al. ’18] Everybody Dance Now

  48. Every rybody Dance Now - cGANs work with different input - Requires consistent input i.e., accurate tracking - Network has no explicit 3D notion [Chan et al. ’18] Everybody Dance Now

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend