Style le GAN Prof. Leal-Taix and Prof. Niessner 1 Style leGAN - PowerPoint PPT Presentation

Style le GAN Prof. Leal-Taixé and Prof. Niessner 1

Style leGAN Style-based generator Traditional Prof. Leal-Taixé and Prof. Niessner 2 [Karras et al. 19] StyleGAN

Style leGAN Style-based generator Traditional Prof. Leal-Taixé and Prof. Niessner 3 [Karras et al. 19] StyleGAN

Style leGAN FID (Frechet inception distance) on 50k gen. images -> Architecture is similar to Progressive Growing GAN Prof. Leal-Taixé and Prof. Niessner 4 [Karras et al. 19] StyleGAN

Style leGAN Prof. Leal-Taixé and Prof. Niessner 5 https://youtu.be/kSLJriaOumA [Karras et al. 19] StyleGAN

Style leGAN Prof. Leal-Taixé and Prof. Niessner 6 https://youtu.be/kSLJriaOumA [Karras et al. 19] StyleGAN

Style leGAN2 Interesting analysis about design choices! https://arxiv.org/pdf/1912.04958.pdf – https://github.com/NVlabs/stylegan2 – – https://youtu.be/c-NJtV9Jvp0 Prof. Leal-Taixé and Prof. Niessner 7

Autoregressiv ive Models ls Prof. Leal-Taixé and Prof. Niessner 8

Autore regressive Models vs GANs • GANs learn implicit data distribution – i.e., output are samples (distribution is in model) • Autoregressive models learn an explicit distribution governed by a prior imposed by model structure – i.e., outputs are probabilities (e.g., softmax) Prof. Leal-Taixé and Prof. Niessner 9

Pix ixelR lRNN • Goal: model distribution of natural images • Interpret pixels of an image as product of conditional distributions – Modeling an image → sequence problem – Predict one pixel at a time – Next pixel determined by all previously predicted pixels  Use a Recurrent Neural Network Prof. Leal-Taixé and Prof. Niessner [Van den Oord et al 2016] 10

Pix ixelR lRNN For RGB Prof. Leal-Taixé and Prof. Niessner [Van den Oord et al 2016] 11

Pix ixelR lRNN 𝑦 𝑗 ∈ 0,255 → 256-way softmax Prof. Leal-Taixé and Prof. Niessner [Van den Oord et al 2016] 12

Pix ixelR lRNN • Row LSTM model architecture • Image processed row by row • Hidden state of pixel depends on the 3 pixels above it – Can compute pixels in row in parallel • Incomplete context for each pixel Prof. Leal-Taixé and Prof. Niessner [Van den Oord et al 2016] 13

Pix ixelR lRNN • Diagonal BiLSTM model architecture • Solve incomplete context problem • Hidden state of pixel 𝑞 𝑗,𝑘 depends on 𝑞 𝑗,𝑘−1 and 𝑞 𝑗−1,𝑘 • Image processed by diagonals Prof. Leal-Taixé and Prof. Niessner [Van den Oord et al 2016] 14

Pix ixelR lRNN • Masked Convolutions • Only previously predicted values can be used as context • Mask A: restrict context during 1 st conv • Mask B: subsequent convs • Masking by zeroing out values Prof. Leal-Taixé and Prof. Niessner [Van den Oord et al 2016] 15

Pix ixelR lRNN • Generated 64x64 images, trained on ImageNet Prof. Leal-Taixé and Prof. Niessner [Van den Oord et al 2016] 16

Pix ixelCNN • Row and Diagonal LSTM layers have potentially unbounded dependency range within the receptive field – Can be very computationally costly  PixelCNN: – standard convs capture a bounded receptive field – All pixel features can be computed at once (during training) Prof. Leal-Taixé and Prof. Niessner [Van den Oord et al 2016] 17

Pix ixelCNN • Model preserves spatial dimensions • Masked convolutions to avoid seeing future context http://sergeiturukin.com/2017/02/22/pixelcnn.h Mask A Prof. Leal-Taixé and Prof. Niessner [Van den Oord et al 2016] 18

Gated Pix ixelCNN • Gated blocks • Imitate multiplicative complexity of PixelRNNs to reduce performance gap between PixelCNN and PixelRNN • Replace ReLU with gated block of sigmoid, tanh k th layer sigmoid 𝑧 = tanh 𝑋 𝑙,𝑔 ∗ 𝑦 ⊙ 𝜏(𝑋 𝑙,𝑕 ∗ 𝑦) element-wise product convolution Prof. Leal-Taixé and Prof. Niessner [Van den Oord et al 2016] 19

Pix ixelCNN Bli lind Spot 5x5 image / 3x3 conv Receptive Field Unseen context http://sergeiturukin.com/2017/02/24/gated-pixelcnn.html Prof. Leal-Taixé and Prof. Niessner [Van den Oord et al 2016] 20

Pix ixelCNN: : Eli limin inatin ing Bli lind Spot • Split convolution to two stacks • Horizontal stack conditions on current row • Vertical stack conditions on pixels above Prof. Leal-Taixé and Prof. Niessner [Van den Oord et al 2016] 21

Conditional Pix ixelCNN • Conditional image generation • E.g., condition on semantic class, text description latent vector to be conditioned on 𝑈 ℎ 𝑈 ℎ) 𝑧 = tanh 𝑋 𝑙,𝑔 ∗ 𝑦 + 𝑊 ⊙ 𝜏(𝑋 𝑙,𝑕 ∗ 𝑦 + 𝑊 𝑙,𝑔 𝑙,𝑕 Prof. Leal-Taixé and Prof. Niessner [Van den Oord et al 2016] 22

Conditional Pix ixelCNN Prof. Leal-Taixé and Prof. Niessner [Van den Oord et al 2016] 23

Autore regressive Models vs GANs • Advantages of autoregressive: – Explicitly model probability densities – More stable training – Can be applied to both discrete and continuous data • Advantages of GANs: – Have been empirically demonstrated to produce higher quality images – Faster to train Prof. Leal-Taixé and Prof. Niessner 24

Autore regressive Models • State of the art is pretty impressive  Vector Quantized Variational AutoEncoder Generating Diverse High-Fidelity Images with VQ-VAE-2 Prof. Leal-Taixé and Prof. Niessner 25 https://arxiv.org/pdf/1906.00446.pdf [Razavi et al. 19]

Generativ ive Models ls on Vid ideos Prof. Leal-Taixé and Prof. Niessner 26

GANs on Vid ideos Two options – Single random variable z seeds entire video (all frames) • Very high dimensional output • How to do for variable length? • Future frames deterministic given past – Random variable z for each frame of the video • Need conditioning for future from the past • How to get combination of past frames + random vectors during training General issues – Temporal coherency – Drift over time (many models collapse to mean image) Prof. Leal-Taixé and Prof. Niessner 27

GANs on Vid ideos: : DVD-GAN GAN Prof. Leal-Taixé and Prof. Niessner 28 [Clark et al. 2019] Adversarial Video Generation on Complex Datasets

GANs on Vid ideos: : DVD-GAN GAN Prof. Leal-Taixé and Prof. Niessner 29 [Clark et al. 2019] Adversarial Video Generation on Complex Datasets

GANs on Vid ideos: : DVD-GAN GAN • Trained on Kinetics-600 dataset – 256 x 256, 128 x 128, and 64 x 64 – Lengths of up 48 frames -> This is state of the art! -> Videos from scratch still incredibly challenging Prof. Leal-Taixé and Prof. Niessner 30 [Clark et al. 2019] Adversarial Video Generation on Complex Datasets

Conditional GANs on Vid ideos • Challenge: – Each frame is high quality, but temporally inconsistent Prof. Leal-Taixé and Prof. Niessner 31

Vid ideo-to to-Vid ideo Synthesis is • Sequential Generator: past L generated frames past L source frames (set L = 2) • Conditional Image Discriminator 𝐸 𝑗 (is it real image) Conditional Video Discriminator 𝐸 𝑤 (temp. consistency via flow) • Full Learning Objective: Prof. Leal-Taixé and Prof. Niessner 32 Wang et al. 18: Vid2Vid

Vid ideo-to to-Vid ideo Synthesis is Prof. Leal-Taixé and Prof. Niessner 33 Wang et al. 18: Vid2Vid

Vid ideo-to to-Vid ideo Synthesis is Prof. Leal-Taixé and Prof. Niessner 34 Wang et al. 18: Vid2Vid

Vid ideo-to to-Vid ideo Synthesis is • Key ideas: – Separate discriminator for temporal parts • In this case based on optical flow – Consider recent history of prev. frames – Train all of it jointly Prof. Leal-Taixé and Prof. Niessner 35 Wang et al. 18: Vid2Vid

Deep Vid ideo Port rtraits Siggraph’18 [Kim et al 18]: Deep Portraits

Deep Vid ideo Port rtraits Similar to “Image -to- Image Translation” (Pix2Pix) [Isola et al.] Siggraph’18 [Kim et al 18]: Deep Portraits

Deep Vid ideo Port rtraits Neural Network converts synthetic data to realistic video Siggraph’18 [Kim et al 18]: Deep Portraits

Deep Vid ideo Port rtraits Interactive Video Editing Siggraph’18 [Kim et al 18]: Deep Portraits

Deep Vid ideo Port rtraits: : In Insights Synthetic data for tracking is great anchor / stabilizer • Overfitting on small datasets works pretty well • Need to stay within training set w.r.t. motions • • No real learning; essentially, optimizing the problem with SGD -> should be pretty interesting for future directions Siggraph’18 [Kim et al 18]: Deep Portraits

Every rybody Dance Now [Chan et al. ’18] Everybody Dance Now

Every rybody Dance Now - cGANs work with different input - Requires consistent input i.e., accurate tracking - Network has no explicit 3D notion [Chan et al. ’18] Everybody Dance Now

Style le GAN Prof. Leal-Taix and Prof. Niessner 1 Style leGAN - PowerPoint PPT Presentation

Style le GAN Prof. Leal-Taix and Prof. Niessner 1 Style leGAN Style-based generator Traditional Prof. Leal-Taix and Prof. Niessner 2 [Karras et al. 19] StyleGAN Style leGAN Style-based generator Traditional Prof. Leal-Taix and

Bridging Theory and Practice of GANs MIX+GAN Ian Goodfellow, Sta ff Research Scientist, Google

style#1 grace style#2 freya style#3 iona style#4 skye style#5 cora style#6 maisie style#7 isla

GANs for Creativity and Design MIX+GAN Ian Goodfellow, Sta ff Research Scientist, Google Brain

GANs for Limited Labeled Data MIX+GAN Ian Goodfellow, Sta ff Research Scientist, Google Brain

Generative Adversarial Networks MIX+GAN Ian Goodfellow, Sta ff Research Scientist, Google Brain

Adversarial Machine Learning MIX+GAN Ian Goodfellow, Sta ff Research Scientist, Google Brain

Adversarial Machine Learning MIX+GAN Ian Goodfellow, Sta ff Research Scientist, Google Brain

Adversarial Machine Learning MIX+GAN Ian Goodfellow, Sta ff Research Scientist, Google Brain

Generative Adversarial Networks MIX+GAN Ian Goodfellow, Sta ff Research Scientist, Google Brain

Le GaN dans les systmes militaires GaN in military systems Francis Doukhan

Simulating GaN Based Devices Optical and Electrical GaN Device Simulations Contents

Introduction to GANs LSGAN SAGAN MIX+GAN Ian Goodfellow, Sta ff Research Scientist, Google Brain

Introduction to GANs LSGAN SAGAN MIX+GAN Ian Goodfellow, Sta ff Research Scientist, Google Brain

Wasserstein GAN Martin Arjovsky, Soumith Chintala, Lon Bottou, ICML 2017 Presented by Yaochen

Click to edit Master title style DRVR Click to edit Master title style Click to edit Master

James Madison University SACS Style Guide The following is a list of style conventions to use in

Site Building with Stanfords Themes Megan Miller, Web Designer, Stanford Web Services Stanford

Deep learning 10.2. Causal convolutions Fran cois Fleuret https://fleuret.org/dlc/ Dec 20,

Formality for algebroid stacks The 2-groupoid of Hochschild cochains Star products and the

Lucky in the Cloud With Diagrams Jan Khnlein Miro Spnemann @jankoehnlein @sponemann Sprotty

Variable-width contouring for additive manufacturing Samuel Hornus , Tim Kuipers , Olivier

Free Picard Categories Michael Horst The Ohio State University horst.59@osu.edu

DOSAR VO ACTION AGENDA ACTION ITEMS AND GOALS CARRIED FORWARD FROM Our Biweekly meeting on Aug

12 GeV Neutron/3He Transversity/TMDs with SoLID Brief review on nucleon longitudinal spin