Models s Prof. Leal-Taix and Prof. Niessner 1 Condit itio ional - PowerPoint PPT Presentation

More Generati ative Models s  Prof. Leal-Taixé and Prof. Niessner 1

Condit itio ional l GANs on Videos • Challenge: – Each frame is high quality, but temporally inconsistent Prof. Leal-Taixé and Prof. Niessner 2

Video-to to-Vid ideo Synthesis is Sequential Generator: • past L generated frames past L source frames (set L = 2) Conditional Image Discriminator 𝐸 𝑗 (is it real image) • Conditional Video Discriminator 𝐸 𝑤 (temp. consistency via flow) • Full Learning Objective: Prof. Leal-Taixé and Prof. Niessner 3 Wang et al. 18: Vid2Vid

Video-to to-Vid ideo Synthesis is Prof. Leal-Taixé and Prof. Niessner 4 Wang et al. 18: Vid2Vid

Video-to to-Vid ideo Synthesis is • Key ideas: – Separate discriminator for temporal parts • In this case based on optical flow – Consider recent history of prev. frames – Train all of it jointly Prof. Leal-Taixé and Prof. Niessner 5 Wang et al. 18: Vid2Vid

Deep Video Portrait its Siggraph’18 [Kim et al 18]: Deep Portraits

Deep Video Portrait its Similar to “Image -to- Image Translation” (Pix2Pix) [Isola et al.] Siggraph’18 [Kim et al 18]: Deep Portraits

Deep Video Portrait its Neural Network converts synthetic data to realistic video Siggraph’18 [Kim et al 18]: Deep Portraits

Deep Video Portrait its Interactive Video Editing Siggraph’18 [Kim et al 18]: Deep Portraits

Deep Video Portrait its: : Insig ights Synthetic data for tracking is great anchor / stabilizer • Overfitting on small datasets works pretty well • Need to stay within training set w.r.t. motions • No real learning; essentially, optimizing the problem • with SGD -> should be pretty interesting for future directions Siggraph’18 [Kim et al 18]: Deep Portraits

Everyb ybody y Dance Now [Chan et al. ’18] Everybody Dance Now

Everybo ybody y Dance Now: Insig ights • Conditioning via tracking seems promising! – Tracking quality translates to resulting image quality – Tracking human skeletons is less developed than faces • Temporally it’s not stable… (e.g., OpenPose etc.) – Fun fact, there were like 4 papers with a similar same idea that appeared around the same time… [Chan et al. ’18] Everybody Dance Now

Deep Voxe xels ls [Sitzmann et al. ’18] Deep Voxels

Deep Voxe xels ls • Main idea for video generation: – Why learn 3D operations with 2D Convs !?!? – We know how 3D transformations work • E.g., 6 DoF rigid pose [ R | t ] – Incorporate these into the architectures • Need to be differentiable! – Example application: novel view point synthesis • Given rigid pose, generate image for that view [Sitzmann et al. ’18] Deep Voxels

Deep Voxe xels ls Occlusion Network: Issue: we don’t know the depth for the target! -> Per-pixel softmax along the ray -> Network learns the depth [Sitzmann et al. ’18] Deep Voxels

Deep Voxels ls: Insig ights • Lifting from 2D to 3D works great – No need to take specific care for temp. coherency! • All 3D operations are differentiable • Currently, only for novel view-point synthesis – I.e., cGAN for new pose in a given scene [Sitzmann et al. ’18] Deep Voxels

Neural Renderin ing with Neural l Textures

Auto toregress ssive Mode dels Prof. Leal-Taixé and Prof. Niessner 27

Autoregressive ive Models vs GANs • GANs learn implicit data distribution – i.e., output are samples (distribution is in model) • Autoregressive models learn an explicit distribution governed by a prior imposed by model structure – i.e., outputs are probabilities (e.g., softmax) Prof. Leal-Taixé and Prof. Niessner 28

PixelRN RNN • Goal: model distribution of natural images • Interpret pixels of an image as product of conditional distributions – Modeling an image → sequence problem – Predict one pixel at a time – Next pixel determined by all previously predicted pixels  Use a Recurrent Neural Network Prof. Leal-Taixé and Prof. Niessner [Van den Oord et al 2016] 29

PixelRN RNN For RGB Prof. Leal-Taixé and Prof. Niessner [Van den Oord et al 2016] 30

PixelRN RNN 𝑦 𝑗 ∈ 0,255 → 256-way softmax Prof. Leal-Taixé and Prof. Niessner [Van den Oord et al 2016] 31

PixelRN RNN • Row LSTM model architecture • Image processed row by row • Hidden state of pixel depends on the 3 pixels above it – Can compute pixels in row in parallel • Incomplete context for each pixel Prof. Leal-Taixé and Prof. Niessner [Van den Oord et al 2016] 32

PixelRN RNN • Diagonal BiLSTM model architecture • Solve incomplete context problem • Hidden state of pixel 𝑞 𝑗,𝑘 depends on 𝑞 𝑗,𝑘−1 and 𝑞 𝑗−1,𝑘 • Image processed by diagonals Prof. Leal-Taixé and Prof. Niessner [Van den Oord et al 2016] 33

PixelRN RNN • Masked Convolutions • Only previously predicted values can be used as context • Mask A: restrict context during 1 st conv • Mask B: subsequent convs • Masking by zeroing out values Prof. Leal-Taixé and Prof. Niessner [Van den Oord et al 2016] 34

PixelRN RNN • Generated 64x64 images, trained on ImageNet Prof. Leal-Taixé and Prof. Niessner [Van den Oord et al 2016] 35

PixelCNN • Row and Diagonal LSTM layers have potentially unbounded dependency range within the receptive field – Can be very computationally costly  PixelCNN: – standard convs capture a bounded receptive field – All pixel features can be computed at once (during training) Prof. Leal-Taixé and Prof. Niessner [Van den Oord et al 2016] 36

PixelCNN • Model preserves spatial dimensions • Masked convolutions to avoid seeing future context http://sergeiturukin.com/2017/02/22/pixelcnn.htm Mask A Prof. Leal-Taixé and Prof. Niessner [Van den Oord et al 2016] 37

Gated PixelC lCNN • Gated blocks • Imitate multiplicative complexity of PixelRNNs to reduce performance gap between PixelCNN and PixelRNN • Replace ReLU with gated block of sigmoid, tanh k th layer sigmoid 𝑧 = tanh 𝑋 𝑙,𝑔 ∗ 𝑦 ⊙ 𝜏(𝑋 𝑙,𝑕 ∗ 𝑦) element-wise product convolution Prof. Leal-Taixé and Prof. Niessner [Van den Oord et al 2016] 38

PixelCNN Blind Spot 5x5 image / 3x3 conv Receptive Field Unseen context http://sergeiturukin.com/2017/02/24/gated-pixelcnn Prof. Leal-Taixé and Prof. Niessner [Van den Oord et al 2016] 39

Pixe xelCNN: : Elimin inatin ing Blind Spot • Split convolution to two stacks • Horizontal stack conditions on current row • Vertical stack conditions on pixels above Prof. Leal-Taixé and Prof. Niessner [Van den Oord et al 2016] 40

Condit itio ional l Pixe xelCNN • Conditional image generation • E.g., condition on semantic class, text description latent vector to be conditioned on 𝑈 ℎ 𝑈 ℎ) 𝑧 = tanh 𝑋 𝑙,𝑔 ∗ 𝑦 + 𝑊 ⊙ 𝜏(𝑋 𝑙,𝑕 ∗ 𝑦 + 𝑊 𝑙,𝑔 𝑙,𝑕 Prof. Leal-Taixé and Prof. Niessner [Van den Oord et al 2016] 41

Condit itio ional l Pixe xelCNN Prof. Leal-Taixé and Prof. Niessner [Van den Oord et al 2016] 42

Autoregressive ive Models vs GANs • Advantages of autoregressive: – Explicitly model probability densities – More stable training – Can be applied to both discrete and continuous data • Advantages of GANs: – Have been empirically demonstrated to produce higher quality images – Faster to train Prof. Leal-Taixé and Prof. Niessner 43

Deep p Learning ng in Highe her Dimens nsions ons Prof. Leal-Taixé and Prof. Niessner 44

Multi-Dim Dimensio ional l ConvNets 1D ConvNets • Audio / Speech – Also Point Clouds – 2D ConvNets • Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) – 3D ConvNets • For videos – For 3D data – 4D ConvNets • E.g., dynamic 3D data (Haven’t seen much work there) – Simulations – Prof. Leal-Taixé and Prof. Niessner 45

Remember: : 1D Convolu lutio ions 4 3 2 -5 3 5 2 5 5 6 𝑔 𝑕 1/3 1/3 1/3 𝑔 ∗ 𝑕 3 4 ⋅ 1 3 + 3 ⋅ 1 3 + 2 ⋅ 1 3 = 3 Prof. Leal-Taixé and Prof. Niessner 46

Remember: : 1D Convolu lutio ions 4 3 2 -5 3 5 2 5 5 6 𝑔 𝑕 1/3 1/3 1/3 𝑔 ∗ 𝑕 3 0 3 ⋅ 1 3 + 2 ⋅ 1 3 + (−5) ⋅ 1 3 = 0 Prof. Leal-Taixé and Prof. Niessner 47

Remember: : 1D Convolu lutio ions 4 3 2 -5 3 5 2 5 5 6 𝑔 𝑕 1/3 1/3 1/3 𝑔 ∗ 𝑕 3 0 0 2 ⋅ 1 3 + (−5) ⋅ 1 3 + 3 ⋅ 1 3 = 0 Prof. Leal-Taixé and Prof. Niessner 48

Models s Prof. Leal-Taix and Prof. Niessner 1 Condit itio ional - PowerPoint PPT Presentation

More Generati ative Models s Prof. Leal-Taix and Prof. Niessner 1 Condit itio ional l GANs on Videos Challenge: Each frame is high quality, but temporally inconsistent Prof. Leal-Taix and Prof. Niessner 2 Video-to to-Vid

DSGE Models: A User Guide for Policymakers Lawrence J. Christiano Outline Why models? Why

Seminar LIGHTING MODELS What is a light? Types of light Illumination models

From Conceptual Models From Conceptual Models to Simulation Models to Simulation Models Model

Factor Models: A Review James J. Heckman The University of Chicago Econ 312, Winter 2019

Weak memory models INF4140 - Models of concurrency Weak memory models Fall 2016 30. 10. 2016

4CSLL5 IBM Translation Models Martin Emms October 22, 2020 4CSLL5 IBM Translation Models IBM

Outline Viscous Flow Turbulence Mixing Length Models One-Equation Models

Regression 2: Mixed Models Marco Baroni Practical Statistics in R Outline Mixed models with

Sequence-to-sequence Models and Attention Graham Neubig Preliminaries: Language Models

Functional Linear Models 1 66 / 181 Functional Linear Models Statistical Models So far we have

Outline Contagion Contagion Basic Contagion Basic Contagion Models Models Complex Networks,

Models for Models for Retrieval and Browsing Retrieval and Browsing - Structural Models and

EE359 Lecture 2 Outline TX and RX Signal Models Path Loss Models Free-space and

Probabilistic Models Models describe how (a portion of) the world works Models are always

Cognitive Modeling Symbolic School Lecture 2: Approaches Symbolic Models 2 Symbolic

Models and refined models for involutory reflection groups and classical Weyl groups FABRIZIO

Models for Structured Data Linear Chains If we take a persons BP every five minutes over

Lecture 18 Models for areal data Colin Rundel 03/22/2017 1 areal / lattice data 2 Example -

Seminar in Computer Graphics 186.175, WS 2019, 2.0h (3 ECTS) Philipp Erler

Finite element methods in scientifjc computing Wolfgang Bangerth, Colorado State University

Tracking using Goal CONDENSATION: Model-based visual tracking in dense Conditional Density

Tracking using CONDENSATION: Conditional Density Propagation M. Isard and A. Blake,

Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University

Lecture 7: Probability Review (contd.) Maximum Likelihood Estimation (MLE) Aykut Erdem