VIDEO-TO-VIDEO SYNTHESIS Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, - PowerPoint PPT Presentation

VIDEO-TO-VIDEO SYNTHESIS Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Guilin Liu, Andrew Tao, Jan Kautz, Bryan Catanzaro

GENERATIVE ADVERSARIAL NETWORKS Unconditional GANs False Generator Discriminator ~ Discriminator True 2 Image credit: Celebrity dataset, Jensen Huang, Founder and CEO of NVIDIA, Ian Goodfellow, Father of GANs.

After training for a while using NVIDIA DGX1 machines Fun sampling time begin Generator 3 Image credit: NVIDIA StyleGAN

CONDITIONAL GANS Allow user more control on the sampling process Modeling (training) Given info (e.g. image, text) Generated result Sampling (testing) Given info (e.g. image, text) output style 4

SKETCH-CONDITIONAL GANS Generator 5 Image credit: NVIDIA pix2pixHD

IMAGE-CONDITIONAL GANS 6 Image credit: NVIDIA MUNIT

MASK-CONDITIONAL GANS Semantic Image Synthesis 7

MASK-CONDITIONAL GANS Semantic Image Synthesis 8

LIVE DEMO I need to get an RTX Ready Laptop (https://www.nvidia.com/en- us/geforce/gaming-laptops/20- series/) It is running live in GTC Will be online for everyone to try out in NVIDIA AI Playground website (https://www.nvidia.com/en- us/research/ai-playground/) 9

Interface 10

PROBLEM WITH PREVIOUS METHODS input result 12

PROBLEM WITH PREVIOUS METHODS Batch Norm (Ioffe et al. 2015) 0 0 𝑧 = 𝑦 − 𝜈 1 0 𝑦 = 𝑦 = 0 0 ⋅ 𝛿 + 𝛾 0 1 𝜏 same output! affine transform de-normalization normalization removes label information 13

PROBLEM WITH PREVIOUS METHODS input result 14

PROBLEM WITH PREVIOUS METHODS • Do not feed the label map directly to network • Use the label map to generate normalization layers instead 15

SPADE ( SP atially A daptive DE normalization) 𝛾 conv conv 𝛿 𝑦 𝑧 network output network input Parameter-free (label free) Batch Norm label free element-wise 𝑧 = 𝑦 − 𝜈 ⋅ 𝛿 + 𝛾 𝜏 16

𝛾 conv conv 𝛿 Parameter-free element-wise Batch Norm SPADE SPatially Adaptive DE-normalization 19

SPADE RESIDUAL BLOCKS 3x3 Conv 3x3 Conv SPADE SPADE ReLU ReLU SPADE ResBlk 20

SPADE GENERATOR SPADE SPADE SPADE SPADE ~ ResBlk ResBlk ResBlk ResBlk 21

PROBLEM WITH PREVIOUS METHODS input w/o SPADE w/ SPADE 22

Multimodal Results on Flickr IMAGE RESULTS 26

Multimodal Results on Flickr IMAGE RESULTS 27

VIDEO-TO-VIDEO SYNTHESIS 33

IMAGE-TO-IMAGE SYNTHESIS Building Tree Car Sidewalk Road 34

MOTIVATION • AI-based rendering Traditional graphics Geometry, texture, lighting Machine learning graphics Data 39

MOTIVATION • AI-based rendering • High-level semantic manipulation little explored (this work) Largely explored Edit here! Segmentation Image/video synthesis Keypoint Detection etc High-level representation Original image New image 40

PREVIOUS WORK Image translation Unconditional synthesis pix2pixHD [2018], CRN [2017], pix2pix [2017] MoCoGAN [2018], TGAN [2017], VGAN [2016] Video style transfer Video prediction MCNet [2017], PredNet [2017] COVST [2017], ArtST [2016] 41

PREVIOUS WORK: FRAME-BY-FRAME RESULT 42

OUR METHOD • Sequential generator • Multi-scale temporal discriminator • Spatio-temporal progressive training procedure 43

OUR METHOD Sequential Generator W 44

OUR METHOD Sequential Generator Multi-scale Discriminators Image Discriminator Video Discriminator D 1 D 2 D 1 D 2 D 3 D 3 W 45

OUR METHOD Spatio-temporally Progressive Training Spatially progressive Residual blocks Alternating training ... ... T T S S S T Temporally progressive 46

RESULTS 47

RESULTS • Semantic → Street view scenes • Edges → Human faces • Poses → Human bodies 48

STREET VIEW: CITYSCAPES Semantic map pix2pixHD COVST (video style transfer) Ours 50

STREET VIEW: BOSTON 51

STREET VIEW: NYC 52

FACE SWAPPING (FACE → EDGE → FACE) input edges output 54

FACE SWAPPING (SLIMMER FACE) input (slimmed) edges (slimmed) output 55

FACE SWAPPING (SLIMMER FACE) input (slimmed) edges (slimmed) output 56

MULTI-MODAL EDGE → FACE Style 1 Style 2 Style 3 57

MOTION TRANSFER (BODY → POSE → BODY) input poses output 59

MOTION TRANSFER 63

EXTENSION: FRAME PREDICTION • Goal: predict future frames given past frames • Our method: decompose prediction into two steps • 1. predict the semantic map for next frame • 2. synthesize the frame based on the semantic map 64

EXTENSION: FRAME PREDICTION Ground truth PredNet MCNet Ours 65

INTERACTIVE GRAPHICS 66

PATH TO INTERACTIVE GRAPHICS • Real-time inference • Combining with existing graphics pipeline • Domain gap between real input and synthetic input 67

PATH TO INTERACTIVE GRAPHICS • Real-time inference • Combining with existing graphics pipeline • Domain gap between real input and synthetic input 68

PATH TO INTERACTIVE GRAPHICS • Real-time inference FP16 + TensorRT → ~5 times speed up • 36ms (27.8 fps) for 1080p inference • • Overall: 15~25 fps 69

PATH TO INTERACTIVE GRAPHICS • Real-time inference • Combining with existing graphics pipeline • CARLA: open-source simulator for autonomous driving research Make game engine render semantic maps • Pass the maps to the network and display the inference result • 70

PATH TO INTERACTIVE GRAPHICS • Real-time inference • Combining with existing graphics pipeline • Domain gap between real input and synthetic input • Network trained on real data but tested on synthetic data • Things that differ: Object shapes/edges, density of objects, camera viewpoints, etc • On-going work 71

ORIGINAL CARLA IMAGE 72

RENDERED SEMANTIC MAPS 73

RECORDED DEMO RESULTS 74

RECORDED DEMO RESULTS 75

CONCLUSION 76

CONCLUSION • What can we achieve? • What can it be used for? 77

CONCLUSION • What can we achieve? • Synthesize high-res realistic images 78

CONCLUSION • What can we achieve? • Synthesize high-res realistic images • Produce temporally-smooth videos 79

CONCLUSION • What can we achieve? • Synthesize high-res realistic images • Produce temporally-smooth videos • Reinvent interactive graphics 80

CONCLUSION • What can we achieve? • What can it be used for? • AI-based rendering • High-level semantic manipulation Traditional graphics High-level representation Machine learning graphics Original image New image 81

THANK YOU https://github.com/NVIDIA/vid2vid

VIDEO-TO-VIDEO SYNTHESIS Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, - PowerPoint PPT Presentation

VIDEO-TO-VIDEO SYNTHESIS Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Guilin Liu, Andrew Tao, Jan Kautz, Bryan Catanzaro GENERATIVE ADVERSARIAL NETWORKS Unconditional GANs False Generator Discriminator ~ Discriminator True 2 Image credit:

SYNTHESIS OF SUPER SYNTHESIS OF SUPER NANOPOROUS SYNTHESIS OF SUPER SYNTHESIS OF

Total Synthesis of the Polycyclic Total Synthesis of the Polycyclic Total Synthesis of the

Chemical Synthesis Techniques Chemical Synthesis Techniques Chemical Synthesis Techniques

Text-to-Image Generation Yu Cheng Text-to-Image Synthesis Text-to-Image Synthesis

Synthesis of Carbon Synthesis of Carbon Nanotubes Nanotubes Polina Shifrina Supervisors: Dr.

Solid Texture Synthesis Solid Texture Synthesis Solid Texture Synthesis from 2D Exemplars from

Post-Synthesis Simulation VITAL Models, SDF Files, Timing Simulation Post-synthesis simulation

Synthesis of Ranking Functions and Synthesis of Inductive Invariants and Synthesis of

Text-to-Speech Synthesis Bernd Mbius Language Science and Technology Saarland University

CTP431- Music and Audio Computing Sound Synthesis Graduate School of Culture Technology KAIST

Texture Synthesis Given a texture, create more CS176: Texture Synthesis All examples from Wei

Collaborative View Synthesis for Interactive Multi-view Video Streaming Fei Chen, Jiangchuan

Co-synthesis techniques for embedded systems embedded systems Kelvin Yuk June 5, 2002 EEC282 -

Scaling Program Synthesis by Exploiting Existing Code James Bornholt Emina Torlak University of

Verification and Synthesis of Reactive Programs Overview of System Synthesis. Amir Pnueli

From Program Synthesis to Optimal Program . . . Optimal Program Synthesis Logical Interpretation

COMPASSIONATE CARE IN RESIDENTIAL AGED CARE FACILITIES IN A COVID 19 WORLD S U S AN KU RRLE G E

Device Vigilance Local Challenges & Global Trends Pam Carter Director, Device Vigilance and

VBP Workgroup Meeting February 22, 2018 February 2018 2 Agenda VI. VBP Roles Document I.

Strategic Integration of Ultra Low Strategic Integration of Ultra Low Power Technologies g

Evaluation Process for Instructors & Administrators using the APEX Homeschool Process Work

Machinery Center Inc. Presents A Ful ull Line ne of f Robots fo for Inj Injection Mo

Dermot Murray Extending your Oracle Forms Estate Using Oracle Application Express Agenda

LIGHT INDUSTRY SHOES TABLE OF CONTENTS Part I Why is a marking and traceability system

VIDEO-TO-VIDEO SYNTHESIS Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, - PowerPoint PPT Presentation

VIDEO-TO-VIDEO SYNTHESIS Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Guilin Liu, Andrew Tao, Jan Kautz, Bryan Catanzaro GENERATIVE ADVERSARIAL NETWORKS Unconditional GANs False Generator Discriminator ~ Discriminator True 2 Image credit:

SYNTHESIS OF SUPER SYNTHESIS OF SUPER NANOPOROUS SYNTHESIS OF SUPER SYNTHESIS OF

Total Synthesis of the Polycyclic Total Synthesis of the Polycyclic Total Synthesis of the

Chemical Synthesis Techniques Chemical Synthesis Techniques Chemical Synthesis Techniques

Text-to-Image Generation Yu Cheng Text-to-Image Synthesis Text-to-Image Synthesis

Synthesis of Carbon Synthesis of Carbon Nanotubes Nanotubes Polina Shifrina Supervisors: Dr.

Solid Texture Synthesis Solid Texture Synthesis Solid Texture Synthesis from 2D Exemplars from

Post-Synthesis Simulation VITAL Models, SDF Files, Timing Simulation Post-synthesis simulation

Synthesis of Ranking Functions and Synthesis of Inductive Invariants and Synthesis of

Text-to-Speech Synthesis Bernd Mbius Language Science and Technology Saarland University

CTP431- Music and Audio Computing Sound Synthesis Graduate School of Culture Technology KAIST

Texture Synthesis Given a texture, create more CS176: Texture Synthesis All examples from Wei

Collaborative View Synthesis for Interactive Multi-view Video Streaming Fei Chen, Jiangchuan

Co-synthesis techniques for embedded systems embedded systems Kelvin Yuk June 5, 2002 EEC282 -

Scaling Program Synthesis by Exploiting Existing Code James Bornholt Emina Torlak University of

Verification and Synthesis of Reactive Programs Overview of System Synthesis. Amir Pnueli

From Program Synthesis to Optimal Program . . . Optimal Program Synthesis Logical Interpretation

COMPASSIONATE CARE IN RESIDENTIAL AGED CARE FACILITIES IN A COVID 19 WORLD S U S AN KU RRLE G E

Device Vigilance Local Challenges &amp; Global Trends Pam Carter Director, Device Vigilance and

VBP Workgroup Meeting February 22, 2018 February 2018 2 Agenda VI. VBP Roles Document I.

Strategic Integration of Ultra Low Strategic Integration of Ultra Low Power Technologies g

Evaluation Process for Instructors &amp; Administrators using the APEX Homeschool Process Work

Machinery Center Inc. Presents A Ful ull Line ne of f Robots fo for Inj Injection Mo

Dermot Murray Extending your Oracle Forms Estate Using Oracle Application Express Agenda

LIGHT INDUSTRY SHOES TABLE OF CONTENTS Part I Why is a marking and traceability system

Device Vigilance Local Challenges & Global Trends Pam Carter Director, Device Vigilance and

Evaluation Process for Instructors & Administrators using the APEX Homeschool Process Work