Manipulation and Synthesis Jun-Yan Zhu UC Berkeley 2017/01/11 @ - PowerPoint PPT Presentation

Deep Learning for Visual Manipulation and Synthesis Jun-Yan Zhu 朱俊彦 UC Berkeley 2017/01/11 @ VALSE

What is visual manipulation? Image Editing Program input photo User Input result Desired output:  stay close to the input.  satisfy user’s constraint. [ Schaefer et al. 2006]

What is Visual Synthesis? Image Generation Program user input result Desired output:  satisfy user’s constraint. Sketch2Photo [ Tao et al. 2009]

So far so good

Things can get really bad The lack of “safety wheels”

Adding the “safety wheels” Image Editing Program Output Result User Input Input Photo A desired output:  stay close to the input. Natural Image  satisfy user’s constraint. Manifold  Lie on the natural image manifold

Prior work: Heuristic-based Gradient [ Perez et al. 2003] “Bleeding” artifacts [Tao et al. 2010] Color [Reinhard et al. 2004] Color and Texture [ Johnson et al. 2011]

Prior work: Discriminative Learning Natural Human Motion Image Compositing Image Deblurring ( 34 subjects) ( 20 images) ( 40 images) [Ren et al. 2005] [Xue et al. 2012] [Liu et al. 2013]

Our Goal: - Learn the manifold of natural images without direct human annotations. - Improve visual manipulation and synthesis by constraining the result to lie on that learned manifold.

Why Deep Learning Methods? • Impressive results on visual recognition. – Classification, detection, segmentation,3D vision, videos, etc. • No feature engineering. • Recent development of generative models. (e.g. Generative Adversarial Networks)

Deep Learning trends: performance

Deep Learning trends: research AlexNet [Krizhevsky et al.] ImageNet [Jia et al.]

Predict Discriminative Model Image Realism Realism Editing M: {𝑦|𝑄 𝑠𝑓𝑏𝑚 𝑦 = 1} CNN Model [ICCV 15’] Improve Editing Generative Model M: {𝑦|𝑦 = 𝐻 𝑨 } Project Edit Transfer Editing UI [SIGGRAPH 14’] [ECCV 16’]

Predict Discriminative Model Image Realism Realism Editing M: {𝑦|𝑄 𝑠𝑓𝑏𝑚 𝑦 = 1} CNN Model [ICCV 15’] Improve Editing Image Composite 𝐽 Foreground Object 𝐺 Background 𝐶

Learning Visual Realism CNN Training Composite images Classifying 25K natural photos Natural Photos vs. 25k composite images

How do we get composite images? Target Object Composite Images Object Mask Object Masks with Similar Shapes Object Mask: (1) Human Annotation (2) Object Proposal [Lalonde and Efros 2007]

Ranking of Training Composites Most realistic composites Least realistic composites

Evaluation Dataset Area under ROC Curve • [Lalonde and Efros 2007] Methods without object mask • Task: binary classification 0.61 Lalonde and Efros (no mask) • 360 realistic photos AlexNet + SVM 0.73 (natural images + realistic RealismCNN 0.84 composites) 𝟏. 𝟗𝟗 RealismCNN + SVM • 360 unrealistic photos 0.91 Human Methods using object mask 0.66 Reinhard et al. Lalonde and Efros (with mask) 0.81

Visual Realism Ranking Least Realistic Most Realistic Snowy Mountain Highway Ocean Red : unrealistic composite, Green : realistic composite, Blue : natural image

Our Pipeline Predict Realism Image Realism Editing CNN Model Improve Composites

Improving Visual Realism Editing model: Realism Color adjustment 𝒉 CNN Foreground object F Original Composite Improved Composite (Realism score: 0.0) (Realism score: 0.8) 𝐹(𝑕, 𝐺) = 𝐹 𝐷𝑂𝑂 + 𝐹 𝑠𝑓𝑕 Quasi-Newton (L-BFGS)

Selecting Suitable Objects Best-fitting object selected by RealismCNN Object with most similar shape

Optimizing Color Compatibility Object mask Cut-n-paste Lalonde et al. Xue et al. Ours

Sanity Check: Real Photos Object mask Cut-n-paste Lalonde et al. Xue et al. Ours

𝜖𝐹 Visualizing and Localizing Errors ( 𝜖𝐽 𝑞 ) Number of L-BFGS iterations Result Gradient Map 𝐹 = 50.73 9.38 5.05 3.44 3.00

Discriminative Model {𝑦|𝑄 𝑠𝑓𝑏𝑚 𝑦 = 1} • Pros: – CNN is easy to train. – Graphics programs often produce better images than generative models. – General framework for many tasks (e.g. deblurring, retargeting, etc.) • Cons: – Task-specific: cannot apply pre-trained model to other tasks. – Graphics programs are often non-parametric and non-differentiable. – Graphics programs often require user in the loop: thus automatically generating results for CNN training is challenging. • Code: github.com/junyanz/RealismCNN • Data: people.eecs.berkeley.edu/~junyanz/projects/realism/

Predict Discriminative Model Image Realism Realism Editing M: {𝑦|𝑄 𝑠𝑓𝑏𝑚 𝑦 = 1} CNN Model [ICCV 15’] Improve Editing Generative Model M: {𝑦|𝑦 = 𝐻 𝑨 } Project Edit Transfer Editing UI [SIGGRAPH 14’] [ECCV 16’]

Learning Natural Image Manifold • Deep generative models: – Generative Adversarial Network ( GAN ) [Goodfellow et al. 2014] [Radford et al. 2015] [Denton et al 2015] – Variational Auto-Encoder ( VAE ) [Kingma and Welling 2013] – DRAW (Recurrent Neural Network) [Gregor et al 2015] – Pixel RNN and Pixel CNN ([Oord et al 2016]) – …

Image Classification via Neural Network “ Cat ” Input image 𝐽 Slides credit: Andrew Owens

Can We Generate Images with Neural Networks? Image Gaussian noise or Random Distribution

Generative Adversarial Networks (GAN) Generative Model Synthesized image [Goodfellow et al. 2014]

Generative Adversarial Networks (GAN) Generative Model Discriminative Model “ real ” [Goodfellow et al. 2014]

Generative Adversarial Networks (GAN) Generative Model Discriminative Model “fake” [Goodfellow et al. 2014]

Cat Generation (w.r.t. training iterations

GAN as Manifold Approximation Random image samples Sample training images from Generator G(z ) from “Amazon Shirts” [Radford et al. 2015]

Traverse on the GAN Manifold 𝐻(𝑨 0 ) Linear Interpolation in z space: 𝐻(𝑨 0 + 𝑢 ⋅ (𝑨 1 − 𝑨 0 )) 𝐻(𝑨 1 ) Limitations : • not photo-realistic enough, low resolution • produce images randomly, no user control [Radford et al. 2015]

Overview original photo different degree of image manipulation Project Edit Transfer Editing UI projection on manifold transition between the original and edited projection

Projecting an Image onto the Manifold Input: real image 𝑦 𝑆 Output: latent vector z Optimization 0.196 0.238 0.332 Reconstruction loss 𝑀 Generative model 𝐻(𝑨)

Projecting an Image onto the Manifold Input: real image 𝑦 𝑆 Output: latent vector z Optimization 0.196 0.238 0.332 Inverting Network z = 𝑄 𝑦 Auto-encoder 0.242 0.336 0.218 with a fixed decoder G

Projecting an Image onto the Manifold Input: real image 𝑦 𝑆 Output: latent vector z Optimization 0.196 0.238 0.332 Inverting Network z = 𝑄 𝑦 0.242 0.336 0.218 Hybrid Method Use the network as initialization for the optimization problem 0.167 0.268 0.153

Manipulating the Latent Vector constraint violation loss 𝑀 𝑕 user guidance image Objective: Guidance 𝑤 𝑕 𝐻(𝑨 ) 𝑨 0

Edit Transfer Motion (u, v)+ Color ( 𝑩 𝟒×𝟓 ): estimate per-pixel geometric and color variation 𝐻(𝑨 0 ) Linear Interpolation in 𝑨 space 𝐻(𝑨 1 ) Input

Edit Transfer Motion (u, v)+ Color ( 𝑩 𝟒×𝟓 ): estimate per-pixel geometric and color variation Motion (u, v)+ Color ( 𝑩 𝟒×𝟓 ): estimate per-pixel geometric and color variation 𝐻(𝑨 0 ) Linear Interpolation in 𝑨 space 𝐻(𝑨 1 ) Input

Edit Transfer Motion (u, v)+ Color ( 𝑩 𝟒×𝟓 ): estimate per-pixel geometric and color variation 𝐻(𝑨 0 ) Linear Interpolation in 𝑨 space 𝐻(𝑨 1 ) Result Input

Image Manipulation Demo

Designing Products

Interactive Image Generation

The Simplest Generative Model: Averaging 𝑥𝑏𝑠𝑞 } AverageExplorer : {𝑦|𝑦 = 𝑜 𝑥 𝑜 ⋅ 𝐽 𝑜 • Generative model: weighted average of warped images. • Limitations: cannot synthesize novel content. [Zhu et al. 2014]

Generative Image Transformation

iGAN (aka. interactive GAN) • Get the code: github.com/junyanz/iGAN • Intelligent drawing tools via GAN. • Debugging tools for understanding and visualizing deep generative networks. • Work in progress: supporting more models (GAN, VAE, theano/tensorflow).

Manipulation and Synthesis Jun-Yan Zhu UC Berkeley 2017/01/11 @ - PowerPoint PPT Presentation

Deep Learning for Visual Manipulation and Synthesis Jun-Yan Zhu UC Berkeley 2017/01/11 @ VALSE What is visual manipulation? Image Editing Program input photo User Input result Desired output: stay close to the input.

Money Manipulation & the Effects on the International -Spencer Houston Community Definition

Data Manipulation in R Introduction to dplyr May 15, 2017 Data Manipulation in R May 15, 2017

SYNTHESIS OF SUPER SYNTHESIS OF SUPER NANOPOROUS SYNTHESIS OF SUPER SYNTHESIS OF

Total Synthesis of the Polycyclic Total Synthesis of the Polycyclic Total Synthesis of the

Chemical Synthesis Techniques Chemical Synthesis Techniques Chemical Synthesis Techniques

Manipulation in Political Stock Manipulation in Political Stock Markets Markets Koleman Strumpf

Recap: Strategic Manipulation We had seen two theorems that show that we cannot rule out strategic

Synthesis and Characterisation Characterisation Synthesis and of ZnO-WS 2 Nanowires, of ZnO-WS

Synthesis of Ranking Functions and Synthesis of Inductive Invariants and Synthesis of

Text-to-Speech Synthesis Bernd Mbius Language Science and Technology Saarland University

CTP431- Music and Audio Computing Sound Synthesis Graduate School of Culture Technology KAIST

Texture Synthesis Given a texture, create more CS176: Texture Synthesis All examples from Wei

Synthesis of Carbon Synthesis of Carbon Nanotubes Nanotubes Polina Shifrina Supervisors: Dr.

Solid Texture Synthesis Solid Texture Synthesis Solid Texture Synthesis from 2D Exemplars from

Post-Synthesis Simulation VITAL Models, SDF Files, Timing Simulation Post-synthesis simulation

Text-to-Image Generation Yu Cheng Text-to-Image Synthesis Text-to-Image Synthesis

Second Philosophy, Pluralism and the Multiverse Fenner Tanswell September 21, 2012 Fenner

1 Ray Tracing History Ray Tracing History Outline in Code Outline Image Raytrace (Camera cam,

Architectural Procedural Models Jan Bene , Tom Kelly, Filip Dchtrenko , Jaroslav Kivnek,

Perceptually-Driven Statistical Texture Modeling Eero Simoncelli Howard Hughes Medical Institute,

Contemporary Mathematical Realism Kurt Gdel THE VICIOUS - CIRCLE PRINCIPLE Ti e principle

Classes 9-13 Formal Philosophy. The Ancient Age: Language & Realism Gianfranco Basti

A Quantitative Assessment of Flight Training Effectiveness in Mixed Reality Peter Bellows 1 , Amy

Analysis of Sample Correlations for Monte Carlo Rendering Gurprit Singh Cengiz Oztireli Abdalla

Sambuz

Useful Links

Newsletter

Mail Us

Manipulation and Synthesis Jun-Yan Zhu UC Berkeley 2017/01/11 @ - PowerPoint PPT Presentation

Deep Learning for Visual Manipulation and Synthesis Jun-Yan Zhu UC Berkeley 2017/01/11 @ VALSE What is visual manipulation? Image Editing Program input photo User Input result Desired output: stay close to the input.

Money Manipulation &amp; the Effects on the International -Spencer Houston Community Definition

Data Manipulation in R Introduction to dplyr May 15, 2017 Data Manipulation in R May 15, 2017

SYNTHESIS OF SUPER SYNTHESIS OF SUPER NANOPOROUS SYNTHESIS OF SUPER SYNTHESIS OF

Total Synthesis of the Polycyclic Total Synthesis of the Polycyclic Total Synthesis of the

Chemical Synthesis Techniques Chemical Synthesis Techniques Chemical Synthesis Techniques

Manipulation in Political Stock Manipulation in Political Stock Markets Markets Koleman Strumpf

Recap: Strategic Manipulation We had seen two theorems that show that we cannot rule out strategic

Synthesis and Characterisation Characterisation Synthesis and of ZnO-WS 2 Nanowires, of ZnO-WS

Synthesis of Ranking Functions and Synthesis of Inductive Invariants and Synthesis of

Text-to-Speech Synthesis Bernd Mbius Language Science and Technology Saarland University

CTP431- Music and Audio Computing Sound Synthesis Graduate School of Culture Technology KAIST

Texture Synthesis Given a texture, create more CS176: Texture Synthesis All examples from Wei

Synthesis of Carbon Synthesis of Carbon Nanotubes Nanotubes Polina Shifrina Supervisors: Dr.

Solid Texture Synthesis Solid Texture Synthesis Solid Texture Synthesis from 2D Exemplars from

Post-Synthesis Simulation VITAL Models, SDF Files, Timing Simulation Post-synthesis simulation

Text-to-Image Generation Yu Cheng Text-to-Image Synthesis Text-to-Image Synthesis

Second Philosophy, Pluralism and the Multiverse Fenner Tanswell September 21, 2012 Fenner

1 Ray Tracing History Ray Tracing History Outline in Code Outline Image Raytrace (Camera cam,

Architectural Procedural Models Jan Bene , Tom Kelly, Filip Dchtrenko , Jaroslav Kivnek,

Perceptually-Driven Statistical Texture Modeling Eero Simoncelli Howard Hughes Medical Institute,

Contemporary Mathematical Realism Kurt Gdel THE VICIOUS - CIRCLE PRINCIPLE Ti e principle

Classes 9-13 Formal Philosophy. The Ancient Age: Language &amp; Realism Gianfranco Basti

A Quantitative Assessment of Flight Training Effectiveness in Mixed Reality Peter Bellows 1 , Amy

Analysis of Sample Correlations for Monte Carlo Rendering Gurprit Singh Cengiz Oztireli Abdalla

Sambuz

Useful Links

Newsletter

Mail Us

Money Manipulation & the Effects on the International -Spencer Houston Community Definition

Classes 9-13 Formal Philosophy. The Ancient Age: Language & Realism Gianfranco Basti