Manipulation and Synthesis Jun-Yan Zhu UC Berkeley 2017/01/11 @ - - PowerPoint PPT Presentation
Manipulation and Synthesis Jun-Yan Zhu UC Berkeley 2017/01/11 @ - - PowerPoint PPT Presentation
Deep Learning for Visual Manipulation and Synthesis Jun-Yan Zhu UC Berkeley 2017/01/11 @ VALSE What is visual manipulation? Image Editing Program input photo User Input result Desired output: stay close to the input.
What is visual manipulation?
User Input
Image Editing Program
input photo result
Desired output:
- stay close to the input.
- satisfy user’s constraint.
[Schaefer et al. 2006]
What is Visual Synthesis?
Image Generation Program
user input result
Desired output:
- satisfy user’s constraint.
Sketch2Photo [Tao et al. 2009]
So far so good
Things can get really bad The lack of “safety wheels”
Adding the “safety wheels”
Image Editing Program
User Input Input Photo Output Result
Natural Image Manifold A desired output:
- stay close to the input.
- satisfy user’s constraint.
- Lie on the natural image manifold
Prior work: Heuristic-based
Color [Reinhard et al. 2004] Color and Texture [Johnson et al. 2011] Gradient [Perez et al. 2003] “Bleeding” artifacts [Tao et al. 2010]
Prior work: Discriminative Learning
Image Compositing (20 images) [Xue et al. 2012] Image Deblurring (40 images) [Liu et al. 2013] Natural Human Motion (34 subjects) [Ren et al. 2005]
Our Goal:
- Learn the manifold of natural images
without direct human annotations.
- Improve visual manipulation and synthesis by
constraining the result to lie on that learned manifold.
Why Deep Learning Methods?
- Impressive results on visual recognition.
– Classification, detection, segmentation,3D vision, videos, etc.
- No feature engineering.
- Recent development of generative models.
(e.g. Generative Adversarial Networks)
Deep Learning trends: performance
Deep Learning trends: research
AlexNet [Krizhevsky et al.] ImageNet [Jia et al.]
Realism CNN Image Editing Model
Predict Realism Improve Editing
Discriminative Model M: {𝑦|𝑄 𝑠𝑓𝑏𝑚 𝑦 = 1} [ICCV 15’] Generative Model M: {𝑦|𝑦 = 𝐻 𝑨 } [SIGGRAPH 14’] [ECCV 16’]
Project Edit Transfer
Editing UI
Realism CNN Image Editing Model
Predict Realism Improve Editing
Discriminative Model M: {𝑦|𝑄 𝑠𝑓𝑏𝑚 𝑦 = 1} [ICCV 15’]
Foreground Object 𝐺 Background 𝐶 Image Composite 𝐽
Composite images Natural Photos
CNN Training Classifying
25K natural photos
vs.
25k composite images
Learning Visual Realism
Object Mask: (1) Human Annotation (2) Object Proposal
[Lalonde and Efros 2007]
How do we get composite images?
Composite Images Object Masks with Similar Shapes Target Object Object Mask
Ranking of Training Composites
Most realistic composites Least realistic composites
Evaluation
Dataset
- [Lalonde and Efros 2007]
- Task: binary classification
- 360 realistic photos
(natural images + realistic composites)
- 360 unrealistic photos
Methods without object mask Lalonde and Efros (no mask) 0.61 AlexNet + SVM 0.73 RealismCNN 0.84 RealismCNN + SVM 𝟏. 𝟗𝟗 Human 0.91 Methods using object mask Reinhard et al. 0.66 Lalonde and Efros (with mask) 0.81
Area under ROC Curve
Red: unrealistic composite, Green: realistic composite, Blue: natural image
Snowy Mountain Ocean Highway Least Realistic Most Realistic
Visual Realism Ranking
Our Pipeline
Realism CNN Image Editing Model
Predict Realism Improve Composites
Realism CNN Editing model: Color adjustment 𝒉 Foreground object F
𝐹(, 𝐺) = 𝐹𝐷𝑂𝑂 + 𝐹𝑠𝑓
Improving Visual Realism
Original Composite (Realism score: 0.0)
Quasi-Newton (L-BFGS)
Improved Composite (Realism score: 0.8)
Selecting Suitable Objects
Best-fitting object selected by RealismCNN Object with most similar shape
Optimizing Color Compatibility
Cut-n-paste Lalonde et al. Xue et al. Ours Object mask
Sanity Check: Real Photos
Lalonde et al. Xue et al. Ours Object mask Cut-n-paste
Visualizing and Localizing Errors (
𝜖𝐹 𝜖𝐽𝑞)
Result 𝐹 = Gradient Map 50.73 9.38 5.05 3.44 3.00 Number of L-BFGS iterations
Discriminative Model {𝑦|𝑄 𝑠𝑓𝑏𝑚 𝑦 = 1}
- Pros:
– CNN is easy to train. – Graphics programs often produce better images than generative models. – General framework for many tasks (e.g. deblurring, retargeting, etc.)
- Cons:
– Task-specific: cannot apply pre-trained model to other tasks. – Graphics programs are often non-parametric and non-differentiable. – Graphics programs often require user in the loop: thus automatically generating results for CNN training is challenging.
- Code: github.com/junyanz/RealismCNN
- Data: people.eecs.berkeley.edu/~junyanz/projects/realism/
Realism CNN Image Editing Model
Predict Realism Improve Editing
Discriminative Model M: {𝑦|𝑄 𝑠𝑓𝑏𝑚 𝑦 = 1} [ICCV 15’] Generative Model M: {𝑦|𝑦 = 𝐻 𝑨 } [SIGGRAPH 14’] [ECCV 16’]
Project Edit Transfer
Editing UI
Learning Natural Image Manifold
- Deep generative models:
– Generative Adversarial Network (GAN) [Goodfellow et al. 2014] [Radford et al. 2015] [Denton et al 2015] – Variational Auto-Encoder (VAE) [Kingma and Welling 2013] – DRAW (Recurrent Neural Network) [Gregor et al 2015] – Pixel RNN and Pixel CNN ([Oord et al 2016]) – …
“Cat”
Input image 𝐽
Image Classification via Neural Network
Slides credit: Andrew Owens
Gaussian noise Image
Can We Generate Images with Neural Networks?
- r Random Distribution
Generative Model
[Goodfellow et al. 2014]
Generative Adversarial Networks (GAN)
Synthesized image
Generative Model
[Goodfellow et al. 2014]
Generative Adversarial Networks (GAN)
Discriminative Model
“real”
Generative Model
[Goodfellow et al. 2014]
Generative Adversarial Networks (GAN)
Discriminative Model
“fake”
Cat Generation (w.r.t. training iterations
GAN as Manifold Approximation
Sample training images from “Amazon Shirts” Random image samples from Generator G(z)
[Radford et al. 2015]
Traverse on the GAN Manifold
[Radford et al. 2015]
𝐻(𝑨0) 𝐻(𝑨1)
Linear Interpolation in z space: 𝐻(𝑨0 + 𝑢 ⋅ (𝑨1 − 𝑨0)) Limitations:
- not photo-realistic enough, low resolution
- produce images randomly, no user control
Overview
- riginal photo
projection on manifold Project Edit Transfer transition between the original and edited projection different degree of image manipulation
Editing UI
Overview
- riginal photo
projection on manifold Project Edit Transfer transition between the original and edited projection different degree of image manipulation
Editing UI
0.196 0.238 0.332
Optimization
Projecting an Image onto the Manifold
Input: real image 𝑦𝑆 Output: latent vector z
Generative model 𝐻(𝑨) Reconstruction loss 𝑀
0.196 0.238 0.332
Optimization
Projecting an Image onto the Manifold
Inverting Network z = 𝑄 𝑦
0.218 0.242 0.336
Auto-encoder with a fixed decoder G Input: real image 𝑦𝑆 Output: latent vector z
0.196 0.238 0.332
Optimization
Projecting an Image onto the Manifold
Inverting Network z = 𝑄 𝑦
0.218 0.242 0.336 0.153 0.167
Hybrid Method Use the network as initialization for the optimization problem
0.268
Input: real image 𝑦𝑆 Output: latent vector z
Overview
- riginal photo
projection on manifold Project Edit Transfer transition between the original and edited projection different degree of image manipulation
Editing UI
Manipulating the Latent Vector
Objective:
𝐻(𝑨) Guidance 𝑤 𝑨0
user guidance image constraint violation loss 𝑀
Overview
- riginal photo
projection on manifold Project Edit Transfer transition between the original and edited projection different degree of image manipulation
Editing UI
Edit Transfer
𝐻(𝑨1) 𝐻(𝑨0) Input
Motion (u, v)+ Color (𝑩𝟒×𝟓): estimate per-pixel geometric and color variation
Linear Interpolation in 𝑨 space
Edit Transfer
𝐻(𝑨1) 𝐻(𝑨0) Input Linear Interpolation in 𝑨 space
Motion (u, v)+ Color (𝑩𝟒×𝟓): estimate per-pixel geometric and color variation Motion (u, v)+ Color (𝑩𝟒×𝟓): estimate per-pixel geometric and color variation
Edit Transfer
Result
𝐻(𝑨1) 𝐻(𝑨0) Input
Motion (u, v)+ Color (𝑩𝟒×𝟓): estimate per-pixel geometric and color variation
Linear Interpolation in 𝑨 space
Image Manipulation Demo
Image Manipulation Demo
Designing Products
Interactive Image Generation
The Simplest Generative Model: Averaging
AverageExplorer: {𝑦|𝑦 = 𝑜 𝑥𝑜 ⋅ 𝐽𝑜
𝑥𝑏𝑠𝑞}
- Generative model: weighted average of warped images.
- Limitations: cannot synthesize novel content.
[Zhu et al. 2014]
Generative Image Transformation
iGAN (aka. interactive GAN)
- Get the code: github.com/junyanz/iGAN
- Intelligent drawing tools via GAN.
- Debugging tools for understanding and
visualizing deep generative networks.
- Work in progress: supporting more models
(GAN, VAE, theano/tensorflow).
Generative Model {𝑦|𝑦 = 𝐻 𝑨 , 𝑨 ∈ 𝑎}
- Pros:
– Task-independent: offline generative model training is independent of the graphics applications. – Optimizing 𝑨 is easier than optimizing 𝑦. – Generative models are better and better.
- Cons:
– Low quality, low res => post-processing (still engineering work). – Limitations of current generative models: cannot produce good texture.
Related work on GAN
- Goodfellow’s NIPS 2016 Tutorial: [arxiv], [slides]
- Early work: [Tu 07’], [Gutmann and Hyvarinen 10’], etc.
- New models: InfoGAN, SSGAN, VAE-GAN, LAPGAN, BiGAN,
CoGAN, PPGAN, etc.
- Training techniques: DCGAN, Improved-GAN, EBGAN, Unrolling.
- Image: Inpainting, Inverting features, Style Transfer, Text-To-
Image, super-resolution, etc.
- Video: Frame Prediction, Tiny Videos, etc.
Image-to-Image Translation
arxiv 2016 code: github.com/phillipi/pix2pix
Image-to-Image Translation with Conditional Adversarial Nets Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. Arxiv 2016
Image-to-Image Problems
Conditional Adversarial Networks (cGAN)
- Loss: L1+GAN
- G:U-Net
- D: PatchGAN (70 × 70)
U-Net [Ronneberger et al. 15’]
Conditional GAN
Network and Loss Function
- Loss function: L1 + GAN
- Generator G: U-Net
- Discriminator D: PatchGAN 70 × 70 (FCN)
U-Net [Ronneberger et al. 15’]
Different Losses
Architectures for Generator G
Patch Size of PatchGAN
Applications
Label→Facade
Label→Street View
Map Generation
Day→ night
Edge→Handbag
HED [Xie and Tu. 15’]
Edge→ Shoe
HED [Xie and Tu. 15’]
User Sketch→ Photo
Automatic Colorization
Failure Cases
- Sparse input image.
- Unusual input image.
Summary: Image-to-Image Problems
Cat Paper Collection
- GitHub: github.com/junyanz/CatPapers
- 90% data is visual; most of visual data are about Cats.
- 60+ vision, learning and graphics papers.
Thank You!
Eli Philipp Alyosha Yong Jae Tinghui Philipp