Manipulation and Synthesis Jun-Yan Zhu UC Berkeley 2017/01/11 @ - - PowerPoint PPT Presentation

manipulation and synthesis
SMART_READER_LITE
LIVE PREVIEW

Manipulation and Synthesis Jun-Yan Zhu UC Berkeley 2017/01/11 @ - - PowerPoint PPT Presentation

Deep Learning for Visual Manipulation and Synthesis Jun-Yan Zhu UC Berkeley 2017/01/11 @ VALSE What is visual manipulation? Image Editing Program input photo User Input result Desired output: stay close to the input.


slide-1
SLIDE 1

Deep Learning for Visual Manipulation and Synthesis

Jun-Yan Zhu 朱俊彦

UC Berkeley 2017/01/11 @ VALSE

slide-2
SLIDE 2

What is visual manipulation?

User Input

Image Editing Program

input photo result

Desired output:

  • stay close to the input.
  • satisfy user’s constraint.

[Schaefer et al. 2006]

slide-3
SLIDE 3

What is Visual Synthesis?

Image Generation Program

user input result

Desired output:

  • satisfy user’s constraint.

Sketch2Photo [Tao et al. 2009]

slide-4
SLIDE 4

So far so good

slide-5
SLIDE 5

Things can get really bad The lack of “safety wheels”

slide-6
SLIDE 6

Adding the “safety wheels”

Image Editing Program

User Input Input Photo Output Result

Natural Image Manifold A desired output:

  • stay close to the input.
  • satisfy user’s constraint.
  • Lie on the natural image manifold
slide-7
SLIDE 7

Prior work: Heuristic-based

Color [Reinhard et al. 2004] Color and Texture [Johnson et al. 2011] Gradient [Perez et al. 2003] “Bleeding” artifacts [Tao et al. 2010]

slide-8
SLIDE 8

Prior work: Discriminative Learning

Image Compositing (20 images) [Xue et al. 2012] Image Deblurring (40 images) [Liu et al. 2013] Natural Human Motion (34 subjects) [Ren et al. 2005]

slide-9
SLIDE 9

Our Goal:

  • Learn the manifold of natural images

without direct human annotations.

  • Improve visual manipulation and synthesis by

constraining the result to lie on that learned manifold.

slide-10
SLIDE 10

Why Deep Learning Methods?

  • Impressive results on visual recognition.

– Classification, detection, segmentation,3D vision, videos, etc.

  • No feature engineering.
  • Recent development of generative models.

(e.g. Generative Adversarial Networks)

slide-11
SLIDE 11

Deep Learning trends: performance

slide-12
SLIDE 12

Deep Learning trends: research

AlexNet [Krizhevsky et al.] ImageNet [Jia et al.]

slide-13
SLIDE 13

Realism CNN Image Editing Model

Predict Realism Improve Editing

Discriminative Model M: {𝑦|𝑄 𝑠𝑓𝑏𝑚 𝑦 = 1} [ICCV 15’] Generative Model M: {𝑦|𝑦 = 𝐻 𝑨 } [SIGGRAPH 14’] [ECCV 16’]

Project Edit Transfer

Editing UI

slide-14
SLIDE 14

Realism CNN Image Editing Model

Predict Realism Improve Editing

Discriminative Model M: {𝑦|𝑄 𝑠𝑓𝑏𝑚 𝑦 = 1} [ICCV 15’]

Foreground Object 𝐺 Background 𝐶 Image Composite 𝐽

slide-15
SLIDE 15

Composite images Natural Photos

CNN Training Classifying

25K natural photos

vs.

25k composite images

Learning Visual Realism

slide-16
SLIDE 16

Object Mask: (1) Human Annotation (2) Object Proposal

[Lalonde and Efros 2007]

How do we get composite images?

Composite Images Object Masks with Similar Shapes Target Object Object Mask

slide-17
SLIDE 17

Ranking of Training Composites

Most realistic composites Least realistic composites

slide-18
SLIDE 18

Evaluation

Dataset

  • [Lalonde and Efros 2007]
  • Task: binary classification
  • 360 realistic photos

(natural images + realistic composites)

  • 360 unrealistic photos

Methods without object mask Lalonde and Efros (no mask) 0.61 AlexNet + SVM 0.73 RealismCNN 0.84 RealismCNN + SVM 𝟏. 𝟗𝟗 Human 0.91 Methods using object mask Reinhard et al. 0.66 Lalonde and Efros (with mask) 0.81

Area under ROC Curve

slide-19
SLIDE 19

Red: unrealistic composite, Green: realistic composite, Blue: natural image

Snowy Mountain Ocean Highway Least Realistic Most Realistic

Visual Realism Ranking

slide-20
SLIDE 20

Our Pipeline

Realism CNN Image Editing Model

Predict Realism Improve Composites

slide-21
SLIDE 21

Realism CNN Editing model: Color adjustment 𝒉 Foreground object F

𝐹(𝑕, 𝐺) = 𝐹𝐷𝑂𝑂 + 𝐹𝑠𝑓𝑕

Improving Visual Realism

Original Composite (Realism score: 0.0)

Quasi-Newton (L-BFGS)

Improved Composite (Realism score: 0.8)

slide-22
SLIDE 22

Selecting Suitable Objects

Best-fitting object selected by RealismCNN Object with most similar shape

slide-23
SLIDE 23

Optimizing Color Compatibility

Cut-n-paste Lalonde et al. Xue et al. Ours Object mask

slide-24
SLIDE 24

Sanity Check: Real Photos

Lalonde et al. Xue et al. Ours Object mask Cut-n-paste

slide-25
SLIDE 25

Visualizing and Localizing Errors (

𝜖𝐹 𝜖𝐽𝑞)

Result 𝐹 = Gradient Map 50.73 9.38 5.05 3.44 3.00 Number of L-BFGS iterations

slide-26
SLIDE 26

Discriminative Model {𝑦|𝑄 𝑠𝑓𝑏𝑚 𝑦 = 1}

  • Pros:

– CNN is easy to train. – Graphics programs often produce better images than generative models. – General framework for many tasks (e.g. deblurring, retargeting, etc.)

  • Cons:

– Task-specific: cannot apply pre-trained model to other tasks. – Graphics programs are often non-parametric and non-differentiable. – Graphics programs often require user in the loop: thus automatically generating results for CNN training is challenging.

  • Code: github.com/junyanz/RealismCNN
  • Data: people.eecs.berkeley.edu/~junyanz/projects/realism/
slide-27
SLIDE 27

Realism CNN Image Editing Model

Predict Realism Improve Editing

Discriminative Model M: {𝑦|𝑄 𝑠𝑓𝑏𝑚 𝑦 = 1} [ICCV 15’] Generative Model M: {𝑦|𝑦 = 𝐻 𝑨 } [SIGGRAPH 14’] [ECCV 16’]

Project Edit Transfer

Editing UI

slide-28
SLIDE 28

Learning Natural Image Manifold

  • Deep generative models:

– Generative Adversarial Network (GAN) [Goodfellow et al. 2014] [Radford et al. 2015] [Denton et al 2015] – Variational Auto-Encoder (VAE) [Kingma and Welling 2013] – DRAW (Recurrent Neural Network) [Gregor et al 2015] – Pixel RNN and Pixel CNN ([Oord et al 2016]) – …

slide-29
SLIDE 29

“Cat”

Input image 𝐽

Image Classification via Neural Network

Slides credit: Andrew Owens

slide-30
SLIDE 30

Gaussian noise Image

Can We Generate Images with Neural Networks?

  • r Random Distribution
slide-31
SLIDE 31

Generative Model

[Goodfellow et al. 2014]

Generative Adversarial Networks (GAN)

Synthesized image

slide-32
SLIDE 32

Generative Model

[Goodfellow et al. 2014]

Generative Adversarial Networks (GAN)

Discriminative Model

“real”

slide-33
SLIDE 33

Generative Model

[Goodfellow et al. 2014]

Generative Adversarial Networks (GAN)

Discriminative Model

“fake”

slide-34
SLIDE 34

Cat Generation (w.r.t. training iterations

slide-35
SLIDE 35

GAN as Manifold Approximation

Sample training images from “Amazon Shirts” Random image samples from Generator G(z)

[Radford et al. 2015]

slide-36
SLIDE 36

Traverse on the GAN Manifold

[Radford et al. 2015]

𝐻(𝑨0) 𝐻(𝑨1)

Linear Interpolation in z space: 𝐻(𝑨0 + 𝑢 ⋅ (𝑨1 − 𝑨0)) Limitations:

  • not photo-realistic enough, low resolution
  • produce images randomly, no user control
slide-37
SLIDE 37

Overview

  • riginal photo

projection on manifold Project Edit Transfer transition between the original and edited projection different degree of image manipulation

Editing UI

slide-38
SLIDE 38

Overview

  • riginal photo

projection on manifold Project Edit Transfer transition between the original and edited projection different degree of image manipulation

Editing UI

slide-39
SLIDE 39

0.196 0.238 0.332

Optimization

Projecting an Image onto the Manifold

Input: real image 𝑦𝑆 Output: latent vector z

Generative model 𝐻(𝑨) Reconstruction loss 𝑀

slide-40
SLIDE 40

0.196 0.238 0.332

Optimization

Projecting an Image onto the Manifold

Inverting Network z = 𝑄 𝑦

0.218 0.242 0.336

Auto-encoder with a fixed decoder G Input: real image 𝑦𝑆 Output: latent vector z

slide-41
SLIDE 41

0.196 0.238 0.332

Optimization

Projecting an Image onto the Manifold

Inverting Network z = 𝑄 𝑦

0.218 0.242 0.336 0.153 0.167

Hybrid Method Use the network as initialization for the optimization problem

0.268

Input: real image 𝑦𝑆 Output: latent vector z

slide-42
SLIDE 42

Overview

  • riginal photo

projection on manifold Project Edit Transfer transition between the original and edited projection different degree of image manipulation

Editing UI

slide-43
SLIDE 43

Manipulating the Latent Vector

Objective:

𝐻(𝑨) Guidance 𝑤𝑕 𝑨0

user guidance image constraint violation loss 𝑀𝑕

slide-44
SLIDE 44

Overview

  • riginal photo

projection on manifold Project Edit Transfer transition between the original and edited projection different degree of image manipulation

Editing UI

slide-45
SLIDE 45

Edit Transfer

𝐻(𝑨1) 𝐻(𝑨0) Input

Motion (u, v)+ Color (𝑩𝟒×𝟓): estimate per-pixel geometric and color variation

Linear Interpolation in 𝑨 space

slide-46
SLIDE 46

Edit Transfer

𝐻(𝑨1) 𝐻(𝑨0) Input Linear Interpolation in 𝑨 space

Motion (u, v)+ Color (𝑩𝟒×𝟓): estimate per-pixel geometric and color variation Motion (u, v)+ Color (𝑩𝟒×𝟓): estimate per-pixel geometric and color variation

slide-47
SLIDE 47

Edit Transfer

Result

𝐻(𝑨1) 𝐻(𝑨0) Input

Motion (u, v)+ Color (𝑩𝟒×𝟓): estimate per-pixel geometric and color variation

Linear Interpolation in 𝑨 space

slide-48
SLIDE 48

Image Manipulation Demo

slide-49
SLIDE 49

Image Manipulation Demo

slide-50
SLIDE 50

Designing Products

slide-51
SLIDE 51
slide-52
SLIDE 52

Interactive Image Generation

slide-53
SLIDE 53

The Simplest Generative Model: Averaging

AverageExplorer: {𝑦|𝑦 = 𝑜 𝑥𝑜 ⋅ 𝐽𝑜

𝑥𝑏𝑠𝑞}

  • Generative model: weighted average of warped images.
  • Limitations: cannot synthesize novel content.

[Zhu et al. 2014]

slide-54
SLIDE 54

Generative Image Transformation

slide-55
SLIDE 55

iGAN (aka. interactive GAN)

  • Get the code: github.com/junyanz/iGAN
  • Intelligent drawing tools via GAN.
  • Debugging tools for understanding and

visualizing deep generative networks.

  • Work in progress: supporting more models

(GAN, VAE, theano/tensorflow).

slide-56
SLIDE 56

Generative Model {𝑦|𝑦 = 𝐻 𝑨 , 𝑨 ∈ 𝑎}

  • Pros:

– Task-independent: offline generative model training is independent of the graphics applications. – Optimizing 𝑨 is easier than optimizing 𝑦. – Generative models are better and better.

  • Cons:

– Low quality, low res => post-processing (still engineering work). – Limitations of current generative models: cannot produce good texture.

slide-57
SLIDE 57

Related work on GAN

  • Goodfellow’s NIPS 2016 Tutorial: [arxiv], [slides]
  • Early work: [Tu 07’], [Gutmann and Hyvarinen 10’], etc.
  • New models: InfoGAN, SSGAN, VAE-GAN, LAPGAN, BiGAN,

CoGAN, PPGAN, etc.

  • Training techniques: DCGAN, Improved-GAN, EBGAN, Unrolling.
  • Image: Inpainting, Inverting features, Style Transfer, Text-To-

Image, super-resolution, etc.

  • Video: Frame Prediction, Tiny Videos, etc.
slide-58
SLIDE 58

Image-to-Image Translation

arxiv 2016 code: github.com/phillipi/pix2pix

Image-to-Image Translation with Conditional Adversarial Nets Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. Arxiv 2016

slide-59
SLIDE 59

Image-to-Image Problems

slide-60
SLIDE 60

Conditional Adversarial Networks (cGAN)

  • Loss: L1+GAN
  • G:U-Net
  • D: PatchGAN (70 × 70)

U-Net [Ronneberger et al. 15’]

slide-61
SLIDE 61

Conditional GAN

slide-62
SLIDE 62

Network and Loss Function

  • Loss function: L1 + GAN
  • Generator G: U-Net
  • Discriminator D: PatchGAN 70 × 70 (FCN)

U-Net [Ronneberger et al. 15’]

slide-63
SLIDE 63

Different Losses

slide-64
SLIDE 64

Architectures for Generator G

slide-65
SLIDE 65

Patch Size of PatchGAN

slide-66
SLIDE 66

Applications

slide-67
SLIDE 67

Label→Facade

slide-68
SLIDE 68

Label→Street View

slide-69
SLIDE 69

Map Generation

slide-70
SLIDE 70

Day→ night

slide-71
SLIDE 71

Edge→Handbag

HED [Xie and Tu. 15’]

slide-72
SLIDE 72

Edge→ Shoe

HED [Xie and Tu. 15’]

slide-73
SLIDE 73

User Sketch→ Photo

slide-74
SLIDE 74

Automatic Colorization

slide-75
SLIDE 75

Failure Cases

  • Sparse input image.
  • Unusual input image.
slide-76
SLIDE 76

Summary: Image-to-Image Problems

slide-77
SLIDE 77

Cat Paper Collection

  • GitHub: github.com/junyanz/CatPapers
  • 90% data is visual; most of visual data are about Cats.
  • 60+ vision, learning and graphics papers.
slide-78
SLIDE 78
slide-79
SLIDE 79

Thank You!

Eli Philipp Alyosha Yong Jae Tinghui Philipp