Unpaired Image-to-Image Translation Taesung Park Alexei A. Efros - - PowerPoint PPT Presentation

β–Ά
unpaired image to image translation
SMART_READER_LITE
LIVE PREVIEW

Unpaired Image-to-Image Translation Taesung Park Alexei A. Efros - - PowerPoint PPT Presentation

Contrastive Learning for Unpaired Image-to-Image Translation Taesung Park Alexei A. Efros Richard Zhang Jun-Yan Zhu UC Berkeley Adobe Research ECCV 2020 What is Unpaired Image-to-Image Translation?


slide-1
SLIDE 1

Contrastive Learning for Unpaired Image-to-Image Translation

Taesung Park Alexei A. Efros Richard Zhang Jun-Yan Zhu

UC Berkeley Adobe Research ECCV 2020

slide-2
SLIDE 2

β‹― β‹―

What is Unpaired Image-to-Image Translation? Training Set Test-time behavior

slide-3
SLIDE 3
slide-4
SLIDE 4

cycle-consistency loss

CycleGAN ( ., ICCV17) DiscoGAN (K ., ICML 17) DualGAN ( ., ICCV17) Also used in MUNIT (H ., ECCV18) DRIT (L ., ECCV18)

slide-5
SLIDE 5
slide-6
SLIDE 6
slide-7
SLIDE 7

interchangeable differentiated

slide-8
SLIDE 8

sensitive invariant

slide-9
SLIDE 9

What makes for a good output?

𝐻

?

Input (horse) Output (zebra)

slide-10
SLIDE 10

Retaining input con

  • ntent

tent

𝐻

Input (horse) Output (zebra)

Discriminator

slide-11
SLIDE 11

Retaining input con

  • ntent

tent

𝐻

Input (horse) Output (zebra)

Corresponding patches should have hi high s h sim imil ilar arit ity

Invariant Sensitive

𝑨 𝑨 𝑨1βˆ’ π‘¨βˆ’ π‘¨βˆ’

slide-12
SLIDE 12

Patch-based Contrastive Loss

𝐻

Input (horse) Output (zebra)

𝑨 𝑨 𝑨 𝑨1

βˆ’

𝑨 𝑨

βˆ’

𝑨 𝑨

βˆ’

cosine similarities softmax

1

/𝜐 /𝜐 /𝜐 /𝜐

𝑨 𝑨 𝑨1βˆ’ π‘¨βˆ’ π‘¨βˆ’

  • InfoNCE loss (Gutmann et al., AISTATS18 , van den Oord et al., 2018) used in MoCo and SimCLR
  • To produce positive pairs:
  • Handcrafted data augmentation (MoCo, SimCLR, etc.)
  • Input and synthesized image (ours)

MoCo: He et al., CVPR20, SimCLR: Chen et al., ICML20

softmax ( /𝜐 )

𝜐=0.07

slide-13
SLIDE 13

Patchwise contrastive loss

𝐻

slide-14
SLIDE 14

Patchwise contrastive loss

Multilayer, Patchwise Contrastive Loss

𝐻nc 𝐻nc 𝐻dc

slide-15
SLIDE 15

Patchwise contrastive loss

Multilayer, Patchwise Contrastive Loss

𝐻nc 𝐻nc 𝐻dc

+ No fixed similarity metric (e.g., L1 or perceptual loss) + One-sided (no inverse mapping needed)

slide-16
SLIDE 16

Internal vs External Patches

𝐻

Internal Patches

slide-17
SLIDE 17

Internal vs External Patches

𝐻

Internal Patches External Patches

Mo MoCo: : He et et al. l., , CVPR2 R20; ; SimCL CLR: : Ch Chen et et al. l., , ICML ML20 use a large set of external images as negative samples

External patches make things worse

slide-18
SLIDE 18

Power of In Inter ternal nal patches

Texture Synthesis by Non-parametric Sampling (Efros & L, ICCV99, Efros & F, SIGGRAPH01)

  • S S-resolution using Deep Internal Learning (Shocher, C & I CVPR18)
slide-19
SLIDE 19

Internal vs External Patches

input in internal rnal patches hes external patches

Mode Collapse!

slide-20
SLIDE 20

Identity Loss Regularization

𝐻

Normally, Contrastive Loss between X and G(X)

DTN (Taigman ., ICLR17), CycleGAN ( ., ICCV17)

X G(X)

slide-21
SLIDE 21

Identity Loss Regularization

Identity loss regularization Contrastive Loss between Y and G(Y)

DTN (Taigman ., ICLR17), CycleGAN ( ., ICCV17)

𝐻

Y G(Y)

𝐻

X G(X)

Normally, Contrastive Loss between X and G(X)

slide-22
SLIDE 22

CUT

Contrastive Unpaired Translation

Contrastive Loss Identity Loss Regularization

FastCUT

πœ‡ 1 πœ‡ 10

Flexible, Faster than CycleGAN Conservative, Even Faster than CUT

slide-23
SLIDE 23

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

Tr Training time (sec/it iter, low

  • wer is better)

r)

Lighter Footprint

< 0.5x

CycleGAN CU CUT Fast astCU CUT CycleGAN CU CUT Fast astCU CUT

slide-24
SLIDE 24

Lighter Footprint

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

Tr Training time (sec/it iter, low

  • wer is better)

r)

< 0.5x

CycleGAN CU CUT Fast astCU CUT MUNIT DRIT

slide-25
SLIDE 25

Lighter Footprint

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

Tr Training time (sec/it iter, low

  • wer is better)

r)

CycleGAN CU CUT Fast astCU CUT MUNIT DRIT

DistanceGAN Self-DistGAN

GcGAN

< 0.5x

slide-26
SLIDE 26

CycleGAN Fast astCUT CUT Input CU CUT MUNIT DRIT DistanceGAN GcGAN

slide-27
SLIDE 27

horse 17.9% zebra 36.8%

Source training set Target training set

Dealing with Dataset Bias

slide-28
SLIDE 28

CUT Input CycleGAN

zebra 30.8% zebra 25.9% zebra 19.1%

FastCUT

detected pixels:

Dealing with Dataset Bias

horse 17.9% zebra 36.8%

Source training set Target training set

slide-29
SLIDE 29

Cat Dog Yosemite Summer Winter Apple Orange GTA Cityscapes Paris Burano

slide-30
SLIDE 30

FID evaluating the realism of output images (lower is better)

20 40 60 80 100 120 140 160 180 CycleGAN MUNIT DRIT DistanceGAN SelfDistanceGAN GCGAN CUT FastCUT

horse2zebra cat2dog cityscapes

slide-31
SLIDE 31

Segmentation Score evaluating correspondences

5 10 15 20 25 30

mean Intersection-over-Union (%), higher is better

I2I Model Segmenter

% match

slide-32
SLIDE 32

Single Image Translation

𝐻

C M

Internal contrastive loss is well-suited for single image translation.

Also see InGAN (Shocher ., ICCV19), SinGAN (Shaham ., ICCV19)

slide-33
SLIDE 33

Single Image Translation

𝐻

C M

Internal contrastive loss is well-suited for single image translation.

Also see InGAN (Shocher ., ICCV19), SinGAN (Shaham ., ICCV19) Reference photo

slide-34
SLIDE 34

Single Image Translation

𝐻

C M

Internal contrastive loss is well-suited for single image translation.

Also see InGAN (Shocher ., ICCV19), SinGAN (Shaham ., ICCV19) Reference photo

𝐸

slide-35
SLIDE 35

Painting

slide-36
SLIDE 36

Reference Photo

Painting

Reference Photo

slide-37
SLIDE 37

Gatys . CVPR16

Reference Photo

Painting

Reference Photo

slide-38
SLIDE 38

STROTSS (Kolkin ., CVPR19) Painting

Reference Photo

slide-39
SLIDE 39

WCT2 ( ., ICCV19) Painting

Reference Photo

slide-40
SLIDE 40

Our translation result Painting

Reference Photo

slide-41
SLIDE 41

CycleGAN

Reference Photo

Painting

slide-42
SLIDE 42

Painting

slide-43
SLIDE 43

Painting Gatys . CVPR16

Reference Photo

slide-44
SLIDE 44

Painting STROTSS (K ., CVPR19)

Reference Photo

slide-45
SLIDE 45

Painting WCT2 ( ., ICCV19)

Reference Photo

slide-46
SLIDE 46

Ours

Reference Photo

Painting

slide-47
SLIDE 47

Painting CycleGAN

Reference Photo

slide-48
SLIDE 48

Painting Our translation result

Reference Photo

slide-49
SLIDE 49

Painting Our translation result

Reference Photo

slide-50
SLIDE 50

Painting Our translation result

Reference Photo

slide-51
SLIDE 51

Painting Our translation result

Reference Photo

slide-52
SLIDE 52

Questions or Comments?

slide-53
SLIDE 53
slide-54
SLIDE 54

Disentanglement? inter er-im image intra tra-ima image ge

slide-55
SLIDE 55

style yle con

  • nten

ent

MUNIT (Huang, Liu, Belongie, Ka, ECCV18)

slide-56
SLIDE 56

Structure for each row

slide-57
SLIDE 57

dark brown, uniform light brown, spotted white white, black striped

Style for each column

slide-58
SLIDE 58

Extracting style and structure from an image

style code

𝐹

structure code

𝐻

slide-59
SLIDE 59

Extracting style and structure from an image

style code

𝐹

structure code

𝐻

slide-60
SLIDE 60

Extracting style and structure from an image

style code

𝐹

structure code

𝐻

slide-61
SLIDE 61

Extracting style and structure from an image

style code

𝐹

structure code

𝐻

Co-occurrence Patch-based Discriminator

slide-62
SLIDE 62

Auto- encode

𝐹

Reconstruction

𝐻

structure code style code

𝐸

slide-63
SLIDE 63

Auto- encode Swap

𝐹 𝐹 𝐻

Reconstruction

𝐻

𝐸 𝐸

slide-64
SLIDE 64

Auto- encode Swap

𝐹 𝐹 𝐻

Reconstruction

𝐻

𝐸

Reference patches Real/fake?

Patch co-occurrence discriminator 𝐸atc 𝐸

slide-65
SLIDE 65

style structure

slide-66
SLIDE 66

style structure

slide-67
SLIDE 67

Patch Co-Occurrence Discriminator is a Texture Discriminator

What is Texture? A iage ha ca be eeeed b fi ad ecd-de aiic

Conjecture by Bela Julesz, 1962

Two textures that differ by first-order statistics

slide-68
SLIDE 68

Patch Co-Occurrence Discriminator is a Texture Discriminator

Two textures that differ by second-order statistics

left adjacent pixel right pixel dark bright bright dark left adjacent pixel right pixel dark bright bright dark

What is Texture? A iage ha ca be eeeed b fi ad ecd-de aiic

Conjecture by Bela Julesz, 1962

slide-69
SLIDE 69

Patch Co-Occurrence Discriminator is a Texture Discriminator

Two textures that differ by third-order statistics

Modeling joint probability is (almost) enough to capture texture

What is Texture? A iage ha ca be eeeed b fi ad ecd-de aiic

Conjecture by Bela Julesz, 1962

slide-70
SLIDE 70

Patch Co-Occurrence Discriminator is a Texture Discriminator

D( , ) =

slide-71
SLIDE 71

Patch Co-Occurrence Discriminator is a Texture Discriminator

D( , ) = Different Image

slide-72
SLIDE 72

Patch Co-Occurrence Discriminator is a Texture Discriminator

D( , ) =

slide-73
SLIDE 73

Patch Co-Occurrence Discriminator is a Texture Discriminator

D( , ) = Same Image

slide-74
SLIDE 74

Patch Co-Occurrence Discriminator is a Texture Discriminator

D( , ) =

slide-75
SLIDE 75

Patch Co-Occurrence Discriminator is a Texture Discriminator

D( , ) = Different Image

slide-76
SLIDE 76

Patch Co-Occurrence Discriminator is a Texture Discriminator

D( , ) =

slide-77
SLIDE 77

Patch Co-Occurrence Discriminator is a Texture Discriminator

D( , ) = Different (fake) Image

structure image

slide-78
SLIDE 78

Auto- encode Swap

𝐹 𝐹 𝐻

Reconstruction

𝐻

𝐸

Reference patches Real/fake?

Patch co-occurrence discriminator 𝐸atc 𝐸

slide-79
SLIDE 79

Conv Dense Reference patches 128 64 32 16 32 128 256 384 384 64 8 4 4 2 768 384 Average 768 2048 2048 1024 1 Real/fake patch Patch Encoder Patch Encoder Real /Fake

Patch Encoder

slide-80
SLIDE 80

Embedding a Real Input Image

Input Ours StyleGAN2 Im2StyleGAN

Fast st

feed-forward pass of the encoder. Magnitudes faster than baselines.

Accura urate

Prior distribution not enforced on the latent space. Spatial resolution is retained.

StyleGAN2 (Karras e a., CVPR20), I2SeGAN (Abdal e a., ICCV19)

slide-81
SLIDE 81

Input Ours StyleGAN2 Im2StyleGAN

Fast st

feed-forward pass of the encoder. Magnitudes faster than baselines.

Accura urate

Prior distribution not enforced on the latent space. Spatial resolution is retained.

StyleGAN2 (Karras e a., CVPR20), I2SeGAN (Abdal e a., ICCV19)

Embedding a Real Input Image

slide-82
SLIDE 82

StyleGAN2 Im2StyleGAN STROTSS Structure Texture Ou Ours WCT2

Photorealistic and Disentangled Swapping Quality

StyleGAN2 (Karras e a., CVPR20), I2SeGAN (Abdal e a., ICCV19) STROTSS (Kolkin e a., CVPR19), WCT2 (Yoo e a., ICCV19)

slide-83
SLIDE 83

StyleGAN2 Im2StyleGAN STROTSS Structure Texture Ou Ours WCT2

Photorealistic and Disentangled Swapping Quality

StyleGAN2 (Karras e a., CVPR20), I2SeGAN (Abdal e a., ICCV19) STROTSS (Kolkin e a., CVPR19), WCT2 (Yoo e a., ICCV19)

slide-84
SLIDE 84

Realism of generated images

StyleGAN2 Im2StyleGAN STROTSS Structure Texture Ou Ours WCT2

slide-85
SLIDE 85

StyleGAN2 Im2StyleGAN STROTSS Structure Texture Ou Ours WCT2

slide-86
SLIDE 86

Smooth Latent Space

Encoder 𝑨1, 𝑨1,,𝑨 average

𝑨

Encoder 𝑨1, 𝑨1,,𝑨 average

𝑨 Zadd_snow= 𝑨 𝑨

slide-87
SLIDE 87

Smooth Latent Space

more snow less snow input image

slide-88
SLIDE 88

PCA on the Latent Space

Encoder 𝑨1, 𝑨1,,𝑨 A id f ai (e, , dee, ca, )

PCA

discovered edit vectors

GANSpace (Harkonen et al., 2020)

slide-89
SLIDE 89

Interactive UI

Peie

  • f each principal axis
slide-90
SLIDE 90

Structural Editing

Extract the structure code at this position

𝐹

structure code style code

slide-91
SLIDE 91

Structural Editing

Overwrite the structure code here

slide-92
SLIDE 92

User-Guided Portrait Painting to Photo

input

  • utput

input

  • utput
slide-93
SLIDE 93

User-Guided Animal Face Transformation

input same pose, different styles

slide-94
SLIDE 94

PCA on the Structure Code

input bigger eyes gaze direction more smile 5 shadow

slide-95
SLIDE 95

Editing Landscape Images

input

slide-96
SLIDE 96

Editing Landscape Images

PCA with the style (texture) code PCA on the structure code, with user-drawn mask brush stroke visualization

  • 1. remove road
  • 2. draw mountain

UI with input image

1 2

slide-97
SLIDE 97

Summary GAN that can embed images

Auto- encode

𝐹

Reconstruction

𝐻

structure code style code

𝐸

slide-98
SLIDE 98

Summary

in inter-ima image ge (st style le) in intra ra-im image ge (struc uctur ure)

structure / style disentanglement

slide-99
SLIDE 99

Summary interactive user editing

slide-100
SLIDE 100

Thank you!

https://taesung.me/ContrastiveUnpairedTranslation https://taesung.me/SwappingAutoencoder