Contrastive Learning for Unpaired Image-to-Image Translation
Taesung Park Alexei A. Efros Richard Zhang Jun-Yan Zhu
UC Berkeley Adobe Research ECCV 2020
Unpaired Image-to-Image Translation Taesung Park Alexei A. Efros - - PowerPoint PPT Presentation
Contrastive Learning for Unpaired Image-to-Image Translation Taesung Park Alexei A. Efros Richard Zhang Jun-Yan Zhu UC Berkeley Adobe Research ECCV 2020 What is Unpaired Image-to-Image Translation?
Taesung Park Alexei A. Efros Richard Zhang Jun-Yan Zhu
UC Berkeley Adobe Research ECCV 2020
cycle-consistency loss
CycleGAN ( ., ICCV17) DiscoGAN (K ., ICML 17) DualGAN ( ., ICCV17) Also used in MUNIT (H ., ECCV18) DRIT (L ., ECCV18)
π»
Input (horse) Output (zebra)
π»
Input (horse) Output (zebra)
Discriminator
π»
Input (horse) Output (zebra)
Invariant Sensitive
π»
Input (horse) Output (zebra)
π¨ π¨ π¨ π¨1
β
π¨ π¨
β
π¨ π¨
β
cosine similarities softmax
1
/π /π /π /π
π¨ π¨ π¨1β π¨β π¨β
MoCo: He et al., CVPR20, SimCLR: Chen et al., ICML20
softmax ( /π )
π=0.07
π»
Multilayer, Patchwise Contrastive Loss
π»nc π»nc π»dc
Multilayer, Patchwise Contrastive Loss
π»nc π»nc π»dc
π»
Internal Patches
π»
Internal Patches External Patches
Mo MoCo: : He et et al. l., , CVPR2 R20; ; SimCL CLR: : Ch Chen et et al. l., , ICML ML20 use a large set of external images as negative samples
External patches make things worse
Texture Synthesis by Non-parametric Sampling (Efros & L, ICCV99, Efros & F, SIGGRAPH01)
input in internal rnal patches hes external patches
Mode Collapse!
π»
Normally, Contrastive Loss between X and G(X)
DTN (Taigman ., ICLR17), CycleGAN ( ., ICCV17)
X G(X)
Identity loss regularization Contrastive Loss between Y and G(Y)
DTN (Taigman ., ICLR17), CycleGAN ( ., ICCV17)
π»
Y G(Y)
π»
X G(X)
Normally, Contrastive Loss between X and G(X)
Contrastive Unpaired Translation
Contrastive Loss Identity Loss Regularization
Flexible, Faster than CycleGAN Conservative, Even Faster than CUT
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
Tr Training time (sec/it iter, low
r)
< 0.5x
CycleGAN CU CUT Fast astCU CUT CycleGAN CU CUT Fast astCU CUT
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
Tr Training time (sec/it iter, low
r)
< 0.5x
CycleGAN CU CUT Fast astCU CUT MUNIT DRIT
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
Tr Training time (sec/it iter, low
r)
CycleGAN CU CUT Fast astCU CUT MUNIT DRIT
DistanceGAN Self-DistGAN
GcGAN
< 0.5x
CycleGAN Fast astCUT CUT Input CU CUT MUNIT DRIT DistanceGAN GcGAN
horse 17.9% zebra 36.8%
Source training set Target training set
CUT Input CycleGAN
zebra 30.8% zebra 25.9% zebra 19.1%
FastCUT
detected pixels:
horse 17.9% zebra 36.8%
Source training set Target training set
Cat Dog Yosemite Summer Winter Apple Orange GTA Cityscapes Paris Burano
20 40 60 80 100 120 140 160 180 CycleGAN MUNIT DRIT DistanceGAN SelfDistanceGAN GCGAN CUT FastCUT
horse2zebra cat2dog cityscapes
5 10 15 20 25 30
mean Intersection-over-Union (%), higher is better
I2I Model Segmenter
% match
π»
C M
Internal contrastive loss is well-suited for single image translation.
Also see InGAN (Shocher ., ICCV19), SinGAN (Shaham ., ICCV19)
π»
C M
Internal contrastive loss is well-suited for single image translation.
Also see InGAN (Shocher ., ICCV19), SinGAN (Shaham ., ICCV19) Reference photo
π»
C M
Internal contrastive loss is well-suited for single image translation.
Also see InGAN (Shocher ., ICCV19), SinGAN (Shaham ., ICCV19) Reference photo
πΈ
Painting
Reference Photo
Painting
Reference Photo
Gatys . CVPR16
Reference Photo
Painting
Reference Photo
STROTSS (Kolkin ., CVPR19) Painting
Reference Photo
WCT2 ( ., ICCV19) Painting
Reference Photo
Our translation result Painting
Reference Photo
CycleGAN
Reference Photo
Painting
Painting
Painting Gatys . CVPR16
Reference Photo
Painting STROTSS (K ., CVPR19)
Reference Photo
Painting WCT2 ( ., ICCV19)
Reference Photo
Ours
Reference Photo
Painting
Painting CycleGAN
Reference Photo
Painting Our translation result
Reference Photo
Painting Our translation result
Reference Photo
Painting Our translation result
Reference Photo
Painting Our translation result
Reference Photo
MUNIT (Huang, Liu, Belongie, Ka, ECCV18)
Structure for each row
dark brown, uniform light brown, spotted white white, black striped
Style for each column
Extracting style and structure from an image
style code
πΉ
structure code
π»
Extracting style and structure from an image
style code
πΉ
structure code
π»
Extracting style and structure from an image
style code
πΉ
structure code
π»
Extracting style and structure from an image
style code
πΉ
structure code
π»
Co-occurrence Patch-based Discriminator
Auto- encode
Reconstruction
structure code style code
πΈ
Auto- encode Swap
Reconstruction
πΈ πΈ
Auto- encode Swap
Reconstruction
πΈ
Reference patches Real/fake?
Patch co-occurrence discriminator πΈatc πΈ
style structure
style structure
Patch Co-Occurrence Discriminator is a Texture Discriminator
What is Texture? A iage ha ca be eeeed b fi ad ecd-de aiic
Conjecture by Bela Julesz, 1962
Two textures that differ by first-order statistics
Patch Co-Occurrence Discriminator is a Texture Discriminator
Two textures that differ by second-order statistics
left adjacent pixel right pixel dark bright bright dark left adjacent pixel right pixel dark bright bright dark
What is Texture? A iage ha ca be eeeed b fi ad ecd-de aiic
Conjecture by Bela Julesz, 1962
Patch Co-Occurrence Discriminator is a Texture Discriminator
Two textures that differ by third-order statistics
What is Texture? A iage ha ca be eeeed b fi ad ecd-de aiic
Conjecture by Bela Julesz, 1962
Patch Co-Occurrence Discriminator is a Texture Discriminator
Patch Co-Occurrence Discriminator is a Texture Discriminator
Patch Co-Occurrence Discriminator is a Texture Discriminator
Patch Co-Occurrence Discriminator is a Texture Discriminator
Patch Co-Occurrence Discriminator is a Texture Discriminator
Patch Co-Occurrence Discriminator is a Texture Discriminator
Patch Co-Occurrence Discriminator is a Texture Discriminator
Patch Co-Occurrence Discriminator is a Texture Discriminator
structure image
Auto- encode Swap
Reconstruction
πΈ
Reference patches Real/fake?
Patch co-occurrence discriminator πΈatc πΈ
Conv Dense Reference patches 128 64 32 16 32 128 256 384 384 64 8 4 4 2 768 384 Average 768 2048 2048 1024 1 Real/fake patch Patch Encoder Patch Encoder Real /Fake
Patch Encoder
Embedding a Real Input Image
Input Ours StyleGAN2 Im2StyleGAN
Fast st
feed-forward pass of the encoder. Magnitudes faster than baselines.
Accura urate
Prior distribution not enforced on the latent space. Spatial resolution is retained.
StyleGAN2 (Karras e a., CVPR20), I2SeGAN (Abdal e a., ICCV19)
Input Ours StyleGAN2 Im2StyleGAN
Fast st
feed-forward pass of the encoder. Magnitudes faster than baselines.
Accura urate
Prior distribution not enforced on the latent space. Spatial resolution is retained.
StyleGAN2 (Karras e a., CVPR20), I2SeGAN (Abdal e a., ICCV19)
Embedding a Real Input Image
StyleGAN2 Im2StyleGAN STROTSS Structure Texture Ou Ours WCT2
Photorealistic and Disentangled Swapping Quality
StyleGAN2 (Karras e a., CVPR20), I2SeGAN (Abdal e a., ICCV19) STROTSS (Kolkin e a., CVPR19), WCT2 (Yoo e a., ICCV19)
StyleGAN2 Im2StyleGAN STROTSS Structure Texture Ou Ours WCT2
Photorealistic and Disentangled Swapping Quality
StyleGAN2 (Karras e a., CVPR20), I2SeGAN (Abdal e a., ICCV19) STROTSS (Kolkin e a., CVPR19), WCT2 (Yoo e a., ICCV19)
Realism of generated images
StyleGAN2 Im2StyleGAN STROTSS Structure Texture Ou Ours WCT2
StyleGAN2 Im2StyleGAN STROTSS Structure Texture Ou Ours WCT2
Smooth Latent Space
Encoder π¨1, π¨1,,π¨ average
π¨
Encoder π¨1, π¨1,,π¨ average
π¨ Zadd_snow= π¨ π¨
Smooth Latent Space
more snow less snow input image
PCA on the Latent Space
Encoder π¨1, π¨1,,π¨ A id f ai (e, , dee, ca, )
discovered edit vectors
GANSpace (Harkonen et al., 2020)
Interactive UI
Peie
Structural Editing
Extract the structure code at this position
structure code style code
Structural Editing
Overwrite the structure code here
User-Guided Portrait Painting to Photo
input
input
User-Guided Animal Face Transformation
input same pose, different styles
PCA on the Structure Code
input bigger eyes gaze direction more smile 5 shadow
Editing Landscape Images
input
Editing Landscape Images
PCA with the style (texture) code PCA on the structure code, with user-drawn mask brush stroke visualization
UI with input image
1 2
Summary GAN that can embed images
Auto- encode
Reconstruction
structure code style code
πΈ
Summary
in inter-ima image ge (st style le) in intra ra-im image ge (struc uctur ure)
structure / style disentanglement
Summary interactive user editing
https://taesung.me/ContrastiveUnpairedTranslation https://taesung.me/SwappingAutoencoder