Text-Adaptive Generative Adversarial Networks: Manipulating Images - - PowerPoint PPT Presentation

text adaptive generative adversarial networks
SMART_READER_LITE
LIVE PREVIEW

Text-Adaptive Generative Adversarial Networks: Manipulating Images - - PowerPoint PPT Presentation

Text-Adaptive Generative Adversarial Networks: Manipulating Images with Natural Language Seonghyeon Nam, Yunji Kim, Seon Joo Kim Dept. of Computer Science, Yonsei University Seoul, South Korea Manipulating Images with Natural Language Icons


slide-1
SLIDE 1

Text-Adaptive Generative Adversarial Networks: Manipulating Images with Natural Language

Seonghyeon Nam, Yunji Kim, Seon Joo Kim

  • Dept. of Computer Science, Yonsei University

Seoul, South Korea

slide-2
SLIDE 2

Manipulating Images with Natural Language

Icons made by Freepik from www.flaticon.com

slide-3
SLIDE 3

Manipulating Images with Natural Language

Icons made by Freepik from www.flaticon.com

This small bird has a blue crown and white belly.

slide-4
SLIDE 4

Manipulating Images with Natural Language

Icons made by Freepik from www.flaticon.com

This small bird has a blue crown and white belly. Processing... Here it is.

slide-5
SLIDE 5

Related Work

  • Existing methods rely heavily on sentence embedding vectors
  • They fail to preserve text-irrelevant contents (e.g. background)
  • Coarse multi-modal modeling is not enough for the disentanglement

Original [Reed et al., 2016] [Dong et al., 2017] Ours

slide-6
SLIDE 6

Contribution

Original [Reed et al., 2016] [Dong et al., 2017] Ours

  • Our key idea is word-level local discriminators for fine-grained training
  • Our method effectively changes visual attributes while preserving

text-irrelevant contents

slide-7
SLIDE 7

Overview of TAGAN

This flower has petals that are yellow and are very stringy.

slide-8
SLIDE 8

Generator

This flower has petals that are yellow and are very stringy.

To preserve original contents, we add a reconstruction loss:

slide-9
SLIDE 9

Discriminator

This flower has petals that are yellow and are very stringy.

The discriminator consists of

  • 1. Unconditional discriminator

→ Make image realistic

  • 2. Text-adaptive discriminator

→ Make image match the text

slide-10
SLIDE 10

Text-Adaptive Discriminator

  • 1. Compute local discriminator scores

w v

Local Discriminator

Image Encoder Text Encoder

Global Average Pooling

text text image

slide-11
SLIDE 11

Text-Adaptive Discriminator

  • 1. Compute local discriminator scores
  • 2. Compute text/image attentions

: softmax weight for word i : softmax weight for word i, and image feature level j

slide-12
SLIDE 12

Text-Adaptive Discriminator

  • 1. Compute local discriminator scores
  • 2. Compute text/image attentions
  • 3. Aggregate the scores with attentions

: softmax weight for word i : softmax weight for word i, and image feature level j

slide-13
SLIDE 13

Manipulation Results on CUB-200

slide-14
SLIDE 14

Manipulation Results on Oxford-102

Gazania

Wikipedia

slide-15
SLIDE 15

Qualitative Comparison

Original [Dong et al., 2017] [Xu et al., 2018] Ours

slide-16
SLIDE 16

Conclusion

  • We propose a Text-Adaptive Generative Adversarial Network (TAGAN)
  • Our method disentangles and manipulates fine-grained visual attributes
  • Our method outperforms existing methods on CUB-200 and Oxford-102

https://github.com/woozzu/tagan Please visit our poster (#126) for more information