Text-Adaptive Generative Adversarial Networks: Manipulating Images - - PowerPoint PPT Presentation

▶

Mar 12, 2024 33 likes •194 views

Text-Adaptive Generative Adversarial Networks: Manipulating Images with Natural Language Seonghyeon Nam, Yunji Kim, Seon Joo Kim Dept. of Computer Science, Yonsei University Seoul, South Korea Manipulating Images with Natural Language Icons

SLIDE 1

Text-Adaptive Generative Adversarial Networks: Manipulating Images with Natural Language

Seonghyeon Nam, Yunji Kim, Seon Joo Kim

Dept. of Computer Science, Yonsei University

Seoul, South Korea

SLIDE 2

Manipulating Images with Natural Language

Icons made by Freepik from www.flaticon.com

SLIDE 3

Manipulating Images with Natural Language

Icons made by Freepik from www.flaticon.com

This small bird has a blue crown and white belly.

SLIDE 4

Manipulating Images with Natural Language

Icons made by Freepik from www.flaticon.com

This small bird has a blue crown and white belly. Processing... Here it is.

SLIDE 5

Related Work

Existing methods rely heavily on sentence embedding vectors
They fail to preserve text-irrelevant contents (e.g. background)
Coarse multi-modal modeling is not enough for the disentanglement

Original [Reed et al., 2016] [Dong et al., 2017] Ours

SLIDE 6

Contribution

Original [Reed et al., 2016] [Dong et al., 2017] Ours

Our key idea is word-level local discriminators for fine-grained training
Our method effectively changes visual attributes while preserving

text-irrelevant contents

SLIDE 7

Overview of TAGAN

This flower has petals that are yellow and are very stringy.

SLIDE 8

Generator

This flower has petals that are yellow and are very stringy.

To preserve original contents, we add a reconstruction loss:

SLIDE 9

Discriminator

This flower has petals that are yellow and are very stringy.

The discriminator consists of

1. Unconditional discriminator

→ Make image realistic

2. Text-adaptive discriminator

→ Make image match the text

SLIDE 10

Text-Adaptive Discriminator

1. Compute local discriminator scores

w v

Local Discriminator

Image Encoder Text Encoder

Global Average Pooling

text text image

SLIDE 11

Text-Adaptive Discriminator

1. Compute local discriminator scores
2. Compute text/image attentions

: softmax weight for word i : softmax weight for word i, and image feature level j

SLIDE 12

Text-Adaptive Discriminator

1. Compute local discriminator scores
2. Compute text/image attentions
3. Aggregate the scores with attentions

: softmax weight for word i : softmax weight for word i, and image feature level j

SLIDE 13

Manipulation Results on CUB-200

SLIDE 14

Manipulation Results on Oxford-102

Gazania

Wikipedia

SLIDE 15

Qualitative Comparison

Original [Dong et al., 2017] [Xu et al., 2018] Ours

SLIDE 16

Conclusion

We propose a Text-Adaptive Generative Adversarial Network (TAGAN)
Our method disentangles and manipulates fine-grained visual attributes
Our method outperforms existing methods on CUB-200 and Oxford-102

Text-Adaptive Generative Adversarial Networks: Manipulating Images with Natural Language

Seonghyeon Nam, Yunji Kim, Seon Joo Kim

Seoul, South Korea

Manipulating Images with Natural Language

Manipulating Images with Natural Language

This small bird has a blue crown and white belly.

Manipulating Images with Natural Language

This small bird has a blue crown and white belly. Processing... Here it is.

Related Work

Contribution

text-irrelevant contents

Overview of TAGAN

This flower has petals that are yellow and are very stringy.

Generator

This flower has petals that are yellow and are very stringy.

To preserve original contents, we add a reconstruction loss:

Discriminator

This flower has petals that are yellow and are very stringy.

The discriminator consists of

→ Make image realistic

→ Make image match the text

Text-Adaptive Discriminator

w v

Image Encoder Text Encoder

text text image

Text-Adaptive Discriminator

: softmax weight for word i : softmax weight for word i, and image feature level j

Text-Adaptive Discriminator

: softmax weight for word i : softmax weight for word i, and image feature level j

Manipulation Results on CUB-200

Manipulation Results on Oxford-102

Gazania

Qualitative Comparison

Original [Dong et al., 2017] [Xu et al., 2018] Ours

Conclusion

https://github.com/woozzu/tagan Please visit our poster (#126) for more information