Language to Image Generation Generate a bird with Generate a bird - PowerPoint PPT Presentation

Language to Image Generation ” Generate a bird with ” Generate a bird with ” Generate a bird with wings that are blue and wings that are black wings that are red and and a white a yellow a red be red belly lly ” white belly belly ” yellow belly belly ” ARTIFICIAL IMAGINATION

Language-to-Image generation with GANs

Propose AttnGANs to improve image generation • • • • • • •

Residual FC with reshape Upsampling Joining Conv3x3 Deep Attentional Multimodal Similarity Model (DAMSM) Attentional Generative Network word Local image features features Attention models attn F 1 F 1 attn F 2 F 2 F F F 0 F 0 z ~N(0,I) 1 2 sentence Image h 1 h 2 feature h 0 Text G 2 c Encoder ca F Encoder 256x256x3 G 0 G 1 128x128x3 this bird is red with 64x64x3 white and has a D 0 very short beak D1 D2 Training pairs

Residual FC with reshape Upsampling Joining Conv3x3 Deep Attentional Multimodal Similarity Model (DAMSM) Attentional Generative Network word Local image features features Attention models attn F 1 F 1 attn F 2 F 2 F F F 0 F 0 z ~N(0,I) 1 2 sentence Image h 1 h 2 feature h 0 Text G 2 c Encoder ca F Encoder 256x256x3 G 0 G 1 128x128x3 this bird is red with 64x64x3 white and has a D 0 very short beak D1 D2 Attentional Generative Network: - Takes multi-level conditions (global-level sentence feature and fine-grained word features) as input. - Generates images from low-to-high resolutions at multiple stages.

Residual FC with reshape Upsampling Joining Conv3x3 Deep Attentional Multimodal Similarity Model (DAMSM) Attentional Generative Network word Local image features features Attention models attn F 1 F 1 attn F 2 F 2 F F F 0 F 0 z ~N(0,I) 1 2 sentence Image h 1 h 2 feature h 0 Text G 2 c Encoder ca F Encoder 256x256x3 G 0 G 1 128x128x3 this bird is red with 64x64x3 white and has a D 0 very short beak D1 D2 In the first stage: ▪ based on the sentence feature, the image with basic color and shape is generated by generator G 0 ; ▪ hidden features h 0 are decoded from the sentence feature. ▪

Residual FC with reshape Upsampling Joining Conv3x3 Deep Attentional Multimodal Similarity Model (DAMSM) Attentional Generative Network word Local image features features Attention models attn F 1 F 1 attn F 2 F 2 F F F 0 F 0 z ~N(0,I) 1 2 sentence Image h 1 h 2 feature h 0 Text G 2 c Encoder ca F Encoder 256x256x3 G 0 G 1 128x128x3 this bird is red with 64x64x3 white and has a D 0 very short beak D1 D2 In following stages, attention models are built. ▪ For each region feature of previous generated image, compute its word-context vector. ▪ Concatenate previous image region features (e.g., h 0 ) and word-context vectors to generate ▪ image with higher resolution.

Residual FC with reshape Upsampling Joining Conv3x3 Deep Attentional Multimodal Similarity Model (DAMSM) Attentional Generative Network word Local image features features Attention models attn F 1 F 1 attn F 2 F 2 F F F 0 F 0 z ~N(0,I) 1 2 sentence Image h 1 h 2 feature h 0 Text G 2 c Encoder ca F Encoder 256x256x3 G 0 G 1 128x128x3 this bird is red with 64x64x3 white and has a D 0 very short beak D1 D2 The conditional GAN loss: 𝑀 𝐻𝐵𝑂 = 𝑊 𝐸 0 , 𝐻 0 + 𝑊 𝐸 1 , 𝐻 1 + 𝑊 𝐸 2 , 𝐻 2 Training pairs

Residual FC with reshape Upsampling Joining Conv3x3 Deep Attentional Multimodal Similarity Model (DAMSM) Attentional Generative Network word Local image features features Attention models attn F 1 F 1 attn F 2 F 2 F F F 0 F 0 z ~N(0,I) 1 2 sentence Image h 1 h 2 feature h 0 Text G 2 c Encoder ca F Encoder 256x256x3 G 0 G 1 128x128x3 this bird is red with 64x64x3 white and has a D 0 very short beak D1 D2 Training pairs

ҧ Residual FC with reshape Upsampling Joining Conv3x3 Deep Attentional Multimodal Similarity Model (DAMSM) Attentional Generative Network word Local image ❖ Text encoder (LSTM) extracts word features e 1 , e 2 , … , e T features features Attention models ❖ Image encoder (CNN) extracts image region features v 1 , v 2 , … , v N , where N = 288 ❖ Attention mechanism: for the i-th word, compute its region-context vector c i , attn F 1 F 1 attn F 2 F 2 F F F 0 F 0 z ~N(0,I) 1 2 sentence Image h 1 h 2 feature h 0 Text G 2 c Encoder ca F 𝑡 𝑗,𝑘 is the dot product between features of the i-th word and the j-th image Encoder - 256x256x3 region; ❖ Compute the similarity score 𝑆(𝑑 𝑗 , 𝑓 𝑗 ) between word and image from cosine similarity G 0 G 1 between 𝑓 𝑗 and 𝑑 𝑗 ; ❖ Compute the similarity score between the sentence (D) and the image (Q) from the 128x128x3 this bird is red with 64x64x3 fine-grained word-region pair information. white and has a D 0 very short beak D1 D2 Training pairs

Residual FC with reshape Upsampling Joining Conv3x3 Deep Attentional Multimodal Similarity Model (DAMSM) Attentional Generative Network word Local image features features Attention models ❖ The DAMSM loss: maximize the similarity score between the images and their corresponding text descriptions (ground truth), i.e., attn F 1 F 1 attn F 2 F 2 F F F 0 F 0 z ~N(0,I) 1 2 sentence Image h 1 h 2 feature h 0 Text G 2 c Encoder ca F Encoder 256x256x3 - M is the number of training pairs. G 0 G 1 ❖ The DAMSM loss provides a fine-grained word-region matching loss for 128x128x3 this bird is red with 64x64x3 white and has a training the generator. D 0 very short beak D1 D2 Training pairs

Residual FC with reshape Upsampling Joining Conv3x3 Deep Attentional Multimodal Similarity Model (DAMSM) Attentional Generative Network word Local image features features Attention models attn F 1 F 1 attn F 2 F 2 F F F 0 F 0 z ~N(0,I) 1 2 sentence Image h 1 h 2 feature h 0 Text G 2 c Encoder ca F Encoder 256x256x3 G 0 G 1 128x128x3 this bird is red with 64x64x3 white and has a D 0 very short beak D1 D2 The final objective function: 𝑀 = 𝑀 𝐻𝐵𝑂 + 𝜇𝑀 𝐸𝐵𝑁𝑇𝑁 Training pairs

CUB CUB-201 011 MS MS-COC OCO Datasets asets train test train test # samples 8,855 2,933 80,000 40,000 caption/ 10 10 5 5 image

- On CUB dataset, our AttnGAN achieves 4.36 inception score, which significantly outperforms the previous best inception score of 3.82. - On the COCO dataset, our AttnGAN boosts the best reported inception score from 9.58 to 25.89, a 170.25% improvement relatively. Dataset aset GAN-INT-CLS CLS GAWWN Stack ckGAN [3] Stack tackGAN AN-v2 v2 [4] [4] PPG PGN [5] Our r AttnGAN AN [1] [2] CUB 2.88 ± .04 3.62 ± .07 3.70 ± .04 3.82 ± .06 \ 4.36 ± .03 COCO 7.88 ± .07 \ 8.45 ± .03 \ 9.58 ± .21 25.89 ± .47 [1] S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee. Generative adversarial text-to-image synthesis. In ICML, 2016. [2] S. Reed, Z. Akata, S. Mohan, S. Tenka, B. Schiele, and H. Lee. Learning what and where to draw. In NIPS, 2016. [3] H. Zhang, T. Xu, H. Li, S. Zhang, X. Wang, X. Huang, and D. Metaxas. Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In ICCV, 2017. [4] H. Zhang, T. Xu, H. Li, S. Zhang, X. Wang, X. Huang, and D. N. Metaxas. Stackgan++: Realistic image synthesis with stacked generative adversarial networks. arXiv: 1710.10916, 2017. [5] A. Nguyen, J. Yosinski, Y. Bengio, A. Dosovitskiy, and J. Clune. Plug & play generative networks: Conditional iterative generation of images in latent space. In CVPR, 2017.

Higher inception score means better image quality and diversity. Higher R-precision rate means better conditioned. The inception score and the corresponding R-precision rate of AttnGAN models on CUB. - “AttnGAN1” architecture has one attention model and generates images of 128x128 resolution; - “AttnGAN2” architecture has two attention models and generates images of 256x256 resolution.

this bird red white a very short beak

A fruit stand display with bananas and kiwi.

Language to Image Generation Generate a bird with Generate a bird - PowerPoint PPT Presentation

Language to Image Generation Generate a bird with Generate a bird with Generate a bird with wings that are blue and wings that are black wings that are red and and a white a yellow a red be red belly lly white belly belly

Image Restoration Image Enhancement and Image Restoration both deal with improving images. Image

Natural Language Processing 1 Lecture 11: Language generation and summarisation Katia Shutova

Text-to-Image Generation Yu Cheng Text-to-Image Synthesis Text-to-Image Synthesis

Image Processing Todays Class Image Representations: Matrices Image Representations: RGB,

Topic 7: Topic 7: Image Morphing Image Morphing 1. 1. Intro to basic image morphing Intro to

Image Features Sanja Fidler CSC420: Intro to Image Understanding 1 / 64 Image Features Image

RGBD Tutorial 14210240041 Gu Pan Image RGB YUV Lab Depth Image RGB image Depth image Each pixel in

Image Features Sanja Fidler CSC420: Intro to Image Understanding 1 / 1 Image Features Image

Natural Language Processing 1 Lecture 10: Language generation and summarisation Katia Shutova

Natural Language Generation Andrea Zugarini SAILab December 5th, 2019 LabMeeting, December 5th

Is Digital Technology Image 1 Restructuring the brain? Globally connected Image 2 Image 3 How

FIRST ACADEMIC BUILDING February 12, 2014 MORPHOSIS ARCHITECTS IMAGE OF ART IMAGE OF ART IMAGE

Image filtering and image features September 26, 2019 Outline: Image filtering and image

Oncentra Prostate Image Fusion Josh Mason Oncentra Prostate Image Fusion Multiple image

Image as a single label king crab Image Source: ImageNet Image as an object set Man

Image restoration IMAGE P ROCES S IN G IN P YTH ON Rebeca Gonzalez Data Engineer Restore an

Attention Eye tracking seminar 2/19/15 Presented by Tatiana Emmanouil Outline What is

Learning attention for historical text normalization by learning to pronounce Marcel Bollmann 1

PRESENTATION SKILLS FOR EXPATS AND DUTCH PROFESSIONALS How to present both clearly and engaging?

Discover the power of your personal story Give Your Story Legs Through Social Media Even when

Putting the Learning Resources Approach into Practice Prof. Dr. Dr. Albert Ziegler Please assume

Social Media Use, Gaming, and Media-Multitasking: Should you be Concerned? (Presentation for

Comments on Behaviorally Informed by Cass R. Sunstein Varun Gauri World Bank June 9,2016

Growth Mindset : Strategies for Helping our Kids Succeed in School and Life Paul Dexter, Ph.D.