generative adversarial network
play

Generative Adversarial Network and it its Applications to Human La - PowerPoint PPT Presentation

Generative Adversarial Network and it its Applications to Human La Language Processing Hung-yi Lee Full version of the tutorial Outline Part I: General Introduction of Generative Adversarial Network (GAN) Part II: Applications


  1. Generative Adversarial Network and it its Applications to Human La Language Processing 李宏毅 Hung-yi Lee Full version of the tutorial

  2. Outline Part I: General Introduction of Generative Adversarial Network (GAN) Part II: Applications to Natural Language Processing Part III: Applications to Speech Processing

  3. All Kinds of GAN … https://github.com/hindupuravinash/the-gan-zoo GAN ACGAN BGAN CGAN It is a wise choice to DCGAN attend this tutorial. EBGAN fGAN GoGAN …… Mihaela Rosca, Balaji Lakshminarayanan, David Warde-Farley, Shakir Mohamed, “ Variational Approaches for Auto-Encoding Generative Adversarial Networks ”, arXiv, 2017

  4. Generative Adversarial Network (GAN) • Anime face generation as example high vector Generator dimensional image vector Discri- score image minator Larger score means real, smaller score means fake.

  5. Algorithm • Initialize generator and discriminator G D • In each training iteration: Step 1 : Fix generator G, and update discriminator D sample Update 1 1 1 1 D generated Database 0 0 0 0 objects randomly vector vector vector vector G Fix sampled Discriminator learns to assign high scores to real objects and low scores to generated objects.

  6. Algorithm • Initialize generator and discriminator G D • In each training iteration: Step 2 : Fix discriminator D, and update generator G Generator learns to “ fool ” the discriminator hidden layer NN Discri- 0.13 Generator minator vector update fix large network Backpropagation

  7. Algorithm • Initialize generator and discriminator G D • In each training iteration: Sample some Update 1 1 1 1 real objects: D Learning Generate some D fake objects: 0 0 0 0 vector vector vector vector fix G Learning G vector vector vector vector image 1 G D image image image update fix

  8. The faces generated by machine. The images are generated by Yen-Hao Chen, Po-Chun Chien, Jun-Chen Xie, Tsung-Han Wu.

  9. Conditional Generation Generation 0.3 0.1 −0.3 NN −0.1 −0.1 0.1 ⋮ ⋮ ⋮ Generator −0.7 0.7 0.9 In a specific range Conditional Generation “Girl with red hair NN and red eyes” Generator “Girl with yellow ribbon”

  10. paired data blue eyes Conditional GAN red hair short hair c: red hair G Image x = G(c,z) 𝑨 Normal distribution 𝑑 D x is realistic or not + scalar (better) c and x are matched or not 𝑦 True text-image pairs: (red hair, ) 1 (blue hair , ) 0 0 (red hair, ) [Scott Reed, et al, ICML, 2016]

  11. The images are generated by Yen-Hao Chen, Po-Chun Chien, Conditional GAN Jun-Chen Xie, Tsung-Han Wu. [Scott Reed, et al, ICML, 2016] x = G(c,z) paired data blue eyes G Image c: text red hair short hair red hair, green eyes blue hair, red eyes

  12. Conditional GAN G Image c: sound "a dog barking sound" Training Data Collection video

  13. The images are generated by Chia- Hung Wan and Shun-Po Chuang. Conditional GAN https://wjohn1483.github.io/ audio_to_scene/index.html Louder • Audio-to-image

  14. Conditional GAN - Image-to-label Multi-label Image Classifier = Conditional Generator Input condition Generated output

  15. Conditional GAN - Image-to-label F1 MS-COCO NUS-WIDE The classifiers can have VGG-16 56.0 33.9 different architectures. + GAN 60.4 41.2 Inception 62.4 53.5 The classifiers are +GAN 63.8 55.8 trained as conditional GAN. Resnet-101 62.8 53.1 +GAN 64.0 55.4 Resnet-152 63.3 52.1 +GAN 63.9 54.1 Att-RNN 62.1 54.7 [Tsai, et al., submitted to RLSD 62.0 46.9 ICASSP 2019]

  16. Conditional GAN - Image-to-label F1 MS-COCO NUS-WIDE The classifiers can have VGG-16 56.0 33.9 different architectures. + GAN 60.4 41.2 Inception 62.4 53.5 The classifiers are +GAN 63.8 55.8 trained as conditional GAN. Resnet-101 62.8 53.1 +GAN 64.0 55.4 Conditional GAN Resnet-152 63.3 52.1 outperforms other +GAN 63.9 54.1 models designed for multi-label. Att-RNN 62.1 54.7 RLSD 62.0 46.9

  17. Adversarial Training of End-to-end Speech Recognition Using a Criticizing Language Conditional GAN Model, https://arxiv.org/abs/1811.00787 – Speech Recognition

  18. Unsupervised Conditional GAN G Condition Generated Object Object in Domain X Object in Domain Y Transform an object from one domain to another without paired data (e.g. style transfer) Domain Y Domain X Not Paired photos Vincent van Gogh’s paintings

  19. Unsupervised Conditional Generation • Approach 1: Direct Transformation For texture or 𝐻 𝑌→𝑍 ? color change Domain X Domain Y • Approach 2: Projection to Common Space 𝐸𝐹 𝑍 𝐹𝑂 𝑌 Face Decoder of Encoder of Domain X Domain Y Attribute domain Y domain X Larger change, only keep the semantics

  20. Domain X Direct Transformation Domain Y Become similar Domain X to domain Y 𝐻 𝑌→𝑍 ? 𝐸 𝑍 scalar Input image belongs to domain Y or not Domain Y

  21. Domain X Direct Transformation Domain Y Become similar Domain X to domain Y 𝐻 𝑌→𝑍 Not what we want! ignore input 𝐸 𝑍 scalar Input image belongs to domain Y or not Domain Y

  22. [Jun-Yan Zhu, et al., ICCV, 2017] Direct Transformation as close as possible Cycle consistency 𝐻 𝑌→𝑍 𝐻 Y→X Lack of information for reconstruction 𝐸 𝑍 scalar Input image belongs to domain Y or not Domain Y

  23. Cycle GAN as close as possible 𝐻 𝑌→𝑍 𝐻 Y→X scalar: belongs to scalar: belongs to 𝐸 𝑍 𝐸 𝑌 domain X or not domain Y or not 𝐻 Y→X 𝐻 𝑌→𝑍 as close as possible

  24. Unsupervised Conditional Generation • Approach 1: Direct Transformation For texture or 𝐻 𝑌→𝑍 ? color change Domain X Domain Y • Approach 2: Projection to Common Space 𝐸𝐹 𝑍 𝐹𝑂 𝑌 Face Decoder of Encoder of Domain X Domain Y Attribute domain Y domain X Larger change, only keep the semantics

  25. Projection to Common Space Target 𝐸𝐹 𝑌 𝐹𝑂 𝑌 image image 𝐹𝑂 𝑍 𝐸𝐹 𝑍 image image Face Attribute Domain X Domain Y

  26. Projection to Common Space Training Minimizing reconstruction error 𝐸𝐹 𝑌 𝐹𝑂 𝑌 image image 𝐹𝑂 𝑍 𝐸𝐹 𝑍 image image Domain X Domain Y

  27. Projection to Common Space Training Minimizing reconstruction error Discriminator of X domain 𝐸𝐹 𝑌 𝐹𝑂 𝑌 𝐸 𝑌 image image 𝐸𝐹 𝑍 𝐹𝑂 𝑍 𝐸 𝑍 image image Discriminator Minimizing reconstruction error of Y domain Because we train two auto- encoders separately … The images with the same attribute may not project to the same position in the latent space.

  28. Projection to Common Space Training Minimizing reconstruction error Discriminator of X domain 𝐸𝐹 𝑌 𝐹𝑂 𝑌 𝐸 𝑌 image image 𝐸𝐹 𝑍 𝐹𝑂 𝑍 𝐸 𝑍 image image Discriminator of Y domain Domain 𝐹𝑂 𝑌 and 𝐹𝑂 𝑍 fool the From 𝐹𝑂 𝑌 or 𝐹𝑂 𝑍 Discriminator domain discriminator The domain discriminator forces the output of 𝐹𝑂 𝑌 and 𝐹𝑂 𝑍 have the same distribution. [Guillaume Lample, et al., NIPS, 2017]

  29. Projection to Common Space Training 𝐸𝐹 𝑌 𝐹𝑂 𝑌 𝐸𝐹 𝑍 𝐹𝑂 𝑍 Sharing the parameters of encoders and decoders Couple GAN [Ming-Yu Liu, et al., NIPS, 2016] UNIT [Ming-Yu Liu, et al., NIPS, 2017]

  30. Projection to Common Space Training Minimizing reconstruction error Discriminator of X domain 𝐸𝐹 𝑌 𝐹𝑂 𝑌 𝐸 𝑌 image image 𝐸𝐹 𝑍 𝐹𝑂 𝑍 𝐸 𝑍 image image Discriminator of Y domain Cycle Consistency: Used in ComboGAN [Asha Anoosheh, et al., arXiv, 017]

  31. Projection to Common Space Training To the same Discriminator latent space of X domain 𝐸𝐹 𝑌 𝐹𝑂 𝑌 𝐸 𝑌 image image 𝐸𝐹 𝑍 𝐹𝑂 𝑍 𝐸 𝑍 image image Discriminator of Y domain Semantic Consistency: Used in DTN [Yaniv Taigman, et al., ICLR, 2017] and XGAN [Amélie Royer, et al., arXiv, 2017]

  32. Outline Part I: General Introduction of Generative Adversarial Network (GAN) Part II: Applications to Natural Language Processing Part III: Applications to Speech Processing

  33. Unsupervised Conditional Generation Image Style Transfer Not Paired photos Vincent van Gogh’s paintings Text Style Transfer Not Paired It is bad. It is good. It’s a bad day. It’s a good day. I love you. I don’t love you. positive negative

  34. Cycle GAN as close as possible 𝐻 𝑌→𝑍 𝐻 Y→X scalar: belongs to scalar: belongs to 𝐸 𝑍 𝐸 𝑌 domain X or not domain Y or not 𝐻 Y→X 𝐻 𝑌→𝑍 as close as possible

  35. Cycle GAN as close as possible It is bad. It is good. 𝐻 𝑌→𝑍 𝐻 Y→X It is bad. negative positive negative 𝐸 𝑍 positive sentence? negative sentence? 𝐸 𝑌 I love you. I hate you. 𝐻 Y→X I love you. 𝐻 𝑌→𝑍 positive negative positive as close as possible

  36. Discrete Issue Seq2seq model hidden layer with discrete output It is bad. It is good. 𝐻 𝑌→𝑍 negative positive update 𝐸 𝑍 positive sentence? large network fix Backpropagation

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend