GANs + Final practice questions Lecture 23 CS 753 Instructor: - PowerPoint PPT Presentation

GANs +   Final practice questions Lecture 23 CS 753 Instructor: Preethi Jyothi

Final Exam Syllabus 1. WFST algorithms/WFSTs used in ASR 2. HMM algorithms/EM/Tied state Triphone models 3. DNN-based acoustic models 4. N-gram/Smoothing/RNN language models 5. End-to-end ASR (CTC, LAS, RNN-T) 6. MFCC feature extraction 7. Search & Decoding 8. HMM-based speech synthesis models 9. Multilingual ASR 10. Speaker Adaptation 11. Discriminative training of HMMs Questions can be asked on any of the 11 topics listed above. You will be allowed a single A-4 cheat sheet of handwritten notes ; content on both sides permitted.

Final Project Deliverables 4-5 page final report: • ✓ Task definition, Methodology, Prior work, Implementation Details, Experimental Setup, Experiments and Discussion, Error Analysis (if any), Summary Short talk summarizing the project: • ✓ Each team will get 8-10 minutes for their presentation   ≈ and � 5 minutes for Q/A ✓ Clearly demarcate which team member worked on what part

Final Project Grading Break-up of 20 points: • 6 points for the report • 4 points for the presentation • 6 points for Q/A • 4 points for overall evaluation of the project •

Final Project Schedule Presentations will be held on Nov 23rd and Nov 24th • The final report in pdf format should be sent to • pjyothi@cse.iitb.ac.in before Nov 24th The order of presentations will be decided on a lottery basis • and shared via Moodle before Nov 9th

Generative Adversarial Networks (GANs) D ( x ) Training process is formulated as a • game between a generator network Discriminator and a discriminative network Objective of the generator: Create • samples that seem to be from the same distribution as the training x real x = G ( z ) data Objective of the discriminator: • Generator Examine a generated sample and distinguish between fake or real samples • The generator tries to fool the discriminator network Z

<latexit sha1_base64="7VI+8ZF3gmwRBJn+eQLkMiseN2U=">ACHicbVDLSsNAFJ3UV62vqEsXDhahgpSkCrosWqgLFxXsA5oQJtNpO3QyCTMTsYQu3fgrblwo4tZPcOfOGmz0OqBYQ7n3Mu9/gRo1JZ1peRW1hcWl7JrxbW1jc2t8ztnZYMY4FJE4csFB0fScIoJ01FSOdSBAU+Iy0/dFl6rfviJA05LdqHBE3QANO+xQjpSXP3HcCdO/VoRNQ7tX0h9QI5ZcT0r1Y1g78syiVbamgH+JnZEiyNDwzE+nF+I4IFxhqTs2lak3AQJRTEjk4ITSxIhPEID0tWUo4BIN5keMoGHWunBfij04wpO1Z8dCQqkHAe+rkwXlfNeKv7ndWPVP3cTyqNYEY5ng/oxgyqEaSqwRwXBio01QVhQvSvEQyQVjq7g7Bnj/5L2lVyvZJuXJzWqxeZHkwR4ACVgzNQBVegAZoAgwfwBF7Aq/FoPBtvxvusNGdkPbvgF4yPb3ZxmFI=</latexit> <latexit sha1_base64="VTwlHXywjOChz5LfJXs73m4kjK8=">ACU3icbVFda9RAFJ2kVetq7aqPvlxchATtklTBvghFV+qDxXctrAJy2T27u7YySTM3LS7DfmPIvjgH/HFB539QGzrgYHDOecyd85kpZKWouiH529s3rp9Z+tu69797Qc7YePjm1RGYF9UajCnGbcopIa+yRJ4WlpkOeZwpPs7N3CPzlHY2WhP9O8xDTnEy3HUnBy0rD9JSGcUX0xRYPQJzmgqu6o9NcPgCeiG8Wkmr983w3oGidTQa2Cwm6hiAr1gFqbw/Erm8q8bxLDrIofBZRiG6bDdibrREnCTxGvSYWscDdvfklEhqhw1CcWtHcRSWnNDUmhsGklcWSizM+wYGjmudo03rZSQPnDKCcWHc0QRL9d+JmufWzvPMJRe72+veQvyfN6hovJ/WUpcVoRari8aVAipgUTCMpEFBau4IF0a6XUFMueGC3De0XAnx9SfJMd73fhld+/Tq87B23UdW+wJe8oCFrPX7IB9YEeszwT7yn6y3x7zvnu/fN/fXEV9bz3zmF2Bv/0H1zCvmQ=</latexit> Generative Adversarial Networks max G min D L ( G, D ) where L ( G, D ) = E x ∈ D [ − log D ( x )] + E z [ − log(1 − D ( G ( z )))] • Cost function of the generator is the opposite of the discriminator’s • Minimax game: The generator and discriminator are playing a zero-sum game against each other

Training Generative Adversarial Networks for number of training iterations do for k steps do • Sample minibatch of m noise samples { z (1) , . . . , z ( m ) } from noise prior p g ( z ) . • Sample minibatch of m examples { x (1) , . . . , x ( m ) } from data generating distribution p data ( x ) . • Update the discriminator by ascending its stochastic gradient: m 1 h ⇣ x ( i ) ⌘ ⇣ ⇣ ⇣ z ( i ) ⌘⌘⌘i X log D + log 1 � D G . r θ d m i =1 end for • Sample minibatch of m noise samples { z (1) , . . . , z ( m ) } from noise prior p g ( z ) . • Update the generator by descending its stochastic gradient: m 1 ⇣ ⇣ ⇣ z ( i ) ⌘⌘⌘ X log 1 � D G . r θ g m i =1 end for The gradient-based updates can use any standard gradient-based learning rule. We used momen- tum in our experiments. Image from [Goodfellow16]: https://arxiv.org/pdf/1701.00160.pdf

<latexit sha1_base64="axrtjiZYFfZltVUN2PIU2Ef0eo=">ACKnicbVDLSgMxFM3UV62vqks3wSK0oGWmCroRqrbUhUgF+4CZoWTStA3NPEgyQjvM97jxV9x0oRS3fojpQ9DqgcDJOfdy7z1OwKiQuj7WEkvLK6tryfXUxubW9k56d68u/JBjUsM+83nTQYIw6pGapJKRZsAJch1Gk7/ZuI3ngX1Pce5SAgtou6Hu1QjKSWukry0WyhxGL7uJWNP1wN6qU7+M4WzmGpRy8hN9qWVUMY2ieWMzvwlK2kh3mcnYrndHz+hTwLzHmJAPmqLbSI6vt49AlnsQMCWEaeiDtCHFJMSNxygoFCRDuoy4xFfWQS4QdTU+N4ZFS2rDjc/U8Cafqz4IuUIMXEdVTrYWi95E/M8zQ9m5sCPqBaEkHp4N6oQMSh9OcoNtygmWbKAIwpyqXSHuIY6wVOmVAjG4sl/Sb2QN07zhYezTPF6HkcSHIBDkAUGOAdFcAuqoAYweAav4A28ay/aSBtrH7PShDbv2Qe/oH1+ASE1pcM=</latexit> <latexit sha1_base64="LGWrBEXN/oGBn/UGMdYvD/mje+0=">ACL3icbVDLSgMxFM34rPVdekmWIQpaJmpgm6Eoq1IVLBPqAdSibNtKGZB0lGaIf5Izf+Sjcirj1L0wfgrYeCJycy/3mMHjApGK/awuLS8spqYi25vrG5tZ3a2a0KP+SYVLDPfF63kSCMeqQiqWSkHnCXJuRmt27Gvm1R8IF9b0H2Q+I5aKORx2KkVRSK3XdJHsYsSi27gVjT/cjUrFuzjWS0ewkIEX8EctqopBDBtN5negbsJjWNBL+iCTyVitVNrIGmPAeWJOSRpMUW6lhs2j0OXeBIzJETDNAJpRYhLihmJk81QkADhHuqQhqIecomwovG9MTxUShs6PlfPk3Cs/u6IkCtE37V5Wh1MeuNxP+8RidcyuiXhBK4uHJICdkUPpwFB5sU06wZH1FEOZU7QpxF3GEpYo4qUIwZ0+eJ9Vc1jzJ5u5P0/nLaRwJsA8OgA5McAby4AaUQVg8ASG4A28a8/ai/ahfU5KF7Rpzx74A+3rG3KWprc=</latexit> Better objective for the generator Problem of saturation: If the • generated sample is really poor, the generator’s cost is relatively flat Original cost • L GEN ( G, D ) = E z [log(1 − D ( G ( z )))] Modified cost • L GEN ( G, D ) = E z [ − log D ( G ( z ))]

Large (& growing!) list of GANs Image from https://github.com/hindupuravinash/the-gan-zoo

D ( x ) Conditional GANs x real • Generator and discriminator x = G ( z ) receive some additional conditioning information C Z

Image-to-image Translation using C-GANs { } Labels to Street Scene Labels to Facade BW to Color input output Aerial to Map input output input output Day to Night Edges to Photo input output input output input output Image from Isola et al., CVPR 2017, https://arxiv.org/pdf/1611.07004.pdf

Text-to-Image Synthesis this small bird has a pink this magnificent fellow is breast and crown, and black almost all black with a red primaries and secondaries. crest, and white cheek patch. this white and yellow flower the flower has petals that are bright pinkish purple have thin white petals and a with white stigma round yellow stamen Image from Reed et al., ICML 2016, https://arxiv.org/pdf/1605.05396.pdf

Text-to-Image Synthesis This flower has small, round violet This flower has small, round violet petals with a dark purple center petals with a dark purple center Generator Network Discriminator Network Image from Reed et al., ICML 2016, https://arxiv.org/pdf/1605.05396.pdf

Three Speech Applications of GANs

GANs for speech synthesis Generator Discriminator: • produces   synthesised speech Binary OR classifier which the Discriminator distinguishes from real speech Linguistic features Natural samples During synthesis, a • Generator: MSE AND random noise + linguistic features Noise Predicted samples generates speech Image from Yang et al., “SPSS using GANs”, 2017

SEGAN: GANs for speech enhancement Enhancement: Given an input noisy • ˜ x signal � , we want to clean it to obtain an x enhanced signal � ˜ x z Generator G will take both � and � as • inputs; G is fully convolutional Image from https://arxiv.org/pdf/1703.09452.pdf

Voice Conversion Using Cycle-GANs Image from https://arxiv.org/abs/1711.11293

Practice Questions

�   HMM 101 A water sample collected from Powai lake is either Clean or Polluted. However, this information is hidden from us and all we can observe is whether the water is muddy, clear, odorless or cloudy. We start at time step 1 in the Clean state. The HMM below models this problem. Let qt and Ot denote the state and observation at time step t, respectively. 0.2 O 2 a)What is P( � = clear)? 0.8 ∣ O 2 q 2 b)What is P( � = Clean � = clear)? Clean   Polluted   Pr(muddy) = 0.5   O 200 Pr(muddy) = 0.1   c)What is P( � = cloudy)? Pr(clear) = 0.1 Pr(clear) = 0.5 Pr(odorless) = 0.2 Pr(odorless) = 0.2 Pr(cloudy) = 0.2 Pr(cloudy) = 0.2 d)What’s the most likely sequence of states for the following observation O 1 O 2 sequence: { � = clear, � = clear,   0.8 O 3 O 4 O 5 = clear, � = clear, � = clear}? 0.2

  HMM 101 Say that we are now given a modified HMM for the water samples as shown below. Initial probabilities and transition probabilities are shown next to the arcs. (Note: You do not need to use the Viterbi algorithm to answer the next two questions.) a) What is the most likely sequence of 0.9 0.9 states given a sequence of three observations: {muddy, muddy, muddy}? 0.1 0.01 0.99 Clean   Polluted   Pr(muddy) = 0.51   Pr(muddy) = 0.49   b) Say we observe a very long Pr(clear) = 0.49 Pr(clear) = 0.51 0.1 sequence of “muddy” (e.g. 10 million “muddy” in a row). What happens to the most likely state sequence then?

GANs + Final practice questions Lecture 23 CS 753 Instructor: - PowerPoint PPT Presentation

GANs + Final practice questions Lecture 23 CS 753 Instructor: Preethi Jyothi Final Exam Syllabus 1. WFST algorithms/WFSTs used in ASR 2. HMM algorithms/EM/Tied state Triphone models 3. DNN-based acoustic models 4. N-gram/Smoothing/RNN

Entropic GANs meet VAEs: A Statistical Approach to Compute Sample Likelihoods in GANs Yogesh

GANs for Word Embeddings Akshay Budhkar and Krishnapriya Introduction GANs have shown incredible

Advanced Section #8: Generative Adversarial Networks (GANs) CS109B Data Science 2 Vincent Casser

Reading group: Latent Optimized GANs (Game theory brings guns to GANs) Michal Sustr Dept. of

Bregman and Wasserstein, with Applications to Generative Adversarial Networks (GANs) and beyond

GANs, Optimal Transport, and Implicit Distribution Estimation Tengyuan Liang Econometrics and

CSCE 2004 - Practice Final Practice final exam questions: You are encouraged to work on these

Final Budget 4-30-2019 Page 1 of 45 Final Budget 4-30-2019 Page 2 of 45 Final Budget 4-30-2019

Lecture 20: GANS CS109B Data Science 2 Pavlos Protopapas and Mark Glickman 1 Outline Review of

On Minimax Optimality of GANs for Robust Mean Estimation Kaiwen Wu 1,2 With Gavin Weiguang Ding 3

GAN Frontiers/Related Methods Improving GAN Training Improved Techniques for Training GANs

Intro Tutorial on GANs Michela Paganini Fermilab Machine Learning Group Meeting March 21, 2018

Final Selected Abstracts Final Selected Abstracts Final Selected Abstracts Final Selected

Final Review Introduction to Web Design Final exam on Thursday, December 19 at 12:00 p.m. Final

Lecture 40 final exam review Mark Hasegawa-Johnson 5/6/2020 Some sample problems DNNs:

Bridging Theory and Practice of GANs MIX+GAN Ian Goodfellow, Sta ff Research Scientist, Google

CoFiGAN: Collaborative Filtering by Generative and Discriminative Training for One-Class

Machine Learning Lecture 13: Generative Adversarial Networks (I) Nevin L. Zhang

Deep Learning Techniques for Music Generation Compound and GAN (6) Jean-Pierre Briot

Sample Complexity and Expressiveness Roi Livni and Yishay Mansour Discrimination A

Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling

Generative modeling - Electromagnetic shower of a calorimeter Paul KLEIN 24 th of April 2017

Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context Zhen Yang,Wei Chen,

EMNLP | 2020 SeqMix: Augmenting Active Sequence Labeling via Sequence Mixup Rongzhi Zhang, Yue