models w latent random variables
play

Models w/ Latent Random Variables Chunting Zhou, Junxian He Site - PowerPoint PPT Presentation

CS11-747 Neural Networks for NLP Models w/ Latent Random Variables Chunting Zhou, Junxian He Site https://phontron.com/class/nn4nlp2019/ Slides from Graham Neubig Discriminative vs. Generative Models Discriminative model: calculate the


  1. CS11-747 Neural Networks for NLP Models w/ Latent Random Variables Chunting Zhou, Junxian He Site https://phontron.com/class/nn4nlp2019/ Slides from Graham Neubig

  2. Discriminative vs. Generative Models • Discriminative model: calculate the probability of output given input P(Y|X) • Generative model: calculate the probability of a variable P(X), or multiple variables P(X,Y) • Which of the following models are discriminative vs. generative? • Standard BiLSTM POS tagger • Globally normalized CRF POS tagger • Language model

  3. Types of Variables • Observed vs. Latent: • Observed: something that we can see from our data, e.g. X or Y • Latent: a variable that we assume exists, but we aren’t given the value • Deterministic vs. Random: • Deterministic: variables that are calculated directly according to some deterministic function • Random (stochastic): variables that obey a probability distribution, and may take any of several (or infinite) values

  4. Quiz: What Types of Variables? • In the an attentional sequence-to-sequence model using MLE/teacher forcing, are the following variables observed or latent? deterministic or random? • The input word ids f • The encoder hidden states h • The attention values a • The output word ids e

  5. Goal of Latent Random Variable Modeling • Specify structural relationships in the context of unknown variables, to learn interpretable structure • Inject inductive bias / prior knowledge

  6. What is Latent Random Variable Model • Older latent variable models • Topic models (unsupervised)

  7. What is Latent Random Variable Model • Older latent variable models • Topic models (unsupervised) • Hidden Markov Model (unsupervised tagger)

  8. What is Latent Random Variable Model • Older latent variable models • Topic models • Hidden Markov Model (unsupervised tagger) • Some tree-structured Model (unsupervised parsing)

  9. Why Latent Random Variable • Specify structure, but interpretable structure is often discrete • There is always a tradeo ff between interpretability and flexibility

  10. What is Latent Random Variable Model • Deep latent variable models • Variational Autoencoders (VAEs) • Generative Adversarial Network (GANs) • Flow-based generative models

  11. Variational Auto-encoders (Kingma and Welling 2014)

  12. A Latent Variable Model • We observed output x (assume a continuous vector for now) • We have a latent variable z generated from a Gaussian • We have a function f, parameterized by Θ that maps from z to x , where this function is usually a neural net z ~ N (0, I) Θ x = f( z ; Θ ) x N

  13. An Example (Goersch 2016) f z x

  14. A Latent Variable Model • We observed output x (assume a continuous vector for now) • We have a latent variable z generated from a Gaussian • We have a function f, parameterized by Θ that maps from z to x , where this function is usually a neural net z ~ N (0, I) Θ x = f( z ; Θ ) x N

  15. What is Our Loss Function? • We would like to maximize the corpus log likelihood X log P ( X ) = log P ( x ; θ ) x ∈ X • For a single example, the marginal likelihood is Z P ( x ; θ ) = P ( x | z ; θ ) P ( z ) d z • We can approximate this by sampling z s then summing X S ( x ) := { z 0 ; z 0 ∼ P ( z ) } P ( x ; θ ) ≈ P ( x | z ; θ ) where z ∈ S ( x )

  16. <latexit sha1_base64="726sRLPU0hZ9Kj5P1KihHSpU9D0=">ACWHicZVBNaxsxENVu03y4bWI7x1xETKEHY9YhkORmXAo5FJCnBi8xmjlWVtEH4s029os+0t6TX5U6J+J1vahTgbEPL15I5eknhMIpegvDzsfdvf2D2qfPXw6P6o3mvTO5TDgRho7TJgDKTQMUKCEYWaBqUTCQ/L4vZo/AbrhNF3uMxgrNhMi1Rwhp6a1I/iGdAYHFj5/9m3JSb0WdaFX0PehuQIts6nbSCM7iqeG5Ao1cMudG3SjDcEsCi6hrMW5g4zxRzaDkYeaKXDjYuW8pF89M6Wpsf5opCv2/42CKacYzr2yam5rVjFojHRtr8K5qlr1zOrulipJ6pdiaxL3RsjmF6OC6GzHEHztY80lxQNrVKiU2GBo1x6wLgV/iuUz5lH2WyYSVdZqsY/3CjF9LSIF2URJ6pYlOU2D2sePO9T7r7N9D0YnHWuOtGv81av4l7n5yQU/KNdMkF6ZFrcksGhJOc/CVP5Dn4FwbhXniwlobBZueYbFXYfAVZ7Zp</latexit> <latexit sha1_base64="726sRLPU0hZ9Kj5P1KihHSpU9D0=">ACWHicZVBNaxsxENVu03y4bWI7x1xETKEHY9YhkORmXAo5FJCnBi8xmjlWVtEH4s029os+0t6TX5U6J+J1vahTgbEPL15I5eknhMIpegvDzsfdvf2D2qfPXw6P6o3mvTO5TDgRho7TJgDKTQMUKCEYWaBqUTCQ/L4vZo/AbrhNF3uMxgrNhMi1Rwhp6a1I/iGdAYHFj5/9m3JSb0WdaFX0PehuQIts6nbSCM7iqeG5Ao1cMudG3SjDcEsCi6hrMW5g4zxRzaDkYeaKXDjYuW8pF89M6Wpsf5opCv2/42CKacYzr2yam5rVjFojHRtr8K5qlr1zOrulipJ6pdiaxL3RsjmF6OC6GzHEHztY80lxQNrVKiU2GBo1x6wLgV/iuUz5lH2WyYSVdZqsY/3CjF9LSIF2URJ6pYlOU2D2sePO9T7r7N9D0YnHWuOtGv81av4l7n5yQU/KNdMkF6ZFrcksGhJOc/CVP5Dn4FwbhXniwlobBZueYbFXYfAVZ7Zp</latexit> <latexit sha1_base64="726sRLPU0hZ9Kj5P1KihHSpU9D0=">ACWHicZVBNaxsxENVu03y4bWI7x1xETKEHY9YhkORmXAo5FJCnBi8xmjlWVtEH4s029os+0t6TX5U6J+J1vahTgbEPL15I5eknhMIpegvDzsfdvf2D2qfPXw6P6o3mvTO5TDgRho7TJgDKTQMUKCEYWaBqUTCQ/L4vZo/AbrhNF3uMxgrNhMi1Rwhp6a1I/iGdAYHFj5/9m3JSb0WdaFX0PehuQIts6nbSCM7iqeG5Ao1cMudG3SjDcEsCi6hrMW5g4zxRzaDkYeaKXDjYuW8pF89M6Wpsf5opCv2/42CKacYzr2yam5rVjFojHRtr8K5qlr1zOrulipJ6pdiaxL3RsjmF6OC6GzHEHztY80lxQNrVKiU2GBo1x6wLgV/iuUz5lH2WyYSVdZqsY/3CjF9LSIF2URJ6pYlOU2D2sePO9T7r7N9D0YnHWuOtGv81av4l7n5yQU/KNdMkF6ZFrcksGhJOc/CVP5Dn4FwbhXniwlobBZueYbFXYfAVZ7Zp</latexit> Variational Inference ≥ ELBO The inequality holds for any q (z|x), but the lower bound is tight only if q(z|x) = p(z|x) p(z|x) is intractable

  17. Practice Prove >= Hint: use Jensen’s inequality

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend