latent lstm allocation
play

Latent LSTM Allocation Manzil Zaheer, Amr Ahmed and Alexander J - PowerPoint PPT Presentation

Latent LSTM Allocation Manzil Zaheer, Amr Ahmed and Alexander J Smola Presented by Akshay Budhkar & Krishnapriya Vishnubhotla March 3, 2018 Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya


  1. Latent LSTM Allocation Manzil Zaheer, Amr Ahmed and Alexander J Smola Presented by Akshay Budhkar & Krishnapriya Vishnubhotla March 3, 2018 Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla) Latent LSTM Allocation March 3, 2018 1 / 22

  2. Outline Introduction 1 Latent Dirichlet Allocation LSTMs Latent LSTM Allocation 2 Algorithm Inference Different Models Results 3 Conclusion 4 Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla) Latent LSTM Allocation March 3, 2018 2 / 22

  3. Latent Dirichlet Allocation Probabilistic graphical model Not sequential, but easily interpretable. Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla) Latent LSTM Allocation March 3, 2018 3 / 22

  4. LSTMs Good for modeling sequential data, preserves temporal aspect Too many parameters Hard to interpret Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla) Latent LSTM Allocation March 3, 2018 4 / 22

  5. Latent LSTM Allocation (LLA) - Algorithm Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla) Latent LSTM Allocation March 3, 2018 5 / 22

  6. Graphical model for LLA Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla) Latent LSTM Allocation March 3, 2018 6 / 22

  7. Marginal probability of observing a document is � p ( w d | LSTM , φ ) = p ( w d , z d | LSTM , φ ) z d (1) � � = p ( w d , t | z d , t ; φ ) p ( z d , t | z d , 1: t − 1 ; LSTM ) z d t Uses a K × H dense matrix and a V × K sparse matrix. Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla) Latent LSTM Allocation March 3, 2018 7 / 22

  8. Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla) Latent LSTM Allocation March 3, 2018 8 / 22

  9. Inference Stochastic Expectation Maximization is used to compute the posterior. The Evidence Lower Bound (ELBO) can be written as: � log p ( w d | LSTM , φ ) d (2) q ( z ) log p ( z d ; LSTM ) � t p ( w d , t | z d , t ; φ ) � � ≥ q ( z d ) z d d Conditional probability of topic at time step t is: p ( z d , t = k | w d , t , z d , 1: t − 1 | LSTM , φ ) (3) ∝ p ( z d , t = k | z d , 1: t ; LSTM ) p ( w d , t | z d , t = k ; φ ) And p ( w d , t | z d , t = k ; φ ) = φ w , k = n w , k + β (4) n k + V β Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla) Latent LSTM Allocation March 3, 2018 9 / 22

  10. Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla) Latent LSTM Allocation March 3, 2018 10 / 22

  11. Mathematical Intuition LDA � log p ( w ) = log p ( w t | model ) t (5) � � = log p ( w t | z t ) p ( z t | doc ) t z t LSTM � log p ( w ) = log p ( w t | w t − 1 , w t − 2 , . . . , w 1 ) (6) t LLA � � log p ( w ) = log p ( w t | z t ) p ( z t | z t − 1 , z t − 2 , . . . , z 1 ) (7) z 1: T t Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla) Latent LSTM Allocation March 3, 2018 11 / 22

  12. Different Models Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla) Latent LSTM Allocation March 3, 2018 12 / 22

  13. Perplexity vs. Number of topics (Wikipedia) Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla) Latent LSTM Allocation March 3, 2018 13 / 22

  14. Perplexity vs. Number of topics (User Search) Cannot use Char LLA, since URLs lack morphological structure Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla) Latent LSTM Allocation March 3, 2018 14 / 22

  15. LDA Ablation Study Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla) Latent LSTM Allocation March 3, 2018 15 / 22

  16. Interpreting Cleaner Topics Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla) Latent LSTM Allocation March 3, 2018 16 / 22

  17. Interpreting Factored Topics Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla) Latent LSTM Allocation March 3, 2018 17 / 22

  18. LSTM Topic Embedding (Wikipedia) Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla) Latent LSTM Allocation March 3, 2018 18 / 22

  19. Convergence Speed Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla) Latent LSTM Allocation March 3, 2018 19 / 22

  20. Effect of Joint vs. Independent Training Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla) Latent LSTM Allocation March 3, 2018 20 / 22

  21. Final Thoughts Pros Provides a knob for interpretability and accuracy Less number of parameters for a reasonable perplexity Cleaner factored topics Cons Did not compare to something like hierarchical LDA Can’t use Char LLA for every problem Perplexity is not a good measure of text generation accuracy Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla) Latent LSTM Allocation March 3, 2018 21 / 22

  22. Bibliography Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine Learning research , 3(Jan):993–1022. Galley, M., Brockett, C., Sordoni, A., Ji, Y., Auli, M., Quirk, C., Mitchell, M., Gao, J., and Dolan, B. (2015). deltableu: A discriminative metric for generation tasks with intrinsically diverse targets. arXiv preprint arXiv:1506.06863 . Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory. Neural computation , 9(8):1735–1780. Zaheer, M., Ahmed, A., and Smola, A. J. (2017). Latent lstm allocation: Joint clustering and non-linear dynamic modeling of sequence data. In International Conference on Machine Learning , pages 3967–3976. Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla) Latent LSTM Allocation March 3, 2018 22 / 22

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend