Latent LSTM Allocation Manzil Zaheer, Amr Ahmed and Alexander J - - PowerPoint PPT Presentation

latent lstm allocation
SMART_READER_LITE
LIVE PREVIEW

Latent LSTM Allocation Manzil Zaheer, Amr Ahmed and Alexander J - - PowerPoint PPT Presentation

Latent LSTM Allocation Manzil Zaheer, Amr Ahmed and Alexander J Smola Presented by Akshay Budhkar & Krishnapriya Vishnubhotla March 3, 2018 Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya


slide-1
SLIDE 1

Latent LSTM Allocation

Manzil Zaheer, Amr Ahmed and Alexander J Smola

Presented by Akshay Budhkar & Krishnapriya Vishnubhotla

March 3, 2018

Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla) Latent LSTM Allocation March 3, 2018 1 / 22

slide-2
SLIDE 2

Outline

1

Introduction Latent Dirichlet Allocation LSTMs

2

Latent LSTM Allocation Algorithm Inference Different Models

3

Results

4

Conclusion

Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla) Latent LSTM Allocation March 3, 2018 2 / 22

slide-3
SLIDE 3

Latent Dirichlet Allocation

Probabilistic graphical model Not sequential, but easily interpretable.

Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla) Latent LSTM Allocation March 3, 2018 3 / 22

slide-4
SLIDE 4

LSTMs

Good for modeling sequential data, preserves temporal aspect Too many parameters Hard to interpret

Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla) Latent LSTM Allocation March 3, 2018 4 / 22

slide-5
SLIDE 5

Latent LSTM Allocation (LLA) - Algorithm

Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla) Latent LSTM Allocation March 3, 2018 5 / 22

slide-6
SLIDE 6

Graphical model for LLA

Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla) Latent LSTM Allocation March 3, 2018 6 / 22

slide-7
SLIDE 7

Marginal probability of observing a document is p(wd|LSTM, φ) =

  • zd

p(wd, zd|LSTM, φ) =

  • zd
  • t

p(wd,t|zd,t; φ)p(zd,t|zd,1:t−1; LSTM) (1) Uses a K × H dense matrix and a V × K sparse matrix.

Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla) Latent LSTM Allocation March 3, 2018 7 / 22

slide-8
SLIDE 8

Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla) Latent LSTM Allocation March 3, 2018 8 / 22

slide-9
SLIDE 9

Inference

Stochastic Expectation Maximization is used to compute the posterior. The Evidence Lower Bound (ELBO) can be written as:

  • d

log p(wd|LSTM, φ) ≥

  • d
  • zd

q(z) log p(zd; LSTM)

t p(wd,t|zd,t; φ)

q(zd) (2) Conditional probability of topic at time step t is: p(zd,t = k|wd,t, zd,1:t−1|LSTM, φ) ∝ p(zd,t = k|zd,1:t; LSTM)p(wd,t|zd,t = k; φ) (3) And p(wd,t|zd,t = k; φ) = φw,k = nw,k + β nk + V β (4)

Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla) Latent LSTM Allocation March 3, 2018 9 / 22

slide-10
SLIDE 10

Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla) Latent LSTM Allocation March 3, 2018 10 / 22

slide-11
SLIDE 11

Mathematical Intuition

LDA log p(w) =

  • t

log p(wt|model) =

  • t

log

  • zt

p(wt|zt)p(zt|doc) (5) LSTM log p(w) =

  • t

log p(wt|wt−1, wt−2, . . . , w1) (6) LLA log p(w) = log

  • z1:T
  • t

p(wt|zt)p(zt|zt−1, zt−2, . . . , z1) (7)

Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla) Latent LSTM Allocation March 3, 2018 11 / 22

slide-12
SLIDE 12

Different Models

Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla) Latent LSTM Allocation March 3, 2018 12 / 22

slide-13
SLIDE 13

Perplexity vs. Number of topics (Wikipedia)

Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla) Latent LSTM Allocation March 3, 2018 13 / 22

slide-14
SLIDE 14

Perplexity vs. Number of topics (User Search)

Cannot use Char LLA, since URLs lack morphological structure

Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla) Latent LSTM Allocation March 3, 2018 14 / 22

slide-15
SLIDE 15

LDA Ablation Study

Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla) Latent LSTM Allocation March 3, 2018 15 / 22

slide-16
SLIDE 16

Interpreting Cleaner Topics

Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla) Latent LSTM Allocation March 3, 2018 16 / 22

slide-17
SLIDE 17

Interpreting Factored Topics

Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla) Latent LSTM Allocation March 3, 2018 17 / 22

slide-18
SLIDE 18

LSTM Topic Embedding (Wikipedia)

Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla) Latent LSTM Allocation March 3, 2018 18 / 22

slide-19
SLIDE 19

Convergence Speed

Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla) Latent LSTM Allocation March 3, 2018 19 / 22

slide-20
SLIDE 20

Effect of Joint vs. Independent Training

Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla) Latent LSTM Allocation March 3, 2018 20 / 22

slide-21
SLIDE 21

Final Thoughts

Pros

Provides a knob for interpretability and accuracy Less number of parameters for a reasonable perplexity Cleaner factored topics

Cons

Did not compare to something like hierarchical LDA Can’t use Char LLA for every problem Perplexity is not a good measure of text generation accuracy

Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla) Latent LSTM Allocation March 3, 2018 21 / 22

slide-22
SLIDE 22

Bibliography

Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine Learning research, 3(Jan):993–1022. Galley, M., Brockett, C., Sordoni, A., Ji, Y., Auli, M., Quirk, C., Mitchell, M., Gao, J., and Dolan, B. (2015). deltableu: A discriminative metric for generation tasks with intrinsically diverse targets. arXiv preprint arXiv:1506.06863. Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8):1735–1780. Zaheer, M., Ahmed, A., and Smola, A. J. (2017). Latent lstm allocation: Joint clustering and non-linear dynamic modeling of sequence data. In International Conference on Machine Learning, pages 3967–3976.

Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla) Latent LSTM Allocation March 3, 2018 22 / 22