gibbs sampling for lda
play

Gibbs Sampling for LDA Lei Tang Department of CSE Arizona State - PowerPoint PPT Presentation

Gibbs Sampling for LDA Lei Tang Department of CSE Arizona State University January 7, 2008 1 / 10 Graphical Representation , are fixed hyper-parameters. We need to estimate parameters for each document and for each topic. Z are


  1. Gibbs Sampling for LDA Lei Tang Department of CSE Arizona State University January 7, 2008 1 / 10

  2. Graphical Representation α , β are fixed hyper-parameters. We need to estimate parameters θ for each document and φ for each topic. Z are latent variables. This is different from original LDA work. 2 / 10

  3. Property of Dirichlet The expectation of Dirichlet is E ( µ k ) = α k α 0 where α 0 = � α k . 3 / 10

  4. Gibbs Variants 1 Gibbs Sampling Draw a conditioned on b, c Draw b conditioned on a, c Draw c conditioned on a, b 2 Block Gibbs Sampling Draw a, b conditioned on c Draw c conditioned on a,b 3 Collapsed Gibbs Sampling Draw a conditioned on c Draw c conditioned on a b is collopsed out during the sampling process. 4 / 10

  5. Collapsed Sampling for LDA In the original paper “Finding Scientific Topics”, the authors are more interested in text modelling, (find out Z ), hence, the Gibbs sampling procedure boils down to estimate P ( z i = j | z − i , w ) Here, θ , φ are intergrated out. Actually, if we know the exact Z for each document, it’s trivial to estimate θ and φ . P ( z i = j | z − i , w ) ∝ P ( z i = j , z − i , w ) = P ( w i | z i = j , z − i , w − i ) P ( z i = j | z − i , w − i ) = P ( w i | z i = j , z − i , w − i ) P ( z i = j | z − i ) The first term is the likelihood and the 2nd term like a prior. 5 / 10

  6. P ( w i | z i = j , z − i , w − i ) � P ( w i | z i = j , φ ( j ) ) P ( φ ( j ) | z − i , w − i ) d φ ( j ) = � φ ( j ) w i P ( φ ( j ) | z − i , w − i ) d φ ( j ) = P ( φ ( j ) | z − i , w − i ) P ( w − i | φ ( j ) , z − i ) P ( φ j ) ∝ Dirichlet ( β + n ( w ) ∼ − i , j ) Here, n ( w ) − i , j is the number of instances of word w assigned to topic j . Using the property of expectation of Dirichlet distribution, we have n ( w i ) − i , j + β P ( w i | z i = j , z − i , w − i ) = n ( · ) − i , j + W β where n − i , j total number of words assigned to topic j . 6 / 10

  7. P ( w i | z i = j , z − i , w − i ) � P ( w i | z i = j , φ ( j ) ) P ( φ ( j ) | z − i , w − i ) d φ ( j ) = � φ ( j ) w i P ( φ ( j ) | z − i , w − i ) d φ ( j ) = P ( φ ( j ) | z − i , w − i ) P ( w − i | φ ( j ) , z − i ) P ( φ j ) ∝ Dirichlet ( β + n ( w ) ∼ − i , j ) Here, n ( w ) − i , j is the number of instances of word w assigned to topic j . Using the property of expectation of Dirichlet distribution, we have n ( w i ) − i , j + β P ( w i | z i = j , z − i , w − i ) = n ( · ) − i , j + W β where n − i , j total number of words assigned to topic j . 6 / 10

  8. Similarly, for the 2nd term, we have � P ( z i = j | θ ( d ) ) P ( θ ( d ) | z − i ) d θ ( d ) P ( z i = j | z − i ) = P ( θ ( d ) | z − i ) P ( z − i | θ ( d ) ) P ( θ ( d ) ) ∝ Dirichlet ( n ( d ) ∼ − i , j + α ) where n ( d ) − i , j is the number of words assigned to topic j excluding current one. n ( d ) − i , j + α P ( z i = j | z − i ) = n ( d ) − i , · + K α where n ( d ) − i , · is the total number of topics assigned to document d excluding current one. 7 / 10

  9. Algorithm n ( w i ) n ( d ) − i , j + β − i , j + α P ( z i = j | z − i , w ) ∝ n ( · ) n ( d ) − i , j + W β − i , · + K α Need to record four count variables: document-topic count n ( d ) − i , j document-topic sum n ( d ) − i , · (actually a constant) topic-term count n ( w i ) − i , j topic-term sum n ( · ) − i , j 8 / 10

  10. Parameter Estimation To obtain φ , and θ , two ways, (draw one sample of z or draw multiple samples of z to calculate the average) n ( j ) w + β φ j , w = w =1 n ( j ) � V w + V β n ( d ) + α θ ( d ) j = j z =1 n ( d ) � K + K α z where n ( j ) w is the freqency of word assigned to topic j , and n ( d ) is the z number of words assigned to topic z . 9 / 10

  11. Comment Compared with VB, Gibbs Sampling is easy to implement. Easy to extend. More efficient. Faster to obtain good approximation. 10 / 10

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend