SLIDE 20 10/4/2018 20
LDA: Model (contโd)
39
controls the mixture of topics controls the distribution of words per topic
LDA: Model (contโd)
Given the parameters ๐ฝ and ๐พ, the joint distribution of a topic mixture ๐, a set of ๐ topics ๐จ, and a set of ๐ words ๐ฅ is given by:
๐ ๐, ๐จ, ๐ฅ ๐ฝ, ๐พ = ๐ ๐ ๐ฝ เท
๐=1 ๐
๐ ๐จ๐ ๐ ๐ ๐ฅ๐ ๐จ๐, ๐พ ,
where ๐ ๐จ๐ ๐ is ๐๐ for the unique ๐ such that ๐จ๐
๐ = 1. Integrating over ๐ and summing over ๐จ, we obtain
the marginal distribution of a document:
๐ ๐ฅ ๐ฝ, ๐พ = เถฑ ๐(๐|๐ฝ) เท
๐=1 ๐
เท
๐จ๐
๐ ๐จ๐ ๐ ๐(๐ฅ๐|๐จ๐, ๐พ) ๐๐๐.
Finally, taking the products of the marginal probabilities of single documents, we obtain the probability of a corpus:
๐ ๐ธ ๐ฝ, ๐พ = เท
๐=1 ๐
เถฑ ๐(๐๐|๐ฝ) เท
๐=1 ๐๐
เท
๐จ๐๐
๐ ๐จ๐๐ ๐๐ ๐(๐ฅ๐๐|๐จ๐๐, ๐พ) ๐๐๐.
40