hierarchical dirichlet processes
play

Hierarchical Dirichlet Processes AMS 241, Fall 2010 Vadim von - PowerPoint PPT Presentation

Hierarchical Dirichlet Processes AMS 241, Fall 2010 Vadim von Brzeski vvonbrze@ucsc.edu Reference Hierarchical Dirichlet Processes , Y. Teh, M. Jordan, M. Beal, D. Blei, Technical Report 653, Statistics, UC Berkeley, 2004. Also


  1. Hierarchical Dirichlet Processes AMS 241, Fall 2010 Vadim von Brzeski vvonbrze@ucsc.edu

  2. Reference • Hierarchical Dirichlet Processes , Y. Teh, M. Jordan, M. Beal, D. Blei, Technical Report 653, Statistics, UC Berkeley, 2004. – Also published in NIPS 2004 : Sharing Clusters among Related Groups: Hierarchical Dirichlet Processes • Some figures and equations shown here are directly taken from the above references (indicated if so) 2

  3. ✄ ✑ ✑ ✟ ✄ ✜ ☎ ✕ ✟ ☎ ✠ ✑ ✛ ✘ ☎ ✙ ✚ ☎ ✠ ✖ ✠ ✢ ✠ ✝ ✝ ✞ ✟ ✡ ✞ ✠ ☛ ✏ ✍✎ ☎ ✏ ✠ ✟ Source: Teh, 2004. 3 ✒✔✓ The HDP Prior ✡✗✖ ✄✆☎ ✄✆☎ �✂✁ �✌☞

  4. ✔ ✢ ✕ ✖ ✠ ✗ ✠ ✘ ✒ ✙ ✓ ✚ ✛ ✖ ✠ ✗ ✠ ✘ ✒ � ✒ ✜ ✁ ✆ ✠✡ ✆ ☞ ☛ ☛ ✏ ✁ ✞ ✑ ✄ ✁ ✂ ✍ ✂☎✄ ✝✟✞ ✂☎☛ ✌✎✍ Going back to original definition of DP, we can derive relationship between and : 4 Source: Teh, 2004.

  5. ✄ � ✄ ✂ ✁ ✂ ✁ � ✂ ✁ ✆ ☎ ✂ ✁ � ✄ ✄ ✂ ✁ ✁ ✂ ✄ ✄ ✁ ✂ � ✁ ✂ ✄ ✄ 5 ☎✝✆ ☎✝✆ ☎✞✆

  6. � ✁ ✆ ✝✞ ✟✠ ✌ ✍ 6 ✡☞☛ ✂☎✄ G 0 G j

  7. � ✌ ☞ ✍ ✔ ✄ ✓ ✎ ✒ ✍ ✌ ☞ ✎ ✌ ✑ ✁ ✌ ☞ ✌ ☞ ☛ ✡ ✠ ✞✟ ✝ ✄ ✕✗✖ ✍✏✎ ✍✏✎ ✂✄✆☎ G 0

  8. � ✁ ✂ ✠ ☞ ✌ ✍ ✞ ✎ ✏ Prior and Data Model ✄✆☎ ✝✟✞ �☛✡ 8 Source: Teh, 2004.

  9. ✌ ✥ ★ ✥ ✫ ✜ ✦ ✜ ✩✪ ★ ✦✧ ✜ ✣ ✥ ✤ ✣ ✢ ✜ ✛ ✒ ✓ ✩✪ ✜ ✑ ✜ ✭ ✮✯ ✬✭ ✫ ✜ ✦ ✥ ✦ ✦ ✢ ✭ ✮✯ ✬✭ ✧ ✦ ✣ ✥ ✤ ✣ ✦ ✘ ✜ ✠✍ ✓ ✠✒ ✌ ✑ ✏ ✆ ✌ ✎ ✌ ✔ ✄ ☛☞ ✆ ✡ ✠ ✥ ✆ ☎ ✄ ✒ ✒ ✠ ✁ ✆ ✞ ✚ ✙ ✓ ✓ ✒ ✘ ✞ ✓ � ✗ ✕ ✠ ✆ ✞ ✖ ✆ ✕ ✦ Source: Teh, 2004. 9 ✝✟✞ �✂✁

  10. Application : Topic Modeling • Topic = (multinomial) distribution over words – Fixed size vocabulary; p(word | topic) – F : Multinomial kernel, H : Dirichlet() • Document = mixture of one or more topics • Goal = recover latent topics; use topics for clustering, finding related documents, etc. 10

  11. ✖ ✠ ✁ ✖ ✗ ✄ ✠ ✁ ✂ ✕ ✓ � ✌✔ ✓ ✠ ✌✏ ✎ ✠ ☛ � ✁ ✂ ✌ ☞ ✍ ✄✆☎ ✄✡✠ ☛✡✖ ✝✟✞ ✑✟✒ 3 TRUE TOPICS J = 6 docs (80 – 100 words / doc) p = [0.4, 0.3, 0.3] 2 – 3 mix components / doc Σ V (vocabulary size) = 10 11

  12. Inference via Gibbs Sampling 1. 2. 3. 12 Source: Teh, 2004.

  13. � � � � � � � TRUTH : ESTIMATE : For each x ji whose true component was k, we have B MCMC draws: { (1) , (2) ,….., (B) } ji ji ji 1 (B) = (b) ji ji Σ B b 1 k = (B) Σ 13 ji n k

  14. � ✁ Truth vs. Posterior Point and 10/90 Interval Estimates for E[ j | data ] 14 True Estimate j

  15. � ✁ Simulated Data Histograms vs. Est. Posterior Predictive : E[ j0 | data ] (b) via CRP config @ state b. For each doc j : avg (over states b = 1..B) draws of j0 15 Data Est Post. Predictive

  16. Simulated Data Distributions vs. Est. Posterior Predictive for New Observation x j0 Data histogram Data density est. Predictive x 0 16

  17. R Code Available • Works, but SLOOOOOOOOOW…. http://www.numberjack.net/download/classes/ams241/project/R 17

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend