Hierarchical Dirichlet Processes AMS 241, Fall 2010 Vadim von - - PowerPoint PPT Presentation

hierarchical dirichlet processes
SMART_READER_LITE
LIVE PREVIEW

Hierarchical Dirichlet Processes AMS 241, Fall 2010 Vadim von - - PowerPoint PPT Presentation

Hierarchical Dirichlet Processes AMS 241, Fall 2010 Vadim von Brzeski vvonbrze@ucsc.edu Reference Hierarchical Dirichlet Processes , Y. Teh, M. Jordan, M. Beal, D. Blei, Technical Report 653, Statistics, UC Berkeley, 2004. Also


slide-1
SLIDE 1

Hierarchical Dirichlet Processes

AMS 241, Fall 2010 Vadim von Brzeski vvonbrze@ucsc.edu

slide-2
SLIDE 2

2

Reference

  • Hierarchical Dirichlet Processes, Y. Teh,
  • M. Jordan, M. Beal, D. Blei, Technical

Report 653, Statistics, UC Berkeley, 2004.

– Also published in NIPS 2004 : Sharing Clusters among Related Groups: Hierarchical Dirichlet Processes

  • Some figures and equations shown here

are directly taken from the above references (indicated if so)

slide-3
SLIDE 3

3

The HDP Prior

✂✁ ✄✆☎ ✝ ✄✆☎ ✞ ✟ ✠ ✡ ✠ ☛ ✌☞ ✍✎ ☎ ✏ ✠ ✞ ✠ ☎ ☎ ✑ ✟ ✄ ✒✔✓ ☎ ✕ ✑ ✟ ✠ ✑ ✡✗✖ ✘ ☎ ✙ ✚ ✟ ✖ ✛ ✜ ✄ ✏ ✝ ✠ ✢

Source: Teh, 2004.

slide-4
SLIDE 4

4

✂☎✄ ✆ ✝✟✞ ✠✡ ✆ ✂☎☛ ☞ ✌✎✍ ☛ ✏ ✁ ✞ ✑ ✄ ✁ ✂ ✍ ☛ ✒ ✓ ✔ ✕ ✖ ✠ ✗ ✠ ✘ ✒

Source: Teh, 2004.

✙ ✚ ✛ ✖ ✠ ✗ ✠ ✘ ✒

Going back to original definition of DP, we can derive relationship between

and

:

slide-5
SLIDE 5

5

✂ ✄ ✄ ☎✝✆ ✁ ✂
✂ ✄ ✄ ☎✝✆ ✁ ✂ ✄ ✄
✂ ☎ ✆ ✁ ✂
✂ ☎✞✆ ✁ ✂ ✄ ✄
slide-6
SLIDE 6

6

✂☎✄ ✆ ✝✞ ✟✠ ✡☞☛ ✌ ✍

G0 Gj

slide-7
SLIDE 7
✂✄✆☎ ✄ ✝ ✞✟ ✠ ✡ ☛ ☞ ✌ ✍✏✎ ☞ ✌ ✍✏✎ ✑ ✌ ✌ ✎ ☞ ✌ ✍ ✒ ✎ ✓ ✄ ✔ ✕✗✖ ✍ ☞

G0

slide-8
SLIDE 8

8

Prior and Data Model

✂ ✄✆☎ ✝✟✞ ✠ ☛✡ ☞ ✌ ✍ ✞ ✎ ✏

Source: Teh, 2004.

slide-9
SLIDE 9

9

✂✁ ✄ ☎ ✆ ✝✟✞ ✠ ✡ ✆ ☛☞ ✄ ✌ ✠✍ ✎ ✌ ✆ ✏ ✑ ✌ ✠✒ ✓ ✒ ✔ ✒ ✓ ✕ ✆ ✖ ✞ ✆ ✠ ✕ ✗
✁ ✘ ✒ ✓ ✓ ✙ ✚ ✞ ✆ ✠ ✘ ✑ ✌ ✓ ✒ ✛ ✜ ✢ ✣ ✤ ✥ ✣ ✦✧ ✥ ★ ✩✪ ✜ ✦ ✜ ✫ ✥ ★ ✩✪ ✜ ✜ ✢ ✣ ✤ ✥ ✣ ✦ ✧ ✬✭ ✮✯ ✭ ✦ ✜ ✦ ✥ ✦ ✜ ✫ ✬✭ ✮✯ ✭ ✦ ✜ ✦ ✥

Source: Teh, 2004.

slide-10
SLIDE 10

10

Application : Topic Modeling

  • Topic = (multinomial) distribution over words

– Fixed size vocabulary; p(word | topic) – F : Multinomial kernel, H : Dirichlet()

  • Document = mixture of one or more topics
  • Goal = recover latent topics; use topics for clustering,

finding related documents, etc.

slide-11
SLIDE 11

11

Σ

p = [0.4, 0.3, 0.3]

J = 6 docs (80 – 100 words / doc) 2 – 3 mix components / doc V (vocabulary size) = 10

3 TRUE TOPICS

✂ ✄✆☎ ✝✟✞ ✄✡✠ ☛ ☞ ✌ ✍ ✠ ✎ ✠ ✌✏ ✠ ✑✟✒ ✓ ✌✔
✕ ✂ ☛✡✖ ✁ ✠ ✄ ✗ ✖ ✁ ✖
slide-12
SLIDE 12

12

Inference via Gibbs Sampling

1. 2. 3.

Source: Teh, 2004.

slide-13
SLIDE 13

13

TRUTH :

For each xji whose true component was k, we have B MCMC draws:

{

  • ji

(1),

  • ji

(2),…..,

  • ji

(B)}

  • ji

(B) =

Σ

b

  • ji

(b)

1 B

ESTIMATE :

  • k =

Σ

1 nk

  • ji

(B)

slide-14
SLIDE 14

14

Truth vs. Posterior Point and 10/90 Interval Estimates for E[

  • j | data ]

True

j

Estimate

slide-15
SLIDE 15

15

Simulated Data Histograms vs. Est. Posterior Predictive : E[

  • j0 | data ]

For each doc j : avg (over states b = 1..B) draws of

j0 (b) via CRP config @ state b.

Data Est Post. Predictive

slide-16
SLIDE 16

16

Simulated Data Distributions vs. Est. Posterior Predictive for New Observation xj0

Data histogram Data density est. Predictive x0

slide-17
SLIDE 17

17

R Code Available

  • Works, but SLOOOOOOOOOW….

http://www.numberjack.net/download/classes/ams241/project/R