Admixture of Poisson MRFs (APM): A Topic Model with Word - - PowerPoint PPT Presentation

admixture of poisson mrfs apm a topic model with word
SMART_READER_LITE
LIVE PREVIEW

Admixture of Poisson MRFs (APM): A Topic Model with Word - - PowerPoint PPT Presentation

Admixture of Poisson MRFs (APM): A Topic Model with Word Dependencies David Inouye*, Pradeep Ravikumar, Inderjit Dhillon Tuesday, June 24, 2014 * Presenter David Inouye*, Pradeep Ravikumar, Inderjit Dhillon Admixture of Poisson MRFs (ICML


slide-1
SLIDE 1

Admixture of Poisson MRFs (APM): A Topic Model with Word Dependencies

David Inouye*, Pradeep Ravikumar, Inderjit Dhillon Tuesday, June 24, 2014

* Presenter

David Inouye*, Pradeep Ravikumar, Inderjit Dhillon Admixture of Poisson MRFs (ICML 2014, Beijing, China)

slide-2
SLIDE 2

Possible applications:

  • Topic Visualization
  • Corpus Summarization
  • Word Sense Disambiguation
  • Semantic Similarity
  • Document Classification

“Fine Arts” theater music plays novels life “Fine Arts” theater music plays novels life Document Corpus

  • Doc. 4 - Music is an art form

whose medium is sound and silence…

  • Doc. 4 - Music is an art form

whose medium is sound and silence…

  • Doc. 4 - Music is an art form

whose medium is sound and silence…

  • Doc. 4 - Music is an art form

whose medium is sound and silence…

  • Doc. 4 - Music is an art form

whose medium is sound and silence…

  • Doc. 4 - Music is an art form

whose medium is sound and silence…

  • Doc. 4 - Music is an art form

whose medium is sound and silence…

  • Doc. 4 - Music is an art form

whose medium is sound and silence…

  • Doc. 4 - Music is an art form

whose medium is sound and silence…

  • Doc. 1 - Nuclear power, or

nuclear energy, is the use of exothermic nuclear processes...

  • Doc. 2 - Theatre or theater is a

collaborative form of fine art that uses live performers ...

  • Doc. 3 - A temperature is a

numerical measure of hot or cold. Its measurement is….

  • Doc. 4 - Music is an art form

whose medium is sound and silence…

  • Doc. 4 - Music is an art form

whose medium is sound and silence…

  • Doc. 4 - Music is an art form

whose medium is sound and silence…

Dependent Model Admixture of Poisson MRFs k Multinomials Independent Models LDA, PLSA, SAM, etc. “Temperature” nuclear heat sun temperature soviet “Fine Arts” theater music plays novels life k Poisson MRFs

◮ Previous topic models assume independence between words. ◮ An Admixture of Poisson MRFs (APM), however, explicitly

models word dependencies.

David Inouye*, Pradeep Ravikumar, Inderjit Dhillon Admixture of Poisson MRFs (ICML 2014, Beijing, China)

slide-3
SLIDE 3

Main Contributions

  • 1. Generalized Admixtures
  • 2. (Background) Poisson MRF [Yang et al. 2012])

◮ Poisson MRFs in the context of LDA ◮ Novel conjugate prior for a Poisson MRF

  • 3. Admixture of Poisson MRFs (APM)
  • 4. Tractable MAP parameter estimation

David Inouye*, Pradeep Ravikumar, Inderjit Dhillon Admixture of Poisson MRFs (ICML 2014, Beijing, China)

slide-4
SLIDE 4

Formalizing Generalized Admixtures

◮ Mixtures - Draws from single

component distribution. (Top)

◮ Admixtures - Draws from a

distribution whose parameters are a convex combination of component

  • parameters. (Bottom)

Pr

Admix.(x | w, Φ) = Pr Base

“ x ˛ ˛ ˛ ¯ φ = Ψ−1h

k

X

j=1

wjΨ(φj) i”

◮ Examples of different Ψ

Pr

Admix.(x | w, λ1...k) = Pr Poiss.

“ x ˛ ˛ ˛ ¯ λ =

k

X

j=1

wjλj” Pr

Admix.(x | w, λ1...k) = Pr Poiss.

“ x ˛ ˛ ˛ ¯ λ = exp `

k

X

j=1

wjln(λj) ´”

x2 x1 "Documents" Mixture Components x2 x1 Dense "Topic" Sparse "Document" Dense "Document” Sparse "Topic"

David Inouye*, Pradeep Ravikumar, Inderjit Dhillon Admixture of Poisson MRFs (ICML 2014, Beijing, China)

slide-5
SLIDE 5

Examples of Admixture Models

  • 1. LDA [Blei et al. 2003]

◮ LDA is an admixture of Multinomials

(i.e. Mult(p1), Mult(p2), · · · , Mult(pk))

◮ Dirichlet prior over p1...k

  • 2. Population Admixtures

◮ Equivalent model to LDA in genetics [Pritchard et al. 2000] ◮ Admixture term comes from genetics literature ◮ Original ancestors of population correspond to “topics” ◮ Individuals of a population correspond to “documents”

  • 3. Spherical Admixture Model [Reisinger et al. 2010]

◮ Von Mises-Fisher base distribution

(an independent Gaussian analog on unit hypersphere)

◮ Von Mises-Fisher priors David Inouye*, Pradeep Ravikumar, Inderjit Dhillon Admixture of Poisson MRFs (ICML 2014, Beijing, China)

slide-6
SLIDE 6

Background: Poisson MRFs [Yang et al., 2012]

P(A | B, C) P(B | A, C) P(C | A, B) P(A, B, C) ?? If we assume the node conditional distributions are Poisson, does there exist a joint MRF distribution that has these conditionals?

◮ Poisson MRF joint distribution:

Pr

PMRF(x | θ, Θ) ∝ exp

  • θTx + xTΘx −

p

  • s=1

ln(xs!)

  • .

◮ Node conditionals are 1-D Poissons:

Pr(xs | x−s, θs, Θs) ∝ exp{ (θs + xTΘs

  • ηs

) xs − ln(xs!) }.

David Inouye*, Pradeep Ravikumar, Inderjit Dhillon Admixture of Poisson MRFs (ICML 2014, Beijing, China)

slide-7
SLIDE 7

Independent PMRF

Count of Word 1 Count of Word 2 2 4 6 8 8 6 4 2

  • 1. Each conditional (”slice”)
  • f a PMRF is 1-D Poisson.
  • 2. Distinct from Gaussian

MRF

  • 3. Positive dependencies can

model word co-occurence.a

aSee [Yang et al. 2013] for SPMRF model that allows for positive dependencies.

Positive Dependency PMRF

Count of Word 1 Count of Word 2 2 4 6 8 8 6 4 2

Negative Dependency PMRF

Count of Word 1 Count of Word 2 2 4 6 8 8 6 4 2

David Inouye*, Pradeep Ravikumar, Inderjit Dhillon Admixture of Poisson MRFs (ICML 2014, Beijing, China)

slide-8
SLIDE 8

Poisson MRFs in the Context of LDA

◮ LDA uses Multinomial distributions but if the parameter

N ∼ Poisson(˜ x = p

s=1 xs|˜

λ = p

s=1 λ), then the joint

distribution is an independent Poisson model:1 Pr

Poiss

  • ˜

x | ˜ λ

  • Pr

Mult

  • x | θ = (λ1, · · · , λp) /˜

λ, N = ˜ x

  • = e−˜

λ

˜ x! ˜ λ˜

x

˜ x! p

s=1 xs! p

  • s=1

λs ˜ λ xs = ˜ x! ˜ x! e−˜

λ

p

s=1 xs! p

  • s=1

˜ λλs ˜ λ xs = Pr

  • Ind. Poiss (x | λ1, · · · , λp) =

p

  • s=1

e−λs xs! λxs

s ◮ Therefore, the topic-word distribution of LDA can be viewed

as a special case of a Poisson MRF.

1Gopalan et al. (2013) recently introduced the connection between LDA and

independent Poissons in the context of matrix factorization.

David Inouye*, Pradeep Ravikumar, Inderjit Dhillon Admixture of Poisson MRFs (ICML 2014, Beijing, China)

slide-9
SLIDE 9

Novel conjugate prior for a Poisson MRF

◮ Form of a conjugate prior:

Pr(θ, Θ) ∝ exp{βTθ + βTΘβ − γA (θ, Θ) − λθθ2

2 − λvec(Θ)1},

where A (θ, Θ) is the log partition function of a PMRF.2

◮ λvec(Θ)1 term encourages sparsity in Θ

(i.e. a Laplace prior on Θ).

◮ β can be viewed as adding pseudo-counts

(similar to a Dirichlet prior for a Multinomial)

2λθθ2 2 and λvec(Θ)1 needed for normalization of this prior distribution.

In practice, λθ can be set arbitrarily small and is thus ignored in subsequent discussion.

David Inouye*, Pradeep Ravikumar, Inderjit Dhillon Admixture of Poisson MRFs (ICML 2014, Beijing, China)

slide-10
SLIDE 10

Admixture of Poisson MRFs (APM)

◮ Poisson MRF base distribution ◮ Priors

◮ Dirichlet prior on admixture weights ◮ Conjugate prior on component PMRFs

Pr

APM(x, w, θ1...k, Θ1...k)

= Pr

PMRF

x ˛ ˛ ˛ ˛ ˛ ¯ θ =

k

X

j=1

wjθj, ¯ Θ =

k

X

j=1

wjΘj ! Pr

Dir(w) k

Y

j=1

Pr(θj, Θj)

◮ Topics → graphs over words (from PMRF parameters) ◮ Documents → weights over topics (dimensionality reduction)

David Inouye*, Pradeep Ravikumar, Inderjit Dhillon Admixture of Poisson MRFs (ICML 2014, Beijing, China)

slide-11
SLIDE 11

Parameter Estimation using Approximate Posterior

◮ Because the Poisson MRF likelihood does not have a

closed-form solution, we approximate the likelihood with the pseudo log-likelihood: L ≈ ˆ L(X | W, θ1...k, Θ1...k) =

n

  • i=1

  

p

  • s=1

ηisxis − ln(xis!) − A (ηis)

  • Conditional Poisson log-likelihood

   , where ηis =k

j=1 wij(θj s + xT i Θj s) is the canonical parameter

  • f a univariate Poisson (i.e. λis = exp(ηis)).

David Inouye*, Pradeep Ravikumar, Inderjit Dhillon Admixture of Poisson MRFs (ICML 2014, Beijing, China)

slide-12
SLIDE 12

Tractable MAP Parameter Estimation

◮ The approximate log posterior is:

P(W, θ1...k, Θ1...k | X) ≈ ˆ L × ln(priors) ∝

n

  • i=1

          

p

  • s=1

ηis (xis +βs)

  • psuedo-counts

−(γ+1)A (ηis)    +(α−1)Tln(wi)

  • Dirichlet prior

       −

k

  • j=1

λΘj1

  • ℓ1 penalty

for sparsity ◮ A MAP parameter estimate can be computed by the following:

arg min

W,θ1...k,Θ1...k − f (W, θ1...k, Θ1...k)

  • differentiable

+ δW(W) + λ

k

  • j=1

Θj1

  • nonsmooth but convex

◮ A proximal gradient method can used

David Inouye*, Pradeep Ravikumar, Inderjit Dhillon Admixture of Poisson MRFs (ICML 2014, Beijing, China)

slide-13
SLIDE 13

Qualitative Experiment

music

million country

language literature languages

sound

persons

roman church

movement

rome emperor modern

development

western

air

eastern

foreign

research

heat

musical

production

stage theater century

kingdom

plays

human

late spanish

eng

india

centuries

spain

introduced

wood

life

indian

novel

died novels

french

france

jean

indians

(a) “Music and Fine Arts”

energy

earth

surfacefield

nuclear

rock

space

soviet

peter

russian

heat

sun gas

temperature mm

deg

temperatures average

annual

(b) “Temperature”

◮ APM topic visualizations are intuitive

(versus list of word representations in LDA/PLSA)

◮ Explicit structure such as word chains

(e.g. [plays − theater − musical − music])

◮ Interesting central words

(e.g. [theater], [music], [temperature])

David Inouye*, Pradeep Ravikumar, Inderjit Dhillon Admixture of Poisson MRFs (ICML 2014, Beijing, China)

slide-14
SLIDE 14

Preliminary Coherence Experiments

Dataset # of Words # of Documents CMU 20 Newsgroup 200 18,846

◮ UMass Coherence Metric

[Minmo et al. 2011]

cohUMass(t) =

m

X

a=2 a−1

X

b=1

ln „D(va, vb) + ǫ D(vb) «

5 10 15 −80 −60 −40 −20 UMass Coherence Score

  • Num. Topics

LDA APM

PMI Coherence Score ◮ Pointwise Mutual Info.

[Newman et al. 2010]

cohPMI(t) = 1 m(m − 1)

m

X

a=1

X

b=a

ln „Pr(va, vb) + ǫ Pr(va) Pr(vb) «

5 10 15 −3.4 −3.2 −3 −2.8 −2.6 −2.4 PMI Coherence Score

  • Num. Topics

LDA APM

David Inouye*, Pradeep Ravikumar, Inderjit Dhillon Admixture of Poisson MRFs (ICML 2014, Beijing, China)

slide-15
SLIDE 15

Summary

◮ Introduced Admixture of Poisson MRFs that explictily models

word dependencies

◮ Formalized a class of models called admixtures that

generalizes previous topic models

◮ Provided tractable MAP parameter estimation ◮ Showed preliminary results on datasets

David Inouye*, Pradeep Ravikumar, Inderjit Dhillon Admixture of Poisson MRFs (ICML 2014, Beijing, China)

slide-16
SLIDE 16

Future Work

◮ Scalability

◮ Obvious concern since number of parameters is O(p2) ◮ Faster, parallel parameter estimation algorithm

(promising initial work on this)

◮ Empirical Experiments

◮ Evaluate semantic meaningfulness of edges in PMRF graph

(promising initial work on this)

◮ Word Sense Disambiguation (WSD) ◮ Document classification

◮ Visualization

◮ Visualize topics ◮ Visualize documents ◮ Visual information retrieval David Inouye*, Pradeep Ravikumar, Inderjit Dhillon Admixture of Poisson MRFs (ICML 2014, Beijing, China)

slide-17
SLIDE 17

Thanks for listening!

◮ Blei, D., Ng, A., and Jordan, M. Latent dirichlet allocation. JMLR,

3:993-1022, 2003.

◮ Mimno, D., Wallach, H. M., Talley, E., Leenders, M., and McCallum, A.

Optimizing semantic coherence in topic models. In EMNLP, pp. 262-272, 2011.

◮ Newman, D., Noh, Y., Talley, E., Karimi, S., and Baldwin, T. Evaluating

topic models for digital libraries. In Proc. of ACM/IEEE Joint Conference

  • n Digital Libraries (JCDL), pp. 215-224, 2010.

◮ Pritchard, J. K., Stephens, M., and Donnelly, P. Inference of population

structure using multilocus genotype data. Genetics, 155(2):945-59, June 2000.

◮ Reisinger, J., Waters, A., Silverthorn, B., and Mooney, R. J. Spherical

topic models. In ICML, pp. 903-910, 2010.

◮ Yang, E., Ravikumar, P., Allen, G. I., and Liu, Z. Graphical models via

generalized linear models. In NIPS, pp. 1367-1375, 2012.

◮ Yang, E., Ravikumar, P., Allen, G., and Liu., Z. On poisson graphical

  • models. In NIPS, pp. 1718-1726, 2013.

David Inouye*, Pradeep Ravikumar, Inderjit Dhillon Admixture of Poisson MRFs (ICML 2014, Beijing, China)

slide-18
SLIDE 18

Thanks for listening!

Possible applications:

  • Topic Visualization
  • Corpus Summarization
  • Word Sense Disambiguation
  • Semantic Similarity
  • Document Classification

“Fine Arts” theater music plays novels life “Fine Arts” theater music plays novels life Document Corpus

  • Doc. 4 - Music is an art form

whose medium is sound and silence…

  • Doc. 4 - Music is an art form

whose medium is sound and silence…

  • Doc. 4 - Music is an art form

whose medium is sound and silence…

  • Doc. 4 - Music is an art form

whose medium is sound and silence…

  • Doc. 4 - Music is an art form

whose medium is sound and silence…

  • Doc. 4 - Music is an art form

whose medium is sound and silence…

  • Doc. 4 - Music is an art form

whose medium is sound and silence…

  • Doc. 4 - Music is an art form

whose medium is sound and silence…

  • Doc. 4 - Music is an art form

whose medium is sound and silence…

  • Doc. 1 - Nuclear power, or

nuclear energy, is the use of exothermic nuclear processes...

  • Doc. 2 - Theatre or theater is a

collaborative form of fine art that uses live performers ...

  • Doc. 3 - A temperature is a

numerical measure of hot or cold. Its measurement is….

  • Doc. 4 - Music is an art form

whose medium is sound and silence…

  • Doc. 4 - Music is an art form

whose medium is sound and silence…

  • Doc. 4 - Music is an art form

whose medium is sound and silence…

Dependent Model Admixture of Poisson MRFs k Multinomials Independent Models LDA, PLSA, SAM, etc. “Temperature” nuclear heat sun temperature soviet “Fine Arts” theater music plays novels life k Poisson MRFs David Inouye*, Pradeep Ravikumar, Inderjit Dhillon Admixture of Poisson MRFs (ICML 2014, Beijing, China)