A Nonparametric N-Gram Topic Model with Interpretable Latent Topics - - PowerPoint PPT Presentation

a nonparametric n gram topic model with interpretable
SMART_READER_LITE
LIVE PREVIEW

A Nonparametric N-Gram Topic Model with Interpretable Latent Topics - - PowerPoint PPT Presentation

A Nonparametric N-Gram Topic Model with Interpretable Latent Topics Shoaib Jameel and Wai Lam The Chinese University of Hong Kong . One line summary of the work . We will see how maintaining the order of words in a document helps improve


slide-1
SLIDE 1

A Nonparametric N-Gram Topic Model with Interpretable Latent Topics

Shoaib Jameel and Wai Lam

The Chinese University of Hong Kong

.

One line summary of the work

. .

We will see how maintaining the order of words in a document helps improve the qualitative and quantitatve results in a nonparametric topic model.

Shoaib Jameel and Wai Lam AIRS-2013, Singapore 1 / 9

slide-2
SLIDE 2

Introduction and Motivation

Popular nonparametric topic models such as Hierarchical Dirichlet Processes (HDP) assume a bag-of-words paradigm. They thus lose important collocation information in the document.

.Example . .

They cannot capture a compound word “neural network” in a topic. Parametric n-gram topic models also exist, but they require the user to supply the number of topics. The bag-of-words assumption makes the latent topics discovered less interpretable.

Shoaib Jameel and Wai Lam AIRS-2013, Singapore 2 / 9

slide-3
SLIDE 3

Related Work

Our work extends the Hierarchical Dirichlet Processes (HDP) (Teh et

  • al. JASA-2006)

(Goldwater et al. ACL-2006) presented nonparametric word segmentation models where order of words in the document is maintained. Deane (Deane. ACL-2005) presented a nonparametric approach to extract phrasal terms. Parametric n-gram topic models such as Bigram Topic Model (Wallach. ICML-2006), LDA-Collocation Model (Griffiths et al. Psy. Rev-2005), and Topical N-gram Model (Wang et al. IDCM-2007) all need the number of topics to be specified by the user.

Shoaib Jameel and Wai Lam AIRS-2013, Singapore 3 / 9

slide-4
SLIDE 4

Background - HDP Model

The Hierarchical Dirichlet Processes (HDP), when used as a topic model, discovers the latent topics in a collection. The model does not require the number of topics to be specified by the user. The model can be regarded as a nonparametric version of the Latent Dirichlet Allocation (LDA) (Blei et al., JMLR-2003)

.

Generative Process

. . G0|γ, H ∼ DP(γ, H) Gd|α, G0 ∼ DP(α, G0) zdi|Gd ∼ Gd wdi|zdi ∼ Multinomial(zdi) γ and α are the concentration parameters. The base probability measure H provides the prior distribution for the factors or topics zdi. .

Graphical Model in Plate Diagram

. .

D G0 Gd zdi wdi H γ α Nd

Shoaib Jameel and Wai Lam AIRS-2013, Singapore 4 / 9

slide-5
SLIDE 5

N-gram Hierarchical Dirichlet Processes Model

. .This is our proposed model, called as NHDP

.

.Our Model - NHDP . .

We introduce a set of binary indicator variables between words in sequence in the HDP model. The binary variables are set to 1 if the words form a bigram, else they are 0. If the words in sequence form a bigram, then the words are generated solely based on the previous word. Unigram words are generated from the topic. This framework helps us capture the word order in the document.

Shoaib Jameel and Wai Lam AIRS-2013, Singapore 5 / 9

slide-6
SLIDE 6

N-gram Hierarchical Dirichlet Processes Model

.

Generative Process

. .

G0|γ, H ∼ DP(γ, H); Gd|α, G0 ∼ DP(α, G0); zdi|Gd ∼ Gd; xdi|wd,i−1 ∼ Bernoulli(ψwd,i−1);

if xdi = 1 then

wdi|wd,i−1∼ Multinomial(σwd,i−1)

end else

wdi|zdi ∼F(zdi)

end .

Graphical Model in Plate Diagram

. .

D V V wd,i−1 wdi wd,i+1 xdi xd,i+1 zd,i−1 zdi zd,i+1 Gd H G0 α γ ψ ǫ σ δ

. .

As one can see, our model can capture word dependencies in the text data.

Shoaib Jameel and Wai Lam AIRS-2013, Singapore 6 / 9

slide-7
SLIDE 7

Posterior Inference using Gibbs Sampling

.

First Condition: xdi = 0

. .

Under this condition, sampling is same as in the HDP model.

.

Second Condition: xdi = 1 → Prob. of a topic in a document.

. . P(kdt = k|t, k¬dt) ∝ { m¬dt

.k f ¬wdt k

(wdt) if k is already used γf ¬wdt

ˆ k

(wdt) if k = ˆ k f ¬wdt

k

(wdt) = Γ(n¬wdt

..k

+ Vη) Γ(n¬wdt

..k

+ nwdt + Vη) × ∏

ϑ Γ(n¬wdt,ϑ ..k

+ nwdt,ϑ + η) ∏

ϑ Γ(n¬wdt,ϑ ..k

+ η)

kdt is the topic index variable for each table t in d w as (wdi : ∀d, i) and wdt as (wdi : ∀i with tdi = t), t as (tdi : ∀d, i) and k as (kdt : ∀d, t). x as (xdi : ∀d, i). (k¬dt, t¬di), it means that the variables corresponding to the superscripted index are removed from the set or from the calculation of the count. V is the vocabulary size. n¬wdi

..k

is the number of words belonging to the topic k in the corpus whose xdi = 0 excluding wdi. nwdt is the total number of words at the table t whose xdi = 0.

Shoaib Jameel and Wai Lam AIRS-2013, Singapore 7 / 9

slide-8
SLIDE 8

Experimental Results

.Qualitative Results . .

HDP NHDP

patterns cortex neurons activation pattern neurons neural network activity neural networks simulations

HDP NHDP

sports british cbs miss television week television summer olympics broadcast world news tonight

HDP NHDP

color usenet ins ftp bit windows sun microsystems anonymous ftp usenet comp

NIPS Dataset AP Dataset Comp Dataset

.Quantitative Results - Perplexity Analysis . .

AP Dataset

. .

. L D A C O L . H D P . B i

  • N

H D P . N H D P . 6,640 . 6,660 . 6,680 . 6,700 . 6,720 . . . . . . . . . . . . . . . . .

Perplexity

.

Training Data=30%

. .

. L D A C O L . H D P . B i

  • N

H D P . N H D P . 3,290 . 3,330 . 3,370 . 3,410 . . . . . . . . . . . . . . . . .

Perplexity

.

Training Data=50%

. .

. L D A C O L . H D P . B i

  • N

H D P . N H D P . 3,390 . 3,430 . 3,470 . . . . . . . . . . . . . . . . .

Perplexity

.

Training Data=70%

. .

. L D A C O L . H D P . B i

  • N

H D P . N H D P . 3,310 . 3,330 . 3,350 . 3,370 . . . . . . . . . . . . . . . . .

Perplexity

.

Training Data=90%

Shoaib Jameel and Wai Lam AIRS-2013, Singapore 8 / 9

slide-9
SLIDE 9

Conclusion and References

We proposed an n-gram nonparametric topic model which discovers more interpretable latent topics. Our model introduces a new set of binary random variables in the HDP model. Our model extends the posterior inference scheme of the HDP model. Results demonstrate that our model has outperformed state-of-the-art results.

.

References

. . Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent dirichlet allocation. JMLR, 3, 993-1022. Wallach, H. M. (2006). Topic modeling: Beyond bag-of-words. Proc of ICML (pp. 977-984). Griffiths, T. L., Steyvers, M., and Tenenbaum, J. B. (2007). Topics in semantic representation. Psychological review, 114(2), 211. Wang, X., McCallum, A., and Wei, X. (2007). Topical n-grams: Phrase and topic discovery, with an application to information retrieval. In Proc. of ICDM, (pp. 697-702). Teh, Y. W., Jordan, M. I., Beal, M. J., and Blei, D. M. (2006). Hierarchical Dirichlet Processes. Journal

  • f the American Statistical Association 101: pp. 15661581.

Deane, P ., 2005. A nonparametric method for extraction of candidate phrasal terms. In Proc. of ACL. 605-613. Goldwater, S., Griffiths, T. L., and Johnson, M,. (2006). Contextual dependencies in unsupervised word segmentation. In Proc. of ACL. 673-680.

Shoaib Jameel and Wai Lam AIRS-2013, Singapore 9 / 9