Modeling Syntactic Structures of Topics with a Nested HMM LDA Jing - - PowerPoint PPT Presentation
Modeling Syntactic Structures of Topics with a Nested HMM LDA Jing - - PowerPoint PPT Presentation
Modeling Syntactic Structures of Topics with a Nested HMM LDA Jing Jiang Singapore Management University Singapore Management University Topic Models A generative model for discovering hidden topics from documents Topics are
Topic Models
- A generative model for discovering hidden topics from
documents
- Topics are represented as word distributions
- Topics are represented as word distributions
This makes full synchrony
- f activated units the
default condition in the model cortex, as in Brown s model [Brown and s model [Brown and Cooke, 1996], so that the background activation is h d b d coherent, and can be read into high order cortical levels which synchronize
12/7/2009 ICDM 2009 2
y with it.
How to Interpret Topics?
- List the top‐K frequent words
– Not easy to interpret – How are the top words related to each other?
- Our method: model the syntactic structures of
Our method: model the syntactic structures of topics using a combination of hidden Markov models (HMM) and topic models (LDA) models (HMM) and topic models (LDA)
- A preliminary solution towards meaningful
representations of topics representations of topics
12/7/2009 ICDM 2009 3
Related Work on Syntactic LDA
- Similar to / based on [Griffiths et al. 05]
– More general, with multiple “semantic classes”
- [Boyd‐Graber & Blei 09]
– Combines parse trees with LDA Combines parse trees with LDA – Expensive to obtain parse trees for large text collections
- [Gruber et al. 07]
– Combines HMM with LDA – Combines HMM with LDA – Does not model syntax
12/7/2009 ICDM 2009 4
HMM to Model Syntax
- In natural language sentences, the syntactic
class of a word occurrence (noun, verb, adjective, adverb, preposition, etc.) depends
- n its context
- Transitions between syntactic classes follow
some structure
- HMMs can be used to model these transitions
– HMM‐based part‐of‐speech tagger – HMM‐based part‐of‐speech tagger
12/7/2009 ICDM 2009 5
Overview of Our Model
- Assumptions
– A topic is represented as an HMM – C Content states: convey semantic meanings of topics (likely to be nouns, verbs, adjectives, etc.) – F Functional states: serve linguistic functions (e.g. prepositions and articles)
- Word distributions of these functional states are shared
among topics
Each document has a mixture of topics – Each document has a mixture of topics – Each sentence is generated from a single topic
12/7/2009 ICDM 2009 6
Overview of Our Model
12/7/2009 ICDM 2009 7
Overview of Our Model
Topics
12/7/2009 ICDM 2009 8
Overview of Our Model
Topics States
12/7/2009 ICDM 2009 9
Overview of Our Model
Topics States
12/7/2009 ICDM 2009 10
The n‐HMM‐LDA Model
12/7/2009 ICDM 2009 11
Document Generation Process
Sample topics and transition probabilities
12/7/2009 ICDM 2009 12
Document Generation Process
Sample a topic distribution for the document (same as in LDA)
12/7/2009 ICDM 2009 13
Document Generation Process
Sample a topic for a sentence
12/7/2009 ICDM 2009 14
Document Generation Process
Generate the words in the sentence using the HMM sentence using the HMM corresponding to this topic
12/7/2009 ICDM 2009 15
Variations
- Transition probabilities between states can be
either topic‐specific (left) or shared by all topics (right)
12/7/2009 ICDM 2009 16
Model Inference: Gibbs Sampling
- Sample a topic for a sentence
- Sample a state for a word
12/7/2009 ICDM 2009 17
Experiments – Data Sets
- NIPS publications (downloaded from
http://nips.djvuzone.org/txt.html)
- Reuters‐21578
Date Sets NIPS Publications* Reuters‐21578 Vocabulary 18,864 10,739 Words 5,305,230 1,460,666 documents for training 1314 8052 documents for training 1314 8052 documents for testing 618 2665
12/7/2009 ICDM 2009 18
Quantitative Evaluation
- Perplexity: a commonly used metric for the
generalization power of language models
- For a test document, observe the first K
sentences and predict the remaining sentences and predict the remaining sentences
12/7/2009 ICDM 2009 19
LDA vs. LDA‐s
- LDA‐s: n‐HMM‐LDA with a single state for each
HMM.
– Same as standard LDA with each sentence having a single topic
NIPS Reuters
One topic per sentence assumption helps
12/7/2009 ICDM 2009 20
One‐topic‐per‐sentence assumption helps.
HMM
- Achieves much lower perplexity, but cannot
be used to discover topics
NIPS
12/7/2009 ICDM 2009 21
NIPS
Increase Number of Functional States
- Fixing the number of content states to 1 and
the number of topics to 40
NIPS Reuters NIPS Reuters
More functional states decreases perplexity
12/7/2009 ICDM 2009 22
More functional states decreases perplexity.
Qualitative Evaluation
- Use the top frequent words to represent a
topic/state
12/7/2009 ICDM 2009 23
Sample Topics/States from LDA/HMM
12/7/2009 ICDM 2009 24
NIPS
Sample States from n‐HMM‐LDA‐d
12/7/2009 ICDM 2009 25
NIPS
Different Content States
12/7/2009 ICDM 2009 26
Case Study (LDA)
This makes full synchrony
- f activated units the
the f is the that
default condition in the model cortex, as in Brown s model [Brown and
- f
a signal and that be are can
- f
the in
s model [ rown and Cooke, 1996], so that the background activation is coherent and can be read
and to in frequency can
- ne
it to and cells to
coherent, and can be read into high order cortical levels which synchronize
frequency is * for cell model a response
with it.
response
12/7/2009 ICDM 2009 27
Case Study (n‐HMM‐LDA)
This makes full synchrony
- f activated units the
cells ll * receptive synaptic
default condition in the model cortex, as in Brown s model [Brown and
cell * neurons field synaptic inhibitory head excitatory
s model [ rown and Cooke, 1996], so that the background activation is coherent and can be read
field input response model excitatory direction cell visual
coherent, and can be read into high order cortical levels which synchronize
model activity synapses pyramidal
with it.
12/7/2009 ICDM 2009 28
Case Study (Comparison)
This makes full synchrony
- f activated units the
This makes full synchrony
- f activated units the
default condition in the model cortex, as in Brown s model [Brown and default condition in the model cortex, as in Brown s model [Brown and s model [ rown and Cooke, 1996], so that the background activation is coherent and can be read s model [ rown and Cooke, 1996], so that the background activation is coherent and can be read coherent, and can be read into high order cortical levels which synchronize coherent, and can be read into high order cortical levels which synchronize with it. with it. LDA n‐HMM‐LDA
12/7/2009 ICDM 2009 29
LDA
Conclusion
- We proposed a nested‐HMM‐LDA to model
the syntactic structures of topics
– Extension of [Griffiths et al. 05]
- Experiments on two data sets show that
p
– The model achieves perplexity between LDA and HMM – The model can provide more insights into the structures of topics than standard LDA
12/7/2009 ICDM 2009 30
Thank You!
- Questions?
12/7/2009 ICDM 2009 31