Supervised Topic Models Atallah Hezbor and Anant Kharkar Outline - - PowerPoint PPT Presentation
Supervised Topic Models Atallah Hezbor and Anant Kharkar Outline - - PowerPoint PPT Presentation
Supervised Topic Models Atallah Hezbor and Anant Kharkar Outline (optional. Mostly for our reference) Intro [AK] LDA [AK] Objective Diagram Motivation for sLDA sLDA Expectation Maximization [AH]
Outline (optional. Mostly for our reference)
- Intro [AK]
- LDA [AK]
○ Objective ○ Diagram ○ Motivation for sLDA
- sLDA
○ Expectation Maximization [AH] ○ Variational Inference [AH] ○ E-step [AH] ○ M-step [AK] ○ Prediction [AK]
- Experimental Setup [AH]
- Results/Conclusions [AH]
Introduction
- Topic modeling
○ Generally unsupervised ○ Learn topics - major clusters of content
- Latent Dirichlet Allocation
○ One method for topic modeling ○ Learn topic assignment for each document
- Learned topics often used for prediction
○ Analogous to PCA for regression/lasso
- sLDA - end-to-end learned LDA + regression
- Dirichlet Distribution
○ Takes parameter vector
Latent Dirichlet Allocation
- Objective - identify major topics in document
○ Topic = word probabilities ○ Use variational inference to compute parameters ○ (topic distr), z (topic assign), w (word), , - Dirichlet
- Intractable posterior distr.
- Unsupervised topics may not be ideal for response prediction
○ Genre may not be optimal topics for movie reviews
Supervised Latent Dirichlet Allocation
- Extend document generation model
○ Response variable ■ Numerical rating, number of likes
I n t r a c t a b l e
- Formulate posterior
○ Intractable to compute
Variational Inference
- Want to approximate posterior distribution
- Use Jensen’s inequality
○ log expectation >= expectation log
- Pick a family of variational distributions, Q
- Each q in Q has variational params: ,
- Variational Expectation Maximization
○ E: Optimize w.r.t , ○ M: Optimize w.r.t model parameters
Expectation Step
- Model Parameters are fixed
- parametrizes Dirichlet Distribution
- the jth words distribution of topics
- Maximize LB with respect to
- Maximize LB with respect to
Maximization Step
- Estimate model parameters by maximizing corpus-level ELBO
- 1:K- topic definitions (word distribution under topic k)
- Regression parameters - , 2
○ Corpus-level analogue of log(p(response)) ○ Expected-value normal equations and update rules
Prediction
- Learned model params - ⍺, 1:K, ,
○
- regression coefficients learned on z for response y
- Predict response Y for a specific document given learned model
- Variational approximation
Experimental Setup
- Movie review corpus [Ratings]
- Digg article corpus [Number of Diggs]
- Compared against
○ LDA + regression ○ Lasso regression
- Metrics:
○ Predictive R-squared ○ Per-word log-likelihood
Results
- 8%, 9.4% prediction improvement
- Better topic model for movie reviews
Conclusions
- LDA adapted to a specific purpose
○ Learn optimal topics for a specific response
- Best of both worlds
○ Predict response ○ Preserve high topic likelihood
- Lingering questions
○ More real world examples - when does it work well?
○
How does it compare to deep feature learning?
Backup Slide
- Variational Distribution q