supervised topic models
play

Supervised Topic Models Atallah Hezbor and Anant Kharkar Outline - PowerPoint PPT Presentation

Supervised Topic Models Atallah Hezbor and Anant Kharkar Outline (optional. Mostly for our reference) Intro [AK] LDA [AK] Objective Diagram Motivation for sLDA sLDA Expectation Maximization [AH]


  1. Supervised Topic Models Atallah Hezbor and Anant Kharkar

  2. Outline (optional. Mostly for our reference) ● Intro [AK] ● LDA [AK] ○ Objective ○ Diagram ○ Motivation for sLDA ● sLDA ○ Expectation Maximization [AH] ○ Variational Inference [AH] ○ E-step [AH] ○ M-step [AK] ○ Prediction [AK] ● Experimental Setup [AH] ● Results/Conclusions [AH]

  3. Introduction ● Topic modeling ○ Generally unsupervised ○ Learn topics - major clusters of content ● Latent Dirichlet Allocation ○ One method for topic modeling ○ Learn topic assignment for each document ● Learned topics often used for prediction ○ Analogous to PCA for regression/lasso ● sLDA - end-to-end learned LDA + regression ● Dirichlet Distribution ○ Takes parameter vector �

  4. Latent Dirichlet Allocation ● Objective - identify major topics in document ○ Topic = word probabilities ○ Use variational inference to compute parameters ○ � (topic distr), z (topic assign), w (word), � , � - Dirichlet ● Intractable posterior distr. ● Unsupervised topics may not be ideal for response prediction ○ Genre may not be optimal topics for movie reviews

  5. Supervised Latent Dirichlet Allocation ● Extend document generation model ○ Response variable ■ Numerical rating, number of likes ● Formulate posterior ○ Intractable to compute e l b a t c a r t n I

  6. Variational Inference ● Want to approximate posterior distribution ● Use Jensen’s inequality ○ log expectation >= expectation log ● Pick a family of variational distributions, Q ● Each q in Q has variational params: � , � ● Variational Expectation Maximization ○ E: Optimize w.r.t � , � ○ M: Optimize w.r.t model parameters

  7. Expectation Step ● Model Parameters are fixed ● � - parametrizes Dirichlet Distribution ● - the jth words distribution of topics ● Maximize LB with respect to � ● Maximize LB with respect to

  8. Maximization Step ● Estimate model parameters by maximizing corpus-level ELBO ● � 1:K - topic definitions (word distribution under topic k) Regression parameters - � , � 2 ● ○ Corpus-level analogue of log(p(response)) ○ Expected-value normal equations and update rules

  9. Prediction Learned model params - ⍺ , � 1:K , � , � ● � - regression coefficients learned on z for response y ○ ● Predict response Y for a specific document given learned model ● Variational approximation

  10. Experimental Setup ● Movie review corpus [Ratings] ● Digg article corpus [Number of Diggs] ● Compared against ○ LDA + regression ○ Lasso regression ● Metrics: ○ Predictive R-squared ○ Per-word log-likelihood

  11. Results ● 8%, 9.4% prediction improvement ● Better topic model for movie reviews ●

  12. Conclusions ● LDA adapted to a specific purpose ○ Learn optimal topics for a specific response ● Best of both worlds ○ Predict response ○ Preserve high topic likelihood ● Lingering questions ○ More real world examples - when does it work well? ○ How does it compare to deep feature learning?

  13. Backup Slide ● Variational Distribution q

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend