Supervised Topic Models Atallah Hezbor and Anant Kharkar Outline - - PowerPoint PPT Presentation

supervised topic models
SMART_READER_LITE
LIVE PREVIEW

Supervised Topic Models Atallah Hezbor and Anant Kharkar Outline - - PowerPoint PPT Presentation

Supervised Topic Models Atallah Hezbor and Anant Kharkar Outline (optional. Mostly for our reference) Intro [AK] LDA [AK] Objective Diagram Motivation for sLDA sLDA Expectation Maximization [AH]


slide-1
SLIDE 1

Supervised Topic Models

Atallah Hezbor and Anant Kharkar

slide-2
SLIDE 2

Outline (optional. Mostly for our reference)

  • Intro [AK]
  • LDA [AK]

○ Objective ○ Diagram ○ Motivation for sLDA

  • sLDA

○ Expectation Maximization [AH] ○ Variational Inference [AH] ○ E-step [AH] ○ M-step [AK] ○ Prediction [AK]

  • Experimental Setup [AH]
  • Results/Conclusions [AH]
slide-3
SLIDE 3

Introduction

  • Topic modeling

○ Generally unsupervised ○ Learn topics - major clusters of content

  • Latent Dirichlet Allocation

○ One method for topic modeling ○ Learn topic assignment for each document

  • Learned topics often used for prediction

○ Analogous to PCA for regression/lasso

  • sLDA - end-to-end learned LDA + regression
  • Dirichlet Distribution

○ Takes parameter vector

slide-4
SLIDE 4

Latent Dirichlet Allocation

  • Objective - identify major topics in document

○ Topic = word probabilities ○ Use variational inference to compute parameters ○ (topic distr), z (topic assign), w (word), , - Dirichlet

  • Intractable posterior distr.
  • Unsupervised topics may not be ideal for response prediction

○ Genre may not be optimal topics for movie reviews

slide-5
SLIDE 5

Supervised Latent Dirichlet Allocation

  • Extend document generation model

○ Response variable ■ Numerical rating, number of likes

I n t r a c t a b l e

  • Formulate posterior

○ Intractable to compute

slide-6
SLIDE 6

Variational Inference

  • Want to approximate posterior distribution
  • Use Jensen’s inequality

○ log expectation >= expectation log

  • Pick a family of variational distributions, Q
  • Each q in Q has variational params: ,
  • Variational Expectation Maximization

○ E: Optimize w.r.t , ○ M: Optimize w.r.t model parameters

slide-7
SLIDE 7

Expectation Step

  • Model Parameters are fixed
  • parametrizes Dirichlet Distribution
  • the jth words distribution of topics
  • Maximize LB with respect to
  • Maximize LB with respect to
slide-8
SLIDE 8

Maximization Step

  • Estimate model parameters by maximizing corpus-level ELBO
  • 1:K- topic definitions (word distribution under topic k)
  • Regression parameters - , 2

○ Corpus-level analogue of log(p(response)) ○ Expected-value normal equations and update rules

slide-9
SLIDE 9

Prediction

  • Learned model params - ⍺, 1:K, ,

  • regression coefficients learned on z for response y
  • Predict response Y for a specific document given learned model
  • Variational approximation
slide-10
SLIDE 10

Experimental Setup

  • Movie review corpus [Ratings]
  • Digg article corpus [Number of Diggs]
  • Compared against

○ LDA + regression ○ Lasso regression

  • Metrics:

○ Predictive R-squared ○ Per-word log-likelihood

slide-11
SLIDE 11

Results

  • 8%, 9.4% prediction improvement
  • Better topic model for movie reviews
slide-12
SLIDE 12

Conclusions

  • LDA adapted to a specific purpose

○ Learn optimal topics for a specific response

  • Best of both worlds

○ Predict response ○ Preserve high topic likelihood

  • Lingering questions

○ More real world examples - when does it work well?

How does it compare to deep feature learning?

slide-13
SLIDE 13

Backup Slide

  • Variational Distribution q