SVD-LDA: Topic Modeling for Full-Text Recommender Systems Sergey - - PowerPoint PPT Presentation

svd lda topic modeling for full text recommender systems
SMART_READER_LITE
LIVE PREVIEW

SVD-LDA: Topic Modeling for Full-Text Recommender Systems Sergey - - PowerPoint PPT Presentation

Recommender systems From LDA to SVD-LDA SVD-LDA: Topic Modeling for Full-Text Recommender Systems Sergey Nikolenko Steklov Mathematical Institute at St. Petersburg Laboratory for Internet Studies, National Research University Higher School of


slide-1
SLIDE 1

Recommender systems From LDA to SVD-LDA

SVD-LDA: Topic Modeling for Full-Text Recommender Systems

Sergey Nikolenko

Steklov Mathematical Institute at St. Petersburg Laboratory for Internet Studies, National Research University Higher School of Economics, St. Petersburg

October 30, 2015

Sergey Nikolenko SVD-LDA

slide-2
SLIDE 2

Recommender systems From LDA to SVD-LDA Intro Recsys overview

Outline

1

Recommender systems Intro Recsys overview

2

From LDA to SVD-LDA Latent Dirichlet Allocation SVD-LDA

Sergey Nikolenko SVD-LDA

slide-3
SLIDE 3

Recommender systems From LDA to SVD-LDA Intro Recsys overview

Overview

Very brief overview of the paper:

  • ur main goal is to recommend full text items (posts in social

networks, web pages etc.) to users; in particular, we want to extend recommender systems with features coming from the texts; this is especially important for cold start; these features can come from topic modeling; in this work, we combine classical SVD and LDA models into

  • ne, training them together.

Sergey Nikolenko SVD-LDA

slide-4
SLIDE 4

Recommender systems From LDA to SVD-LDA Intro Recsys overview

Recommender systems

Recommender systems analyze user interests and attempt to predict what the current user will be most interested in now. Collaborative filtering: given a sparse matrix of ratings assigned by users to items, predict unknown ratings (and hence recommend items with best predictions):

nearest neighbor methods (user-user and item-item) – GroupLens; SVD – singular value decomposition: decompose a user × item matrix, reducing dimensionality as user × item = (user × feature)(feature × item) (with very few features compared to users and items), learning user and item features that can be used to make predictions.

Sergey Nikolenko SVD-LDA

slide-5
SLIDE 5

Recommender systems From LDA to SVD-LDA Intro Recsys overview

Recommender systems

Formally speaking, SVD models a rating as ^ rSVD

i,a

= µ + bi + ba + q⊤

a pi, where

bi is the baseline predictor for user i; ba is the baseline predictor for user a; qa and pi are feature vectors for i and a.

Then you can train bi, ba, qa, pi together by fitting actual ratings to the model (by alternating least squares). Importantly for us, if you have likes/dislikes rather than explicit ratings, you can use logistic SVD (trained by alternating logistic regression): p(Likei,a) = σ

  • µ + bi + ba + q⊤

a pi

  • .

Sergey Nikolenko SVD-LDA

slide-6
SLIDE 6

Recommender systems From LDA to SVD-LDA Intro Recsys overview

Recommender systems

Many modifications of classical recommender systems use additional information:

implicit user preferences (what the user viewed – e.g., SVD++); the time when ratings appear (e.g., time-SVD++); social graph when the users’ social network profiles are available; context-aware recommendations (time of day, situation, company etc.); recommendations aware of other recommendations (optimizing diversity, novelty, serendipity).

In this work, we concentrate on the textual content of items.

Sergey Nikolenko SVD-LDA

slide-7
SLIDE 7

Recommender systems From LDA to SVD-LDA Intro Recsys overview

Recommender systems

The main dataset for the project comes from a Russian recommender system Surfingbird:

Surfingbird recommends web pages to users; a user clicks “Surf”, sees a new page, maybe rates it by clicking “Like” or “Dislike”; web pages usually have content, often textual content; the text may be very useful for recommendations; how do we use it?

Sergey Nikolenko SVD-LDA

slide-8
SLIDE 8

Recommender systems From LDA to SVD-LDA Latent Dirichlet Allocation SVD-LDA

Outline

1

Recommender systems Intro Recsys overview

2

From LDA to SVD-LDA Latent Dirichlet Allocation SVD-LDA

Sergey Nikolenko SVD-LDA

slide-9
SLIDE 9

Recommender systems From LDA to SVD-LDA Latent Dirichlet Allocation SVD-LDA

Topic modeling with LDA

Latent Dirichlet Allocation (LDA) – topic modeling for a corpus of texts:

a document is represented as a mixture of topics; a topic is a distribution over words; to generate a document, for each word we sample a topic and then sample a word from that topic; by learning these distributions, we learn what topics appear in a dataset and in which documents.

Sergey Nikolenko SVD-LDA

slide-10
SLIDE 10

Recommender systems From LDA to SVD-LDA Latent Dirichlet Allocation SVD-LDA

Topic modeling with LDA

Sample LDA result from (Blei, 2012):

Sergey Nikolenko SVD-LDA

slide-11
SLIDE 11

Recommender systems From LDA to SVD-LDA Latent Dirichlet Allocation SVD-LDA

Topic modeling with LDA

Sample LDA result from (Blei, 2012):

Sergey Nikolenko SVD-LDA

slide-12
SLIDE 12

Recommender systems From LDA to SVD-LDA Latent Dirichlet Allocation SVD-LDA

PGM for LDA

Sergey Nikolenko SVD-LDA

slide-13
SLIDE 13

Recommender systems From LDA to SVD-LDA Latent Dirichlet Allocation SVD-LDA

Inference in LDA

There are two major approaches to inference in probabilistic models with a loopy factor graph like LDA:

variational approximations simplify the graph by approximating the underlying distribution with a simpler one, but with new parameters that are subject to optimization; Gibbs sampling approaches the underlying distribution by sampling a subset of variables conditional on fixed values of all

  • ther variables.

Both approaches have been applied to LDA. In a way, LDA is similar to SVD – it performs dimensionality reduction and, so to speak, decomposes document × word = (document × topic)(topic × word).

Sergey Nikolenko SVD-LDA

slide-14
SLIDE 14

Recommender systems From LDA to SVD-LDA Latent Dirichlet Allocation SVD-LDA

LDA likelihood

Thus, the total likelihood of the LDA model is p(z, w, α, β) =

  • θ,φ

p(θ | α)p(z | θ)p(w | z, φ)p(φ | β)dθdφ. And in Gibbs sampling, we sample p(zw = t | z−w, w, α, β) ∝ q(zw, t, z−w, w, α, β) = = n(d)

−w,t + α

  • t ′∈T
  • n(d)

−w,t ′ + α

  • n(w)

−w,t + β

  • w ′∈W
  • n(w ′)

−w,t + β

.

Sergey Nikolenko SVD-LDA

slide-15
SLIDE 15

Recommender systems From LDA to SVD-LDA Latent Dirichlet Allocation SVD-LDA

LDA extensions

There already exist LDA extensions relevant to our research:

DiscLDA: LDA for classification with a class-dependent transformation in the topic mixtures; Supervised LDA: documents with a response variable, we mine topics that are indicative of the response; TagLDA: words have tags that mark context or linguistic features; Tag-LDA: documents have topical tags, the goal is to recommend new tags to documents; Topics over Time: topics change their proportions with time; hierarchical modifications with nested topics are also important.

In this work, we develop a novel extension: SVD-LDA.

Sergey Nikolenko SVD-LDA

slide-16
SLIDE 16

Recommender systems From LDA to SVD-LDA Latent Dirichlet Allocation SVD-LDA

Supervised LDA

We begin with supervised LDA:

assumes that each document has a response variable; and the purpose is to predict this variable rather than just “learn something about the dataset”; can we learn topics that are relevant to this specific response variable?

In recommender systems, the response variable would be the probability of a like, an explicit rating, or some other desirable action. This adds new variables to the graph.

Sergey Nikolenko SVD-LDA

slide-17
SLIDE 17

Recommender systems From LDA to SVD-LDA Latent Dirichlet Allocation SVD-LDA

PGM for sLDA

Sergey Nikolenko SVD-LDA

slide-18
SLIDE 18

Recommender systems From LDA to SVD-LDA Latent Dirichlet Allocation SVD-LDA

Supervised LDA

Mathematically, we add a factor corresponding to the response variable (Gaussian in sLDA): p(yd | z, b, σ2) = exp

  • −1

2

  • yd − b⊤✖

z − a 2 , the total likelihood is now p(z | w, y, b, α, β, σ2) ∝ ∝

  • d

B(nd + α) B(α)

  • t

B(nt + β) B(β)

  • d

exp

  • −1

2

  • yd − b⊤✖

zd − a 2 ,

Sergey Nikolenko SVD-LDA

slide-19
SLIDE 19

Recommender systems From LDA to SVD-LDA Latent Dirichlet Allocation SVD-LDA

Supervised LDA

The Gibbs sampling goes as

p(zw = t | z−w, w, α, β) ∝ ∝ q(zw, t, z−w, w, α, β)exp

  • −1

2

  • yd − b⊤✖

z − a 2 = = n(d)

−w,t + α

  • t ′∈T
  • n(d)

−w,t ′ + α

  • n(w)

−w,t + β

  • w ′∈W
  • n(w ′)

−w,t + β

exp

  • −1

2

  • yd − b⊤✖

z − a 2 ,

but it is now a two-step iterative algorithm:

sample z according to equations above; train b, a as a regression.

Sergey Nikolenko SVD-LDA

slide-20
SLIDE 20

Recommender systems From LDA to SVD-LDA Latent Dirichlet Allocation SVD-LDA

Logistic sLDA

Hence, our first results:

a Gibbs sampling scheme for supervised LDA (the original paper offered only variational approximations); an extension of supervised LDA to handle logistic variables: p = σ

  • b⊤✖

z + a

  • .

Sergey Nikolenko SVD-LDA

slide-21
SLIDE 21

Recommender systems From LDA to SVD-LDA Latent Dirichlet Allocation SVD-LDA

Logistic sLDA

Hence, our first results:

logistic regression is used to train b, a; and the Gibbs sampling goes as

p(zw = t | z−w, w, α, β) ∝ ∝ q(zw, t, z−w, w, α, β)

  • x∈Xd
  • σ
  • b⊤✖

zd + a yx 1 − σ

  • b⊤✖

zd + a 1−yx = = n(d)

−w,t + α

  • t ′∈T
  • n(d)

−w,t ′ + α

  • n(w)

−w,t + β

  • w ′∈W
  • n(w ′)

−w,t + β

× × exp (sd log pd + (|Xd| − sd) log(1 − pd)) .

Sergey Nikolenko SVD-LDA

slide-22
SLIDE 22

Recommender systems From LDA to SVD-LDA Latent Dirichlet Allocation SVD-LDA

SVD-LDA

The main result is a new unified SVD-LDA model. This time, we model the success probability as p(successi,a) = σ

  • µ + bi + ba + q⊤

a pi + θ⊤ a li

  • ,

where bi, ba, qa, qi are SVD predictors, θa are topic distributions, and li are user predictors for the topics. To avoid overfitting and improve cold start, we use the same li either for all users or for large clusters of users based on external (demographic) information.

Sergey Nikolenko SVD-LDA

slide-23
SLIDE 23

Recommender systems From LDA to SVD-LDA Latent Dirichlet Allocation SVD-LDA

SVD-LDA

Now the total likelihood is

p(D | µ, bi, ba, pi, qa, li, θa) = =

  • D

σ

  • µ + bi + ba + q⊤

a pi + θ⊤ a li

[r=1] 1 − σ

  • ^

r SVD

i,a

+ θ⊤

a li

[r=−1] , ln p(D | µ, bi, ba, pi, qa, li, θa) =

  • D
  • [r = 1] ln σ
  • ^

r SVD

i,a

+ θ⊤

a li

  • +

+ [r = −1] ln

  • 1 − σ
  • ^

r SVD

i,a

+ θ⊤

a li

  • =

=

  • D
  • ln
  • [r = −1] − σ
  • ^

r SVD

i,a

+ θ⊤

a li

  • ,

where ^ rSVD

i,a

= µ + bi + ba + q⊤

a pi, [r = 1] is 1 if r = 1 and 0

  • therwise, θa =

1 Na

  • w∈a zw is the topic distribution of

document a.

Sergey Nikolenko SVD-LDA

slide-24
SLIDE 24

Recommender systems From LDA to SVD-LDA Latent Dirichlet Allocation SVD-LDA

SVD-LDA

Gibbs sampling now looks as follows:

p(zw = t | z−w, w, α, β) ∝ ∝ q(zw, t, z−w, w, α, β)p(D | µ, bi, ba, pi, qa, li, θw→t

a

) = = n(d)

−w,t + α

  • t ′∈T
  • n(d)

−w,t ′ + α

  • n(w)

−w,t + β

  • w ′∈W
  • n(w ′)

−w,t + β

p(D | µ, bi, ba, pi, qa, li, θw→t

a

) = = n(d)

−w,t + α

  • t ′∈T
  • n(d)

−w,t ′ + α

  • n(w)

−w,t + β

  • w ′∈W
  • n(w ′)

−w,t + β

× × exp

  • D
  • ln
  • [r = −1] − σ
  • ^

r SVD

i,a

+ l ⊤

i θw→t a

  • .

Sergey Nikolenko SVD-LDA

slide-25
SLIDE 25

Recommender systems From LDA to SVD-LDA Latent Dirichlet Allocation SVD-LDA

SVD-LDA

The training algorithm for SVD-LDA proceeds iteratively:

train SVD model parameters by SGD for fixed z; sample z for fixed SVD model parameters.

The problem is that sampling has become intractable: to find the SVD factor, we need to go over all ratings in the recommender dataset for a given item; this is too long for practical algorithms.

Sergey Nikolenko SVD-LDA

slide-26
SLIDE 26

Recommender systems From LDA to SVD-LDA Latent Dirichlet Allocation SVD-LDA

SVD-LDA

To solve this problem, we developed a (first order) approximation to the Gibbs sampling scheme. After the computations, the approximation turns up very nice and simple:

p(zw = t | z−w, w, α, β) ∝ ∝ q(zw, t, z−w, w, α, β)p(D | µ, bi, ba, pi, qa, li, θw→t

a

) ≈ ≈ n(d)

−w,t + α

  • t ′∈T
  • n(d)

−w,t ′ + α

  • n(w)

−w,t + β

  • w ′∈W
  • n(w ′)

−w,t + β

exp

  • saθw→t

a

∝ n(d)

−w,t + α

  • t ′∈T
  • n(d)

−w,t ′ + α

  • n(w)

−w,t + β

  • w ′∈W
  • n(w ′)

−w,t + β

exp (st) .

This is what we used in the algorithm.

Sergey Nikolenko SVD-LDA

slide-27
SLIDE 27

Recommender systems From LDA to SVD-LDA Latent Dirichlet Allocation SVD-LDA

Results

During SVD-LDA training, SVD RMSE goes steadily down. 20 40 60 80 100 120 140 160 0.5 0.52 Iterations RMSE

Sergey Nikolenko SVD-LDA

slide-28
SLIDE 28

Recommender systems From LDA to SVD-LDA Latent Dirichlet Allocation SVD-LDA

Results

And the evaluation metrics all improve (sample results below).

T f NDCG AUC MAP WTA Top3 Top5 SVD 5 0.9815 0.8800 0.9408 0.9454 0.9439 0.9428 SVD-LDA 50 5 0.9829 0.8893 0.9417 0.9514 0.9469 0.9447 SVDLDA+dem 50 5 0.9840 0.8904 0.9428 0.9527 0.9481 0.9457 SVD 20 0.9816 0.8802 0.9408 0.9453 0.9439 0.9428 SVD-LDA 200 20 0.9828 0.8886 0.9417 0.9518 0.9468 0.9445 SVDLDA+dem 200 20 0.9837 0.8895 0.9427 0.9527 0.9478 0.9455

Sergey Nikolenko SVD-LDA

slide-29
SLIDE 29

Recommender systems From LDA to SVD-LDA Latent Dirichlet Allocation SVD-LDA

Future work

Although we have succeeded with SVD-LDA, there were some shortcomings:

Gibbs sampling is slow; it is hard to develop and test many new extensions (we needed an approximation and got lucky in that it worked).

We can address this by switching from LDA to pLSA (pLSI):

the learning is simpler (basically SVD), easier to scale; extensions come as different forms of regularizers, very simple to develop and test out; pLSA regularizers are a very recent development, so there is still a lot of uncharted ground.

Another completely different but in some ways similar idea: distributed word representations.

Sergey Nikolenko SVD-LDA

slide-30
SLIDE 30

Recommender systems From LDA to SVD-LDA Latent Dirichlet Allocation SVD-LDA

Thank you! Thank you for your attention!

Sergey Nikolenko SVD-LDA