artm vs. lda: an svd extension case study . Sergey I. Nikolenko - - PowerPoint PPT Presentation

▶

Sep 21, 2023 412 likes •519 views

artm vs. lda: an svd extension case study . Sergey I. Nikolenko 1,2,3,4 1 Steklov Institute of Mathematics at St. Petersburg 2 National Research University Higher School of Economics, St. Petersburg 3 Kazan (Volga Region) Federal University, Kazan,

SLIDE 1

artm vs. lda: an svd extension case study

.

Sergey I. Nikolenko1,2,3,4

1Steklov Institute of Mathematics at St. Petersburg 2National Research University Higher School of Economics, St. Petersburg 3Kazan (Volga Region) Federal University, Kazan, Russia 4Deloitte Analytics Institute, Moscow, Russia

April 7, 2016

SLIDE 2

problem setting .

Main goal: to recommend full text items (posts in social

networks, web pages etc.) to users.

In particular, enrich recommender systems with text-based

features; this is especially important for cold start.

These features can come from topic modeling.
LDA extensions can be developed to extract features relevant to

recommendations.

SLIDE 3

supervised lda .

Supervised LDA:
assumes that each document has a response variable;
and the purpose is to predict this variable rather than just “learn

something about the dataset”;

can we learn topics that are relevant to this specific response

variable?

In recommender systems, the response variable would be the

probability of a like, an explicit rating, or some other desirable action.

This adds new variables to the graph.

SLIDE 4

pgm for slda .

SLIDE 5

logistic slda and svd-lda .

Our previous results (MICAI 2015):
a Gibbs sampling scheme for supervised LDA (the original paper
ffered only variational approximations);
an extension of supervised LDA to handle logistic variables:

p = σ (b⊤ ̄ z + a) ,

a new unified SVD-LDA model:

p(successi,a) = σ (µ + bi + ba + q⊤

api + θ⊤ ali) ,

where bi, ba, qa, qi are SVD predictors, θa are topic distributions, and li are user predictors for the topics, and a Gibbs sampling scheme for it;

Gibbs sampling was too computationally intensive, so we

developed an approximate sampler (first-order approximation). This wasn’t easy.

SLIDE 6

artm .

Additive Regularization of Topic Models (ARTM) simply adds

regularizers to the objective function on the training stage of the basic pLSA model (Vorontsov et al., 2013, 2014, 2015): L(Φ, Θ)+R(Φ, Θ) = ∑

d∈D

∑

w∈W

ndw ln p(w ∣ d)+

∑

i=1

ρiRi(Φ, Θ).

Solve Karush-Kuhn-Tucker conditions with Newton’s method:

ptdw = norm+

t∈T (φwtθtd) ,

nwt = ∑

d∈D

ndwptdw, ntd = ∑

w∈d

ndwptdw, φwt = norm+

w∈W (nwt + φwt

∂R ∂φwt ) , θtd = norm+

t∈T (ntd + θtd

∂R ∂θtd ) ,

where norm+ denotes non-negative normalization.

SLIDE 7

svd-artm .

We extend ARTM with an SVD-based regularizer: the total

likelihood of the dataset with ratings comprised of triples D = {(i, a, r)} (user i rated item a as r ∈ {−1, 1}) is R(Φ, Θ) = ln p(D ∣ μ, bi, ba, pi, qa, li, θa) = = ∑

ln ([r = −1] − σ (̂ ri,a)) , where [r = −1] = 1 if r = −1 and [r = −1] = 0 otherwise, and θa is the vector of topics trained for document a in the LDA model, θa =

1 Na ∑w∈a zw, where Na is the length of document a, and

̂ ri,a = ̂ rSVD

i,a

+ θ⊤

ali = μ + bi + ba + q⊤ api + θ⊤ ali. 7

SLIDE 8

svd-artm .

To add this regularizer to the pLSA model, we compute its partial

derivatives: ∂R(Φ, Θ) ∂φwt = 0, ∂R(Φ, Θ) ∂θta = ∑

(i,a,r)∈D [[r = 1] − σ(̂

rSVD

i,a

+ θ⊤

ali)] li.

Turns out it’s exactly the same as the painstakingly developed

approximation scheme for SVD-LDA!

So this is a good case study for ARTM: we got a reasonable

scheme automatically.

SLIDE 9

artm vs. lda: an svd extension case study

.

Sergey I. Nikolenko1,2,3,4

April 7, 2016

problem setting .

networks, web pages etc.) to users.

features; this is especially important for cold start.

recommendations.

supervised lda .

something about the dataset”;

variable?

probability of a like, an explicit rating, or some other desirable action.

pgm for slda .

logistic slda and svd-lda .

p = σ (b⊤ ̄ z + a) ,

p(successi,a) = σ (µ + bi + ba + q⊤

where bi, ba, qa, qi are SVD predictors, θa are topic distributions, and li are user predictors for the topics, and a Gibbs sampling scheme for it;

developed an approximate sampler (first-order approximation). This wasn’t easy.

artm .

regularizers to the objective function on the training stage of the basic pLSA model (Vorontsov et al., 2013, 2014, 2015): L(Φ, Θ)+R(Φ, Θ) = ∑

∑

ndw ln p(w ∣ d)+

∑

ρiRi(Φ, Θ).

ptdw = norm+

nwt = ∑

ndwptdw, ntd = ∑

ndwptdw, φwt = norm+

∂R ∂φwt ) , θtd = norm+

∂R ∂θtd ) ,

where norm+ denotes non-negative normalization.

svd-artm .

likelihood of the dataset with ratings comprised of triples D = {(i, a, r)} (user i rated item a as r ∈ {−1, 1}) is R(Φ, Θ) = ln p(D ∣ μ, bi, ba, pi, qa, li, θa) = = ∑

ln ([r = −1] − σ (̂ ri,a)) , where [r = −1] = 1 if r = −1 and [r = −1] = 0 otherwise, and θa is the vector of topics trained for document a in the LDA model, θa =

̂ ri,a = ̂ rSVD

+ θ⊤

svd-artm .

derivatives: ∂R(Φ, Θ) ∂φwt = 0, ∂R(Φ, Θ) ∂θta = ∑

rSVD

+ θ⊤

approximation scheme for SVD-LDA!

scheme automatically.

thank you! .

Thank you for your attention!