user profiling in text-based recommender systems based on - - PowerPoint PPT Presentation

user profiling in text based recommender systems based on
SMART_READER_LITE
LIVE PREVIEW

user profiling in text-based recommender systems based on - - PowerPoint PPT Presentation

user profiling in text-based recommender systems based on distributed word representations . Steklov Institute of Mathematics at St. Petersburg National Research University Higher School of Economics, St. Petersburg Kazan (Volga


slide-1
SLIDE 1

user profiling in text-based recommender systems based on distributed word representations

.

Anton Alekseev and Sergey I. Nikolenko

Steklov Institute of Mathematics at St. Petersburg National Research University Higher School of Economics, St. Petersburg Kazan (Volga Region) Federal University, Kazan, Russia Deloitte Analytics Institute, Moscow, Russia

April 7, 2016

slide-2
SLIDE 2

intro: word embeddings .

slide-3
SLIDE 3
  • verview

.

  • Very brief overview of the paper:
  • we want to recommend full-text items to users;
  • in the input data, users like full-text items, and we’d like to

construct thematic user profiles based on this;

  • to do so, we cluster the word embeddings of keywords;
  • then we propose a conceptual way to weigh down meaningless

clusters of common words.

3

slide-4
SLIDE 4

word embeddings .

  • In this work, we construct user profiles based on texts.
  • To do so, we used distributed word representations (word

embeddings).

  • Distributed word representations map each word occurring in

the dictionary to a Euclidean space, attempting to capture semantic relationships between the words as geometric relationships in the Euclidean space.

4

slide-5
SLIDE 5

word embeddings .

  • Started back in (Bengio et al., 2003), exploded after the works of

Bengio et al. and Mikolov et al. (2009–2011), now used everywhere.

  • Basic idea:
  • shallow neural networks trained to reconstruct contexts by words
  • r words by context;
  • skip-gram: predict contextual words by the word , ;
  • CBOW: predict the word by its context , ;
  • Glove: train a decomposition of the matrix of cooccurrences.
  • Word embeddings serve as building blocks for neural network

approaches to NLP.

4

slide-6
SLIDE 6

word embeddings .

  • Two main architectures:

CBOW skip-gram

  • We use CBOW embeddings trained on a very large Russian

dataset (thanks to Nikolay Arefyev and Alexander Panchenko!).

4

slide-7
SLIDE 7

methods .

slide-8
SLIDE 8

tf-idf document profiles .

  • We begin with baseline approaches.
  • Using distributed representations trained on a huge Russian

corpus, we:

  • clustered the word vectors, resulting in semantic clustering;
  • used a vector representation for the documents as weighted sum

(with tf-idf weights) for the words;

  • stored baseline user profiles based on simple weighted sums of

their likes in this document representation;

  • trained baseline recommender algorithms that use these profiles:

ranking by cosine similarity, user-based and item-based collaborative filtering.

6

slide-9
SLIDE 9

new ideas and results .

  • Main problem:
  • we have a clustering in the word vector space , which can be

also applied to documents represented as vectors in ;

  • we also have a usersdocuments matrix;
  • how do we better compress it to individual user profiles?
  • We have tried decomposing this matrix with SVD and pLSA, but

with no good results. Two problems:

  • there are only likes in the dataset, no dislikes;
  • “junk” clusters with common words always fill user profiles,

whatever we did.

7

slide-10
SLIDE 10

new ideas and results .

  • We can use the following natural idea:
  • represent a document as a vector of cluster likelihoods ;
  • treat each user independently;
  • for every user, construct a logistic regression problem that models

the probability of like with weights corresponding to clusters;

  • train logistic regression; its weights constitute the user profile.
  • But it also seems to suffer from the same problems: where do

we get negative examples for the regression, and what do we do with “junk” clusters?

7

slide-11
SLIDE 11

new ideas and results .

  • We solve both problems with one stroke:
  • train several (hundred) balanced logistic regressions, choosing

negative examples uniformly at random among not-liked items;

  • then use the weights statistics (e.g., mean and variance) as user

profile;

  • this way, logistic regression is always balanced;
  • also, now junk clusters with common words with often appear in

negative examples too, so they will have significantly higher variance than informative clusters!

  • Having constructed these profiles, how do we make

recommendations?

7

slide-12
SLIDE 12

new ideas and results .

  • Recommender algorithm:
  • from the posterior distribution of weights (we used normal

distribution with posterior mean and variance), sample several (hundred) different weight combinations;

  • predict the probabilities of likes for all these combinations;
  • rank according to mean predicted like probability.

7

slide-13
SLIDE 13

sample user profile .

#

  • Words

867 0.772 0.165

hours two-hour break minute half-hour five-minute two-hour ten-hour...

424 0.833 0.202

kissing call cry silent scream laughing nod dare restrain angry slam...

837 0.399 0.010

youtube blog net mail facebook player online yandex user tor ado...

366 0.396 0.042

associate attitude seems quite horoscope ideal religious face era...

413 0.406 0.080

feel glad remember worrying offended jealous inhale pity envy suffer autumn...

427 0.385 0.073

hijack bombing raid to steal loot bomb

798 0.385 0.080

uro missile air defense mine RL submarine Vaenga Red Banner Pacific Fleet...

8

slide-14
SLIDE 14

experimental evaluation .

slide-15
SLIDE 15

algorithms .

  • So far we are comparing three baseline algorithms and our

regression-based algorithm:

(1) cosine: find nearest documents to a linear user profile with respect to cosine proximity; (2) user-based collaborative filtering: find nearest neighbors for a user and recommend documents according to their likes; (3) item-based collaborative filtering: find nearest neighbors for a document and recommend documents similar to the ones a user liked; (4) regression-based algorithm: sample weights according to the posterior distribution, recommend according to average results.

10

slide-16
SLIDE 16

evaluation: metrics .

  • In experimental evaluation, regression-based recommender

clearly outperforms all other methods. Algorithm AUC NDCG Top1 Top5 Top10 Cosine 0.514 0.779 0.511 2.471 4.757 1 User-based CF 0.456 0.686 0.101 1.418 3.851 2 Item-based CF 0.495 0.780 0.523 2.493 4.813 3 Regression 0.530 0.796 0.562 2.667 5.153

  • Demo...

11

slide-17
SLIDE 17

thank you! .

Thank you for your attention!

12