SVD-LDA: Topic Modeling for Full-Text Recommender Systems Sergey - PowerPoint PPT Presentation

Recommender systems From LDA to SVD-LDA SVD-LDA: Topic Modeling for Full-Text Recommender Systems Sergey Nikolenko Steklov Mathematical Institute at St. Petersburg Laboratory for Internet Studies, National Research University Higher School of Economics, St. Petersburg October 30, 2015 Sergey Nikolenko SVD-LDA

Recommender systems Intro From LDA to SVD-LDA Recsys overview Outline Recommender systems 1 Intro Recsys overview From LDA to SVD-LDA 2 Latent Dirichlet Allocation SVD-LDA Sergey Nikolenko SVD-LDA

Recommender systems Intro From LDA to SVD-LDA Recsys overview Overview Very brief overview of the paper: our main goal is to recommend full text items (posts in social networks, web pages etc.) to users; in particular, we want to extend recommender systems with features coming from the texts; this is especially important for cold start; these features can come from topic modeling; in this work, we combine classical SVD and LDA models into one, training them together. Sergey Nikolenko SVD-LDA

Recommender systems Intro From LDA to SVD-LDA Recsys overview Recommender systems Recommender systems analyze user interests and attempt to predict what the current user will be most interested in now. Collaborative filtering: given a sparse matrix of ratings assigned by users to items, predict unknown ratings (and hence recommend items with best predictions): nearest neighbor methods (user-user and item-item) – GroupLens; SVD – singular value decomposition: decompose a user × item matrix, reducing dimensionality as user × item = ( user × feature )( feature × item ) (with very few features compared to users and items), learning user and item features that can be used to make predictions. Sergey Nikolenko SVD-LDA

Recommender systems Intro From LDA to SVD-LDA Recsys overview Recommender systems Formally speaking, SVD models a rating as r SVD = µ + b i + b a + q ⊤ ^ a p i , where i , a b i is the baseline predictor for user i ; b a is the baseline predictor for user a ; q a and p i are feature vectors for i and a . Then you can train b i , b a , q a , p i together by fitting actual ratings to the model (by alternating least squares). Importantly for us, if you have likes/dislikes rather than explicit ratings, you can use logistic SVD (trained by alternating logistic regression): � � µ + b i + b a + q ⊤ p ( Like i , a ) = σ a p i . Sergey Nikolenko SVD-LDA

Recommender systems Intro From LDA to SVD-LDA Recsys overview Recommender systems Many modifications of classical recommender systems use additional information: implicit user preferences (what the user viewed – e.g., SVD++); the time when ratings appear (e.g., time-SVD++); social graph when the users’ social network profiles are available; context-aware recommendations (time of day, situation, company etc.); recommendations aware of other recommendations (optimizing diversity, novelty, serendipity). In this work, we concentrate on the textual content of items. Sergey Nikolenko SVD-LDA

Recommender systems Intro From LDA to SVD-LDA Recsys overview Recommender systems The main dataset for the project comes from a Russian recommender system Surfingbird : Surfingbird recommends web pages to users; a user clicks “Surf”, sees a new page, maybe rates it by clicking “Like” or “Dislike”; web pages usually have content , often textual content; the text may be very useful for recommendations; how do we use it? Sergey Nikolenko SVD-LDA

Recommender systems Latent Dirichlet Allocation From LDA to SVD-LDA SVD-LDA Outline Recommender systems 1 Intro Recsys overview From LDA to SVD-LDA 2 Latent Dirichlet Allocation SVD-LDA Sergey Nikolenko SVD-LDA

Recommender systems Latent Dirichlet Allocation From LDA to SVD-LDA SVD-LDA Topic modeling with LDA Latent Dirichlet Allocation (LDA) – topic modeling for a corpus of texts: a document is represented as a mixture of topics; a topic is a distribution over words; to generate a document, for each word we sample a topic and then sample a word from that topic; by learning these distributions, we learn what topics appear in a dataset and in which documents. Sergey Nikolenko SVD-LDA

Recommender systems Latent Dirichlet Allocation From LDA to SVD-LDA SVD-LDA Topic modeling with LDA Sample LDA result from (Blei, 2012): Sergey Nikolenko SVD-LDA

Recommender systems Latent Dirichlet Allocation From LDA to SVD-LDA SVD-LDA PGM for LDA Sergey Nikolenko SVD-LDA

Recommender systems Latent Dirichlet Allocation From LDA to SVD-LDA SVD-LDA Inference in LDA There are two major approaches to inference in probabilistic models with a loopy factor graph like LDA: variational approximations simplify the graph by approximating the underlying distribution with a simpler one, but with new parameters that are subject to optimization; Gibbs sampling approaches the underlying distribution by sampling a subset of variables conditional on fixed values of all other variables. Both approaches have been applied to LDA. In a way, LDA is similar to SVD – it performs dimensionality reduction and, so to speak, decomposes document × word = ( document × topic )( topic × word ) . Sergey Nikolenko SVD-LDA

Recommender systems Latent Dirichlet Allocation From LDA to SVD-LDA SVD-LDA LDA likelihood Thus, the total likelihood of the LDA model is � p ( z , w , α, β ) = p ( θ | α ) p ( z | θ ) p ( w | z , φ ) p ( φ | β ) d θ d φ . θ , φ And in Gibbs sampling, we sample p ( z w = t | z − w , w , α, β ) ∝ q ( z w , t , z − w , w , α, β ) = n ( d ) n ( w ) − w , t + α − w , t + β = � . � � � n ( d ) n ( w ′ ) � − w , t ′ + α � − w , t + β t ′ ∈ T w ′ ∈ W Sergey Nikolenko SVD-LDA

Recommender systems Latent Dirichlet Allocation From LDA to SVD-LDA SVD-LDA LDA extensions There already exist LDA extensions relevant to our research: DiscLDA: LDA for classification with a class-dependent transformation in the topic mixtures; Supervised LDA: documents with a response variable, we mine topics that are indicative of the response; TagLDA: words have tags that mark context or linguistic features; Tag-LDA: documents have topical tags, the goal is to recommend new tags to documents; Topics over Time: topics change their proportions with time; hierarchical modifications with nested topics are also important. In this work, we develop a novel extension: SVD-LDA. Sergey Nikolenko SVD-LDA

Recommender systems Latent Dirichlet Allocation From LDA to SVD-LDA SVD-LDA Supervised LDA We begin with supervised LDA: assumes that each document has a response variable; and the purpose is to predict this variable rather than just “learn something about the dataset”; can we learn topics that are relevant to this specific response variable? In recommender systems, the response variable would be the probability of a like, an explicit rating, or some other desirable action. This adds new variables to the graph. Sergey Nikolenko SVD-LDA

Recommender systems Latent Dirichlet Allocation From LDA to SVD-LDA SVD-LDA PGM for sLDA Sergey Nikolenko SVD-LDA

Recommender systems Latent Dirichlet Allocation From LDA to SVD-LDA SVD-LDA Supervised LDA Mathematically, we add a factor corresponding to the response variable (Gaussian in sLDA): � − 1 � 2 � � p ( y d | z , b , σ 2 ) = exp y d − b ⊤ ✖ z − a , 2 the total likelihood is now p ( z | w , y , b , α, β, σ 2 ) ∝ � � 2 � B ( n d + α ) B ( n t + β ) − 1 � � � � y d − b ⊤ ✖ z d − a exp , ∝ B ( α ) B ( β ) 2 t d d Sergey Nikolenko SVD-LDA

Recommender systems Latent Dirichlet Allocation From LDA to SVD-LDA SVD-LDA Supervised LDA The Gibbs sampling goes as p ( z w = t | z − w , w , α, β ) ∝ � − 1 � 2 � � y d − b ⊤ ✖ ∝ q ( z w , t , z − w , w , α, β ) exp z − a = 2 n ( d ) n ( w ) − w , t + α − w , t + β � � 2 � − 1 � y d − b ⊤ ✖ = z − a , � exp � � � n ( d ) n ( w ′ ) 2 � − w , t ′ + α � − w , t + β t ′ ∈ T w ′ ∈ W but it is now a two-step iterative algorithm: sample z according to equations above; train b , a as a regression. Sergey Nikolenko SVD-LDA

Recommender systems Latent Dirichlet Allocation From LDA to SVD-LDA SVD-LDA Logistic sLDA Hence, our first results: a Gibbs sampling scheme for supervised LDA (the original paper offered only variational approximations); an extension of supervised LDA to handle logistic variables: � � b ⊤ ✖ p = σ z + a . Sergey Nikolenko SVD-LDA

Recommender systems Latent Dirichlet Allocation From LDA to SVD-LDA SVD-LDA Logistic sLDA Hence, our first results: logistic regression is used to train b , a ; and the Gibbs sampling goes as p ( z w = t | z − w , w , α, β ) ∝ �� 1 − y x = �� y x � � � � � b ⊤ ✖ b ⊤ ✖ ∝ q ( z w , t , z − w , w , α, β ) σ z d + a 1 − σ z d + a x ∈ X d n ( d ) n ( w ) − w , t + α − w , t + β = � × � � � n ( w ′ ) n ( d ) � − w , t ′ + α � − w , t + β t ′ ∈ T w ′ ∈ W × exp ( s d log p d + ( | X d | − s d ) log ( 1 − p d )) . Sergey Nikolenko SVD-LDA

SVD-LDA: Topic Modeling for Full-Text Recommender Systems Sergey - PowerPoint PPT Presentation

Recommender systems From LDA to SVD-LDA SVD-LDA: Topic Modeling for Full-Text Recommender Systems Sergey Nikolenko Steklov Mathematical Institute at St. Petersburg Laboratory for Internet Studies, National Research University Higher School of

Virtual Student Orientation Information for Families SLIDESMANIA.COM TOPIC TOPIC TOPIC TOPIC

ConnectHome ConnectHome Topic 2 Topic 2 Nation Webinar Nation Webinar Topic 3 Topic 3 Topic

SVD Status H. Yin August 24, 2017 H. Yin SVD Status August 24, 2017 1 / 19 Overview SVD

Linking words to topics Pavel Oleinikov Associate Director DataCamp Topic Modeling in R LDA

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Web Mining and Recommender Systems Recommender Systems: Introduction Learning Goals

SALT LAKE LEGAL DEFENDER (LDA) AND SOCIAL SERVICES Who we are, what we do, court system and how LDA

full year results full year results full year results full full year results full year results full

A study for hit-time reconstruction of Belle II SVD Yuma Uematsu (UTokyo) on behalf of Belle II

Parallel Singular Value Decomposition Jiaxing Tan Outline What is SVD? How to calculate

LDA 1 [Credits: Mike Smith, Las Vegas Sun 2013] LDA 2 [Credits: IITD Library] 4 5 6 In

2. Recommender Systems Recommenders Everywhere Advanced Topics in Information Retrieval /

Affect- and Personality-based Recommender Systems Part II: Acquisition, Usage in Recommender

On the Economics of Recommender Systems Emilio Calvano Center for Studies in Econ and Finance U.

Privacy in Recommender Systems CompSci 590.03 Instructor: Ashwin Machanavajjhala Lecture 21:

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

LArIAT In 10 Minutes New Perspectives 2018 Hunter Sullivan University of Texas at Arlington On

Algorithms for NLP CS 11-711 Fall 2020 Lecture 2: Linear text classification Emma Strubell

BASIC INPUT/OUTPUT Fundamentals of Computer Science I Outline: Basic Input/Output Screen

Exact Pattern Matching p t Goal: Find all occurrences of a pattern in a text Input: Pattern p = p

Tweaking TCPs Timers Kieran Mansley Laboratory for Communication Engineering Context

[537] Distributed Systems Chapters 42 Tyler Harter 11/19/14 File-System Case Studies Local -

Peer-to-Peer Networks The Internet 6th Week Albert-Ludwigs-Universitt Freiburg Department of

Implementation and Analysis of Nonblocking Collective Operations on SCI Networks Christian Kaiser