Recommender Systems
Presented By- Manish Mishra (271340) Raghavendran Tata (271441) Niraj Dev Pandey (271484)
Recommender Systems Online Update Presented By- Manish Mishra - - PowerPoint PPT Presentation
Recommender Systems Online Update Presented By- Manish Mishra (271340) Raghavendran Tata (271441) Niraj Dev Pandey (271484) Agenda o Introduction o Discussions on papers o Comparison of the Papers o Winning method On line Update Introduction
Presented By- Manish Mishra (271340) Raghavendran Tata (271441) Niraj Dev Pandey (271484)
On line Update
scalability , performance and accuracy of the systems
1. Incremental Singular Value Decomposition Algorithms for Highly Scalable Recommender Systems (Sarwar, Badrul, et al. Fifth International Conference on Computer and Information Science. 2002 ) 2. Vinagre, João, Alípio Mário Jorge, and João Gama. "Fast incremental matrix factorization for recommendation with positive-
Publishing, 2014 3. Matuszyk, Pawel, et al. "Forgetting methods for incremental matrix factorization in recommender systems." Proceedings of the 30th Annual ACM Symposium on Applied Computing. ACM, 2015.
On line Update
Presented by – Manish Mishra
Paper 1: Incremental singular value decomposition algorithms for highly scalable recommender systems
Online Update Manish Mishra
Online Update Manish Mishra
Motivation:
recommendations but has to undergo very expensive matrix factorization steps
based CF that has potential to be highly scalable while producing good predictive accuracy
while producing similar prediction accuracy
Online Update Manish Mishra
Matrix factorization technique for producing low-rank approximations = U and V are orthogonal matrices S is a diagonal matrix with only r nonzero entries such that si > 0 and s1 ≥ s2 ≥ . . . ≥ sr where r is the rank of matrix A The r columns of U corresponding to the nonzero singular values span the column space (eigenvectors of AAT) , and the r columns of V span the row space of the matrix A (eigenvectors of ATA)
m n m m n n m n
A U S V T
T n n n m m m n m
V S U ) SVD(A
× × × ×
× × =
Online Update Manish Mishra
customer-product relationship.
eigenvector.
Am x n Um x r S r x r VTr x n
Online Update Manish Mishra
*Ref: [1]
Where, is the prediction for ith customer and jth product is the row average Once the SVD decomposition is done, the prediction generation process involves only a dot product computation, which takes O(1) time, since k is a constant
,
T k k T k k i j i
i
j i
P,
Online Update Manish Mishra
The entire recommender system algorithm works in two separate steps:
Online Update Manish Mishra
Online Update Manish Mishra
Algorithm: Folding –in (As per the paper)
P= Uk × Uk T × Nu
the k x n matrix Sk.VkT
Algorithm: Folding –in (Reference [2])
n), a projection onto the current product vectors (Vk) is computed as
vector d (mx1), a projection
is computed as
Online Update Manish Mishra
d ˆ
1 1
ˆ
− × = k k T k
S U d d
u ˆ
1 1
ˆ
− × = k k k
S uV u
*Ref: [1]
Pseudo Code: Folding-in Input: , Output: 1 For i=1 to u do 2 3 := Append in U 4 end 5 return
Online Update Manish Mishra
T n k k k k m n m
V S U A
× × × × =
. .
T n k k k k u m n u m
V S U A
× × × + × +
= . .
) ( ' ) (
n u
1
. . ˆ
− × × × × = k k k n n i k i
S V d d
th th
k i m
× + ) (
k ith
d
×
ˆ
T n k k k k u m
V S U
× × × +
. .
) (
where, <pi – qi> is a ratings – prediction pair
N q p
N i i i
=
−
1
| |
Online Update Manish Mishra Data Parameters Description Data source MovieLens (www.movielens.umn.edu) Ratings 100,000 ratings (Users with 20 or more ratings considered) User-Movie matrix 943 users (rows) and 1682 movies (columns) T est-Training ratio X : 80%, 50% and 20%
wo hyper parameters need to be optimized before the experiment
dimensions each time. The results were plotted and k=14 was obtained as an optimal value
computing the SVD model by projecting the rest of (total -basis) users using the folding-in technique. MAE was plotted for the experiments and the optimal basis size was chosen
to incrementally compute the SVD model for additional users
Online Update Manish Mishra
Select a basis size that is small enough to produce fast model building yet large enough to produce good prediction quality
Online Update Manish Mishra
*Ref: [1]
MAE x = 0.8 = 0.733 (full model size) and 0.742 (model size of 600) (only 1.22% quality drop) => Even with a small basis size it is possible to obtain a good quality Corresponding to x=0.8, at basis size 600 throughput rate s is 88.82 whereas at basis size 943(full model) throughput rate becomes 48.9. So there is 81.63% performance gain
Online Update Manish Mishra
*Ref: [1] *Ref: [1]
performance but computing the SVD is very expensive
systems achieve high scalability while providing good predictive accuracy
Online Update Manish Mishra
recommender systems." Fifth International Conference on Computer and Information Science. 2002.
Latent Semantic Analysis. Journal of the American Society for Information Science. 41(6).
Master’s thesis, The University of Knoxville, Knoxville, TN, 1995
Online Update Manish Mishra
Online Update Manish Mishra
Latent Semantic Indexing and Updating (Folding-in) Example
Original Data T erm-Document Matrix of Original Data
documents refer to human-computer interaction and the “m" documents refer to group theory
129 term-document matrix are the frequencies in which a term
in a document or title
*Ref: [4] *Ref: [4]
Online Update Manish Mishra SVD: Selecting k=2, the best rank-2 approximation to A
S 2
*Ref: [4]
Online Update Manish Mishra T wo-dimensional plot of terms and documents for the 12x9 T erms Representation: X-axis:1st column of U2 scaled by s1 Y-axis:2nd column of U2 scaled by s2 Documents Representation: X-axis: 1st column of V2 scaled by s1 Y-axis: 2nd column of V2 scaled by s2 Notice the documents and terms pertaining to human computer interaction are clustered around the x-axis and the graph theory-related terms and documents are clustered around the y-axis
*Ref: [4]
Online Update Manish Mishra
Suppose another document d= “human computer” needs to be added Then, Document vector d12x1 = [1 0 1 0 0 0 0 0 0 0 0 0] T Projection of d12x1 will be This can be appended as column in VT to give UmxkSkxkVTkx(n+1)
*Ref: [4]
Online Update Raghavendran T ata
entire data set
streaming data
data becomes too expensive
for positive – only feedback
while being significantly faster
Online Update Raghavendran T ata
preferences from a set of known user preferences
about the interests of a user by collecting preferences or taste information from many users (collaborating)
Online Update Raghavendran T ata
ratings in the training set, actually taking advantage of the high sparsity of R
Online Update Raghavendran T ata
* Ref [1]
Classic evaluation methodologies for recommender systems begin by splitting the ratings dataset in two subsets – training set and testing set – randomly choosing data elements from the initial dataset. However, there are some issues:
Online Update Raghavendran T ata
rue Values as “1” and error is measured as
values as missing values)
Online Update Raghavendran T ata
ui ui ^
Online Update Raghavendran T ata
* Ref [1]
4 Algorithm's and compared the values “Update Time” for set N
Online Update Raghavendran T ata
* Ref [1]
Online Update Raghavendran T ata
* Ref [1]
Online Update Raghavendran T ata
* Ref [1]
feedback
competitive accuracy
Online Update Raghavendran T ata
Jorge, and Joao Gama
Online Update Raghavendran T ata
Presented by – Niraj Dev Pandey
Online Update Niraj Dev Pandey
echniques
Online Update Niraj Dev Pandey
but preferences are not static
as the taste of Users and perception of Items change over time
incorporate them into a Stream recommender
introduced and it shows that the forgetting of outdated date increases the quality of recommendation substantially
Online Update Niraj Dev Pandey
Online Update Niraj Dev Pandey
Online Update Niraj Dev Pandey
necessary to analyze the incoming data in an online manner
(https://en.wikipedia.org/wiki/Concept_drift)
Online Update Niraj Dev Pandey
Online Update Niraj Dev Pandey
Online Update Niraj Dev Pandey
Online Update Niraj Dev Pandey
Online Update Niraj Dev Pandey
(where T is a test set)
Online Update Niraj Dev Pandey
RMSE
random sample of 1000 Users) Epinions (extended)
Online Update Niraj Dev Pandey
T able 1: Average values of sliding RMSE for each dataset (lower values are better). Our forgetting strategy outperforms the non forgetting strategy on all datasets
Online Update Niraj Dev Pandey
Online Update Niraj Dev Pandey
improve the quality of recommendations
initial training and temporal aspects into account
Online Update Niraj Dev Pandey
Proceedings of the 30th Annual ACM Symposium on Applied Computing. ACM, 2015.
Springer US
Online Update Niraj Dev Pandey
On line Update Incremental SVD
building for SVD – Based CF systems
Recommender Systems
that requires less time and storage space Incremental SGD
factorization (ISGD)
feedback and prequential evaluation framework for streaming data Selective Forgetting for Incremental Matrix Factorization
Factorization using Forgetting techniques
using recent relevant data
algorithm
mechanism with limited space
sparsity
On line Update