Training deep Autoencoders for collaborative filtering Oleksii - PowerPoint PPT Presentation

Training deep Autoencoders for collaborative filtering Oleksii Kuchaiev & Boris Ginsburg

Motivation Personalized recommendations 2

Key points (spoiler alert) 1. Deep autoencoder for collaborative filtering 1. Improves generalization 2. Right activation function (SELU, ELU, LeakyRELU) enables deep architectures 1. No layer-wise pre-training, or skip connections 3. Heavy use of dropout 4. Dense re-feeding for faster and better training 5. Beats other models on time-split Netflix data (RMSE of 0.9099 vs 0.9224) 6. (PyTorch-based) https://github.com/NVIDIA/DeepRecommender Oleksii Kuchaiev and Boris Ginsburg "Training Deep AutoEncoders for Collaborative Filtering“, arXiv preprint arXiv:1708.01715 (2017). 3

Autoencoders & collaborative filtering Effects of the activation types Overfitting the data Going deeper Dropout Dense re-feeding Conclusions Oleksii Kuchaiev and Boris Ginsburg "Training Deep AutoEncoders for Collaborative Filtering“, arXiv preprint arXiv:1708.01715 (2017). 4

Collaborative filtering Rating prediction Some of the most popular approaches – Alternative Least Squares ( ALS) R(i,j) = k iff user i gave item j rating k j X r Items 3 4 m ≈ Users R=Rating matrix users 1 i 3 5 3 5 r hidden factors n items 5

Autoencoder Deep learning tool of choice for dimensionality reduction … y D z = encoder(x), encoding e 1 ∗ 𝑒 1 + 𝑐 4 y = decoder(z), reconstruction of x 𝑒 2 = 𝑋 𝑒 c y = decoder(encoder(x)) … o d 2 ∗ 𝑨 + 𝑐 3 ) Autoencoder can be thought of as 𝑒 1 = 𝑔(𝑋 e 𝑒 r generalization of PCA … 𝑨 = 𝑓 2 E n 2 ∗ 𝑓 1 + 𝑐 2 ) 𝑓 2 = 𝑔(𝑋 𝑓 c “Constrained” if decoder weights … o are transpose of encoder d “De - noising” if noise is a added to x. 1 ∗ 𝑦 + 𝑐 1 ) 𝑓 1 = 𝑔(W e e r … x 6 Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science , 313 (5786), 504-507.

AutoEncoders for recommendations User (item) based Masked Mean Squared Error z r y (very) sparse dense Sedhain, Suvash, et al. "Autorec: Autoencoders meet collaborative filtering." Proceedings of the 24th International Conference on World Wide Web. ACM, 2015. 7

Dataset Netflix prize training data set Time split to predict future ratings 8 Wu, Chao-Yuan, et al. "Recurrent recommender networks." Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. ACM, 2017.

Benchmark 𝑗 − 𝑧 𝑗 2 σ 𝑠 𝑗 ≠0 𝑠 Netflix prize training data set 𝑆𝑁𝑇𝐹 = σ 𝑠 𝑗 ≠0 1 PMF: Mnih, Andriy, and Ruslan R. Salakhutdinov. "Probabilistic matrix factorization." Advances in neural information processing systems. 2008. I-AR, U-AR : Sedhain, Suvash, et al. "Autorec: Autoencoders meet collaborative filtering." Proceedings of the 24th International Conference on World Wide Web. ACM, 2015. RRN: Wu, Chao-Yuan, et al. "Recurrent recommender networks." Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. ACM, 2017. 9

Activation function matters - We found that on this task ELU, SELU and LRELU perform much better than SIGMOID, RELU, RELU6, TANH and SWISH Apparently important : a) non-zero negative part b) Unbounded positive part RMSE Training RMSE per mini-batch. All lines correspond to 4-layers autoencoder (2 layer encoder and 2 layer decoder) with hidden unit dimensions of 128. Different line colors correspond to different activation functions. Iteration 11

1.8 d size 128 d size 256 d size 512 d size 1024 1.7 Overfit your data 1.6 1.5 Wide layers 1.4 RMSE 1.3 generalize poorly 1.2 1.1 1 … y 0 10 20 30 40 50 60 70 80 90 100 1.6 d size 128 d size 256 d size 512 d size 1024 1 ∗ 𝑒 1 + 𝑐 4 1.5 𝑒 2 = 𝑋 𝑒 1.4 … 𝑒 1.3 1.2 RMSE 1 ∗ 𝑦 + 𝑐 1 ) 𝑓 1 = 𝑔(W e 1.1 1 … x 0.9 0.8 0.7 Evaluation RMSE > 1.1 on Netflix full 0 10 20 30 40 50 60 70 80 90 100 Epoch 13

Deeper models Generalize better … y … … … No layer-wise pre-training necessary ! … x 15

Dropout Helps wider models generalize Evaluation RMSE y Drop Prob 0.0 Drop Prob 0.5 Drop Prob 0.65 Drop Prob 0.8 1.09 … 512 1.07 1.05 … 512 1.03 dropout 1.01 RMSE … 1024 0.99 0.97 … 512 0.95 0.93 … 512 0.91 0 20 40 60 80 100 … x Epoch 17

Dense re-feeding Intuition: idealized scenario Note that x is sparse but f(x) is dense For x , most of the loss is masked Imagine perfect f ∀𝑦 𝑗 ≠ 0: 𝑔 𝑦 𝑗 = 𝑦 𝑗 If user later rates new item k with rating r, then: 𝑔 𝑦 𝑙 = 𝑠 By induction: f(x)= 𝑔 𝑔 𝑦 = 𝑔 𝑦 Thus, f(x) should be a fixed point of f for every valid x 19

Dense re-feeding Attempt to enforce fixed point constraint (very) sparse x Dense f(f(x)) Dense f(x) Dense f(x) Update with real data x Update with synthetic data f(x) 20

Dense re-feeding Together with bigger LR improves generalization 0.995 Baseline Baseline LR 0.005 Baseline RF Baseline LF 0.005 RF 0.985 0.975 0.965 0.955 RMSE 0.945 0.935 0.925 0.915 0.905 0 20 40 60 80 100 Epoch 21

Results Netflix time split data I-AR, U-AR : Sedhain, Suvash, et al. "Autorec: Autoencoders meet collaborative filtering." Proceedings of the 24th International Conference on World Wide Web. ACM, 2015. RRN: Wu, Chao-Yuan, et al. "Recurrent recommender networks." Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. ACM, 2017. DeepRec is our 6 layer model 22

Conclusions 1. Autoencoders can replace ALS and be competitive with other methods 2. Deeper models generalize better 1. No layer-wise pre-training is necessary 3. Right activation function enables deep architectures 1. Negative parts are important 2. Unbounded positive part 4. Heavy use of dropout is needed for wider models 5. Dense re-feeding further improves generalization 23

Oleksii Kuchaiev and Boris Ginsburg "Training Deep Autoencoders for Collaborative Filtering“, arXiv preprint arXiv:1708.01715 (2017). Code, docs and tutorial: https://github.com/NVIDIA/DeepRecommender

Training deep Autoencoders for collaborative filtering Oleksii - PowerPoint PPT Presentation

Training deep Autoencoders for collaborative filtering Oleksii Kuchaiev & Boris Ginsburg Motivation Personalized recommendations 2 Key points (spoiler alert) 1. Deep autoencoder for collaborative filtering 1. Improves generalization

Deep learning Autoencoders Hamid Beigy Sharif university of technology November 11, 2019 Hamid

CSC321 Lecture 20: Autoencoders Roger Grosse Roger Grosse CSC321 Lecture 20: Autoencoders 1 /

Lecture 25: Autoencoders Kernel PCA Aykut Erdem January 2017 Hacettepe University Today

Deep-Learning: Unsupervised Generative models Deep Belief Networks Deep Stacked AutoEncoders

Machine Learning for Computational Linguistics Autoencoders + deep learning summary ar

PixelGAN Autoencoders Alireza Makhzani, Brendan Frey Machine learning Group University of

Autoencoders David Dohan So far: supervised models Multilayer perceptrons (MLP)

Adversarially Regularized Autoencoders Junbo (Jake) Zhao, Yoon Kim, Kelly Zhang, Alexander M.

Understanding Geometric Attributes with Autoencoders Alasdair Newson Tlcom ParisTech

Educating Text Autoencoders: Latent Representation Guidance via Denoising Tianxiao Shen Jonas

Variational Autoencoders Tom Fletcher March 25, 2019 Talking about this paper: Diederik Kingma

Hierarchical Importance Weighted Autoencoders Chin-Wei Huang Kris Sankaran Eeshan Dhekane

CSC421/2516 Lecture 17: Variational Autoencoders Roger Grosse and Jimmy Ba Roger Grosse and

Variational Laplace Autoencoders Yookoon Park, Chris Dongjoo Kim and Gunhee Kim Vision and

Loss Landscapes of Regularized Linear Autoencoders Daniel Kunin Jonathan M. Bloom Aleksandrina

CSCE 496/896 Lecture 5: Stephen Scott Autoencoders Introduction Basic Idea Stacked AE Stephen

Using Semantic Relations for Content-based Recommender Systems in Cultural Heritage Yiwen Wang 1 ,

Building a real-time recommendation engine with Neo4j OSCON 2017 William Lyon @lyonwj William

Collaborative Topic Modeling for Recommending Scientific Articles Chong Wang and David M. Blei

Context-aware recommendation Eirini Kolomvrezou, Hendrik Heuer Special Course in Computer and

Collaborative Filtering Yun-Ta Tsai 1 , Markus Steinberger 2 , Dawid Pajk 3 , Kari Pulli 4 1

Comparative performance of open source recommender systems Lenskit vs Mahout Laurie James

Fairness in Machine Learning Fairness in Supervised Learning Make decisions by machine learning:

ACCELERATOR SCHEDULING AND MANAGEMENT USING A RECOMMENDATION SYSTEM David Kaeli Department of