 
              ��-�.�����A�������������&�������-����� ����������������������������������C������-���.�����A�����B�����E �A����+���A��������� ��A���������������B����������� Xiangnan He 1 , Xiaoyu Du 2 , Xiang Wang 1 , Feng Tian 3 , Jinhui Tang 4 and Tat-Seng Chua 1 1: National University of Singapore 2: Chengdu University of Information Technology 3: Northeast Petroleum University 4: Nanjing University of Science and Technology F����E�������&�. �����������������B���
�+������+������+�������� Ø A prevalent model for collaborative filtering – Represent a user (or an item) as a vector of latent factors (also termed as embedding ) Embedding Item Embedding User Embedding Layer Q NxK P MxK Input Layer 0 1 0 ... 0 1 0 ... (Sparse) Item (i) User (u) ������������(�����������)�����).�.��.��
�+������+������+�������� Ø A prevalent model for collaborative filtering – Represent a user (or an item) as a vector of ^ y ui Prediction latent factors (also termed as embedding ) – Estimate an interaction as the inner product f(p , q ) Interaction between the user embedding and item u i Function embedding p u q i Embedding Item Embedding User Embedding Layer Q NxK P MxK Input Layer 0 1 0 ... 0 1 0 ... (Sparse) Item (i) User (u) ������������(�����������)�����).�.��.��
�+������+������+�������� Ø A prevalent model for collaborative filtering – Represent a user (or an item) as a vector of ^ y ui Prediction latent factors (also termed as embedding ) – Estimate an interaction as the inner product f(p , q ) Interaction between the user embedding and item u i Function embedding p u q i Ø Many extensions on MF Embedding Item Embedding User Embedding Layer Q NxK P MxK – Model perspective: NeuMF [He et al, WWW’17] , Input Layer 0 1 0 ... 0 1 0 ... Factorization Machine etc. (Sparse) Item (i) User (u) – Learning perspective: BPR, Adversarial Personalized Ranking [He et al, SIGIR’18] ������������(�����������)�����).�.��.��
������+���� ���+���������� Ø MF uses Inner Product as the interaction function Ø The implicit assumption in Inner Product: • The embedding dimensions are independent with each other However, the implicit assumption is impractical. p The embedding dimensions could be interpreted as certain properties of items [Zhang et al. , SIGIR’14] , which are not necessarily to be independent Recent DNN-based models either use element-wise product or concatenation. p E.g., NeuMF [He et al, WWW’17] , NNCF [Bai et al, CIKM’17], JRL [Zhang et al, CIKM’17], Autoencoder-based CF Models [Wu et al, WSDM’16] p Still, the relations among embedding dimensions are not explicitly modeled. ��������.��������� �������.�����������
�+�+���� ��+��.��� • How to model the relations between embedding dimensions? • Next:our proposed method: 1. Outer product on user&item embedding for pairwise interaction modeling 2. CNN on the outer product matrix to extract and reweight prediction signals. �������.���������� ������.�����+�+��+�
) �������������.-��� ( ���-�� � ���-.��-���� � ����������)(�� Outer-product explicitly models the pairwise Training ^ y ui BPR Prediction relations between embedding dimensions: - G et a 2D matrix, named as interaction map : Interaction Features Hidden Layers Signals in Inner Interaction E product Map p u q i ⊗ Embedding Item Embedding User Embedding Layer Q NxK P MxK Indicating the interaction between the k-th Input Layer 0 1 0 ... 0 1 0 ... dimension of p u and the 2-nd dimension of q i . (Sparse) User (u) Item (i) ������������(�����������+�����+��������
���� ��� Training ^ y ui Above the interaction map are hidden BPR Prediction layers, which aim to extract useful Interaction Features signal from the 2D interaction map. Hidden Layers Interaction E A straightforward solution is to use MLP, Map however it results in too many parameters: p u q i ⊗ - Interaction map E has K � K neurons (K is embeddings size usually hundreds) Embedding Item Embedding User Embedding Layer - Require large memories to store the model Q NxK P MxK - Require large training data to learn the 0 1 0 ... Input Layer 0 1 0 ... model well (Sparse) User (u) Item (i) ������������������������+�����+.�.��.-�
�������������������������� Ø ConvNCF uses locally connected CNN as hidden layers in ONCF: Ø CNN has much fewer parameters than MLP Ø Hierarchical tower structure: higher layer integrates more information from larger area. Ø Final prediction summarizes all information from interaction map. Layer 1 Layer 2 Layer 3 Layer 4 Layer 5 Layer 6 Prediction Interaction Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1x1 2x2 Feature Map 32 4x4 8x8 Feature Map 3 16x16 64x64 Feature Map 2 Feature Map 1 32x32 2 Fully Connected Layers: > 10M parameters § 6 Convolutional Layers: 20K parameters, but § achieve better performance! ���������.�����)��������(��.��(������+�
�����������.�������+� Ø Datasets – Yelp: 25,815 users, 25,677 items, and 730,791 interactions. – Gowalla: 54,156 users, 52,400 items, and 1,249,703 interactions. Ø Protocols – Leave-one-out: holdout the latest interaction of each user as the test – Pair 1 testing instance with 999 negative instances – Top-K evaluation: ranking 1 positive vs. 999 negatives. – Ranking lists are evaluated by Hit Ratio and NDCG (@10). Ø Loss Function – Bayesian Personalized Ranking ��������+��������� ��..���+�����������
����.���� Ø MF-BPR [Rendle et al. , UAI’09] – Learning MF with a pair-wise classification loss. Ø MLP [He et al. , WWW’17] – 3-layer multi-layer perceptron above user and item embeddings. Ø JRL [Zhang et al. , CIKM’17] – Multi-layer perceptron above the element-wise product of embeddings. Ø NeuMF [He et al. , WWW’17] – A neural network combining hidden layer of MF and MLP. ��������+��������� ��..���+�����������
�+��������+������������ ∗ indicates that the improvements over all other methods are statistically significant for p < 0.05. Overall Performance: ConvNCF > NeuMF [He et al., 2017] > JRL [Zhang et al., 2017] Ø Usefulness of modeling the relations of embedding dimensions Ø Training MLP well is practically difficult. ���������.�������� ��������.���+�+��+�
���������������������+��� Training process of neural models that apply different operations above the embedding layer: - ConvNCF: outer product; GMF: element-wise product; MLP: concatenation; JRL: element-wise product Yelp Gowalla 0.160 0.58 0.155 0.56 NDCG@10 0.150 NDCG@10 0.54 0.145 0.52 0.140 0.50 ConvNCF ConvNCF 0.135 GMF GMF 0.48 MLP MLP 0.130 JRL JRL 0.46 0.125 0 200 400 600 800 1000 1200 1400 0 200 400 600 800 1000 1200 1400 Epoch# Epoch# Outer product is a simple but effective merge of user&item embeddings. ��������.��������� �������.����������+
�++.������+����� NDCG@10 of using different hidden layers for ONCF: ConvNCF uses a 6-layer CNN. • ONCF-mlp uses a 3-layer MLP above the interaction map. • Yelp Gowalla 0.58 0.159 0.155 0.56 NDCG@10 NDCG@10 0.151 0.54 0.147 0.52 0.143 0.50 ConvNCF ConvNCF 0.139 ONCF-mlp ONCF-mlp 0.135 0.48 0 200 400 600 800 1000 1200 1400 0 200 400 600 800 1000 1200 1400 Epoch# Epoch# 1. ConvNCF outperforms ONCF-mlp. 2. ConvNCF is more stable than ONCF-mlp. �������.���������� ������.������������
Recommend
More recommend