xing zhao qingquan song james caverlee and xia hu
play

Xing Zhao, Qingquan Song, James Caverlee and Xia Hu Department of - PowerPoint PPT Presentation

Xing Zhao, Qingquan Song, James Caverlee and Xia Hu Department of Computer Science and Engineering Texas A&M University, USA 1 Da Dataset Statistics cs # 10 6 2.5 1 Cumsum Taking Up of Positive Samples Items Quantity Proportion


  1. Xing Zhao, Qingquan Song, James Caverlee and Xia Hu Department of Computer Science and Engineering Texas A&M University, USA 1

  2. Da Dataset Statistics cs # 10 6 2.5 1 Cumsum Taking Up of Positive Samples Items Quantity Proportion Number of Remaining Tracks 2 0.8 Playlists 1,000,000 Unique Tracks 2,262,292 100% 1.5 0.6 Unique tracks (freq ≥ 5) 599,341 96.05% 1 0.4 Unique tracks (freq ≥ 100) 70,229 80.67% Unique albums 734,684 0.5 0.2 Unique artists 295,860 0 0 1 5 10 100 1000 10000 40000 Track Appeared Times in Training Data Therefore, in some part of our methods, we only consider these tracks for training. 4

  3. Ou Our Me Metho thod - Tr TrailMix Cold Start: For CC- DNCF Task 1 Title Playlist C-Tree Continuation: For Task 2 to 10 5

  4. CC-Tit CC Title le: Co Context t Cl Cluster ering g us using ng Tit Title le Tracks (2,262,292) Word list 1: Word Tracks Track list 1 i list … 3 7 Word list 3: Words (9,817) Word list 2: 5 21 Word list Tracks Track list 3 Track list 2 3 43 … 6 81 j Word list Tracks Cluster … Pre- 8 32 process 7 Word list Tracks Recommend 13 14 6 5 … Word list Tracks Track i is existed in 6 New title: e.g. Pop Punk 2018 Summer Normalize playlists whose title contain word j 6

  5. CC CC-Tit Title le: Co Cont. t. Steps: 1. Preprocessing: stemming, stop words, emoji, punctuation, etc. Items Quantity 2. Building word-track matrix of size unique titles 92,944 9817 x 2,262,292 unique normalized titles 17,381 3. Normalizing cells using ‘IDF’ unique non-stop 9,817 normalized words 4. Clustering words based on row playlist without title after similarity 22,921 processing 5. Recommend tracks in each cluster for new title 7

  6. CC CC-Tit Title le: Co Cont. t. Highlight: 1. CC-Title could deal with large scale of matrix computation with high efficiency. 2. In some cases (clusters), the performance is very good. 8

  7. DNCF: DNCF: Dec Decorated ed Neu Neural Co Collaborati tive e Filter Fi ering Neural Collaborative Filtering Pros: 1. Simple and Generic 2. Ensemble the advantages of basic matrix factorization model and MLP . Cons: Computationally not efficient to be directly applied on the target problem due to the huge item scope and the matrix sparsity. He et al. , “Neural Collaborative Filtering”. WWW, 2017. 9

  8. DNCF: DNCF: Co Cont. t. Two modifications to address efficiency issue: Training Phase: Constrained Negative Sampling. Testing Phase: Constrained Recommendation with Reordering. 10

  9. DNCF: Co DNCF: Cont. t. Training Phase: Constrained Negative Sampling. # 10 6 2.5 1 1. Constrain the negative sampling Cumsum Taking Up of Positive Samples space to the space of the tracks Number of Remaining Tracks 2 0.8 appearing equal to or more than 100 times in the training data. 1.5 0.6 1 0.4 2. Positive samples remain the whole dataset during training to protect the 0.5 0.2 feasible embedding and prediction of all the testing data. (Task 2-10) 0 0 1 5 10 100 1000 10000 40000 Track Appeared Times in Training Data 11

  10. DNCF: Co DNCF: Cont. t. Testing Phase: Constrained Recommendation with Reordering. 1. Constrain the recommendation space by only recommending the popular tracks (>=100 times) during testing phase towards a more targeted prediction. φ 1 φ 2 φ 3 Word2Vec (1) Word2Vec (2) DNCF 2. Reorder the predicted 500 tracks with an ensemble trick leveraging two types of predictions provided by the Word2Vec embedding. φ 1 \ L 1 ∪ L 2 ∪ L 3 L 2 L 3 L 1 12

  11. DNCF: Re DNCF: Result Highlight : 1. Results steadily increase with maximum performance at seed 25; 2. It performs better for playlists with random seeding tracks (R) than sequential seeding tracks; 13

  12. C-Tree: ee: Construct cted Tree A Playlist is: 1. Natural tree-structure : A playlist consists of different tracks ,and these tracks always belong to a specific album of an artist; 2. Meaningful Cluster: A list of tracks in a specific playlist always have latent similarity, such as genres, style, listening sense, etc. Phylogenetic Tree. (Source: https://www.creative-biostructure.com/custom- phylogenetic-tree-construction-service-399.htm) 14

  13. C-Tree: ee: Co Cont. t. A Real Example (PID: 11548): Pop punk band • Playlist Title: Pop Puck • 48 tracks belongs to 12 albums by 5 artists (2 rock bands and 3 pop punk bands) How do we compare the internal relationship? Rock band How do we compare it with another tree ( external )? 15

  14. C-Tree: ee: Co Cont. t. External comparison Incomplete Tree: A playlist only contains partial of tracks (seed), which is waiting for recommending. Training Data: Complete Tree Testing Data: Incomplete Tree 16

  15. C-Tree: ee: Co Cont. t. Steps: 1. Building Forest: 1 million complete trees; 2. Comparing and normalizing the distance between the incomplete tree T-test and Playlist 1 complete tree T-train; Playlist 2 3. Recommending the tracks Playlist 3 (leaves) from each T-train to the Playlist 4 … incomplete tree T-test, based on the score of each leaf. Playlist n 17

  16. C-Tree: ee: Re Result Highlight : 1. Results steadily increase with maximum performance at seed 25; 2. It performs better for playlists with random seeding tracks (R) than sequential seeding tracks; 18

  17. TrailMix : En Tr Ensemble Mo Model el Num_handou t Method 1 CC-Title A DNCF B DNCF A C-Tree B C-Tree Method 2 Final Recommendation 19

  18. Exp Experiment an and Re Result Experiment Setting: • Training 80%, testing 20%: cross-validation for hyper parameter tuning • Testing data strictly follows the rules designed by RecSys 2018 20

  19. Thank you! 21

  20. Q&A 22

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend