Collaborative Deep Learning and Its Variants for Recommender Systems - PowerPoint PPT Presentation

1 Collaborative Deep Learning and Its Variants for Recommender Systems Hao Wang Joint work with Naiyan Wang, Xingjian Shi, and Dit-Yan Yeung

2 Recommender Systems Rating matrix: Observed preferences: Matrix completion To predict:

3 Recommender Systems with Content Content information: Plots, directors, actors, etc.

4 Modeling the Content Information Automatically Automatically learn features and Handcrafted features learn features adapt for ratings

5 Modeling the Content Information 1. Powerful features for content information Deep learning 2. Feedback from rating information Non-i.i.d. Collaborative deep learning

6 Deep Learning Stacked denoising Convolutional neural Recurrent neural autoencoders networks networks Typically for i.i.d. data

7 Modeling the Content Information 1. Powerful features for content information Deep learning 2. Feedback from rating information Non-i.i.d. Collaborative deep learning (CDL)

8 Contribution  Collaborative deep learning: * deep learning for non-i.i.d. data * joint representation learning and collaborative filtering

9 Contribution  Collaborative deep learning  Complex target: * beyond targets like classification and regression * to complete a low-rank matrix

10 Contribution  Collaborative deep learning  Complex target  First hierarchical Bayesian models for deep hybrid recommender system

11 Related Work • Not hybrid methods (ratings only) RBM (single layer, Salakhutdinov et al., 2007) I-RBM/U-RBM (Georgiev et al., 2013) • Not using Bayesian modeling for joint learning DeepMusic (van den Oord et al., 2013) HLDBN (Wang et al., 2014)

12 Stacked Denoising Autoencoders (SDAE) Corrupted input Clean input [ Vincent et al. 2010 ]

13 Probabilistic Matrix Factorization (PMF) Graphical model: Notation: latent vector of item j latent vector of user i rating of item j from user i Generative process: Objective function if using MAP: [ Salakhutdinov et al. 2008 ]

14 Probabilistic SDAE Graphical model: Generative process: Generalized SDAE Notation: corrupted input clean input weights and biases

15 Collaborative Deep Learning (CDL) Graphical model: Collaborative deep learning SDAE Two-way interaction Notation: rating of item j from user i corrupted input • More powerful representation latent vector of item j clean input • Infer missing ratings from content latent vector of user i weights and biases • Infer missing content from ratings content representation

16 A Principled Probabilistic Framework Perception Component Task-Specific Component Perception Variables Task Variables Hinge Variables [ Wang et al. TKDE 2016 ]

17 CDL with Two Components Graphical model: Collaborative deep learning SDAE Two-way interaction Notation: rating of item j from user i corrupted input • More powerful representation latent vector of item j clean input • Infer missing ratings from content latent vector of user i weights and biases • Infer missing content from ratings content representation

18 Collaborative Deep Learning Neural network representation for degenerated CDL

19 Collaborative Deep Learning Information flows from ratings to content

20 Collaborative Deep Learning Information flows from content to ratings

21 Collaborative Deep Learning Representation learning <-> recommendation

22 Learning maximizing the posterior probability is equivalent to maximizing the joint log-likelihood

23 Learning Prior (regularization) for user latent vectors, weights, and biases

24 Learning Generating item latent vectors from content representation with Gaussian offset

25 Learning ‘Generating’ clean input from the output of probabilistic SDAE with Gaussian offset

26 Learning Generating the input of Layer l from the output of Layer l-1 with Gaussian offset

27 Learning measures the error of predicted ratings

28 Learning If goes to infinity, the likelihood simplifies to

29 Update Rules For U and V, use block coordinate descent: For W and b, use a modified version of backpropagation:

30 Datasets Content information Titles and abstracts Titles and abstracts Movie plots [ Wang et al. KDD 2011 ] [ Wang et al. IJCAI 2013 ]

31 Evaluation Metrics Recall: Mean Average Precision (mAP): Higher recall and mAP indicate better recommendation performance

32 Recall@M When the ratings are very sparse : citeulike-t , sparse setting Netflix , sparse setting When the ratings are dense : Netflix , dense setting citeulike-t , dense setting

33 Mean Average Precision (mAP) Exactly the same as Oord et al. 2013, we set the cutoff point at 500 for each user. A relative performance boost of about 50%

34 Example User Moonstruck Romance Movies Precision: 20% VS 30% True Romance

35 Example User Action & Johnny English Drama Movies Precision: 20% VS 50% American Beauty

36 Example User Precision: 50% VS 90%

37 Summary: Collaborative Deep Learning  Non-i.i.d (collaborative) deep learning  With a complex target  First hierarchical Bayesian models for hybrid deep recommender system  Significantly advance the state of the art

38 Marginalized CDL Transformation to latent factors CDL: Reconstruction error Transformation to latent factors Marginalized CDL: Reconstruction error [ Li et al., CIKM 2015 ]

39 Collaborative Deep Ranking [ Ying et al., PAKDD 2016 ]

40 Generative Process: Collaborative Deep Ranking

41 CDL Variants More details in http://wanghao.in/CDL.htm

42 Beyond Bag-of-Words: Documents as Sequences Motivation: • A more natural way, take in one word at a time, model documents as sequences • Jointly model preferences and sequence generation under the BDL framework “ Collaborative recurrent autoencoder: recommend while learning to fill in the blanks” [ Wang et al., NIPS 2016a ]

43 Beyond Bag-of-Words: Documents as Sequences Main Idea: • Joint learning in the BDL framework • Wildcard denoising for robust representation “ Collaborative recurrent autoencoder: recommend while learning to fill in the blanks” [ Wang et al., NIPS 2016a]

44 Wildcard Denoising Sentence: This is a great idea. -> This is a great idea. Direct wrong transition Denoising: this a great idea this is a great idea encoder RNN decoder RNN Wildcard Denoising: this <wc> a great idea this is a great idea encoder RNN decoder RNN

45 Documents as Sequences Main Idea: • Joint learning in the BDL framework • Wildcard denoising for robust representation • Beta-Pooling for variable-length sequences “ Collaborative recurrent autoencoder: recommend while learning to fill in the blanks” [ Wang et al., NIPS 2016a ]

46 Is Variable-Length Weight Vector Possible? length: 8 length: 6 length: 4 [ Wang et al., NIPS 2016a ]

47 Variable-Length Weight Vector with Beta Distributions 8 length-3 vectors X 0.22 0.21 0.18 0.16 length-8 0.08 0.10 weight vector 0.04 0.01 = one single vector Use the area of the beta distribution to define the weights! [ Wang et al., NIPS 2016a ]

48 Variable-Length Weight Vector with Beta Distributions 6 length-3 vectors X 0.27 0.28 0.20 length-6 0.13 0.10 weight vector 0.02 = one single vector Use the area of the beta distribution to define the weights! [ Wang et al., NIPS 2016a ]

49 Graphical Model: Collaborative Recurrent Autoencoder Perception Component Task-Specific Component • Joint learning in the BDL framework • Wildcard denoising for robust representation • Beta-Pooling for variable-length sequences [ Wang et al., NIPS 2016a ]

50 Incorporating Relational Information [ Wang et al. AAAI 2015 ] [ Wang et al. AAAI 2017 ]

51 Probabilistic SDAE Graphical model: Generative process: Generalized SDAE Notation: corrupted input clean input weights and biases

52 Relational SDAE: Graphical Model Notation: corrupted input clean input adjacency matrix

53 Relational SDAE: Two Components Perception Component Task-Specific Component

54 Relational SDAE: Generative Process

55 Relational SDAE: Generative Process

56 Multi-Relational SDAE: Graphical Model Notation: Product of Q+1 Gaussians corrupted input Multiple networks : citation networks clean input co-author networks adjacency matrix

57 Relational SDAE: Objective Function Network A → Relational Matrix S Relational Matrix S → Middle-Layer Representations

58 Update Rules

59 From Representation to Tag Recommendation

60 Algorithm

61 Datasets

62 Sparse Setting, citeulike-a

63 Case Study 1: Tagging Scientific Articles Precision: 10% VS 60%

64 Case Study 2: Tagging Movies (SDAE) Precision: 30% VS 60%

65 Case Study 2: Tagging Movies (RSDAE) Does not appear in the tag lists of movies linked to ‘E.T. the Extra - Terrestrial’ Very difficult to discover this tag

Collaborative Deep Learning and Its Variants for Recommender Systems - PowerPoint PPT Presentation

1 Collaborative Deep Learning and Its Variants for Recommender Systems Hao Wang Joint work with Naiyan Wang, Xingjian Shi, and Dit-Yan Yeung 2 Recommender Systems Rating matrix: Observed preferences: Matrix completion To predict: 3

The game Euclid , its variants, and continued fractions Nhan Bao Ho 23 April 2014 Nhan Bao Ho

Consensus Variants Usman Mazhar Mirza 6/17/2013 1 Consensus Variants In the variants we

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Variants of Turing Machines Variants of Turing Machines p.1/49 Robustness

On the variants of treewidth and minor-closedness property O-joung Kwon KAIST in Daejeon, Korea

Predic'ng 'ssue-specific effects of rare gene'c variants Farhan Damani Biological Data Sciences

Minor variants in HIV-1 Minor variants in HIV-1 Why? Why? University of Cologne Institute of

Influence of the K103N minor variants in Influence of the K103N minor variants in therapy-nave

On Variants of Modified Bar Recursion Paulo Oliva Queen Mary, University of London, UK

Theory of Computer Science D4. Halting Problem Variants & Rices Theorem Gabriele R oger

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Sampling & Counting for Big Data 2019

Statistical Inference in Gaussian Graphical Models Y. Baraud (1) , C. Giraud (1 , 2) , S. Huet (2)

Layer-finding in Radar Echograms using Probabilis8c Graphical

Graphical Languages for Modeling Complex Reactive Systems Or: Three proposals to argue with . . .

Inferring Sparse Gaussian Graphical Models for Biological Network Christophe Ambroise Camille

Asymptotics of the Coefficients of Bivariate Analytic Functions with Algebraic Singularities

Thermodynamic and Transport Proper2es of Strongly-Coupled Degenerate

The Coupled Electron-Ion Monte Carlo study of hydrogen under extreme conditions Carlo Pierleoni

Collaborative Deep Learning and Its Variants for Recommender Systems - PowerPoint PPT Presentation

1 Collaborative Deep Learning and Its Variants for Recommender Systems Hao Wang Joint work with Naiyan Wang, Xingjian Shi, and Dit-Yan Yeung 2 Recommender Systems Rating matrix: Observed preferences: Matrix completion To predict: 3

The game Euclid , its variants, and continued fractions Nhan Bao Ho 23 April 2014 Nhan Bao Ho

Consensus Variants Usman Mazhar Mirza 6/17/2013 1 Consensus Variants In the variants we

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Variants of Turing Machines Variants of Turing Machines p.1/49 Robustness

On the variants of treewidth and minor-closedness property O-joung Kwon KAIST in Daejeon, Korea

Predic'ng 'ssue-specific effects of rare gene'c variants Farhan Damani Biological Data Sciences

Minor variants in HIV-1 Minor variants in HIV-1 Why? Why? University of Cologne Institute of

Influence of the K103N minor variants in Influence of the K103N minor variants in therapy-nave

On Variants of Modified Bar Recursion Paulo Oliva Queen Mary, University of London, UK

Theory of Computer Science D4. Halting Problem Variants &amp; Rices Theorem Gabriele R oger

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Sampling &amp; Counting for Big Data 2019

Statistical Inference in Gaussian Graphical Models Y. Baraud (1) , C. Giraud (1 , 2) , S. Huet (2)

Layer-finding in Radar Echograms using Probabilis8c Graphical

Graphical Languages for Modeling Complex Reactive Systems Or: Three proposals to argue with . . .

Inferring Sparse Gaussian Graphical Models for Biological Network Christophe Ambroise Camille

Asymptotics of the Coefficients of Bivariate Analytic Functions with Algebraic Singularities

Thermodynamic and Transport Proper2es of Strongly-Coupled Degenerate

The Coupled Electron-Ion Monte Carlo study of hydrogen under extreme conditions Carlo Pierleoni

Theory of Computer Science D4. Halting Problem Variants & Rices Theorem Gabriele R oger

Sampling & Counting for Big Data 2019