large scale matrix factorization with distributed
play

Large-Scale Matrix Factorization with Distributed Stochastic - PowerPoint PPT Presentation

Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent Rainer Gemulla December 17, 2011 Peter J. Haas Yannis Sismanis Christina Teflioudi Faraz Makari Outline Matrix Factorization Stochastic Gradient Descent


  1. Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent Rainer Gemulla December 17, 2011 Peter J. Haas Yannis Sismanis Christina Teflioudi Faraz Makari

  2. Outline Matrix Factorization Stochastic Gradient Descent Distributed SGD with MapReduce Experiments Summary 2 / 28

  3. Outline Matrix Factorization Stochastic Gradient Descent Distributed SGD with MapReduce Experiments Summary 3 / 28

  4. Matrix completion visualized Original image 4 / 28

  5. Matrix completion visualized Original image Partially observed image 4 / 28

  6. Matrix completion visualized Original image Partially observed image Reconstructed image 4 / 28

  7. Matrix completion for recommender systems ◮ Discover latent factors ( r = 1) Avatar The Matrix Up Alice 4 2 Bob 3 2 Charlie 5 3 5 / 28

  8. Matrix completion for recommender systems ◮ Discover latent factors ( r = 1) Avatar The Matrix Up ( 2.24 ) ( 1.92 ) ( 1.18 ) Alice 4 2 ( 1.98 ) Bob 3 2 ( 1.21 ) Charlie 5 3 ( 2.30 ) 5 / 28

  9. Matrix completion for recommender systems ◮ Discover latent factors ( r = 1) Avatar The Matrix Up ( 2.24 ) ( 1.92 ) ( 1.18 ) Alice 4 2 ( 1.98 ) ( 3.8 ) ( 2.3 ) Bob 3 2 ( 1.21 ) ( 2.7 ) ( 2.3 ) Charlie 5 3 ( 2.30 ) ( 5.2 ) ( 2.7 ) ◮ Minimum loss � ( V ij − [ WH ] ij ) 2 min W , H ( i , j ) ∈ Z 5 / 28

  10. Matrix completion for recommender systems ◮ Discover latent factors ( r = 1) Avatar The Matrix Up ( 2.24 ) ( 1.92 ) ( 1.18 ) ? Alice 4 2 ( 1.98 ) ( 4.4 ) ( 3.8 ) ( 2.3 ) Bob 3 2 ? ( 1.21 ) ( 2.7 ) ( 2.3 ) ( 1.4 ) ? Charlie 5 3 ( 2.30 ) ( 5.2 ) ( 4.4 ) ( 2.7 ) ◮ Minimum loss � ( V ij − [ WH ] ij ) 2 min W , H ( i , j ) ∈ Z 5 / 28

  11. Matrix completion for recommender systems ◮ Discover latent factors ( r = 1) Avatar The Matrix Up ( 2.24 ) ( 1.92 ) ( 1.18 ) ? Alice 4 2 ( 1.98 ) ( 4.4 ) ( 3.8 ) ( 2.3 ) Bob 3 2 ? ( 1.21 ) ( 2.7 ) ( 2.3 ) ( 1.4 ) ? Charlie 5 3 ( 2.30 ) ( 5.2 ) ( 4.4 ) ( 2.7 ) ◮ Minimum loss � ( V ij − µ − u i − m j − [ WH ] ij ) 2 min W , H , u , m ( i , j ) ∈ Z ◮ Bias 5 / 28

  12. Matrix completion for recommender systems ◮ Discover latent factors ( r = 1) Avatar The Matrix Up ( 2.24 ) ( 1.92 ) ( 1.18 ) Alice ? 4 2 ( 1.98 ) ( 4.4 ) ( 3.8 ) ( 2.3 ) Bob 3 2 ? ( 1.21 ) ( 2.7 ) ( 2.3 ) ( 1.4 ) Charlie 5 ? 3 ( 2.30 ) ( 5.2 ) ( 4.4 ) ( 2.7 ) ◮ Minimum loss � ( V ij − µ − u i − m j − [ WH ] ij ) 2 min W , H , u , m ( i , j ) ∈ Z + λ ( � W � + � H � + � u � + � m � ) ◮ Bias, regularization 5 / 28

  13. Matrix completion for recommender systems ◮ Discover latent factors ( r = 1) Avatar The Matrix Up ( 2.24 ) ( 1.92 ) ( 1.18 ) Alice ? 4 2 ( 1.98 ) ( 4.4 ) ( 3.8 ) ( 2.3 ) Bob 3 2 ? ( 1.21 ) ( 2.7 ) ( 2.3 ) ( 1.4 ) Charlie 5 ? 3 ( 2.30 ) ( 5.2 ) ( 4.4 ) ( 2.7 ) ◮ Minimum loss � ( V ij − µ − u i ( t ) − m j ( t ) − [ W ( t ) H ] ij ) 2 min W , H , u , m ( i , j , t ) ∈ Z t + λ ( � W ( t ) � + � H � + � u ( t ) � + � m ( t ) � ) ◮ Bias, regularization, time, . . . 5 / 28

  14. Generalized Matrix Factorization ◮ A general machine learning problem ◮ Recommender systems, text indexing, face recognition, . . . 6 / 28

  15. Generalized Matrix Factorization ◮ A general machine learning problem ◮ Recommender systems, text indexing, face recognition, . . . ◮ Training data ◮ V : m × n input matrix (e.g., rating matrix) ◮ Z : training set of indexes in V (e.g., subset of known ratings) V ij V 6 / 28

  16. Generalized Matrix Factorization ◮ A general machine learning problem ◮ Recommender systems, text indexing, face recognition, . . . ◮ Training data ◮ V : m × n input matrix (e.g., rating matrix) ◮ Z : training set of indexes in V (e.g., subset of known ratings) ◮ Parameter space ◮ W : row factors (e.g., m × r latent customer factors) ◮ H : column factors (e.g., r × n latent movie factors) H H ∗ j W W i ∗ V ij V 6 / 28

  17. Generalized Matrix Factorization ◮ A general machine learning problem ◮ Recommender systems, text indexing, face recognition, . . . ◮ Training data ◮ V : m × n input matrix (e.g., rating matrix) ◮ Z : training set of indexes in V (e.g., subset of known ratings) ◮ Parameter space ◮ W : row factors (e.g., m × r latent customer factors) ◮ H : column factors (e.g., r × n latent movie factors) H ◮ Model H ∗ j ◮ L ij ( W i ∗ , H ∗ j ): loss at element ( i , j ) ◮ Includes prediction error, regularization, auxiliary information, . . . ◮ Constraints (e.g., non-negativity) W W i ∗ V ij V 6 / 28

  18. Generalized Matrix Factorization ◮ A general machine learning problem ◮ Recommender systems, text indexing, face recognition, . . . ◮ Training data ◮ V : m × n input matrix (e.g., rating matrix) ◮ Z : training set of indexes in V (e.g., subset of known ratings) ◮ Parameter space ◮ W : row factors (e.g., m × r latent customer factors) ◮ H : column factors (e.g., r × n latent movie factors) H ◮ Model H ∗ j ◮ L ij ( W i ∗ , H ∗ j ): loss at element ( i , j ) ◮ Includes prediction error, regularization, auxiliary information, . . . ◮ Constraints (e.g., non-negativity) W W i ∗ V ij ◮ Find best model � argmin L ij ( W i ∗ , H ∗ j ) V W , H ( i , j ) ∈ Z 6 / 28

  19. Successful Applications ◮ Movie recommendation (Netflix, competition papers) ◮ > 12M users, > 20k movies, 2.4B ratings (projected) ◮ 36GB data, 9.2GB model (projected) ◮ Latent factor model ◮ Website recommendation (Microsoft, WWW10) ◮ 51M users, 15M URLs, 1.2B clicks ◮ 17.8GB data, 161GB metadata, 49GB model ◮ Gaussian non-negative matrix factorization ◮ News personalization (Google, WWW07) ◮ Millions of users, millions of stories, ? clicks ◮ Probabilistic latent semantic indexing 7 / 28

  20. Successful Applications ◮ Movie recommendation (Netflix, competition papers) ◮ > 12M users, > 20k movies, 2.4B ratings (projected) ◮ 36GB data, 9.2GB model (projected) ◮ Latent factor model ◮ Website recommendation (Microsoft, WWW10) ◮ 51M users, 15M URLs, 1.2B clicks ◮ 17.8GB data, 161GB metadata, 49GB model ◮ Gaussian non-negative matrix factorization ◮ News personalization (Google, WWW07) ◮ Millions of users, millions of stories, ? clicks ◮ Probabilistic latent semantic indexing Distributed processing is necessary! ◮ Big data ◮ Large models ◮ Expensive computations 7 / 28

  21. Outline Matrix Factorization Stochastic Gradient Descent Distributed SGD with MapReduce Experiments Summary 8 / 28

  22. Stochastic Gradient Descent 1.0 5 4.5 4 5 ◮ Find minimum θ ∗ of function L . 5 5 . 6 6 5 . 5 4 7.5 7 3 . 5 3 2 . 5 0.5 0.0 ● 0 . 5 − 0.5 1 1.5 2 − 1.0 6.5 4.5 5 6 7 4 5 5 . − 1.0 − 0.5 0.0 0.5 1.0 9 / 28

  23. Stochastic Gradient Descent 1.0 5 4.5 4 5 ◮ Find minimum θ ∗ of function L . 5 5 . 6 6 5 . 5 4 7.5 7 3 . 5 3 2 . 5 ◮ Pick a starting point θ 0 0.5 ● ◮ Approximate gradient ˆ L ′ ( θ 0 ) 0.0 ● 0 . 5 − 0.5 1 1.5 2 − 1.0 6.5 4.5 5 6 7 4 5 5 . − 1.0 − 0.5 0.0 0.5 1.0 9 / 28

  24. Stochastic Gradient Descent 1.0 5 4.5 4 5 ◮ Find minimum θ ∗ of function L . 5 5 . 6 6 5 . 5 4 7.5 7 3 . 5 3 2 . 5 ◮ Pick a starting point θ 0 0.5 ● ◮ Approximate gradient ˆ L ′ ( θ 0 ) ◮ Move “approximately” downhill 0.0 ● 0 . 5 − 0.5 1 1.5 2 − 1.0 6.5 4.5 5 6 7 4 5 5 . − 1.0 − 0.5 0.0 0.5 1.0 9 / 28

  25. Stochastic Gradient Descent 1.0 5 4.5 4 5 ◮ Find minimum θ ∗ of function L 5 . 5 . 6 6 5 . 5 4 7.5 7 3 . 5 3 2 . 5 ◮ Pick a starting point θ 0 0.5 ● ◮ Approximate gradient ˆ L ′ ( θ 0 ) ◮ Move “approximately” downhill 0.0 ● ◮ Stochastic difference equation 0 . 5 − 0.5 1 θ n +1 = θ n − ǫ n ˆ L ′ ( θ n ) 1.5 2 − 1.0 6.5 4.5 5 6 7 4 5 5 . − 1.0 − 0.5 0.0 0.5 1.0 9 / 28

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend