field aware factorization machines
play

Field-aware Factorization Machines YuChin Juan, Yong Zhuang, and - PowerPoint PPT Presentation

Field-aware Factorization Machines YuChin Juan, Yong Zhuang, and Wei-Sheng Chin NTU CSIE MLGroup 1/18 Recently, field-aware factorization machines (FFM) have been used to win two click-through rate prediction competitions hosted by Criteo 1 and


  1. Field-aware Factorization Machines YuChin Juan, Yong Zhuang, and Wei-Sheng Chin NTU CSIE MLGroup 1/18

  2. Recently, field-aware factorization machines (FFM) have been used to win two click-through rate prediction competitions hosted by Criteo 1 and Avazu 2 . In these slides we introduce the formulation of FFM together with well known linear model, degree-2 polynomial model, and factorization machines. To use this model, please download LIBFFM at: http://www.csie.ntu.edu.tw/~r01922136/libffm 1 https://www.kaggle.com/c/criteo-display-ad-challenge 2 https://www.kaggle.com/c/avazu-ctr-prediction 2/18

  3. Linear Model The formulation of linear model is: � φ ( w , x ) = w T x = w j x j , 3 j ∈ C 1 where w is the model, x is a data instance, and C 1 is the non-zero elements in x . 3 The bias term is not included in these slides. 3/18

  4. Degree-2 Polynomial Model (Poly2) The formulation of Poly2 is: � w j 1 , j 2 x j 1 x j 2 , 4 φ ( w , x ) = j 1 , j 2 ∈ C 2 where C 2 is the 2-combination of non-zero elements in x . 4 The linear terms and the bias term are not included in these slides. 4/18

  5. Factorization Machines 6 (FM) The formulation of FM is: � � w j 1 , w j 2 � x j 1 x j 2 , 5 φ ( w , x ) = j 1 , j 2 ∈ C 2 where w j 1 and w j 2 are two vectors with length k , and k is a user-defined parameter. 5 The linear terms and the bias term are not included in these slides. 6 This model is proposed in [Rendle, 2010]. 5/18

  6. Field-aware Factorization Machines 8 (FFM) The formulation of FFM is: � � w j 1 , f 2 , w j 2 , f 1 � x j 1 x j 2 , 7 φ ( w , x ) = j 1 , j 2 ∈ C 2 where f 1 and f 2 are respectively the fields of j 1 and j 2 , and w j 1 , f 2 and w j 2 , f 1 are two vectors with length k . 7 The linear terms and the bias term are not included in these slides. 8 This model is used in [Jahrer et al., 2012]; a similar model is proposed in [Rendle and Schmidt-Thieme, 2010]. 6/18

  7. FFM for Logistic Loss The optimization problem is: L + λ � 2 � w � 2 � � � � min log 1 + exp( − y i φ ( w , x i ) , w i =1 where � φ ( w , x ) = � w j 1 , f 2 , w j 2 , f 1 � x j 1 x j 2 , j 1 , j 2 ∈ C 2 L is the number of instances, and λ is regularization parameter. 7/18

  8. A Concrete Example Consider the following example: User (Us) Movie (Mo) Genre (Ge) Pr (Pr) YuChin (YC) 3Idiots (3I) Comedy,Drama (Co,Dr) $9.99 Note that “User,” “Movie,” and “Genre” are categorical variables, and “Price” is a numerical variable. 8/18

  9. A Concrete Example Conceptually, for linear model, φ ( w , x ) is: w Us-Yu · x Us-Yu + w Mo-3I · x Mo-3I + w Ge-Co · x Ge-Co + w Ge-Dr · x Ge-Dr + w Pr · x Pr , where x Us-Yu = x Mo-3I = x Ge-Co = x Ge-Dr = 1 and x Pr = 9 . 99. Note that because “User,” “Movie,” and “Genre” are categorical variables, the values are all ones. 9 9 If preprocessing such as instances-wise normalization is conducted, the values may not be ones. 9/18

  10. A Concrete Example For Poly2, φ ( w , x ) is: w Us-Yu-Mo-3I · x Us-Yu · x Mo-3I + w Us-Yu-Ge-Co · x Us-Yu · x Ge-Co + w Us-Yu-Ge-Dr · x Us-Yu · x Ge-Dr + w Us-Yu-Pr · x Us-Yu · x Pr + w Mo-3I-Ge-Co · x Mo-3I · x Ge-Co + w Mo-3I-Ge-Dr · x Mo-3I · x Ge-Dr + w Mo-3I-Pr · x Mo-3I · x Pr + w Ge-Co-Ge-Dr · x Ge-Co · x Ge-Dr + w Ge-Co-Pr · x Ge-Co · x Pr + w Ge-Dr-Pr · x Ge-Dr · x Pr 10/18

  11. A Concrete Example For FM, φ ( w , x ) is: � w Us-Yu , w Mo-3I � · x Us-Yu · x Mo-3I + � w Us-Yu , w Ge-Co � · x Us-Yu · x Ge-Co + � w Us-Yu , w Ge-Dr � · x Us-Yu · x Ge-Dr + � w Us-Yu , w Pr � · x Us-Yu · x Pr + � w Mo-3I , w Ge-Co � · x Mo-3I · x Ge-Co + � w Mo-3I , w Ge-Dr � · x Mo-3I · x Ge-Dr + � w Mo-3I , w Pr � · x Mo-3I · x Pr + � w Ge-Co , w Ge-Dr � · x Ge-Co · x Ge-Dr + � w Ge-Co , w Pr � · x Ge-Co · x Pr + � w Ge-Dr , w Pr � · x Ge-Dr · x Pr 11/18

  12. A Concrete Example For FFM, φ ( w , x ) is: � w Us-Yu,Mo , w Mo-3I,Us � · x Us-Yu · x Mo-3I + � w Us-Yu,Ge , w Ge-Co,Us � · x Us-Yu · x Ge-Co + � w Us-Yu,Ge , w Ge-Dr,Us � · x Us-Yu · x Ge-Dr + � w Us-Yu,Pr , w Pr,Us � · x Us-Yu · x Pr + � w Mo-3I,Ge , w Ge-Co,Mo � · x Mo-3I · x Ge-Co + � w Mo-3I,Ge , w Ge-Dr,Mo � · x Mo-3I · x Ge-Dr + � w Mo-3I,Pr , w Pr,Mo � · x Mo-3I · x Pr + � w Ge-Co,Ge , w Ge-Dr,Ge � · x Ge-Co · x Ge-Dr + � w Ge-Co,Pr , w Pr,Ge � · x Ge-Co · x Pr + � w Ge-Dr,Pr , w Pr,Ge � · x Ge-Dr · x Pr 12/18

  13. A Concrete Example In practice we need to map these features into numbers. Say we have the following mapping. Field name Field index Feature name Feature index User → field 1 User-YuChin → feature 1 Movie → field 2 Movie-3Idiots → feature 2 Genre → field 3 Genre-Comedy → feature 3 Price → field 4 Genre-Drama → feature 4 Price → feature 5 After transforming to the LIBFFM format, the data becomes: 1:1:1 2:2:1 3:3:1 3:4:1 4:5:9.99 Here a red number is an index of field, a blue number is an index of feature, and a green number is the value of the corresponding feature. 13/18

  14. A Concrete Example Now, for linear model, φ ( w , x ) is: w 1 · 1 + w 2 · 1 + w 3 · 1 + w 4 · 1 + w 5 · 9 . 99 14/18

  15. A Concrete Example For Poly2, φ ( w , x ) is: w 1,2 · 1 · 1 + w 1,3 · 1 · 1 + w 1,4 · 1 · 1 + w 1,5 · 1 · 9.99 + w 2,3 · 1 · 1 + w 2,4 · 1 · 1 + w 2,5 · 1 · 9.99 + w 3,4 · 1 · 1 + w 3,5 · 1 · 9.99 + w 4,5 · 1 · 9.99 15/18

  16. A Concrete Example For FM, φ ( w , x ) is: � w 1 , w 2 � · 1 · 1 + � w 1 , w 3 � · 1 · 1 + � w 1 , w 4 � · 1 · 1 + � w 1 , w 5 � · 1 · 9.99 + � w 2 , w 3 � · 1 · 1 + � w 2 , w 4 � · 1 · 1 + � w 2 , w 5 � · 1 · 9.99 + � w 3 , w 4 � · 1 · 1 + � w 3 , w 5 � · 1 · 9.99 + � w 4 , w 5 � · 1 · 9.99 16/18

  17. A Concrete Example For FFM, φ ( w , x ) is: � w 1,2 , w 2,1 � · 1 · 1 + � w 1,3 , w 3,1 � · 1 · 1 + � w 1,3 , w 4,1 � · 1 · 1 + � w 1,4 , w 5,1 � · 1 · 9.99 + � w 2,3 , w 3,2 � · 1 · 1 + � w 2,3 , w 4,2 � · 1 · 1 + � w 2,4 , w 5,2 � · 1 · 9.99 + � w 3,3 , w 4,3 � · 1 · 1 + � w 3,4 , w 5,3 � · 1 · 9.99 + � w 4,4 , w 5,3 � · 1 · 9.99 17/18

  18. Jahrer, M., Tscher, A., Lee, J.-Y., Deng, J., Zhang, H., and Spoelstra, J. (2012). Ensemble of collaborative filtering and feature engineered models for click through rate prediction. Rendle, S. (2010). Factorization machines. Rendle, S. and Schmidt-Thieme, L. (2010). Pairwise interaction tensor factorization for personalized tag recommendation. 18/18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend