RaFM Rank-Aware Factorization Machines Yin Zheng On Behalf of - - PowerPoint PPT Presentation

rafm
SMART_READER_LITE
LIVE PREVIEW

RaFM Rank-Aware Factorization Machines Yin Zheng On Behalf of - - PowerPoint PPT Presentation

RaFM Rank-Aware Factorization Machines Yin Zheng On Behalf of Xiaoshuang Chen, Yin Zheng, Jiaxing Wang, Wenye Ma, Junzhou Huang Motivation Factorization Machines Different features have different frequencies of occurences Factorized


slide-1
SLIDE 1

RaFM

Rank-Aware Factorization Machines

Xiaoshuang Chen, Yin Zheng, Jiaxing Wang, Wenye Ma, Junzhou Huang Yin Zheng On Behalf of

slide-2
SLIDE 2

Motivation

, ,

ˆ ,

i j j i i i j i j i i

y x x i s x b a w

  

  

 

F F

V V

, , 1

,

D i j i j M j f F i f f

v v

    v v V V Different features have different frequencies of occurences Factorization Machines

i i

 v V

Factorized embeddings for each feature: Modeling pairwise interactions:

What is the best rank of the embeddings?

slide-3
SLIDE 3

Motivation

Performance of FMs with fixed ranks MovieLens Tag Overfitting Underfitting

slide-4
SLIDE 4

Basic Model

Rank-Aware Factorization Machines Low-Rank FM High-Rank FM Rank-Aware FM

slide-5
SLIDE 5

Basic Model

Rank-Aware Factorization Machines

 

min ,

ij i j

k k k 

   

,

ij ij

RaFM k k i j i j

  v v V V

, ,

ˆ ,

i j j i i i j i j i i

y x x i s x b a w

  

  

 

F F

V V

     

 

1 2

, , ,

i

k i i i i

 v v v  V

Multiple embeddings with different ranks: The largest rank to avoid overfitting (hyperparameters) Choose a proper rank for computation of pairwise interaction

  • What is the time and space complexity?
  • How to efficiently train RaFM?
slide-6
SLIDE 6

Space Complexity

Active and Inactive Factors

Active Factors Inactive Factors

 

:

k i

i k k    F F

1 m k k k

D O

     

F Described by Feature Set

Active factors: Inactive factors:

 

p

p

v

F

 

p

p 

v

F F

Need NOT be stored! Space Complexity:

slide-7
SLIDE 7

Time Complexity

 

:

k i

i k k    F F

 

 

 

 

, ,

| | ,

ij ij l k l k

l k j i k i j k i j

x x

  v v B

 

 

,

max ,min ,

ij ij l k

k l k k     

, 1 , , 1 1, 1 l k l k k k k k    

   B B A A

1,m

RaFM  B

1 m k k k

D O

     

F Auxiliary Variables

       

2 2 , 2 , , 2

1 2

k k k

l l l i i l l k j i j i i i j i j i i i

x x x x

   

           

  

v v v v

F F F

A

 

l k

O D F

It is easy to prove that

slide-8
SLIDE 8

Learning Algorithm

Free and Dependent Factors

Free Factors Inactive Factors

 

:

k i

i k k    F F

Dependent Factors

 

1 p p

p

v

F F

 

p

p 

v

F F

 

1 p

p

v

F

Bi-Level Optimization

 

1,

1 min ,

m

L y N 

x

B

 

 

1

1, 1, 1

1 argmin , , 1

p

p p p

p m L N

   

x

v

F

B B

Pushing dependent factors to approximate free factors Proved by Thm. 6

slide-9
SLIDE 9

Experiment

  • RaFM outperforms FM.
  • RaFM is also more computational

efficient than FM.

Improvement: 0.5%~15% Model Size: 20%~66% Training Time: 24%~95%

slide-10
SLIDE 10

Experiment

RaFM vs. FM Results on Tencent CTR Dataset

RaFM-low has similar performance as FM-32. RaFM: 32 + 512

slide-11
SLIDE 11

Thanks!

Code Xiaoshuang Chen Yin Zheng https://github.com/cxsmarkchan/RaFM https://cxsmarkchan.github.io https://sites.google.com/site/zhengyin1126

Pacific Ballroom Jun 13th 6:30PM~9:00PM PosterID 220