A Preference-Based Bandit Framework for Personalized Recommendation - - PowerPoint PPT Presentation

▶

Apr 23, 2023 10 likes •143 views

A Preference-Based Bandit Framework for Personalized Recommendation Maryam Tavakol and Ulf Brefeld Paderborn, Nov 8, 2016 Introduction Personalized Recommendation Preference Learning Multi-armed bandits 2 Recommendation 3 Recommendation

SLIDE 1

A Preference-Based Bandit Framework for Personalized Recommendation

Maryam Tavakol and Ulf Brefeld

Paderborn, Nov 8, 2016

SLIDE 2

Introduction

Personalized Recommendation

Preference Learning Multi-armed bandits

SLIDE 3

Recommendation

SLIDE 4

Recommendation

SLIDE 5

Preference Model

Item i: {Shirt, Blue, Women, Cheap}
Item k: {Polo shirt, White, Women, Expensive}

Item i ≻ Item k:

{Shirt-Polo shirt, Blue-White, Women-Women, Cheap-Expensive}

zik := zi − zk

SLIDE 6

Payoff Model

Personalized model + average component

User 1 User 2 … User m User 1 + User 2 + … + User m

E[rt,ik|ut = uj] = β>

t zik + θ>zik

SLIDE 7

Personalized Recommendation with Qualitative Bandit

For t = 1, …, T:
1. The world generates some context
2. The learner chooses an action
3. The world reacts with a reward
Choosing the arm with the highest mean reward +

confidence interval (General case of LinUCB)

SLIDE 8

Unified Optimization

Solving the objective function in dual space
With arbitrary loss function
Using Fenchel-Legendre conjugate

SLIDE 9

Squared Loss

The problem reduces to standard quadratic
ptimization
Model parameters ( , ), are obtained from

α θ βj

max

− 1 2C α>α + r>α

1 2α>[ZZ> + 1 µ( X

φj ⌦ φ>

j ) ZZ>]α

SLIDE 10

Squared Loss

In the contextual bandit framework:
Mean:
Confidence bound:

β>

t zik + θ>zik

c q z>

ik(Z>Z + λI)1zik

SLIDE 11

Algorithm

SLIDE 12

Summary

Personalized recommendation
Pairwise learning in bandit framework
Optimization in dual space
Learning algorithm for squared loss

SLIDE 13

Questions?

Thanks for your attention

Email: tavakol@leuphana.de